Lag scores
Packages: dplyr
When analyzing dynamic systems, it is often important to consider the relationships between a variable and its past or future values. In time-series analysis, this involves the use of lagged or leading values of a variable. The ‘lag()’ and ‘lead()’ functions are tools commonly used in R for this purpose. Importantly, when we have multiple participants in a dataframe, we need to compute the lag for each participant independently. Otherwise, it would mix the first or last value of one participant with the next one. In addition, we have to make sure that the data frame is well-ordered.
The ‘lag()’ function allows us to access a previous value of a variable. Here are two simple ways to compute it.
= data[order(data$id,data$obsno),] # Order the dataframe
data $PA1_lag = ave(data$PA1, data$id, FUN=function(x) lag(x)) data
The ‘lead()’ function allows you to access a future value of a variable.
= data[order(data$id,data$obsno),] # Order the dataframe
data $PA1_lead = ave(data$PA1, data$id, FUN=function(x) lead(x)) data
You can go further into the past (lag) or the future (lead) by specifying the extent of the lag (‘k’ argument in R base function and ‘n’ argument in dplyr function). Here, we create a lag of 2:
= data %>%
data arrange(id, obsno) %>%
group_by(id) %>%
mutate(PA1_lag2 = lag(PA1, n=2))
Here is a subset of the new variables:
# A tibble: 6 × 5
# Groups: id [1]
id obsno PA1_lag PA1_lead PA1_lag2
<dbl> <int> <int> <int> <int>
1 1 1 NA NA NA
2 1 2 NA NA NA
3 1 3 NA 1 NA
4 1 4 NA NA NA
5 1 5 1 NA NA
6 1 6 NA 1 1
Night issue: unequal time interval
Whenever an ESM study has multiple beeps a day, we could have good reasons not to want the last beep value of a day to be associated with the first beep of the following day, and vice versa for the ‘lead()’ function. We present two methods to restrict the ‘lag()’ and the ‘lead()’ output to stay within a day. For both, you can use ‘as.Date()’ or the day number variable.
$PA1_lag = rep(NA, nrow(data))
data# Loop over the participants
for (i in unique(data$id)){
= unique(data$daycum[data$id==i])
day_id # Loop over the day of the participant
for (day in day_id){
= data$daycum==day & data$id==i
position "PA1_lag"] = lag(data[position, "PA1"])
data[position,
} }