ESM Preprocessing Gallery – delay_2_sent

Delay between two consecutive beeps

Packages: dplyr, lubridate, ggplot2

The delay of time between two consecutive sent beeps is an important piece of information. Considering the sampling scheme at hand, this delay could be more or less fixed. For example, the distribution of delays is likely to differ between random-contingent and fixed-contingent sampling schemes. However, it is frequent to have delay values that are unexpected. Common issues can be:

a negative delay (i.e., a negative value), meaning an issue in the order of the observations or in the recording of the timestamps.
a very long delay (i.e., a high value), meaning that some beeps have probably not been recorded or the sending of a beep has been delayed.

One way to investigate the delays is to plot the distribution of those delays. First, we compute the delay of time between two consecutive observations within each participant. When computing, you can change the units used for the delays to either ‘secs’, ‘mins’, ‘hours’, ‘days’, or ‘weeks’. Below, we will use ‘hours’ as the unit of time. Additionally, we can choose to take into account the nested structure of time:

Delay between two consecutive sent beeps across the study. This method particularly suits daily questionnaires.
Delays between two consecutive beeps within days. It means that the delay between the last beeps and the first beeps of the day is ignored.

Before computing the delay, it is important to check that the dataframe is sorted by the variable ‘sent’. If not, you can use the ‘arrange’ function from the dplyr package as follows: ‘data = arrange(data, id, sent)’.

df_int = data %>%
    group_by(id) %>%
    mutate(delay = difftime(as.POSIXct(lead(sent)), sent, units="hours"))

Now, we can create a histogram to plot this variable:

df_int %>% 
    ggplot(aes(x = delay)) +
    geom_histogram(bins=100)

From the plots above we can check if the distribution follows an expected pattern. In the case of fixe-contingent design, we expect a very narrow distribution of delay between two consecutive beeps. In the case of signal-contingent design, we expect the shape to take a triangle distribution (see De Haan-Rietdijk et al., 2017).

Going a step further, you might consider incorporating covariates into your plot, such as the day of the week or the month. This is particularly relevant if you programmed variations in the sampling scheme based on these temporal factors, for example, if you programmed different sampling schemes for different weeks of the day. To do so, you can use the facet_wrap function in ggplot2 to create a separate histogram for each level of the covariate. For example, the code below creates a separate density plot for the type of day (weekday vs. weekend day) using the variable ‘wday’. Here, we used density plots instead of histograms as we expect the number of observations to be different between weekdays and weekend days. However, if the delay is independent of the type of day, we expect the shape of the distribution of the delays to remain the same.

df_int %>%
    mutate(weekday = ifelse(wday(sent, week_start=1) %in% c(6,7), "weekend", "weekday")) %>%
    ggplot(aes(x = delay)) +
    # geom_histogram(bins=100) +
    geom_density() +
    facet_wrap(~weekday)

Above, we can see that the density distribution of the delay between two consecutive beeps looks similar between weekdays and weekend days.

References

de Haan-Rietdijk S, Voelkle MC, Keijsers L and Hamaker EL (2017) Discrete- vs. Continuous-Time Modeling of Unequally Spaced Experience Sampling Method Data. Front. Psychol. 8:1849. doi: 10.3389/fpsyg.2017.01849