Duration plot

Packages: dplyr, ggplot2, lubridate, ggalt


We named a duration plot a graphic that gives an overview of the duration of the ESM study for each participant. It is an easy way to look at irregularities in the duration involvement of participants, such as outliers (e.g., participants with a very short or long study duration) or patterns in the length of the study across participants (e.g., groups of participants with different study durations). However,

Warning

It is important to ensure that the timestamp variable used here (i.e., ‘sent’) is trouble-free, otherwise the results might be misleading. Additionally, the following plots rely on the duration variable, which must be computed prior to running the following chunks of R code.

In a first plot, we will displays the number of days for which the participants have received beeps. It provides insights into how long the data collection lasted for each participant, from their first to their last beep. We propose two display options in function of whether the participants are ordered by their id numbers or by the duration of the study.

data %>% 
  group_by(id) %>% slice(1) %>% # Select the first row of each participant
  ggplot(aes(x=factor(id), y=duration)) +
    geom_col() +
    coord_flip()

Above, we can see that the study lasts from 4 days to 14 days across the participants. In particular, 18 participants have a study duration lower than 14 days (the planned duration of the study). This irregularity might not be expected if all participants followed the same protocol.

A second plot provides a broader view. It helps investigate the period in a year the surveys were sent to the participants. In addition to examining whether the data collection periods align with expectations, this plot offers valuable insights by contextualizing the timing of participants’ responses in relation to specific events (e.g., holidays, exams, or national events).

library(ggalt)
data %>% group_by(id) %>%
  summarise(min_sent = min(sent, na.rm=TRUE), max_sent = max(sent, na.rm=TRUE)) %>%
  ggplot(aes(x=min_sent, xend=max_sent, y=factor(id), group=factor(id))) + 
    geom_dumbbell(color="black", 
                  size=0.75, 
                  point.colour.l="black") +
    scale_x_datetime(date_breaks="1 month", date_labels = "%B %y") +
    theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))

Despite some early-bird and late-comer participants, the majority of the participants were involved in the study between July 2018 and December 2018. It might be interesting to investigate the reasons behind these early-bird and late-comer participants.