Sampling scheme participant plot

Packages: dplyr, scales


The sampling scheme participant plot would help you to have a broad view of all the ‘valid’ beeps and/or answered beeps by the participants. It is a useful tool to identify patterns of answers, patterns in missingness, other types of characteristics related to participants’ answers, or just better understand the data.

Similarly to the sampling scheme plot, you can integrate other type of information in this plot. For instance, you can add information regarding the type of day (i.e., weekend vs. weekday) and see if there are differences in the pattern of answers across participants (i.e., participants answering less on weekends).

Time frames

The beeps can be displayed regarding different time frames, such as:

  • Beep level: each dot is a valid beep of the participant.
  • Day level: each dot is a day for which the participant has at least one valid beep.
  • Study duration: each dot is a valid sent to the participant on a continuous time scale.

Note that this section is similar to the sampling scheme plot section, instead we filter the dataframe using the ‘valid’ variable.

Beep level

In the plot, each dot is a valid beep of the participant. This plot relies on the observation number (‘obsno’) variable (see the Create time variable topic). Importantly, you must ensure that the computation of this variable accounts for any potential missing beeps otherwise the plot will be missleading.

data %>% filter(valid==1) %>%
    group_by(dyad, id, obsno) %>%
    ggplot(aes(x = obsno, y = factor(id))) +
        geom_point(size=1.5) +
        theme(axis.text.x = element_text(angle = 90)) 

Day level

Here, each dot is a day for which the participant has at least one valid beep. This plot relies on the day number (‘daycum’) variable (see the Create time variable topic). A color scale can be used to display the number of valid beeps in the day.

data %>% filter(valid==1) %>%
    group_by(dyad, id, daycum) %>%
    summarize(count = n()) %>%
    ggplot(aes(x = factor(daycum), y = factor(id))) +
        geom_point(aes(color=factor(count)),size=3) + # Add color scale
        theme(axis.text.x = element_text(angle = 90))

Study duration

In the plot, each dot is a valid sent to the participant on a continuous time scale, with 0 being the first day of the ESM study for the participant and the units being in minutes. To create this plot, we first need to compute the continuous time variable (‘continuoustime’) starting from the start of the first day for each participant.

# Handle x labels and breaks
breaks_ = seq(0, 1000000, 1440)[-1]  # Create vector of end of day values (in minutes)
labels_ = paste0(1:(length(breaks_)), " day")  # Create day labels

# Values for vertical lines (lower than maximal continuous time value)
breaks_limit = breaks_[breaks_ < max(data$continuoustime)]

# Create plot
data %>%
    ggplot(aes(x=continuoustime, y=id)) +
        geom_point(size=1.2) +
        scale_x_continuous(breaks = breaks_ - 720, label=labels_) +
        geom_vline(xintercept=breaks_limit)