Time sent beeps

Packages: dplyr, ggplot2, lubridate, hms, scales


When designing an ESM study, researchers program the sending of beeps to the participants according to a predefined sampling scheme. Then, the sending is often handled by an application and/or a server. Nevertheless, the programming phase is prone to human errors, such as incorrect times being input into the system. Moreover, applications and servers are not infallible and can sometimes have bugs or be subject to network issues.

One basic check to perform is to verify that the beeps were sent at the expected times in a day, which can be done using plots.

Distributions

The easiest way to check is to create a distribution that displays at what time of day the beeps were sent. Hence, we will use the ‘sent’ variable, which contains the time at which the beep was sent. From this variable, we will extract either the hour using the ‘hour()’ function from the lubridate package, or the exact time of the day using the ‘as_hms()’ function from the hms package. In both options, you can see that we get rid of the date information and keep only the time information. Then, based on the time information, we can create two types of plots:

  • an histogram using ‘geom_histogram()’ function and the exact time of the day.
  • a bar plot using ‘geom_bar()’ function. Here, we have the option to limit the bins to specific time intervals, such as hourly segments.

data %>%
    ggplot(aes(x=hms::as_hms(sent))) +
        geom_histogram(bins=100) +
        scale_x_time(breaks = scales::date_breaks("2 hours"),
                     labels = function(x) strftime(x, format = "%H:%M"))

Above, we can see that some beeps were sent during the night, hence, outside of the defined sampling scheme.

Distribution by categorical variable

We can go further by splitting the previous plots according to another categorical variable using the ‘facet_wrap()’ function. It is particularly useful when your sampling scheme is inconsistent over the study period or when you want to investigate possible predictors. As an example, if your sampling scheme is different in function of the day of the week, it may be useful to split based on it.

Warning

In the following plot, we split the bar plot according to the day of the week using the ‘wday()’ function from the lubridate package. Remember that, by default, ‘wday()’ returns values from 1=Sunday to 7=Saturday. If you want the week number to start from Monday (i.e., 1=Monday), then set the argument week_start as follows: ‘wday(x, week_start=1)’ (see lubridate section).

data %>% 
    mutate(hour = hour(sent), day = wday(sent)) %>%
    ggplot(aes(x = hour)) +
        geom_bar(position = "dodge") +
        facet_wrap(.~day) +
        scale_x_continuous(limits=c(0,23), breaks=seq(0,24,2))