Check sampling scheme
The following set of R codes helps to check that the sampling scheme have been well followed by the application/server in charge of sending the beeps to the participant. The general idea is to check that the beeps have been sent in the rigth time interval. To do so, the function need to be adapted to the sampling scheme in hand.
The dificulty to adapt such a function will be determined by the type of sampling scheme:
- Time-contingent sampling (a)
- Interval-contingent sampling (a)
- Event-contingent sampling (b)
- Burst-contingent sampling (c)
- Signal-contingent sampling
- etc.
In addition, the sampling can remain the same (i.e., consistent) or change (i.e., unconsistent) over the study period. Nonetheless, it can be a difficult task especially when the sampling in unconsistent and/or when the sampling is a mix of event-based and interval based sampling.
(a) Check time and interval-contingent sampling
Packages: dplyr, ???
Because both time and interval-contingent sampling sent beeps in an interval of time. They can be checked with the same method. The steps are the following:
- Recreate the sampling scheme based on study information
- Extract the sampling scheme from the true data
- Compare both and extract the results
- Investigate the results
Recreate sampling scheme
Here, you will need to change the following variables:
- duration_study: the number of day the study last
- week_day: the week day numbers the beeps could be sent. We are using a string and separated the day number by a comma. Remind that in R: 1=Sunday and 7=Saturday.
- obsnoday: number of beeps in a day
- sched_sent: the starting time in a day of the periods when the beeps could be sent. We recommend to set up the time one before the actual time. Indeed, because server timing, the beeps could be few seconds before the actual time.
- sched_int_sent (in minutes): the period of time
- sched_int_start (in minutes): the period of time in which the participant can start the survey.
- int_end (in minutes): the period of time in which the participant can finish the survey the survey.
If your sampling scheme is inconsistent over the study duration, you can specify 2+ sampling scheme with the following line of codes and then merge them in a unique dataframe.
= 14
duration_study = "1,2,3,4,5,6,7"
week_day = 5
obsnoday = duration_study * obsnoday
obsno_max
= hms::as_hms(c("09:59:00", "10:59:00", "11:59:00", "12:59:00", "13:59:00"))
sched_sent = 11
int_sent = hms::as_hms(sched_sent + int_sent * 60)
sched_int_sent = 21
int_start = hms::as_hms(sched_sent + int_start * 20)
sched_int_start = 60
int_end = hms::as_hms(sched_sent + int_end * 60)
sched_int_end
= expand.grid(daycum=c(1:duration_study), obsnoday=c(1:obsnoday), week_day=week_day)
df_test = data.frame(obsnoday=1:obsnoday, sched_sent=sched_sent, sched_int_sent=sched_int_sent, sched_int_start=sched_int_start, sched_int_end=sched_int_end)
df_scheme_day = left_join(df_test, df_scheme_day, by=c("obsnoday")) %>%
df_test arrange(daycum, obsnoday) %>%
mutate(obsno = 1:n())
Extract sampling from the dataframe
To extract the sampling scheme, we need the following variables: id number, daycum, obsnoday, sent time, start time and end time.
# Extract sampling scheme
= data.frame(id = data$id,
df_sampling daycum = as.numeric(data$daycum), obsnoday=data$obsnoday,
sent_date = data$sent, sent_time=hms::as_hms(data$sent), start_time=hms::as_hms(data$start), end_time=hms::as_hms(data$end))
# Compute obsno
= df_sampling %>%
df_sampling arrange(id,daycum,obsnoday) %>%
group_by(id) %>%
mutate(obsno=1:n())
Compare sampling schemes
To check the consistency between the two sampling scheme, we will test if a observation in the true sampling scheme matched with a beep in the recreated sampling scheme. We can considere different outputs:
- beep_match_ok: if it matched a beep from the recreated sampling scheme
- beep_matched: the obsno of the matched beep
- sent_time_ok: if the sent time was in the interval defined by the matched beep.
- start_time_ok: if the start time was in the interval defined by the matched beep.
- end_time_ok: if the end time was in the interval defined by the matched beep.
We need to base the comparison on variables. We have two choices:
- day number and sent time variables
- day number and obsnoday variables
Choose one of the two methods in function of your dataset properties, knowing that the variables use as basis must be without mistakes!
= as.data.frame(df_sampling)
df_sampling_1 $beep_match_ok = 0
df_sampling_1$beep_matched = 0
df_sampling_1$sent_time_ok = 0
df_sampling_1$start_time_ok = 0
df_sampling_1$end_time_ok = 0
df_sampling_1
for(row in 1:nrow(df_sampling_1)){
= df_sampling_1[row, "daycum"]
day_cum # obsnoday = df_sampling_1[row, "obsnoday"]
= df_test$daycum == day_cum
daycum_test = which(daycum_test & df_test$sched_sent < as_hms(df_sampling_1[row, "sent_date"]) & df_test$sched_int_sent > as_hms(df_sampling_1[row, "sent_date"]))
pos
if (length(pos) == 1){
"beep_match_ok"] = 1
df_sampling_1[row, "beep_matched"] = df_test[pos, "obsno"]
df_sampling_1[row,
"sent_time_ok"] = ifelse(df_sampling_1[row,"sent_time"] >= df_test[pos,"sched_sent"] & df_sampling_1[row,"sent_time"] <= df_test[pos,"sched_int_sent"],
df_sampling_1[row,1, 0)
"start_time_ok"] = ifelse(df_sampling_1[row,"start_time"] >= df_test[pos,"sched_sent"] & df_sampling_1[row,"start_time"] <= df_test[pos,"sched_int_end"],
df_sampling_1[row,1, 0)
"end_time_ok"] = ifelse(df_sampling_1[row,"end_time"] >= df_test[pos, "sched_sent"] & df_sampling_1[row,"end_time"] <= df_test[pos, "sched_int_end"],
df_sampling_1[row,1, 0)
} }
Checking sampling
Now that the comparison have ran, we can analysis the different outputs.
First we will look for beep in the recreated sampling scheme that have not matched any observation in the true sampling scheme.
# Extract which obsno have not been sent(?)
= split(df_sampling_2$beep_matched, df_sampling_2$id)
obsno_split = lapply(obsno_split, function(x) c(1:obsno_max)[!c(1:obsno_max) %in% x])
obsno_missed = data.frame()
df_missed for (i in 1:length(obsno_missed)){
= names(obsno_split)[i]
id_ = which(df_test$obsno %in% obsno_missed[[i]])
pos if (length(pos) > 0){
= cbind(data.frame(id=id_), df_test[pos,])
new_df = rbind(df_missed, new_df)
df_missed
}
}
df_missed
data frame with 0 columns and 0 rows
Beeps not matched in the sampling scheme
= which(is.na(df_sampling_2$beep_matched))
beep_not_matched data[beep_not_matched,]
# A tibble: 0 × 20
# Groups: id, daycum [0]
# ℹ 20 variables: dyad <dbl>, role <int>, obsno <int>, id <dbl>, age <int>,
# cond_dyad <chr>, scheduled <dttm>, sent <dttm>, start <dttm>, end <dttm>,
# contact <int>, PA1 <int>, PA2 <int>, PA3 <int>, NA1 <int>, NA2 <int>,
# NA3 <int>, location <chr>, daycum <drtn>, obsnoday <int>
Beeps that have matched but sent outside of the time interval
= which(is.na(df_sampling_2$sent_time_ok))
sent_out data[sent_out,]
# A tibble: 0 × 20
# Groups: id, daycum [0]
# ℹ 20 variables: dyad <dbl>, role <int>, obsno <int>, id <dbl>, age <int>,
# cond_dyad <chr>, scheduled <dttm>, sent <dttm>, start <dttm>, end <dttm>,
# contact <int>, PA1 <int>, PA2 <int>, PA3 <int>, NA1 <int>, NA2 <int>,
# NA3 <int>, location <chr>, daycum <drtn>, obsnoday <int>
Beeps that have matched but start outside of the time interval
= which(is.na(df_sampling_2$start_time_ok))
start_out data[start_out,]
# A tibble: 521 × 20
# Groups: id, daycum [253]
dyad role obsno id age cond_dyad scheduled
<dbl> <int> <int> <dbl> <int> <chr> <dttm>
1 1 1 1 1 40 condB 2018-10-17 08:00:08
2 1 1 2 1 40 condB 2018-10-17 09:00:01
3 1 1 3 1 40 condB 2018-10-17 09:59:56
4 1 1 5 1 40 condB 2018-10-17 12:00:12
5 1 1 6 1 40 condB 2018-10-18 07:59:47
6 1 1 10 1 40 condB 2018-10-18 12:00:06
7 1 1 11 1 40 condB 2018-10-19 07:59:40
8 1 1 16 1 40 condB 2018-10-20 08:00:04
9 1 1 17 1 40 condB 2018-10-20 09:00:13
10 1 1 18 1 40 condB 2018-10-20 10:00:13
# ℹ 511 more rows
# ℹ 13 more variables: sent <dttm>, start <dttm>, end <dttm>, contact <int>,
# PA1 <int>, PA2 <int>, PA3 <int>, NA1 <int>, NA2 <int>, NA3 <int>,
# location <chr>, daycum <drtn>, obsnoday <int>
Beeps that have matched but end outside of the time interval
= which(is.na(df_sampling_2$end_time_ok))
end_out data[end_out,]
# A tibble: 521 × 20
# Groups: id, daycum [253]
dyad role obsno id age cond_dyad scheduled
<dbl> <int> <int> <dbl> <int> <chr> <dttm>
1 1 1 1 1 40 condB 2018-10-17 08:00:08
2 1 1 2 1 40 condB 2018-10-17 09:00:01
3 1 1 3 1 40 condB 2018-10-17 09:59:56
4 1 1 5 1 40 condB 2018-10-17 12:00:12
5 1 1 6 1 40 condB 2018-10-18 07:59:47
6 1 1 10 1 40 condB 2018-10-18 12:00:06
7 1 1 11 1 40 condB 2018-10-19 07:59:40
8 1 1 16 1 40 condB 2018-10-20 08:00:04
9 1 1 17 1 40 condB 2018-10-20 09:00:13
10 1 1 18 1 40 condB 2018-10-20 10:00:13
# ℹ 511 more rows
# ℹ 13 more variables: sent <dttm>, start <dttm>, end <dttm>, contact <int>,
# PA1 <int>, PA2 <int>, PA3 <int>, NA1 <int>, NA2 <int>, NA3 <int>,
# location <chr>, daycum <drtn>, obsnoday <int>
We can have a broad view with the a summary plot:
= df_sampling_2 %>%
df_sampling_2_ gather(type, ok, beep_match_ok, sent_time_ok, start_time_ok, end_time_ok)
%>%
df_sampling_2_ ggplot(aes(y=obsno, x=type, fill=as.logical(ok))) +
geom_tile() +
facet_grid(id~., space="free_x", scales="free_x") +
theme(strip.placement = "outside",
strip.background = element_rect(fill=NA,colour="grey50"),
panel.spacing=unit(0,"cm"),
axis.ticks.y = element_blank(),
axis.text.y = element_blank())
# scale_x_discrete(guide = guide_axis_nested(delim = "&")) +
# theme(ggh4x.axis.nesttext.y = element_blank())
(b) Check event-contingent sampling
IN DEVELOPMENT
(c) Check burst-contingent sampling
IN DEVELOPMENT
If the burst are trigered on event, then you can refer to the previous part to check it. Here, we will focus especially on checking if the follow-ups beeps have followed the defined sampling scheme.