ESM Preprocessing Gallery – common

Common scores

Packages: dplyr

Descriptive scores are a fundamental part of analyzing data from experience sampling studies (ESM). They allow us to summarize the distribution of responses and identify patterns or trends in the data.

‘summarize()’ and ‘mutate()’ functions

The ‘summarize()’ and ‘mutate()’ functions from the dplyr package provide a convenient way to compute descriptive statistics. With these functions, it is possible to compute various summary statistics, such as means, standard deviations, and counts, for different variables and grouping variables in a data frame. In particular:

summarize(): is used to calculate summary statistics for each group in the data frame. In the outputs, there is one row per group. Below, we group by ‘id’ and compute the mean of the ‘PA1’ variable for each participant. The output is composed of one row per participant.

data %>% 
    group_by(id) %>%
    summarize(mean_PA1 = mean(PA1,na.rm=TRUE)) %>% head()

# A tibble: 6 × 2
     id mean_PA1
  <dbl>    <dbl>
1     1     5.46
2     2    18.5 
3     3    20.3 
4     4    24.4 
5     5    10.8 
6     6     3.39

mutate(): is used to create new variables based on existing ones. It does not change the shape of the output dataframe, it only adds a new column. Below, we group by ‘id’ and compute the mean of the ‘PA1’ variable for each participant. Unlike the ‘summarise’ function, which reduces data to summary values, here all rows and additional variables are retained.

data %>% 
    group_by(id) %>%
    mutate(mean_PA1 = mean(PA1,na.rm=TRUE)) %>% head()

# A tibble: 6 × 10
# Groups:   id [1]
     id daycum obsno   PA1   PA2   PA3   NA1   NA2   NA3 mean_PA1
  <dbl> <drtn> <int> <int> <int> <int> <int> <int> <int>    <dbl>
1     1 1 days     1    NA    NA    NA    NA    NA    NA     5.46
2     1 1 days     2    NA    NA    NA    NA    NA    NA     5.46
3     1 1 days     3    NA    NA    NA    NA    NA    NA     5.46
4     1 1 days     4     1    11    25    10    16    28     5.46
5     1 1 days     5    NA    NA    NA    NA    NA    NA     5.46
6     1 2 days     6    NA    NA    NA    NA    NA    NA     5.46

Descriptive statistics

In this section, we will go over some commonly used descriptive measures in ESM research and show how to compute them using R.

In particular, an important decision is about how to handle the missing values. If na.rm=TRUE then it removes the missing values in the computation (see check computation). If na.rm=FALSE (often default argument) then it keeps the missing values and returns NA values whenever there is at least one NA in the computation.

Here are examples of descriptive scores that are often computed. We propose 2 methods for each: one with R base functions and one with dplyr functions.

Compute mean

data$PA = apply(data[,c("PA1","PA2","PA3")], 1, mean)

Compute weight mean

weight = c(1, .3, .7)
data$PA = apply(data[,c("PA1","PA2","PA3")], 1, function(x) mean(x * weight))

Compute sum

data$PA = apply(data[,c("PA1","PA2","PA3")], 1, sum)

Compute standard deviation

data$PA = apply(data[,c("PA1","PA2","PA3")], 1, sd)

Compute max/min

data$PA = apply(data[,c("PA1","PA2","PA3")], 1, max)

Compute cumulative score within a period (e.g., a day). You might need to replace missing values (NA) by 0 to do so.

data$PA1_na = replace(data$PA1, is.na(data$PA1), 0)
data$PA_cumsum = ave(data$PA1_na, data$id, data$daycum, FUN=cumsum)