ESM Preprocessing Gallery

Centering

Packages: dplyr

Centering is often done when we want to disentangle the within and the between variances in the data. For this section, we will use a subset of the simulated dataset. Here are the first rows:

  dyad id obsno PA1 PA2
1    1  1     1  NA  NA
2    1  1     2  NA  NA
3    1  1     3  NA  NA
4    1  1     4   1  11
5    1  2     1  32   3
6    1  2     2  23   4

Following, we present the usual centering methods taking as example the ‘PA1’ variable. For each, we propose 2 ways to compute them: one with R base functions and one with dplyr functions. At the bottom of this page, we show the results of each centering type for our dataset at hand.

Grand-mean centering (gc): center the values based on the variable mean.

data$PA1_gc = data$PA1 - mean(data$PA1,na.rm=TRUE)

Person-mean centering (pc): center subjects’ values based on their own variable mean.

data$PA1_pc = data$PA1 - ave(data$PA1, data$id, FUN=function(x) mean(x, na.rm=TRUE))

Lag and person-mean centering (lag_pc): create a lag variable and center its values based on subjects’ original variable mean.

lag_PA1 = c(NA, data$PA1[-nrow(data)])
lag_PA1[which(!duplicated(data$id))] = NA
data$PA1_lag_pc = lag_PA1 - ave(data$PA1, data$id, FUN=function(x) mean(x, na.rm=TRUE))

Dyad-mean centering (dc): center dyads’ values based on their own variable mean.

data$PA1_dc = data$PA1 - ave(data$PA1, data$dyad, FUN=function(x) mean(x, na.rm=TRUE))

The results of the different centering methods are the following:

   dyad id obsno PA1 PA2     PA1_gc     PA1_pc PA1_lag_pc     PA1_dc
1     1  1     1  NA  NA         NA         NA         NA         NA
2     1  1     2  NA  NA         NA         NA         NA         NA
3     1  1     3  NA  NA         NA         NA         NA         NA
4     1  1     4   1  11 -18.400000  0.0000000         NA -18.400000
5     1  2     1  32   3  12.600000  8.0000000         NA  12.600000
6     1  2     2  23   4   3.600000 -1.0000000  8.0000000   3.600000
7     1  2     3  17   7  -2.400000 -7.0000000 -1.0000000  -2.400000
8     1  2     4  24  12   4.600000  0.0000000 -7.0000000   4.600000
9     2  3     1   8  21  -7.800000 -5.5000000         NA  -7.800000
10    2  3     2  19  35   3.200000  5.5000000 -5.5000000   3.200000
11    2  3     3  NA  NA         NA         NA  5.5000000         NA
12    2  3     4  NA  NA         NA         NA         NA         NA
13    2  4     1  NA  NA         NA         NA         NA         NA
14    2  4     2  15  27  -0.800000 -2.3333333         NA  -0.800000
15    2  4     3  18  34   2.200000  0.6666667 -2.3333333   2.200000
16    2  4     4  19  31   3.200000  1.6666667  0.6666667   3.200000
17    3  5     1  NA  NA         NA         NA         NA         NA
18    3  5     2  NA  NA         NA         NA         NA         NA
19    3  5     3  NA  NA         NA         NA         NA         NA
20    3  5     4  11  59   6.666667  0.0000000         NA   6.666667
21    3  6     1  NA  NA         NA         NA         NA         NA
22    3  6     2   1   1  -3.333333  0.0000000         NA  -3.333333
23    3  6     3  NA  NA         NA         NA  0.0000000         NA
24    3  6     4   1   1  -3.333333  0.0000000         NA  -3.333333

Centering mutiple variables at once

It is very common to have more than one variable to center in an ESM dataset. To streamline the process of centering multiple variables, we suggest using the dplyr package which helps eliminate the need for redundant code for each variable. We take the example of the person-mean centering (pc) for ‘PA1’ and ‘PA2’ variables. It can be applied to other:

variables: by modifying the list of variables mentioned.
centering methods: by adjusting the function following the ‘~’ symbol or altering the grouping level. For instance, if you intend to employ dyad-mean centering (dc), you only need to change the grouping level from ‘id’ to ‘dyad’.

library(dplyr)
data = data %>%
  group_by(id) %>%
  mutate(across(c(PA1, PA2), ~ . - mean(., na.rm = TRUE), .names = "{.col}_pc"))

The results of the centering of the variables:

   dyad id obsno PA1 PA2     PA1_pc     PA2_pc
1     1  1     1  NA  NA         NA         NA
2     1  1     2  NA  NA         NA         NA
3     1  1     3  NA  NA         NA         NA
4     1  1     4   1  11  0.0000000  0.0000000
5     1  2     1  32   3  8.0000000 -3.5000000
6     1  2     2  23   4 -1.0000000 -2.5000000
7     1  2     3  17   7 -7.0000000  0.5000000
8     1  2     4  24  12  0.0000000  5.5000000
9     2  3     1   8  21 -5.5000000 -7.0000000
10    2  3     2  19  35  5.5000000  7.0000000
11    2  3     3  NA  NA         NA         NA
12    2  3     4  NA  NA         NA         NA
13    2  4     1  NA  NA         NA         NA
14    2  4     2  15  27 -2.3333333 -3.6666667
15    2  4     3  18  34  0.6666667  3.3333333
16    2  4     4  19  31  1.6666667  0.3333333
17    3  5     1  NA  NA         NA         NA
18    3  5     2  NA  NA         NA         NA
19    3  5     3  NA  NA         NA         NA
20    3  5     4  11  59  0.0000000  0.0000000
21    3  6     1  NA  NA         NA         NA
22    3  6     2   1   1  0.0000000  0.0000000
23    3  6     3  NA  NA         NA         NA
24    3  6     4   1   1  0.0000000  0.0000000