ESM Preprocessing Gallery – summary

Summary tables

Packages: dplyr, psych, kableExtra

IN DEVELOPMENT

Summary tables are often an efficient way to present variable information in articles. For instance, they can be used to present the mean and standard deviation of variables, their correlation structure, or within- and between-participant differences. This section will present a few examples of informative tables and how to create them using R. Finally, we will present how to extract them in different formats to be integrated into an article.

Mean and variance tables

Within and between

describe_data <- data[,c("id","obsno","PA1","PA2","PA3","NA1","NA2","NA3")]
describe_data_2 <- psych::statsBy(describe_data, "id")

#Making tables
data_mean <- as.data.frame(describe_data_2[["mean"]]) #this takes the means and pops them in a new data frame
data_withinsd <- as.data.frame(describe_data_2[["sd"]]) #this takes the within-person SDs and pops them in a new data frame

#between-person SDs are not calculated by statsby, so we can grab them ourselves and make a third dataframe
data_betweensd <- as.data.frame(sapply(data_mean, sd, na.rm = TRUE))


# I send this all to a new dataframe
data_describe_output <- data.frame(Variable=colnames(data_mean), #this is grabbing the column names as a new column called "variable" to print
                                    M=colMeans(data_mean, na.rm = TRUE), #this is going to put the mean of the person means as a column called M
                                    Within_person_SD=colMeans(data_withinsd, na.rm=TRUE), #same for the within SDs
                                    Between_person_SD=data_betweensd[,1], #this is grabbing the first column of the between SD datafile (which has one column)
                                    ICC=describe_data_2[["ICC1"]])


#first, we want to delete the row for ID and beepnumber
#we're asking R to remove the first two rows
data_describe_output <- data_describe_output[-c(1,2),]

#finally, outputting the tables
data_describe_output

    Variable        M Within_person_SD Between_person_SD       ICC
PA1      PA1 24.08572         7.376207         21.935024 0.7627714
PA2      PA2 21.87614         7.787698         17.299357 0.6337179
PA3      PA3 24.05133         8.902822         22.551042 0.7107763
NA1      NA1 22.88302         8.639781         24.505277 0.7540195
NA2      NA2 10.47019        10.873926          3.164872 0.0491020
NA3      NA3 62.53472        20.648478         20.280107 0.4684942

Distribution tables

ICC table

Correlation tables

Correlation: within and between

key_varEMA<- data[, c("id","PA1","PA2","PA3","NA1","NA2","NA3")]
# key_varEMA<-key_varEMA[,c(1:3,5,4,6)] #order so that this is in the same order as the descriptives

EMAcorr= psych::statsBy (key_varEMA, key_varEMA$id, na.rm=TRUE)


# Get Between subjects or
Between = as.data.frame(EMAcorr[["rbg"]]) 
# Get within subjects cor
Within = as.data.frame(EMAcorr[["rwg"]]) 


# Return a table with data under the diagonal corresponding to within
#   subject correlations, and the data above the diagonal corresponding
#   to between subject correlations.
EMA_corr = function(Data, GroupingID, roundDP){
  if (missing(roundDP)){roundDP = -1}
  
  # Get Cor Data via psych package
  EMA_corr = psych::statsBy(Data, GroupingID)
  # Get Between subjects or
  Between = as.data.frame(EMA_corr[["rbg"]]) 
  # Get within subjects cor
  Within = as.data.frame(EMA_corr[["rwg"]]) 
  # Round correlations
  if (roundDP > -1){
    Between = round(Between, roundDP)
    Within = round(Within, roundDP)
  }
  # Replace diagonal
  diag(Between) = '-'
  # Replace items along one half of the diagonal
  Check = c()
  for (kk in seq(ncol(Between))){
    for (ii in seq(ncol(Between))){
      if (ii != kk & !((ii^2 + kk^2) %in% Check) ){
        Check = c(Check, ii^2 + kk^2)
        Between[ii,kk] = Within[ii,kk]
      }
    }
  }
  # Replace column & row names
  colnames(Between) = gsub('.bg', '', colnames(Between))
  rownames(Between) = colnames(Between)
  
  return(Between)
}


data_corr <- EMA_corr(key_varEMA, key_varEMA$id,2)
data_corr$Variable <- colnames(data_corr)
data_corr2 <- data_corr[,-c(1)]

data_corr2

      PA1   PA2   PA3   NA1   NA2   NA3 Variable
id  -0.04 -0.13 -0.09 -0.17  0.01  0.15       id
PA1     -  0.09   0.4   0.3  0.22 -0.05      PA1
PA2  0.57     -  0.06  0.15 -0.03 -0.02      PA2
PA3  0.41  0.19     -  0.26  0.12 -0.29      PA3
NA1 -0.06 -0.11  0.24     - -0.17 -0.04      NA1
NA2  0.02     0  0.01 -0.01     -  -0.1      NA2
NA3 -0.02  0.03 -0.02     0 -0.03     -      NA3

Export

First, packages can help you further design your table, such as the kableExtra package. Indeed, this package allows you to add lines, extra headers, colors, and other elements to your table. It is not the purpose here to go into detail, you can refer to the of the package.

First, certain packages, like kableExtra, enhance your ability to customize tables. This package enables the addition of lines, extra headers, colors, and various stylistic elements to your tables. A detailed exploration is beyond our scope. You can consult the package’s vignette for comprehensive guidance. Here, we will only convert the dataframe to a kable object and change its style to a classic one.

kbl_describe = data_describe_output %>%
  kbl() %>% # convert to kable object
  kable_classic_2(full_width = F) # change style

The table looks like this now:

	Variable	M	Within_person_SD	Between_person_SD	ICC
PA1	PA1	24.08572	7.376207	21.935024	0.7627714
PA2	PA2	21.87614	7.787698	17.299357	0.6337179
PA3	PA3	24.05133	8.902822	22.551043	0.7107763
NA1	NA1	22.88302	8.639781	24.505277	0.7540195
NA2	NA2	10.47019	10.873926	3.164872	0.0491020
NA3	NA3	62.53472	20.648477	20.280107	0.4684942

In function of the needs, a table can be exported in different formats:

CSV: easy to import in Excel or Word. Note that it does not concern the kable format, only the raw dataframe.
PNG: it is the format of a picture. Then, it is easy to integrate into multiple documents.
Latex: easy to integrate in a Latex document. Here, we will need to precise the format in the format argument of the ‘kbl()’ function. We recommend to later change the style of the table in your Latex document.
HTML: convert to an all-in-one HTML file. It is easy to integrate into a web page or a blog post.

write.csv(data_describe_output, "article/table_describe_output.csv", row.names = FALSE)

save_kable(kbl_describe, "article/table_describe_output.png")

data_describe_output %>%
  kbl(format = "latex", booktabs = TRUE) %>%
  kable_styling(latex_options = c("striped", "scale_down")) %>%
  save_kable("article/table_describe_output.tex")

save_kable(kbl_describe, "article/table_describe_output.html")