Summary tables
Packages: dplyr, psych, kableExtra
IN DEVELOPMENT
Summary tables are often an efficient way to present variable information in articles. For instance, they can be used to present the mean and standard deviation of variables, their correlation structure, or within- and between-participant differences. This section will present a few examples of informative tables and how to create them using R. Finally, we will present how to extract them in different formats to be integrated into an article.
Mean and variance tables
Within and between
<- data[,c("id","obsno","PA1","PA2","PA3","NA1","NA2","NA3")]
describe_data <- psych::statsBy(describe_data, "id")
describe_data_2
#Making tables
<- as.data.frame(describe_data_2[["mean"]]) #this takes the means and pops them in a new data frame
data_mean <- as.data.frame(describe_data_2[["sd"]]) #this takes the within-person SDs and pops them in a new data frame
data_withinsd
#between-person SDs are not calculated by statsby, so we can grab them ourselves and make a third dataframe
<- as.data.frame(sapply(data_mean, sd, na.rm = TRUE))
data_betweensd
# I send this all to a new dataframe
<- data.frame(Variable=colnames(data_mean), #this is grabbing the column names as a new column called "variable" to print
data_describe_output M=colMeans(data_mean, na.rm = TRUE), #this is going to put the mean of the person means as a column called M
Within_person_SD=colMeans(data_withinsd, na.rm=TRUE), #same for the within SDs
Between_person_SD=data_betweensd[,1], #this is grabbing the first column of the between SD datafile (which has one column)
ICC=describe_data_2[["ICC1"]])
#first, we want to delete the row for ID and beepnumber
#we're asking R to remove the first two rows
<- data_describe_output[-c(1,2),]
data_describe_output
#finally, outputting the tables
data_describe_output
Variable M Within_person_SD Between_person_SD ICC
PA1 PA1 24.08572 7.376207 21.935024 0.7627714
PA2 PA2 21.87614 7.787698 17.299357 0.6337179
PA3 PA3 24.05133 8.902822 22.551042 0.7107763
NA1 NA1 22.88302 8.639781 24.505277 0.7540195
NA2 NA2 10.47019 10.873926 3.164872 0.0491020
NA3 NA3 62.53472 20.648478 20.280107 0.4684942
Distribution tables
ICC table
Correlation tables
Correlation: within and between
<- data[, c("id","PA1","PA2","PA3","NA1","NA2","NA3")]
key_varEMA# key_varEMA<-key_varEMA[,c(1:3,5,4,6)] #order so that this is in the same order as the descriptives
= psych::statsBy (key_varEMA, key_varEMA$id, na.rm=TRUE)
EMAcorr
# Get Between subjects or
= as.data.frame(EMAcorr[["rbg"]])
Between # Get within subjects cor
= as.data.frame(EMAcorr[["rwg"]])
Within
# Return a table with data under the diagonal corresponding to within
# subject correlations, and the data above the diagonal corresponding
# to between subject correlations.
= function(Data, GroupingID, roundDP){
EMA_corr if (missing(roundDP)){roundDP = -1}
# Get Cor Data via psych package
= psych::statsBy(Data, GroupingID)
EMA_corr # Get Between subjects or
= as.data.frame(EMA_corr[["rbg"]])
Between # Get within subjects cor
= as.data.frame(EMA_corr[["rwg"]])
Within # Round correlations
if (roundDP > -1){
= round(Between, roundDP)
Between = round(Within, roundDP)
Within
}# Replace diagonal
diag(Between) = '-'
# Replace items along one half of the diagonal
= c()
Check for (kk in seq(ncol(Between))){
for (ii in seq(ncol(Between))){
if (ii != kk & !((ii^2 + kk^2) %in% Check) ){
= c(Check, ii^2 + kk^2)
Check = Within[ii,kk]
Between[ii,kk]
}
}
}# Replace column & row names
colnames(Between) = gsub('.bg', '', colnames(Between))
rownames(Between) = colnames(Between)
return(Between)
}
<- EMA_corr(key_varEMA, key_varEMA$id,2)
data_corr $Variable <- colnames(data_corr)
data_corr<- data_corr[,-c(1)]
data_corr2
data_corr2
PA1 PA2 PA3 NA1 NA2 NA3 Variable
id -0.04 -0.13 -0.09 -0.17 0.01 0.15 id
PA1 - 0.09 0.4 0.3 0.22 -0.05 PA1
PA2 0.57 - 0.06 0.15 -0.03 -0.02 PA2
PA3 0.41 0.19 - 0.26 0.12 -0.29 PA3
NA1 -0.06 -0.11 0.24 - -0.17 -0.04 NA1
NA2 0.02 0 0.01 -0.01 - -0.1 NA2
NA3 -0.02 0.03 -0.02 0 -0.03 - NA3
Export
First, packages can help you further design your table, such as the kableExtra
package. Indeed, this package allows you to add lines, extra headers, colors, and other elements to your table. It is not the purpose here to go into detail, you can refer to the of the package.
First, certain packages, like kableExtra, enhance your ability to customize tables. This package enables the addition of lines, extra headers, colors, and various stylistic elements to your tables. A detailed exploration is beyond our scope. You can consult the package’s vignette for comprehensive guidance. Here, we will only convert the dataframe to a kable object and change its style to a classic one.
= data_describe_output %>%
kbl_describe kbl() %>% # convert to kable object
kable_classic_2(full_width = F) # change style
The table looks like this now:
Variable | M | Within_person_SD | Between_person_SD | ICC | |
---|---|---|---|---|---|
PA1 | PA1 | 24.08572 | 7.376207 | 21.935024 | 0.7627714 |
PA2 | PA2 | 21.87614 | 7.787698 | 17.299357 | 0.6337179 |
PA3 | PA3 | 24.05133 | 8.902822 | 22.551043 | 0.7107763 |
NA1 | NA1 | 22.88302 | 8.639781 | 24.505277 | 0.7540195 |
NA2 | NA2 | 10.47019 | 10.873926 | 3.164872 | 0.0491020 |
NA3 | NA3 | 62.53472 | 20.648477 | 20.280107 | 0.4684942 |
In function of the needs, a table can be exported in different formats:
- CSV: easy to import in Excel or Word. Note that it does not concern the kable format, only the raw dataframe.
- PNG: it is the format of a picture. Then, it is easy to integrate into multiple documents.
- Latex: easy to integrate in a Latex document. Here, we will need to precise the format in the format argument of the ‘kbl()’ function. We recommend to later change the style of the table in your Latex document.
- HTML: convert to an all-in-one HTML file. It is easy to integrate into a web page or a blog post.
write.csv(data_describe_output, "article/table_describe_output.csv", row.names = FALSE)