ESM Preprocessing Gallery – preprocessing

Preprocessing report

The preprocessing report serves as a crucial documentation of the steps taken during the preprocessing phase of the raw data, as well as highlight any data modifications or issues encountered. It serves as a tool for sharing the preprocessing process and instilling confidence in the analysis outcomes. In other words, it improves the transparency and the reproducibility of a study. As a component of the esmtools package, a set of specialized tools has been developed. Each of these tools is thoroughly described in the Reporting tools section.

The report is based on the ESM preprocessing framework, and, hence, use the following stucture:

Import packages and data
Step 1: Import data and preliminary preprocessing
Step 2: Design and sampling scheme
Step 3: Participants’ response behaviors
Step 4: Transform variables
Step 5 (optional): Descriptive statistics and visualization
Export data
Session and data info: see the session and data info section.

Examples of a preprocessing report can be found in the repository of the website or at the top of this page. Those examples showcase the structure and content that can be included in such a report. Additionally, a minimal checklist is provided in the preprocessing checklist section. Note that this checklist is not exhaustive and should be considered as a starting point.

Templates

The esmtools package offers two preprocessing report templates for Rmarkdown: ‘preprocess_esm’ and ‘advanced_preprocess_esm’ (click to download). In the advanced version of the report template, dynamic functionality is introduced using buttons. This interactive feature allows users to navigate through different sections of the report effortlessly, providing a more user-friendly experience. The drawnback is that it requires more time to organize the document.

Note

These templates can be imported into a directory using the ‘use_template()’ function.

esmtools::use_template("preprocess_report", output_dir = getwd())
# OR
esmtools::use_template("advanced_preprocess_report", output_dir = getwd())

More information can be found in the package’s documentation website.

Highlighting

In the report, we particularly encourage to highlight any data modifications and identified issues. We highlight two methods that are further described in the Reporting tools section:

use HTML tags.
use the ‘txt()’ function from the esmtools package.

Using the CSS class ‘text_issue’ that should be defined at the top of the file:

<style>
  .text_issue{ 
    font-weight: bold; 
    color: #e61919; 
    text-decoration: underline #e61919 2px;
}
</style>

The following html and inline code:

<span class="text_issue"> Issue `r er=er+1 ; er`:</span> Description of the issue ...

returns:

Issue 1: Description of the issue …

Example of reporting

Reporting an issue could have the following structure:

We checked the consistency of time-invariant variables. It is expected that participants have a unique ‘role’ value, either 1 or 2.

library(esmtools)
vars_consist(data, "id", c("role"))

   id   role
1   1      1
2   2      2
3   3      1
4   4      2
5   5 (1, 2)
6   6      2
7   7      1
8   8      2
9   9      1
10 10      2
11 11      1
12 12      2
13 13      1
14 14      2
15 15      1
16 16      2
17 17      1
18 18      2
19 19 (1, 2)
20 20      2
21 21      1
22 22      2
23 23      1
24 24      2
25 25      1
26 26      2
27 27      1
28 28      2
29 29      1
30 30      2
31 31      1
32 32      2
33 33      1
34 34      2
35 35      1
36 36      2
37 37      1
38 38      2
39 39      1
40 40      2
41 41      1
42 42      2
43 43      1
44 44      2
45 45      1
46 46      2
47 47      1
48 48      2
49 49      1
50 50      2
51 51      1
52 52      2
53 53      1
54 54      2
55 55      1
56 56      2
57 57      1
58 58      2
59 59      1
60 60      2

Issue 2: Participants 5 and 19 have two role values (1 and 2), where only one value is expected.

Modification 1: For participants 5 and 19, their accurate and expected values for the ‘role’ variable are both 1. In addition, as it is time-invariant, no missing values are expected. Therefore, we adjusted all ‘role’ values to 1.

data[data$id==5, "role"] = 1
data[data$id==19, "role"] = 1

Dynamically knit the data quality report

Dynamic generation of a data quality report during the compilation of the preprocessing report document holds significant utility. This approach allows for the creation of a fresh data quality report with each new iteration of the preprocessed data, aligning it closely with the preprocessing steps. Such a process proves particularly advantageous as the data quality report serves as a secondary verification of the preprocessing procedure. Indeed, it plays a vital role in promptly identifying any issues or anomalies that may have occurred during the preprocessing, particularly during the final stages.

This method is already implemented in the “advanced_preprocess_esm” template and the data quality template. Specifically, the following line of code can be found at the bottom of the preprocess report templates:

# Path to the data quality report (.Rmd format) 
rmark_file = "path/data_quality_report.Rmd"

# Name of the output data quality report. Date is included to keep track of changes
filename_out = paste0(as.Date(Sys.time()), "_Data_Quality_Report.html")

# Knit the data quality report
rmarkdown::render(rmark_file, output_file=filename_out, params=list(file_path=file_path_preproc))

To make it work you need to:

Create the data quality report based on the proposed template (see Data quality report section) and incorporate relevant elements and plots within it. The data quality report template contains the following line of R code. In particular, the ‘file_path’ is either defined on informed parameters when knitting the preprocessing report, or from manual path specification if it is run independently (not in synergy with the preprocessing report).

if(exists(params)){
    file_path = params$file_path_preproc
} else {
    file_path = "path/data_file.csv"
}
data = read.csv(file_path)

Specify the path of the data quality report by assigning it to the ‘rmark_file’ variable.
Modify the name of the output data quality file by altering the value assigned to the ‘filename_out’ variable (optional).
Verify that the ‘file_path_preproc’ variable contains the accurate path to the exported preprocessed data. This variable will be used in the data quality report to import the preprocessed data.

Note

It is important to note that any issues encountered during the data quality report’s rendering will halt the preprocessing report as well. To avoid this, it is best to initiate the process after the preprocessing phase is well underway, preventing potential report issues from impacting the overall rendering process.