Preprocessing report
The preprocessing report serves as a crucial documentation of the steps taken during the preprocessing phase of the raw data, as well as highlight any data modifications or issues encountered. It serves as a tool for sharing the preprocessing process and instilling confidence in the analysis outcomes. In other words, it improves the transparency and the reproducibility of a study. As a component of the esmtools package, a set of specialized tools has been developed. Each of these tools is thoroughly described in the Reporting tools section.
The report is based on the ESM preprocessing framework, and, hence, use the following stucture:
- Import packages and data
- Step 1: Import data and preliminary preprocessing
- Step 2: Design and sampling scheme
- Step 3: Participants’ response behaviors
- Step 4: Transform variables
- Step 5 (optional): Descriptive statistics and visualization
- Export data
- Session and data info: see the session and data info section.
Examples of a preprocessing report can be found in the repository of the website or at the top of this page. Those examples showcase the structure and content that can be included in such a report. Additionally, a minimal checklist is provided in the preprocessing checklist section. Note that this checklist is not exhaustive and should be considered as a starting point.
Templates
The esmtools package offers two preprocessing report templates for Rmarkdown: ‘preprocess_esm’ and ‘advanced_preprocess_esm’ (click to download). In the advanced version of the report template, dynamic functionality is introduced using buttons. This interactive feature allows users to navigate through different sections of the report effortlessly, providing a more user-friendly experience. The drawnback is that it requires more time to organize the document.
These templates can be imported into a directory using the ‘use_template()’ function.
::use_template("preprocess_report", output_dir = getwd())
esmtools# OR
::use_template("advanced_preprocess_report", output_dir = getwd()) esmtools
More information can be found in the package’s documentation website.
Highlighting
In the report, we particularly encourage to highlight any data modifications and identified issues. We highlight two methods that are further described in the Reporting tools section:
- use HTML tags.
- use the ‘txt()’ function from the esmtools package.
Using the CSS class ‘text_issue’ that should be defined at the top of the file:
<style>
.text_issue{
font-weight: bold;
color: #e61919;
text-decoration: underline #e61919 2px;
}</style>
The following html and inline code:
<span class="text_issue"> Issue `r er=er+1 ; er`:</span> Description of the issue ...
returns:
Issue 1: Description of the issue …
Example of reporting
Reporting an issue could have the following structure:
We checked the consistency of time-invariant variables. It is expected that participants have a unique ‘role’ value, either 1 or 2.
library(esmtools)
vars_consist(data, "id", c("role"))
id role
1 1 1
2 2 2
3 3 1
4 4 2
5 5 (1, 2)
6 6 2
7 7 1
8 8 2
9 9 1
10 10 2
11 11 1
12 12 2
13 13 1
14 14 2
15 15 1
16 16 2
17 17 1
18 18 2
19 19 (1, 2)
20 20 2
21 21 1
22 22 2
23 23 1
24 24 2
25 25 1
26 26 2
27 27 1
28 28 2
29 29 1
30 30 2
31 31 1
32 32 2
33 33 1
34 34 2
35 35 1
36 36 2
37 37 1
38 38 2
39 39 1
40 40 2
41 41 1
42 42 2
43 43 1
44 44 2
45 45 1
46 46 2
47 47 1
48 48 2
49 49 1
50 50 2
51 51 1
52 52 2
53 53 1
54 54 2
55 55 1
56 56 2
57 57 1
58 58 2
59 59 1
60 60 2
Issue 2: Participants 5 and 19 have two role values (1 and 2), where only one value is expected.
Modification 1: For participants 5 and 19, their accurate and expected values for the ‘role’ variable are both 1. In addition, as it is time-invariant, no missing values are expected. Therefore, we adjusted all ‘role’ values to 1.
$id==5, "role"] = 1
data[data$id==19, "role"] = 1 data[data
Dynamically knit the data quality report
Dynamic generation of a data quality report during the compilation of the preprocessing report document holds significant utility. This approach allows for the creation of a fresh data quality report with each new iteration of the preprocessed data, aligning it closely with the preprocessing steps. Such a process proves particularly advantageous as the data quality report serves as a secondary verification of the preprocessing procedure. Indeed, it plays a vital role in promptly identifying any issues or anomalies that may have occurred during the preprocessing, particularly during the final stages.
This method is already implemented in the “advanced_preprocess_esm” template and the data quality template. Specifically, the following line of code can be found at the bottom of the preprocess report templates:
# Path to the data quality report (.Rmd format)
= "path/data_quality_report.Rmd"
rmark_file
# Name of the output data quality report. Date is included to keep track of changes
= paste0(as.Date(Sys.time()), "_Data_Quality_Report.html")
filename_out
# Knit the data quality report
::render(rmark_file, output_file=filename_out, params=list(file_path=file_path_preproc)) rmarkdown
To make it work you need to:
- Create the data quality report based on the proposed template (see Data quality report section) and incorporate relevant elements and plots within it. The data quality report template contains the following line of R code. In particular, the ‘file_path’ is either defined on informed parameters when knitting the preprocessing report, or from manual path specification if it is run independently (not in synergy with the preprocessing report).
if(exists(params)){
= params$file_path_preproc
file_path else {
} = "path/data_file.csv"
file_path
}= read.csv(file_path) data
- Specify the path of the data quality report by assigning it to the ‘rmark_file’ variable.
- Modify the name of the output data quality file by altering the value assigned to the ‘filename_out’ variable (optional).
- Verify that the ‘file_path_preproc’ variable contains the accurate path to the exported preprocessed data. This variable will be used in the data quality report to import the preprocessed data.
It is important to note that any issues encountered during the data quality report’s rendering will halt the preprocessing report as well. To avoid this, it is best to initiate the process after the preprocessing phase is well underway, preventing potential report issues from impacting the overall rendering process.