ESM Preprocessing Gallery – reporting

Reporting tools

The esmtools package has some tools to help to create a preprocessing report and data quality reports. In particular, we have developed two tools: highlight text, and toggle button. Also part of those tools, you can have a look at the Session and Dataset info section. All tools are implemented in the advanced version of the preprocessing report (see the report example folder).

Highlighting text

In the report, we particularly encourage highlighting any data modifications and identified issues. Users can emphasize specific sections, making it easier to spot and communicate changes made during the preprocessing phase. Often, you only need to copy and paste the required code (see the instruction in the template) and adapt the description part. We propose two methods:

‘txt()’ function from the esmtools package. It can be used in two different ways.
- Basic: make reporting simple.
- Exporting: enable exporting of the comments.
HTML tags and CSS code: it does not requiere function.

txt(): basic

id: a character string that makes the link with the css style. Predefined styles are: ‘esm-issue’, ‘esm-mod’, ‘esm-inspect’. You can create your own CSS style and associate it.
title: the part that is highlighted.
count: a logical value (TRUE by default) indicating whether to include a count in the title part.

For instance, the following inline code `r txt(id='esm-issue',title='Issue',count=TRUE)` The issue is that ... gives:

Issue 1: The issue is that …

Note for consiseness, you can remove the argument names (e.g., ‘txt(’esm-issue’,‘Issue’,TRUE)’).

Note

You can as well highligth:

Data modifications: using `r txt(id='esm-mod','Modification',TRUE)` I changed ...

Modification 1: I changed …

Data inspection: using `r txt(id='esm-inspect','Inspection')` Here we can see that ...

Inspection: Here we can see that …

Note

The ‘esm-issue’, ‘esm-mod’, and ‘esm-inspect’ css class styles are imported whenever you import the esmtools package. You can override those styles to modify the fonts, colors, etc., as you like (an example is commented). In certain cases, you might need to use the ‘!important’ declaration to override the default style definitions.


<style>
.esm-issue{
  /* font-family: Georgia; */
}
</style>

txt(): exporting

The txt() function generates custom text and provides the option to include an optional count that can be integrated into the text. Whenever you have spotted an issue in your dataset (e.g., duplication), you can use the inline code below (within ``). In constract with the basic use of the txt() function, the description need to be within the function, specified inside the ‘text’ argument. The arguments of the function are the following:

id: a character string that makes the link with the css style. Predefined styles are: ‘esm-issue’, ‘esm-mod’, ‘esm-inspect’. You can create your own CSS style and associate it.
title: the part that is highlighted.
text: give a description to the spotted issue.
count: a logical value (TRUE by default) indicating whether to include a count in the title part.

For instance, the following inline code `r txt(id='esm-issue',title='Issue',text='The issue is that ...',count=TRUE)` gives:

Issue 2: The issue is that …

Note for consiseness, you can remove the argument names (e.g., ‘txt(’esm-issue’,‘Issue’,‘The issue is that …’,TRUE)’).

Note

You can as well highligth:

Data modifications: using `r txt(id='esm-mod','Modification','I changed ...',TRUE)`

Modification 2: I changed …

Data inspection: using `r txt(id='esm-inspect','Inspection','Here we can see that ...')`

Inspection: Here we can see that …

Additionally, we can implement LaTeX code to change the font, the size or the layout of the code For instance, we can use itemization with the ul and il tag such as `r txt(id='esm-issue',title='Issue',text='the plot aids in visualizing that: <br> <ul> <li>Firstly ...</li> <li>Secondly ...</li> </ul>')`

Issue 3: the plot aids in visualizing that:

Firstly …
Secondly …

Exporting the highlighted elements: by using the txt() function, you can export selected or all highlighted elements into a JSON file. This not only provides a concise summary of these elements but also creates a machine-readable file that can be used later. The output file will be named following the rmarkdown file name adding ’_list’ (e.g., ‘Preprocessing_report_list.json’). For instance, the advanced preprocessing report example (here) gives the following output:

{
    "Modification 1": "reformating the timestamps variables to be in POSIXct format.",
    "Modification 2": "create variable to report maximum value in the multi-response item 'perc_stress_child'.",
    "Error 1": "from the descriptive output in the Importation check section, we can see that the missing values in perc_stress_child are coded as ''",
    "Modification 3": "We recoded the missing values of the 'perc_stress_child' variable as NA",
    "Error 2": "There are inconsistent missing values:",
    "Modification 4": "set as missing the 'start' and 'end' variables of those 27 first inconsistent cases. For the 3 remaining observations, it will have no implications for the later analysis.",
    "Modification 5": "extract time elements (day, year, etc.) as well as create observation number (obsno), day number (daycum), beep number in a day (beepno) and duration in days variables (duration).",
    "Modification 6": "the valid observations are the ones in which there are no missing values in the variables of interest. It is 3 variables (from 'pos_aff', 'pos_neg', 'perc_stress_child' variables) and, in addition, 'perc_fun_child' and 'perc_fun_signaled' in function of the branching conditions (see above).\nWe created a function that can be reused later.",
    "Error 3": "the sampling scheme plot aids in visualizing that: <br> \n<ol>\n<li>There are big intervals between the first day and the rest of the days for many participants (e.g., participants 1, 49, 52, 72).<\/li> \n<li>The first and sometimes the second days of participation must be removed. Indeed, those days have often less than 4 beeps sent and are testing days. In the end, participants should only have 10 days of participation, starting on a Friday and including 2 weekends. <\/li>\n<\/ol>",
    "Modification 7": "remove test observations and recompute time variables (checking the new sampling scheme is in the supplementary part below). Test observations are all day 1 observations and: \n<ul>\n<li>day 4 observations for participants 79 and 73.<\/li> \n<li>day 17 observations for participants 1 and 72.<\/li> \n<li>day 10 observations for participants 49, 52.<\/li> \n<li>day 5 observations for participant 66.<\/li> \n<li>day 3 observations for participants 9 and 32. <\/li>\n<\/ul>",
    "Error 4": "both within and between observations. There is an observation with start before sent with an an hour of difference. It is not an issue for later analysis.",
    "Error 5": "negative time interval for an obs (issue already mentioned higher).",
    "Error 6": "there are outliers, specifically belonging to participants 7, 15, 66, 73, and 77, that require further investigation. Additionally, Participant 66 exhibits low compliance.",
    "Error 7": "there are time differences superior to 10 and to 20 minutes (max=28 mins). \nIt may be problematic for the analysis but differences follow the sampling scheme (i.e., delay to start the questionnaires).",
    "Error 8": "the overall compliance is rather low. In particular, the participant 66 has a compliance close to 0, and the participants 1, 32, 72 and 79 have a compliance lower than .2.",
    "Error 9": "when taking dyads' partner observations together, the dyads' compliance (defined as the proportion of beeps answer by both partners) are very low overall.",
    "Modification 8": "person-mean center the 'pos_aff' and 'neg_aff' variables.",
    "Modification 9": "remove irrelevant variables for later analysis"
}

To create this export, you need to specify the ids of the textual elements you wish to include in the ‘json_esm’ variable within the ‘params’ section of the header. For example, if you want to export all spotted issues (with id=‘esm-issue’) and data modifications (with id=‘esm-mod’), simply specify these ids in the ‘json_esm’ parameter, and proceed as follows:

---
title: report
output:
  html_document
params:
  json_esm: esm-issue, esm-mod
---

Note


<style>
.esm-issue{
  /* font-family: Georgia; */
}
</style>

HTML tags

First, you need to define a CSS style associated with a class (e.g., ‘text_issue’). CSS (Cascading Style Sheets) is a stylesheet language used to describe how HTML elements are presented on a webpage. It’s used to define the appearance and layout of elements such as text, images, and other content. When you want to apply a consistent style to multiple elements on your webpage, you can define a CSS class. In RMarkdown files, classes should be created within script tags (<script> </script>) and they need to begin with a dot (e.g., .text_issue{...}). Then, within you can define font weight (e.g., font-weight: bold;), font color (e.g., color: white;) and text decoration (e.g., text-decoration: underline #e61919 2px;). You can modify the CSS code as needed to achieve the desired style.

<style>
.text_issue{ 
  font-weight: bold; 
  color: #e61919; 
  text-decoration: underline #e61919 2px;
}
</style>

Next, to count the number of highlighted elements, you should initialize a variable that will be used later. Make sure to define the variable at the top of the file.

   ```{r} 
 er = 0 
 ```

Then you can associate this code and the counting variable to highlight text. Within the span tag (i.e., <span> </span>), the defined style will be applied to the type of element you want to highlight. The counting variable, which is an inline R code, will be updated (e.g., er = er + 1) and printed each time. For instance, the following html and inline code:

<span class="text_issue"> Issue `r er=er+1 ; er`:</span> Description of the issue ...

gives:

Issue 1: Description of the issue …

Note

For each type of element you want to highlight (e.g., data modification, inspection), you’ll need to create a different CSS style and counter. By using this approach, you can effectively highlight various elements throughout your RMarkdown document while keeping track of the counts for each highlighted item.

Toggle button

The ‘button()’ and ‘endbutton()’ functions from the esmtools package delimite part of the document that can be revealed using a button. It is particularly useful to hide non-essential elements (but still important to report) to improve readibility. A name can be given to the button using the ‘text’ argument.

Warning

Prior to employing the button() function, it’s essential to load the esmtools package. Consequently, the inclusion of the esmtools package can be performed within the scope of the ‘button()’ and ‘endbutton()’ functions. It’s crucial to ensure this import is the initial action.

To make it work, enclose the part between the ‘button()’ and ‘endbutton()’, following this example:

  `r button(text = 'Description')` 
 
  ```{r} 
 print(1 + 1) 
 ``` 
 
  ```{r} 
 print(2 + 2) 
 ``` 
 
 `r endbutton()`

The above example results in the following button. Click to toggle the content.

Note

We propose to use two button names that are particularly relevant:

‘Descriptive’: add any descriptive analysis that helps to have a broad view of the data, such as summary tables.
‘Supplementary’ button: add every other functions you used to inspect the data but that were not relevant to highlight the spotted issues.