ESM Preprocessing Gallery – first

First glimpse

Packages: dplyr, psych, skimr, Hmisc, esmtools

After importing your dataset or while preprocessing it, it is important to have a quick and efficient overview of your dataset. To this end, we demonstrate how to display meta-data (e.g., dimensions), to display a subset of rows, to compute common descriptive statistics (e.g., mean, standard deviation) and to compute occurences of values. Beyond providing a good understanding of your dataset, it often a reveals some issues in the dataset (e.g., wrong minimal or maximal values, loss of many rows after a data manipulation, an unexpected high number of occurrences for a value of a categorical variable).

Meta-data

Three important meta-data aspects to check are the number of rows and columns, the format of the columns and the number of observations per participant. It can be done using:

‘dim()’: returns the number of rows (first number) and the number of columns (second number). It helps to quickly see if those numbers are the expected ones (e.g., after data modification). You can also investigate independently the number of rows with ‘nrow()’ and the number of columns with ‘ncol()’.

dim(data)

[1] 4200   18

nrow(data)

[1] 4200

ncol(data)

[1] 18

‘str()’: returns the columns’ formats and their first values. It is particularly useful to inspect if variables are in the correct formats (e.g., integer, character, POSIXct). Can also be done using ‘glimpse()’ from the dplyr package.

str(data)

'data.frame':   4200 obs. of  18 variables:
 $ dyad     : num  1 1 1 1 1 1 1 1 1 1 ...
 $ role     : int  1 1 1 1 1 1 1 1 1 1 ...
 $ obsno    : int  1 2 3 4 5 6 7 8 9 10 ...
 $ id       : num  1 1 1 1 1 1 1 1 1 1 ...
 $ age      : int  40 40 40 40 40 40 40 40 40 40 ...
 $ cond_dyad: chr  "condB" "condB" "condB" "condB" ...
 $ scheduled: POSIXct, format: "2018-10-17 08:00:08" "2018-10-17 09:00:01" ...
 $ sent     : POSIXct, format: "2018-10-17 08:00:11" "2018-10-17 09:00:22" ...
 $ start    : POSIXct, format: NA NA ...
 $ end      : POSIXct, format: NA NA ...
 $ contact  : int  NA NA NA 0 NA NA 0 0 0 NA ...
 $ PA1      : int  NA NA NA 1 NA NA 1 1 1 NA ...
 $ PA2      : int  NA NA NA 11 NA NA 1 1 1 NA ...
 $ PA3      : int  NA NA NA 25 NA NA 5 7 16 NA ...
 $ NA1      : int  NA NA NA 10 NA NA 30 30 43 NA ...
 $ NA2      : int  NA NA NA 16 NA NA 1 13 23 NA ...
 $ NA3      : int  NA NA NA 28 NA NA 35 41 46 NA ...
 $ location : chr  NA NA NA "A" ...

Number of rows per participant: with R base functions or dplyr function. Be aware that for the R base version, the output displays the id number above the number of rows for the participant. In the dplyr version, ‘n()’ is meant to compute the number of rows for each group, so here for each participant.

sapply(split(data$id, data$id), length)

 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 
70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 
70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 70 
53 54 55 56 57 58 59 60 
70 70 70 70 70 70 70 70

Display rows

A direct inspection of the observations/rows can also be useful to perform a quick investigation in many situations, such as after importing data or creating new variables. Most common practices are to use ‘head()’ and ‘tail()’ to display respectively the first rows or last rows of the dataset. However, only displaying the first or last subset of rows of a dataset may not provide a good representation of it, and, worse, it may blur problematic patterns, outliers, or other data quality issues. Hence, displaying a set of random rows may be a good alternative.

‘head()’ and ‘tail()’: returns, respectively, the first and last n rows (by default n=5). We can change n as follows: ‘head(data, n=10)’ or ‘tail(data, n=10)’.

head(data, n=5)

  dyad role obsno id age cond_dyad           scheduled                sent
1    1    1     1  1  40     condB 2018-10-17 08:00:08 2018-10-17 08:00:11
2    1    1     2  1  40     condB 2018-10-17 09:00:01 2018-10-17 09:00:22
3    1    1     3  1  40     condB 2018-10-17 09:59:56 2018-10-17 10:00:08
4    1    1     4  1  40     condB 2018-10-17 10:59:48 2018-10-17 10:59:52
5    1    1     5  1  40     condB 2018-10-17 12:00:12 2018-10-17 12:00:15
                start                 end contact PA1 PA2 PA3 NA1 NA2 NA3
1                <NA>                <NA>      NA  NA  NA  NA  NA  NA  NA
2                <NA>                <NA>      NA  NA  NA  NA  NA  NA  NA
3                <NA>                <NA>      NA  NA  NA  NA  NA  NA  NA
4 2018-10-17 11:00:12 2018-10-17 11:03:01       0   1  11  25  10  16  28
5                <NA>                <NA>      NA  NA  NA  NA  NA  NA  NA
  location
1     <NA>
2     <NA>
3     <NA>
4        A
5     <NA>

Random rows: in the esmtools package, you can find three functions that displays:
- (1) random rows: with the function ‘randrows’, you can display n randomly selected rows from the dataset.
- (2) on random set of following rows: with the function ‘folrows’, you can displays one randomly selected sets of n following rows.
- (3) multiple random sets of following rows: with the function ‘folrows’, you can displays nb_sample randomly selected sets of n following rows.

library(esmtools)
randrows(data, n=5)

     dyad role obsno id age cond_dyad           scheduled                sent
3711   27    2     1 54  36     condB 2018-09-18 08:00:04 2018-09-18 08:00:12
273     2    2    63  4  25     condB 2018-08-21 10:00:08 2018-08-21 10:00:19
3863   28    2    13 56  48     condA 2018-10-29 11:00:01 2018-10-29 11:00:23
321     3    1    41  5  25     condB 2018-03-12 09:00:17 2018-03-12 09:00:28
607     5    1    47  9  48     condA 2018-02-11 10:00:04 2018-02-11 10:00:21
                   start                 end contact PA1 PA2 PA3 NA1 NA2 NA3
3711                <NA>                <NA>      NA  NA  NA  NA  NA  NA  NA
273  2018-08-21 10:00:30 2018-08-21 10:01:19       0  20  22   1   1   1  93
3863 2018-10-29 11:00:30 2018-10-29 11:03:20       0  27  18   1   1  10  79
321                 <NA>                <NA>      NA  NA  NA  NA  NA  NA  NA
607                 <NA>                <NA>      NA  NA  NA  NA  NA  NA  NA
     location
3711     <NA>
273         D
3863        D
321      <NA>
607      <NA>

library(esmtools)
folrows(data, n=5)

     dyad role obsno id age cond_dyad           scheduled                sent
3502   26    1     2 51  25     condB 2018-10-08 09:00:04 2018-10-08 09:00:09
3503   26    1     3 51  25     condB 2018-10-08 09:59:49 2018-10-08 09:59:56
3504   26    1     4 51  25     condB 2018-10-08 11:00:09 2018-10-08 11:00:17
3505   26    1     5 51  25     condB 2018-10-08 11:59:54 2018-10-08 12:00:13
3506   26    1     6 51  25     condB 2018-10-09 08:00:12 2018-10-09 08:00:15
                   start                 end contact PA1 PA2 PA3 NA1 NA2 NA3
3502                <NA>                <NA>      NA  NA  NA  NA  NA  NA  NA
3503                <NA>                <NA>      NA  NA  NA  NA  NA  NA  NA
3504 2018-10-08 11:00:40 2018-10-08 11:03:26       0  46   6  45 100   1  33
3505                <NA>                <NA>      NA  NA  NA  NA  NA  NA  NA
3506                <NA>                <NA>      NA  NA  NA  NA  NA  NA  NA
     location
3502     <NA>
3503     <NA>
3504        A
3505     <NA>
3506     <NA>

library(esmtools)
folrows(data, n=3, nb_sample=2)

     dyad role obsno id age cond_dyad           scheduled                sent
2370   17    2    60 34  42     condB 2018-11-21 12:59:50 2018-11-21 12:59:52
2371   17    2    61 34  42     condB 2018-11-22 09:00:07 2018-11-22 09:00:08
2372   17    2    62 34  42     condB 2018-11-22 10:00:05 2018-11-22 10:00:16
3740   27    2    30 54  36     condB 2018-09-23 11:59:54 2018-09-23 12:00:01
3741   27    2    31 54  36     condB 2018-09-24 07:59:45 2018-09-24 07:59:52
3742   27    2    32 54  36     condB 2018-09-24 08:59:57 2018-09-24 08:59:59
                   start                 end contact PA1 PA2 PA3 NA1 NA2 NA3
2370                <NA>                <NA>      NA  NA  NA  NA  NA  NA  NA
2371 2018-11-22 09:00:11 2018-11-22 09:03:03       0   2  22  19  11  19   5
2372                <NA>                <NA>      NA  NA  NA  NA  NA  NA  NA
3740 2018-09-23 12:00:14 2018-09-23 12:01:37       1 100 100 100   1   9  83
3741 2018-09-24 07:59:54 2018-09-24 08:02:19       0 100  64 100   1   1  75
3742 2018-09-24 09:00:06 2018-09-24 09:01:01       0   1   1  69   1   7  67
     location
2370     <NA>
2371        C
2372     <NA>
3740        E
3741        C
3742        B

Common descriptive functions

There are multiple functions from many packages that can be used to compute descriptive statistics (e.g., mean, median, quantiles, proportion of missing values) over all variables of a dataframe. Computing descriptive statistics is an useful way to check some assumptions about the data (e.g., range of values), to detect potential issues (e.g., important number of missing values), or to have a quick overview of the data. Here, we show four functions coming from different packages:

summary(data)

      dyad           role         obsno            id             age       
 Min.   : 1.0   Min.   :1.0   Min.   : 1.0   Min.   : 1.00   Min.   :25.00  
 1st Qu.: 8.0   1st Qu.:1.0   1st Qu.:18.0   1st Qu.:15.75   1st Qu.:25.00  
 Median :15.5   Median :1.5   Median :35.5   Median :30.50   Median :35.50  
 Mean   :15.5   Mean   :1.5   Mean   :35.5   Mean   :30.50   Mean   :35.10  
 3rd Qu.:23.0   3rd Qu.:2.0   3rd Qu.:53.0   3rd Qu.:45.25   3rd Qu.:42.25  
 Max.   :30.0   Max.   :2.0   Max.   :70.0   Max.   :60.00   Max.   :65.00  
                                                                            
  cond_dyad           scheduled                     
 Length:4200        Min.   :2018-02-02 08:59:47.00  
 Class :character   1st Qu.:2018-07-20 09:00:01.00  
 Mode  :character   Median :2018-09-13 11:00:13.50  
                    Mean   :2018-09-08 11:33:24.83  
                    3rd Qu.:2018-10-24 02:59:52.50  
                    Max.   :2019-06-10 11:59:46.00  
                    NA's   :2                       
      sent                            start                       
 Min.   :2018-02-02 08:59:51.00   Min.   :2018-02-02 09:00:31.00  
 1st Qu.:2018-07-20 09:00:18.75   1st Qu.:2018-07-22 12:00:37.25  
 Median :2018-09-13 11:30:18.00   Median :2018-09-14 22:00:28.00  
 Mean   :2018-09-08 11:54:09.94   Mean   :2018-09-14 18:49:31.13  
 3rd Qu.:2018-10-23 17:00:04.00   3rd Qu.:2018-10-31 13:00:30.50  
 Max.   :2019-06-10 11:59:54.00   Max.   :2019-06-10 12:00:15.00  
                                  NA's   :1254                    
      end                            contact            PA1        
 Min.   :2018-02-02 09:03:07.00   Min.   :0.0000   Min.   :  1.00  
 1st Qu.:2018-07-22 12:01:49.25   1st Qu.:0.0000   1st Qu.:  4.00  
 Median :2018-09-14 22:02:02.00   Median :0.0000   Median : 18.00  
 Mean   :2018-09-14 18:51:14.74   Mean   :0.1229   Mean   : 23.09  
 3rd Qu.:2018-10-31 13:02:31.00   3rd Qu.:0.0000   3rd Qu.: 32.00  
 Max.   :2019-06-10 12:02:30.00   Max.   :1.0000   Max.   :100.00  
 NA's   :1254                     NA's   :1254     NA's   :1254    
      PA2              PA3              NA1              NA2      
 Min.   :  1.00   Min.   :  1.00   Min.   :  1.00   Min.   : 1.0  
 1st Qu.:  3.00   1st Qu.:  3.00   1st Qu.:  1.00   1st Qu.: 1.0  
 Median : 19.00   Median : 16.00   Median : 11.00   Median : 7.0  
 Mean   : 21.77   Mean   : 23.32   Mean   : 21.36   Mean   :10.5  
 3rd Qu.: 33.00   3rd Qu.: 31.00   3rd Qu.: 31.00   3rd Qu.:15.0  
 Max.   :100.00   Max.   :100.00   Max.   :100.00   Max.   :83.0  
 NA's   :1254     NA's   :1254     NA's   :1254     NA's   :1254  
      NA3           location        
 Min.   :  1.00   Length:4200       
 1st Qu.: 40.00   Class :character  
 Median : 72.00   Mode  :character  
 Mean   : 63.72                     
 3rd Qu.: 89.00                     
 Max.   :100.00                     
 NA's   :1254

library(psych)
psych::describe(data)

           vars    n  mean    sd median trimmed   mad min  max range  skew
dyad          1 4200 15.50  8.66   15.5   15.50 11.12   1   30    29  0.00
role          2 4200  1.50  0.50    1.5    1.50  0.74   1    2     1  0.00
obsno         3 4200 35.50 20.21   35.5   35.50 25.95   1   70    69  0.00
id            4 4200 30.50 17.32   30.5   30.50 22.24   1   60    59  0.00
age           5 4200 35.10  9.79   35.5   34.40 15.57  25   65    40  0.55
cond_dyad*    6 4200  1.50  0.50    1.5    1.50  0.74   1    2     1  0.00
scheduled     7 4198   NaN    NA     NA     NaN    NA Inf -Inf  -Inf    NA
sent          8 4200   NaN    NA     NA     NaN    NA Inf -Inf  -Inf    NA
start         9 2946   NaN    NA     NA     NaN    NA Inf -Inf  -Inf    NA
end          10 2946   NaN    NA     NA     NaN    NA Inf -Inf  -Inf    NA
contact      11 2946  0.12  0.33    0.0    0.03  0.00   0    1     1  2.30
PA1          12 2946 23.09 23.54   18.0   18.77 20.76   1  100    99  1.57
PA2          13 2946 21.77 21.50   19.0   18.39 22.24   1  100    99  1.50
PA3          14 2946 23.32 25.91   16.0   18.06 20.76   1  100    99  1.68
NA1          15 2946 21.36 26.53   11.0   15.97 14.83   1  100    99  1.57
NA2          16 2946 10.50 11.53    7.0    8.38  8.90   1   83    82  1.77
NA3          17 2946 63.72 28.66   72.0   65.95 31.13   1  100    99 -0.53
location*    18 2946  3.03  1.45    3.0    3.04  1.48   1    5     4 -0.05
           kurtosis   se
dyad          -1.20 0.13
role          -2.00 0.01
obsno         -1.20 0.31
id            -1.20 0.27
age           -0.48 0.15
cond_dyad*    -2.00 0.01
scheduled        NA   NA
sent             NA   NA
start            NA   NA
end              NA   NA
contact        3.27 0.01
PA1            2.34 0.43
PA2            2.71 0.40
PA3            2.35 0.48
NA1            1.74 0.49
NA2            4.04 0.21
NA3           -1.02 0.53
location*     -1.36 0.03

library(skimr)
skim(data)

Data summary
Name	data
Number of rows	4200
Number of columns	18
_______________________
Column type frequency:
character	2
numeric	12
POSIXct	4
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	empty	n_unique	whitespace
cond_dyad	0	1.0	5	5	0	2	0
location	1254	0.7	1	1	0	5	0

Variable type: numeric

skim_variable	n_missing	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
dyad	0	1.0	15.50	8.66	1	8.00	15.5	23.00	30	▇▇▇▇▇
role	0	1.0	1.50	0.50	1	1.00	1.5	2.00	2	▇▁▁▁▇
obsno	0	1.0	35.50	20.21	1	18.00	35.5	53.00	70	▇▇▇▇▇
id	0	1.0	30.50	17.32	1	15.75	30.5	45.25	60	▇▇▇▇▇
age	0	1.0	35.10	9.79	25	25.00	35.5	42.25	65	▇▃▅▁▁
contact	1254	0.7	0.12	0.33	0	0.00	0.0	0.00	1	▇▁▁▁▁
PA1	1254	0.7	23.09	23.54	1	4.00	18.0	32.00	100	▇▅▁▁▁
PA2	1254	0.7	21.77	21.50	1	3.00	19.0	33.00	100	▇▅▂▁▁
PA3	1254	0.7	23.32	25.91	1	3.00	16.0	31.00	100	▇▃▁▁▁
NA1	1254	0.7	21.36	26.53	1	1.00	11.0	31.00	100	▇▂▁▁▁
NA2	1254	0.7	10.50	11.53	1	1.00	7.0	15.00	83	▇▂▁▁▁
NA3	1254	0.7	63.72	28.66	1	40.00	72.0	89.00	100	▂▃▃▅▇

Variable type: POSIXct

skim_variable	n_missing	complete_rate	min	max	median	n_unique
scheduled	2	1.0	2018-02-02 08:59:47	2019-06-10 11:59:46	2018-09-13 11:00:13	3963
sent	0	1.0	2018-02-02 08:59:51	2019-06-10 11:59:54	2018-09-13 11:30:18	4003
start	1254	0.7	2018-02-02 09:00:31	2019-06-10 12:00:15	2018-09-14 22:00:28	2888
end	1254	0.7	2018-02-02 09:03:07	2019-06-10 12:02:30	2018-09-14 22:02:02	2930

library(Hmisc)
Hmisc::describe(data)

data 

 18  Variables      4200  Observations
--------------------------------------------------------------------------------
dyad 
       n  missing distinct     Info     Mean      Gmd      .05      .10 
    4200        0       30    0.999     15.5    9.991      2.0      3.9 
     .25      .50      .75      .90      .95 
     8.0     15.5     23.0     27.1     29.0 

lowest :  1  2  3  4  5, highest: 26 27 28 29 30
--------------------------------------------------------------------------------
role 
       n  missing distinct     Info     Mean      Gmd 
    4200        0        2     0.75      1.5   0.5001 
                    
Value         1    2
Frequency  2100 2100
Proportion  0.5  0.5
--------------------------------------------------------------------------------
obsno 
       n  missing distinct     Info     Mean      Gmd      .05      .10 
    4200        0       70        1     35.5    23.33      4.0      7.9 
     .25      .50      .75      .90      .95 
    18.0     35.5     53.0     63.1     67.0 

lowest :  1  2  3  4  5, highest: 66 67 68 69 70
--------------------------------------------------------------------------------
id 
       n  missing distinct     Info     Mean      Gmd      .05      .10 
    4200        0       60        1     30.5       20     3.95     6.90 
     .25      .50      .75      .90      .95 
   15.75    30.50    45.25    54.10    57.05 

lowest :  1  2  3  4  5, highest: 56 57 58 59 60
--------------------------------------------------------------------------------
age 
       n  missing distinct     Info     Mean      Gmd      .05      .10 
    4200        0       19    0.938     35.1    10.74    25.00    25.00 
     .25      .50      .75      .90      .95 
   25.00    35.50    42.25    48.00    48.00 
                                                                            
Value       25.0  27.8  31.0  31.8  33.0  33.8  35.0  35.8  37.0  37.8  39.0
Frequency   1610    70   140    70    70    70    70   420    70    70    70
Proportion 0.383 0.017 0.033 0.017 0.017 0.017 0.017 0.100 0.017 0.017 0.017
                                                          
Value       39.8  41.0  41.8  43.0  45.8  47.0  47.8  65.0
Frequency     70    70   280    70    70   140   700    70
Proportion 0.017 0.017 0.067 0.017 0.017 0.033 0.167 0.017

For the frequency table, variable is rounded to the nearest 0.4
--------------------------------------------------------------------------------
cond_dyad 
       n  missing distinct 
    4200        0        2 
                      
Value      condA condB
Frequency   2100  2100
Proportion   0.5   0.5
--------------------------------------------------------------------------------
scheduled 
                  n             missing            distinct                Info 
               4198                   2                3963                   1 
               Mean                 Gmd                 .05                 .10 
2018-09-08 11:33:24             9254604 2018-02-17 06:00:02 2018-05-10 07:05:53 
                .25                 .50                 .75                 .90 
2018-07-20 09:00:01 2018-09-13 11:00:13 2018-10-24 02:59:52 2018-12-22 09:23:54 
                .95 
2019-02-28 16:00:10 

lowest : 2018-02-02 08:59:47 2018-02-02 09:00:09 2018-02-02 10:00:05 2018-02-02 10:00:10 2018-02-02 11:00:06
highest: 2019-06-10 10:00:04 2019-06-10 10:59:51 2019-06-10 11:00:21 2019-06-10 11:59:44 2019-06-10 11:59:46
--------------------------------------------------------------------------------
sent 
                  n             missing            distinct                Info 
               4200                   0                4003                   1 
               Mean                 Gmd                 .05                 .10 
2018-09-08 11:54:09             9252090 2018-02-17 08:00:15 2018-05-25 15:42:02 
                .25                 .50                 .75                 .90 
2018-07-20 09:00:18 2018-09-13 11:30:18 2018-10-23 17:00:04 2018-12-07 19:48:01 
                .95 
2019-02-28 14:00:16 

lowest : 2018-02-02 08:59:51 2018-02-02 09:00:09 2018-02-02 10:00:15 2018-02-02 11:00:12 2018-02-02 11:00:16
highest: 2019-06-10 10:00:08 2019-06-10 10:59:55 2019-06-10 11:00:32 2019-06-10 11:59:53 2019-06-10 11:59:54
--------------------------------------------------------------------------------
start 
                  n             missing            distinct                Info 
               2946                1254                2888                   1 
               Mean                 Gmd                 .05                 .10 
2018-09-14 18:49:31             9136650 2018-02-20 12:00:21 2018-06-10 08:00:29 
                .25                 .50                 .75                 .90 
2018-07-22 12:00:37 2018-09-14 22:00:28 2018-10-31 13:00:30 2019-02-13 12:00:27 
                .95 
2019-03-04 10:00:24 

lowest : 2018-02-02 09:00:31 2018-02-02 10:00:32 2018-02-02 11:00:23 2018-02-02 12:00:08 2018-02-02 12:00:23
highest: 2019-06-10 10:00:33 2019-06-10 10:59:58 2019-06-10 11:00:53 2019-06-10 12:00:05 2019-06-10 12:00:15
--------------------------------------------------------------------------------
end 
                  n             missing            distinct                Info 
               2946                1254                2930                   1 
               Mean                 Gmd                 .05                 .10 
2018-09-14 18:51:14             9136650 2018-02-20 12:02:20 2018-06-10 08:02:36 
                .25                 .50                 .75                 .90 
2018-07-22 12:01:49 2018-09-14 22:02:02 2018-10-31 13:02:31 2019-02-13 12:01:29 
                .95 
2019-03-04 10:02:18 

lowest : 2018-02-02 09:03:07 2018-02-02 10:02:48 2018-02-02 11:02:24 2018-02-02 12:02:55 2018-02-02 12:03:04
highest: 2019-06-10 10:02:39 2019-06-10 11:00:54 2019-06-10 11:02:44 2019-06-10 12:00:42 2019-06-10 12:02:30
--------------------------------------------------------------------------------
contact 
       n  missing distinct     Info      Sum     Mean      Gmd 
    2946     1254        2    0.323      362   0.1229   0.2156 

--------------------------------------------------------------------------------
PA1 
       n  missing distinct     Info     Mean      Gmd      .05      .10 
    2946     1254       94    0.991    23.09    24.12        1        1 
     .25      .50      .75      .90      .95 
       4       18       32       52       82 

lowest :   1   2   3   4   5, highest:  96  97  98  99 100
--------------------------------------------------------------------------------
PA2 
       n  missing distinct     Info     Mean      Gmd      .05      .10 
    2946     1254       93    0.988    21.77    22.37        1        1 
     .25      .50      .75      .90      .95 
       3       19       33       46       64 

lowest :   1   2   3   4   5, highest:  94  95  97  99 100
--------------------------------------------------------------------------------
PA3 
       n  missing distinct     Info     Mean      Gmd      .05      .10 
    2946     1254       95    0.988    23.32    25.87     1.00     1.00 
     .25      .50      .75      .90      .95 
    3.00    16.00    31.00    51.00    96.75 

lowest :   1   2   3   4   5, highest:  96  97  98  99 100
--------------------------------------------------------------------------------
NA1 
       n  missing distinct     Info     Mean      Gmd      .05      .10 
    2946     1254       98    0.972    21.36    26.39        1        1 
     .25      .50      .75      .90      .95 
       1       11       31       60       92 

lowest :   1   2   3   4   5, highest:  96  97  98  99 100
--------------------------------------------------------------------------------
NA2 
       n  missing distinct     Info     Mean      Gmd      .05      .10 
    2946     1254       66    0.954     10.5    11.58        1        1 
     .25      .50      .75      .90      .95 
       1        7       15       26       34 

lowest :  1  2  3  4  5, highest: 66 67 73 76 83
--------------------------------------------------------------------------------
NA3 
       n  missing distinct     Info     Mean      Gmd      .05      .10 
    2946     1254      100        1    63.72    32.36       13       20 
     .25      .50      .75      .90      .95 
      40       72       89       96       98 

lowest :   1   2   3   4   5, highest:  96  97  98  99 100
--------------------------------------------------------------------------------
location 
       n  missing distinct 
    2946     1254        5 
                                        
Value          A     B     C     D     E
Frequency    621   543   535   608   639
Proportion 0.211 0.184 0.182 0.206 0.217
--------------------------------------------------------------------------------

Occurences of values

Finally, we can also be interested in the occurrences of values for the specific items, particularly with Lickert scales or multiple choice questions. In such a case, it is useful to investigate the number of unique values and inspect if some values have unexpectedly high or low numbers of occurrences. We can either display:

the overall occurrences: numbers of occurences for each value over the whole data set.
the within-participant occurrences: numbers of occurences for each value and for each participant in the data set. In the output of the function, the rows are the participants and the columns are the different values of the item.

table(data$location, useNA="ifany")


   A    B    C    D    E <NA> 
 621  543  535  608  639 1254