Reformating
Packages: dplyr
Multiple formats exist in R. Each variable format stores a particular type of data (e.g., text, numbers, decimal, boolean) and has specific restrictions and properties. Hence, we must ensure that your R objects have the correct format before passing it in a function or when computing scores.
Firstly, we must know the format of each column of the dataframe. Here are different methods to check them:
- Check all variables at once with ‘str()’.
- Check one variable with ‘class()’.
str(data)
'data.frame': 4200 obs. of 18 variables:
$ dyad : num 1 1 1 1 1 1 1 1 1 1 ...
$ role : int 1 1 1 1 1 1 1 1 1 1 ...
$ obsno : int 1 2 3 4 5 6 7 8 9 10 ...
$ id : num 1 1 1 1 1 1 1 1 1 1 ...
$ age : int 40 40 40 40 40 40 40 40 40 40 ...
$ cond_dyad: chr "condB" "condB" "condB" "condB" ...
$ scheduled: POSIXct, format: "2018-10-17 08:00:08" "2018-10-17 09:00:01" ...
$ sent : POSIXct, format: "2018-10-17 08:00:11" "2018-10-17 09:00:22" ...
$ start : POSIXct, format: NA NA ...
$ end : POSIXct, format: NA NA ...
$ contact : int NA NA NA 0 NA NA 0 0 0 NA ...
$ PA1 : int NA NA NA 1 NA NA 1 1 1 NA ...
$ PA2 : int NA NA NA 11 NA NA 1 1 1 NA ...
$ PA3 : int NA NA NA 25 NA NA 5 7 16 NA ...
$ NA1 : int NA NA NA 10 NA NA 30 30 43 NA ...
$ NA2 : int NA NA NA 16 NA NA 1 13 23 NA ...
$ NA3 : int NA NA NA 28 NA NA 35 41 46 NA ...
$ location : chr NA NA NA "A" ...
Now, we can reformat one or multiple columns using one of the following functions: ‘as.character()’, ‘as.factor()’, ‘as.logical()’, ‘as.numeric()’, ‘as.integer()’, ‘as.Date()’, ‘as.POSIXct()’. Frequently, we encounter situations where multiple columns need to be reassigned to the same new format. Consequently, a more efficient approach exists compared to individually reassigning each column in a separate line. Hence, we can either use:
- as.format() (e.g., ‘as.numeric()’), to reformat one variable or dataframe’s variable.
- ‘mutate()’ and ‘across()’ which allows to reformat multiple specified variables at once. If multiple variable to reformat, it should be specified within a vector: ‘c(var1, var2, var3)’ or ‘var1:var3’.
- ‘mutate()’ and ‘across()’ and further specify arguments with ‘~ as.format(., arg=“arg1”)’. The first argument of the function “.” passes each variable within the function. It allows adding further arguments, such as the timezone (‘tz’) and the time format (‘format’) for the ‘as.POSIXct()’ function.
$PA1 = as.numeric(data$PA1)
data$cond_dyad = as.character(data$cond_dyad)
data$scheduled = as.POSIXct(data$scheduled, format="%Y-%m-%d %H:%M:%OS") data
To learn more about timestamps, visit the “Reformat timestamps variables” section in the Create time variables topic.