Reformating

Packages: dplyr


Multiple formats exist in R. Each variable format stores a particular type of data (e.g., text, numbers, decimal, boolean) and has specific restrictions and properties. Hence, we must ensure that your R objects have the correct format before passing it in a function or when computing scores.

Firstly, we must know the format of each column of the dataframe. Here are different methods to check them:

  1. Check all variables at once with ‘str()’.
  2. Check one variable with ‘class()’.

str(data)
'data.frame':   4200 obs. of  18 variables:
 $ dyad     : num  1 1 1 1 1 1 1 1 1 1 ...
 $ role     : int  1 1 1 1 1 1 1 1 1 1 ...
 $ obsno    : int  1 2 3 4 5 6 7 8 9 10 ...
 $ id       : num  1 1 1 1 1 1 1 1 1 1 ...
 $ age      : int  40 40 40 40 40 40 40 40 40 40 ...
 $ cond_dyad: chr  "condB" "condB" "condB" "condB" ...
 $ scheduled: POSIXct, format: "2018-10-17 08:00:08" "2018-10-17 09:00:01" ...
 $ sent     : POSIXct, format: "2018-10-17 08:00:11" "2018-10-17 09:00:22" ...
 $ start    : POSIXct, format: NA NA ...
 $ end      : POSIXct, format: NA NA ...
 $ contact  : int  NA NA NA 0 NA NA 0 0 0 NA ...
 $ PA1      : int  NA NA NA 1 NA NA 1 1 1 NA ...
 $ PA2      : int  NA NA NA 11 NA NA 1 1 1 NA ...
 $ PA3      : int  NA NA NA 25 NA NA 5 7 16 NA ...
 $ NA1      : int  NA NA NA 10 NA NA 30 30 43 NA ...
 $ NA2      : int  NA NA NA 16 NA NA 1 13 23 NA ...
 $ NA3      : int  NA NA NA 28 NA NA 35 41 46 NA ...
 $ location : chr  NA NA NA "A" ...

Now, we can reformat one or multiple columns using one of the following functions: ‘as.character()’, ‘as.factor()’, ‘as.logical()’, ‘as.numeric()’, ‘as.integer()’, ‘as.Date()’, ‘as.POSIXct()’. Frequently, we encounter situations where multiple columns need to be reassigned to the same new format. Consequently, a more efficient approach exists compared to individually reassigning each column in a separate line. Hence, we can either use:

  1. as.format() (e.g., ‘as.numeric()’), to reformat one variable or dataframe’s variable.
  2. ‘mutate()’ and ‘across()’ which allows to reformat multiple specified variables at once. If multiple variable to reformat, it should be specified within a vector: ‘c(var1, var2, var3)’ or ‘var1:var3’.
  3. ‘mutate()’ and ‘across()’ and further specify arguments with ‘~ as.format(., arg=“arg1”)’. The first argument of the function “.” passes each variable within the function. It allows adding further arguments, such as the timezone (‘tz’) and the time format (‘format’) for the ‘as.POSIXct()’ function.

data$PA1 = as.numeric(data$PA1)
data$cond_dyad = as.character(data$cond_dyad)
data$scheduled = as.POSIXct(data$scheduled, format="%Y-%m-%d %H:%M:%OS")
Note

To learn more about timestamps, visit the “Reformat timestamps variables” section in the Create time variables topic.