Import data

Packages: dplyr, readr


Probably one of the first task you will do is to import your dataset in your R session. Many file formats and many R functions exist. We briefly introduce them here.

Import a csv file

A common and practical file format to store ESM data is the csv (comma-separated values) file. To import a csv file, we can use the base R functions ‘read.csv()’ or ‘read.csv2()’. Those functions are built on the ‘read.table()’ function and have default argument values: ‘sep’ (i.e., character that separates the values) and ‘dec’ (i.e., character that indicates a decimal):

  • ‘read.csv()’ has as default aguments sep=“,” and dec=“.”. It is equivalent to ‘read.table(file, sep=“,”, dec=“.”)’.
  • ‘read.csv2()’ has as default arguments sep=“;” and dec=“,”. It is equivalent to ‘read.table(file, sep=“;”, dec=“,”)’.

As an illustration, we want to import the file ‘data_sim.csv’ (can be downloaded above). This file contains values separated by “;” and decimals indicated by “,”. The function that fits these settings is the ‘read.csv2()’.

data = read.csv2(file="data/data_sim.csv")
Warning

Note that directly after importing data, it is recommended to inspect it to see if it has been imported correctly (see first glimpse). Also, you should check the warning messages that might appear during the import process.

Additional arguments

Some function from other packages have additional arguments (e.g., code of missing values, specifying column type) when importing csv files. A popular one is the readr package that contains the ‘read_delim()’ function, and two of its extensions: ‘read_csv()’ and ‘read_csv2()’. They hold the same significance as ‘read.csv’ and ‘read.csv2’, differing in the aspect that they incorporate many useful arguments, such as:

  • Missing value code using the ‘na’ argument. We can specify what is the code of the missing values in the file (e.g., ‘-999’ or ‘na’).
  • Column type with the ‘col_types’ argument and the ‘cols’ function. We can specify ‘n’ for numerical, ‘i’ for integer, ‘c’ character, ‘f’ for factor, ‘d’ for Date, ‘T’ for datetime, etc. For instance, if the variable PA1 is expected to be numeric: ‘col_types = cols(PA1=“n”)’. In the cols function the .default argument can be used to specify the default column type.
library(readr)
df = read_csv2(file="data/data_sim.csv",
    na=c("-999", "__na__"),              # Specify missing value format
    col_types = cols(.default = "i",     # Specify default column types, here integer
                     cond_dyad = "c", location="c"))

This function has further arguments that can be found in the function’s documentation: https://readr.tidyverse.org/reference/read_delim.html.

Import other types of files

In case we have to import other types of files, here are some useful functions:

Function Type of file Package Description
read.delim .txt ; .csv . If separator character that is different from a tab, a comma or a semicolon
read.table .txt ; .csv ; .dat . Allows to specify more arguments
read_sas .sas haven SAS data
read_stata .dta haven Stata data
read_spss .sav haven SPSS file
read_excel .xls ; .xlsx readxl Excel files