Import data
Packages: dplyr, readr
Probably one of the first task you will do is to import your dataset in your R session. Many file formats and many R functions exist. We briefly introduce them here.
Import a csv file
A common and practical file format to store ESM data is the csv (comma-separated values) file. To import a csv file, we can use the base R functions ‘read.csv()’ or ‘read.csv2()’. Those functions are built on the ‘read.table()’ function and have default argument values: ‘sep’ (i.e., character that separates the values) and ‘dec’ (i.e., character that indicates a decimal):
- ‘read.csv()’ has as default aguments sep=“,” and dec=“.”. It is equivalent to ‘read.table(file, sep=“,”, dec=“.”)’.
- ‘read.csv2()’ has as default arguments sep=“;” and dec=“,”. It is equivalent to ‘read.table(file, sep=“;”, dec=“,”)’.
As an illustration, we want to import the file ‘data_sim.csv’ (can be downloaded above). This file contains values separated by “;” and decimals indicated by “,”. The function that fits these settings is the ‘read.csv2()’.
= read.csv2(file="data/data_sim.csv") data
Note that directly after importing data, it is recommended to inspect it to see if it has been imported correctly (see first glimpse). Also, you should check the warning messages that might appear during the import process.
Additional arguments
Some function from other packages have additional arguments (e.g., code of missing values, specifying column type) when importing csv files. A popular one is the readr package that contains the ‘read_delim()’ function, and two of its extensions: ‘read_csv()’ and ‘read_csv2()’. They hold the same significance as ‘read.csv’ and ‘read.csv2’, differing in the aspect that they incorporate many useful arguments, such as:
- Missing value code using the ‘na’ argument. We can specify what is the code of the missing values in the file (e.g., ‘-999’ or ‘na’).
- Column type with the ‘col_types’ argument and the ‘cols’ function. We can specify ‘n’ for numerical, ‘i’ for integer, ‘c’ character, ‘f’ for factor, ‘d’ for Date, ‘T’ for datetime, etc. For instance, if the variable PA1 is expected to be numeric: ‘col_types = cols(PA1=“n”)’. In the cols function the .default argument can be used to specify the default column type.
library(readr)
= read_csv2(file="data/data_sim.csv",
df na=c("-999", "__na__"), # Specify missing value format
col_types = cols(.default = "i", # Specify default column types, here integer
cond_dyad = "c", location="c"))
This function has further arguments that can be found in the function’s documentation: https://readr.tidyverse.org/reference/read_delim.html.
Import other types of files
In case we have to import other types of files, here are some useful functions:
Function | Type of file | Package | Description |
---|---|---|---|
read.delim | .txt ; .csv | . | If separator character that is different from a tab, a comma or a semicolon |
read.table | .txt ; .csv ; .dat | . | Allows to specify more arguments |
read_sas | .sas | haven | SAS data |
read_stata | .dta | haven | Stata data |
read_spss | .sav | haven | SPSS file |
read_excel | .xls ; .xlsx | readxl | Excel files |