Importing Data in R

Process of loading and reading data into R from various resources.

5 types of data to import:

  • Flat files - Flat files are text-based files where data is organized in rows and columns.

  • Data from Excel

  • Databases

  • Web

  • Statistical software

read.csv()

specifically designed for reading comma-separated values (CSV) files, where columns are separated by commas by default.

data <- read.csv("data.csv", stringAsFactor = TRUE)

stringAsFactor TRUE is the default. Meaning character vector is converted to a factor.

read.csv2():

reads European-style CSV files where a semicolon is the file separator and a comma is the decimal point.

read_csv():

This is a part of readr package. Similar to read.csv()

but can also handle other formats not only commas-separated functions. The result is usually a tibble.

assumes numeric values are separated by a comma.

Example table

Name,Age,Salary
Alice,28,55,000
Bob,32,60,000
Charlie,22,48,000
David,35,72,000

if we use read_csv() the output will be:

# A tibble: 4 x 3
  Name     Age Salary
  <chr>  <dbl>  <dbl>
1 Alice      28     55
2 Bob        32     60
3 Charlie    22     48
4 David      35     72

read_csv2():

Provided in the readr package. Assumes that the decimal point in numeric values is represented by a comma. For example, 3,14 will be 3.14

The above example seemed erroneous but if we use read_csv2

the amount will be noted to be correct

# A tibble: 4 x 3
  Name     Age Salary
  <chr>  <int>  <dbl>
1 Alice     28  55000
2 Bob       32  60000
3 Charlie   22  48000
4 David     35  72000

read.delim()

function defaults to the tab character as the separator between values and the period as the decimal character.

reads .txt or .tab

read.delim(file, header = FALSE, sep = "\t", quote = "\"", ...)

can also be used simply as

# Read the tab-delimited file into a data frame
data <- read.delim("data.txt", header = TRUE)

# View the contents of the data frame
print(data)

read.table()

Is a more general function that can read data from a variety of tabular formats, including plain text files with custom delimiters and fixed-width files. You need to specify the delimiter and other parameters explicitly.

mydata <- read.table("c:/mydata.csv", header=TRUE,
  sep=",", row.names="id")

read_tsv()

Found in the readr package

This function is used to read tab-delimited text files into a data frame.

read_delim() is a more general function that reads any file with a deliminator.

read_tsv(file, col_names = TRUE, col_types = NULL, skip = 0, n_max = Inf)
read_delim(file, delim ="/")

read_xlsx()

read Excel files

excel_sheets() list the names of different sheets in an Excel workbook before importing data using read_excel().

library(xlsx)
mydata <- read.xlsx("c:/myexcel.xlsx", 1)

read_excel() imports data. Part of readxl package.

read_excel(path, sheet = 1, range = NULL, col_names = TRUE, col_types = NULL, ...)

From SPSS

get file='c:\mydata.sav'.
export outfile='c:\mydata.por'.

# in R
library(Hmisc)
mydata <- spss.get("c:/mydata.por", use.value.labels=TRUE)
# last option converts value labels to R factors

fread()

Part of data.table package

similar to read.table()

It infers column types and separators

extremely fast.

fread(input, sep = ",", header = "auto", data.table = FALSE, ...)
Data FormatImport FunctionSaving Function
CSVread_csv()write_csv()
TSVread_tsv()write_tsv()
Delimited Textread_delim()write_delim()
Excelread_excel()write_excel()
Tableread_table()write_table()
SASread_sas()Not Applicable
SPSSread_sav()Not Applicable
Featherread_feather()write_feather()
Parquetread_parquet()write_parquet()
Arrowread_arrow()write_arrow()

Import from Web

# Example: Reading data from a web URL
url <- "https://example.com/data.csv"
data <- read.table(url, header = TRUE, sep = ",")

Import API Data

library(httr)
library(jsonlite)

# Make an API request and parse JSON response
response <- GET("https://api.example.com/data")
data <- fromJSON(content(response, "text"))

The GET() function from httr is used to make HTTP GET requests, and fromJSON() from jsonlite is used to parse JSON responses.

Import JSON Data

library(jsonlite)

# Read JSON data from a file
data <- fromJSON("data.json")

Further Reading

https://raw.githubusercontent.com/rstudio/cheatsheets/main/data-import.pdf