Importing Data in R
Process of loading and reading data into R from various resources.
5 types of data to import:
Flat files - Flat files are text-based files where data is organized in rows and columns.
Data from Excel
Databases
Web
Statistical software
read.csv()
specifically designed for reading comma-separated values (CSV) files, where columns are separated by commas by default.
data <- read.csv("data.csv", stringAsFactor = TRUE)
stringAsFactor TRUE is the default. Meaning character vector is converted to a factor.
read.csv2()
:
reads European-style CSV files where a semicolon is the file separator and a comma is the decimal point.
read_csv()
:
This is a part of readr package. Similar to read.csv()
but can also handle other formats not only commas-separated functions. The result is usually a tibble.
assumes numeric values are separated by a comma.
Example table
Name,Age,Salary
Alice,28,55,000
Bob,32,60,000
Charlie,22,48,000
David,35,72,000
if we use read_csv() the output will be:
# A tibble: 4 x 3
Name Age Salary
<chr> <dbl> <dbl>
1 Alice 28 55
2 Bob 32 60
3 Charlie 22 48
4 David 35 72
read_csv2()
:
Provided in the readr
package. Assumes that the decimal point in numeric values is represented by a comma. For example, 3,14 will be 3.14
The above example seemed erroneous but if we use read_csv2
the amount will be noted to be correct
# A tibble: 4 x 3
Name Age Salary
<chr> <int> <dbl>
1 Alice 28 55000
2 Bob 32 60000
3 Charlie 22 48000
4 David 35 72000
read.delim()
function defaults to the tab character as the separator between values and the period as the decimal character.
reads .txt
or .tab
read.delim(file, header = FALSE, sep = "\t", quote = "\"", ...)
can also be used simply as
# Read the tab-delimited file into a data frame
data <- read.delim("data.txt", header = TRUE)
# View the contents of the data frame
print(data)
read.table()
Is a more general function that can read data from a variety of tabular formats, including plain text files with custom delimiters and fixed-width files. You need to specify the delimiter and other parameters explicitly.
mydata <- read.table("c:/mydata.csv", header=TRUE,
sep=",", row.names="id")
read_tsv()
Found in the readr
package
This function is used to read tab-delimited text files into a data frame.
read_delim()
is a more general function that reads any file with a deliminator.
read_tsv(file, col_names = TRUE, col_types = NULL, skip = 0, n_max = Inf)
read_delim(file, delim ="/")
read_xlsx()
read Excel files
excel_sheets()
list the names of different sheets in an Excel workbook before importing data using read_excel()
.
library(xlsx)
mydata <- read.xlsx("c:/myexcel.xlsx", 1)
read_excel()
imports data. Part of readxl
package.
read_excel(path, sheet = 1, range = NULL, col_names = TRUE, col_types = NULL, ...)
From SPSS
get file='c:\mydata.sav'.
export outfile='c:\mydata.por'.
# in R
library(Hmisc)
mydata <- spss.get("c:/mydata.por", use.value.labels=TRUE)
# last option converts value labels to R factors
fread()
Part of data.table
package
similar to read.table()
It infers column types and separators
extremely fast.
fread(input, sep = ",", header = "auto", data.table = FALSE, ...)
Data Format | Import Function | Saving Function |
CSV | read_csv() | write_csv() |
TSV | read_tsv() | write_tsv() |
Delimited Text | read_delim() | write_delim() |
Excel | read_excel() | write_excel() |
Table | read_table() | write_table() |
SAS | read_sas() | Not Applicable |
SPSS | read_sav() | Not Applicable |
Feather | read_feather() | write_feather() |
Parquet | read_parquet() | write_parquet() |
Arrow | read_arrow() | write_arrow() |
Import from Web
# Example: Reading data from a web URL
url <- "https://example.com/data.csv"
data <- read.table(url, header = TRUE, sep = ",")
Import API Data
library(httr)
library(jsonlite)
# Make an API request and parse JSON response
response <- GET("https://api.example.com/data")
data <- fromJSON(content(response, "text"))
The GET()
function from httr
is used to make HTTP GET requests, and fromJSON()
from jsonlite is used to parse JSON responses.
Import JSON Data
library(jsonlite)
# Read JSON data from a file
data <- fromJSON("data.json")
Further Reading
https://raw.githubusercontent.com/rstudio/cheatsheets/main/data-import.pdf