In this lesson we learn about how to import data into R. This lesson was inspired on Datacamp’s manuals on loading data in R, which you can find here and here
Remember: In this lesson, all R code is below the R Source button, while the output is hidden. To see it you will need to click on the green “R Output” button. At the bottom of the page, you will find a bar which allows you to change the theme of the webpage (changing colors and format) so it can easily adapt to your system and preferences. There you also find “Code highlighting” which changes how R code is displayed to you, and Toggle R code and Figures.
Instead of creating your dataset - as we did in the last lesson - it is often necessary to import data into R, so you start working on your analysis. Before we move on and learn how to load data into R, it might be useful to go over the below checklist which can make it easier to import the data correctly, and potentially avoid loading problems.
In R, you want to make sure that the Enviroment and loaded packages from another session does not - by accident - interfere with the current analysis. So, it might be a good idea to start with a clean slate. One way of doing this is by typing in the below code in your console.
# This prompts garbage collector to take place. This is useful to prompt R
# to return memory to the operating system. gc() also return a summary of
# the occupy memory
gc()
# Restarts the R session for a clean slate
.rs.restartR()
# Unloads all previously loaded packages It may give an error in the lapply
# if there were no loaded packages
pkgs <- names(sessionInfo()$otherPkgs)
pkgs <- paste("package:", pkgs, sep = "")
lapply(pkgs, detach, character.only = TRUE, unload = TRUE)
# This removes all R objects from the environment
rm(list = ls(all = TRUE))
# And this just cleans the console for a fresh start
cat("\f")
Next, you might also find it handy to know where your working directory is set at the moment:
getwd()
And you might consider changing the path that you get as a result of this function, maybe to the folder in which you have stored your data set (note the bars are inverted):
setwd("C:/Users/myusername/folder/R Workshop/Datasets")
For the purposes of this lessons, some datasets are available for your to download, place in your working directory and load in R. Please follow this link.
For these file extensions you want to use the read.table() function
dat <- read.table("pbcseq.dat")
You also want to check if the number of columns is the same for all rows.
cf <- count.fields("pbcseq.dat")
cf
# unique(cf) find number of unique observations on your vector all(cf == 19)
# Alternatevely, check if all are 19
Below we learn how to define “Missing data” while loading data in, For example, in our file pbcseq.dat missing data was codded as “.”. In R, however, missing is coded as “NA”s, so we need to make sure to tell our read.table() function about that difference.
dat <- read.table("pbcseq.dat", na.strings = ".")
dat <- read.table("pbcseq.dat", header = TRUE, na.strings = ".")
dat <- read.csv("aids.csv")
read.table("http://flavioazevedo.github.io/blog/R-Introduction-Course/Datasets/sexualat.dat")
Quite frequently, the sample data is in Excel format, and needs to be imported into R prior to use. For this, we can use the function read.xls from the gdata package. It reads from an Excel spreadsheet and returns a data frame. The following shows how to load an Excel spreadsheet named “mydata.xls”. This method requires Perl runtime to be present in the system, which you can download from this link.
library(gdata) # load gdata package
dat <- read.xls("mydata.xls") # read from first sheet
# help(read.xls) # documentation
Alternatively, we can use the function loadWorkbook from the XLConnect package to read the entire workbook, and then load the worksheets with readWorksheet. The XLConnect package requires Java to be pre-installed, which you can download from this link.
library(XLConnect)
wk <- loadWorkbook("mydata.xls")
df <- readWorksheet(wk, sheet = "Sheet1")
For the data files in SPSS format, it can be opened with the function read.spss from the foreign package. There is a “to.data.frame” option for choosing whether a data frame is to be returned. By default, it returns a list of components instead.
library(foreign) # load the foreign package
dat <- read.spss("myfile", to.data.frame = TRUE)
# help(read.spss) # documentation
If you would like to know more about loading data into R, see this link.
The official CRAN manual on importing/exporting data can be found here
There are numerous methods for exporting R objects into other formats . For SPSS, SAS and Stata. you will need to load the foreign packages. For Excel, you will need the xlsReadWrite package.
write.table(mydata, "c:/mydata.txt", sep = "\t")
library(xlsx)
write.xlsx(mydata, "c:/mydata.xlsx")
library(foreign)
write.foreign(mydata, "c:/mydata.txt", "c:/mydata.sps", package = "SPSS")
library(foreign)
write.foreign(mydata, "c:/mydata.txt", "c:/mydata.sas", package = "SAS")
library(foreign)
write.dta(mydata, "c:/mydata.dta")
If your data file is one that you have saved in R as an .rdata file, you can read it in as follows:
load(".RDA")