This is the first installment of “Introduction to R & Statistics” . We will learn the first steps in programming in R, as well as run essential statistical analyses.
R is a programming language, a statistical programming language, designed by Ross Ihaka and Robert Gentleman as an implementation of the S programming language.
Originally, the underlying goal of the researchers of the University of Auckland, New Zealand, was to develop a language that was capable to do data analysis, statistics and graphical models in user-friendly way. The project was first conceived in 1992, with its first version released in 1995 and a stable beta version in 2000. Noways, R is the lingua franca of statistics, and it is currently developed by the R Development Core Team, of which Chambers is a member.
Curiously, R is named partly after the first names of the first two R authors and partly as a play on the name of S.ref
There are many reasons why R is the language of Data Science and Statistics.
If you would like a more informative descrition of why you should learn R, there is one blog post that goes at length in explaining it. There also this one.
While R has a command line interface, there are several graphical front-ends available. In this course we will explore RStudio which has many (many!) features that will be useful in learning R. Here’s what the partnership between R & Rstudio can do.
If you are interested in knowing how R (and RStudio) compare to other software, here a good source. The information contained in the link is summarized in the below table.
On the off chance that you are wondering whether to learn R or Python, most Statisticians and Data scientists agree that you probably should start with R, and as you go along, Python can be really useful. For example, in case of Deep learning, scripting, and big data-sets (> millions cases). One of the best resources discussing this issue freely available is on datacamp.com.
This is the website where you can download R, and many of the library packages that are available.
If you have R already installed, you want update your R to the latest version. You can do so by running the below code. It will check for newer versions, and if one is available, it will guide you through the decisions you will need to make.
install.packages("installr") # Install R package that facilitates the process
library(installr) # load the package in R
updateR() # update R
RStudio is a great interface that makes R a lot more accessible. RStudio includes a console, syntax-highlighting editor that supports direct code execution, as well as tools for plotting, history, debugging and work-space management.
If you have RStudio installed, you also want its latest version. Go to Help > Check updates in the menu.
Here’s a video depicting the installation of R and RStudio (link).
If you would like to learn R with video lessons, in this page you will find a collection of R online video courses on YouTube.
Before we start the workshop, lets go through a number of settings which are worthwhile to know about.
The advantages of these setting will bring us:
Click on Tools menu, find Global options (last option).
These options ensure that any content of previous R sessions is never stored or reloaded between R sessions.
In the Pane layout section of the settings you can switch around the locations of certain user interface elements between the 4 different available panels. Play around a bit with the location and find a setting that works for you, but here’s how I think is more intuitive for beginners.