This is the first installment of “Introduction to R & Statistics” . We will learn the first steps in programming in R, as well as run essential statistical analyses.

1. A bit of R history

R is a programming language, a statistical programming language, designed by Ross Ihaka and Robert Gentleman as an implementation of the S programming language.

Originally, the underlying goal of the researchers of the University of Auckland, New Zealand, was to develop a language that was capable to do data analysis, statistics and graphical models in user-friendly way. The project was first conceived in 1992, with its first version released in 1995 and a stable beta version in 2000. Noways, R is the lingua franca of statistics, and it is currently developed by the R Development Core Team, of which Chambers is a member.

Curiously, R is named partly after the first names of the first two R authors and partly as a play on the name of S.ref

2. Advantages of R

There are many reasons why R is the language of Data Science and Statistics.

  1. It is Free and Open-source.
  2. It runs on UNIX, Windows and Macintosh.
  3. It is especially written for vector operations. [no need of for loops]
  4. It has one of biggest online communities, where you can ask questions, get help, etc.
  5. It offers 7000+ packages, which expands it capabilities - through allowing people to create content - giving R endless possibilities.
  6. It is a Programming language based on S, which allows for very fast operations, which is why it is considered to be the language of data science.
  7. There are several user-interfaces which you can use (e.g., R-Studio, Jupyter)

If you would like a more informative descrition of why you should learn R, there is one blog post that goes at length in explaining it. There also this one.

3. Advantages of R with RStudio

While R has a command line interface, there are several graphical front-ends available. In this course we will explore RStudio which has many (many!) features that will be useful in learning R. Here’s what the partnership between R & Rstudio can do.

  1. High-end Graphics
  2. User-friendly interface
  3. Reports/Slides
  4. Shiny apps
  5. Abundant on-line Resources
  6. Abundant on-line Webnars

4. R with RStudio vs. other statistical softwares

If you are interested in knowing how R (and RStudio) compare to other software, here a good source. The information contained in the link is summarized in the below table.

4.1 R with RStudio vs. Python

On the off chance that you are wondering whether to learn R or Python, most Statisticians and Data scientists agree that you probably should start with R, and as you go along, Python can be really useful. For example, in case of Deep learning, scripting, and big data-sets (> millions cases). One of the best resources discussing this issue freely available is on datacamp.com.

5. Getting Started with R & RStudio:

5.1 Downloading and Installing R

This is the website where you can download R, and many of the library packages that are available.

Link

5.2 Updating R

If you have R already installed, you want update your R to the latest version. You can do so by running the below code. It will check for newer versions, and if one is available, it will guide you through the decisions you will need to make.

install.packages("installr")  # Install R package that facilitates the process
library(installr)  # load the package in R
updateR()  # update R

5.3 Downloading RStudio

RStudio is a great interface that makes R a lot more accessible. RStudio includes a console, syntax-highlighting editor that supports direct code execution, as well as tools for plotting, history, debugging and work-space management.

Link

5.4 Updating RStudio

If you have RStudio installed, you also want its latest version. Go to Help > Check updates in the menu.

6. Need more help?

Here’s a video depicting the installation of R and RStudio (link).

If you would like to learn R with video lessons, in this page you will find a collection of R online video courses on YouTube.

7. RStudio Settings: personal recommendations

Before we start the workshop, lets go through a number of settings which are worthwhile to know about.

The advantages of these setting will bring us:

  • Code completion
  • Inline documentation
  • Live preview of R markdown documents

Click on Tools menu, find Global options (last option).

  • deselect the Restore .RData into work-space at start-up,
  • set Save work-space to .RData on exit to Never.

These options ensure that any content of previous R sessions is never stored or reloaded between R sessions.

  • Set the Editor Theme to some less aggressive to the eyes. I use Chaos, but Cobalt, Idle Fingers, Mervivore Soft, Monokai, Solarized Light and Dark, and Tomorrow Night are also good options.
  • Set the zoom according to your screen settings. You can also use Font Size to achieve the same result.

In the Pane layout section of the settings you can switch around the locations of certain user interface elements between the 4 different available panels. Play around a bit with the location and find a setting that works for you, but here’s how I think is more intuitive for beginners.

  • You want to make only one change: click on the bottom-left “Console” and choose “Environment, History, Build…” option. This should flip these two panels yielding an optimal setting in which you would have all that is “input” on the left, and what is output on the right.