In this lesson we learn about the a few technicalities of R which will be necessary so we understand how to deal with data.frames down the line. We will also see several useful functions to program in R.

Remember: In this lesson, all R code is below the R Source button, while the output is hidden. To see it you will need to click on the green "R Output" button. At the bottom of the page, you will find a bar which allows you to change the theme of the webpage (changing colors and format) so it can easily adapt to your system and preferences. There you also find "Code highlighting" which changes how R code is displayed to you, and Toggle R code and Figures.

11. Intro to the technicals

  • R is CaSe-SeNsItIvE, e.g., "Gender" is different than "gender"
  • Missing values are coded as NA (i.e., not available)

Variable types

  • character - Strings
  • integer - Integers
  • numeric - Integers + Rational + Irrational
  • factor - Categorical variable where each level is a category
  • logical - Boolean
  • complex - Complex numbers

Data types

  • vector - A collection of elements of same class
  • matrix - All columns must uniformly contain only one variable type.
  • data.frame - The columns can contain different classes.
  • list - Can hold objects of different classes and lengths

Classification of R Objects

This is important to learn and distinguish because in programming in R, you need to know the characteristics of the object you want to manipulate.

mode

There are five main types (technically "modes") of R objects

  • integer & numeric: used for quantitative data. Our vector height is numeric.
  • character: used for qualitative data. When there is one string like "Hey There Mate" or a website address.
  • logical: TRUE or FALSE
  • list: special R object ()
  • function: sets of instructions contained in a sub program: e.g., mean()
mode(height)
[1] "numeric"

The mode of the R function mean() is ?

mode(mean)
[1] "function"

In depth knowledge: 'mode', is a mutually exclusive classification of objects according to their basic structure. An object has one and only one mode. The 'atomic' modes are numeric, complex, character and logical. Recursive objects have modes such as 'list' or 'function' or a few others. The emphasis on "main" was given as there are also other types of storage modes (such as raw, complex and others) but we are not going to use them in this course - for more info check ?mode.

class

Here are some of common R classes

  • integer & numeric
  • character & factor
  • matrix (2D)
  • list & data.frame

In depth knowledge: 'class' is a property assigned to an object that determines how generic functions operate with it. It is not a mutually exclusive classification. If an object has no specific class assigned to it, such as a simple numeric vector, it's class is usually the same as its mode, by convention. For more info check ?class.

x <- 1               ; c(class(x), mode(x))
[1] "numeric" "numeric"
x <- letters         ; c(class(x), mode(x))  # creates letters from a to z
[1] "character" "character"
x <- TRUE            ; c(class(x), mode(x))  # logical
[1] "logical" "logical"
x <- cars            ; c(class(x), mode(x))  # cars is probably the most famous dataset in R
[1] "data.frame" "list"      
x <- cars[1]         ; c(class(x), mode(x))  # first collumn of cars dataset
[1] "data.frame" "list"      
x <- cars[[1]]       ; c(class(x), mode(x))  # unlist the observations of the first collumn
[1] "numeric" "numeric"
x <- matrix(cars)    ; c(class(x), mode(x))  # coerces cars into a matrix
[1] "matrix" "list"  
x <- expression(1+1) ; c(class(x), mode(x))  # calculates the mathematical expression
[1] "expression" "expression"
x <- quote(y <- 1)   ; c(class(x), mode(x))  # simply returns its argument, which is not evaluated
[1] "<-"   "call"
x <- mean            ; c(class(x), mode(x))  # 
[1] "function" "function"

In depth knowledge: Optional read & info

There is an excellent discussion on the difference between mode, class, and other characteristics of R objects such as storage.mode and type of here.

12. Useful functions

Here we will learn a few more useful functions in R that will useful to us further along.

Sequences

# seq() function generates a sequence of numbers.
seq(from = 1, to = 9)
[1] 1 2 3 4 5 6 7 8 9
# We can ommit the from and to
seq(1, 9)
[1] 1 2 3 4 5 6 7 8 9
# We can define the step, increment of sequences(Default is 1)
seq(1, 9, by = 2)
[1] 1 3 5 7 9
# Steps can by any amount
seq(0, 9, by = 1.5)
[1] 0.0 1.5 3.0 4.5 6.0 7.5 9.0
# We can also use length parameter instead, which will equally split our
# sequence.
seq(from = 1, to = 10, length.out = 19)
 [1]  1.0  1.5  2.0  2.5  3.0  3.5  4.0  4.5  5.0  5.5  6.0  6.5  7.0  7.5
[15]  8.0  8.5  9.0  9.5 10.0
# We can achieve a simpler version of seq with the operator ':'
0:9
 [1] 0 1 2 3 4 5 6 7 8 9

Repetitions

R also allows you to easily create vectors containing repetitions with the rep() function.

# rep() function replicates the values in x.
rep(c("Isn't it time for a break?"), times = 2)
[1] "Isn't it time for a break?" "Isn't it time for a break?"
# We can use to create repetitions of numbers as well
rep(5, times = 3)
[1] 5 5 5
# Or of sequences of numbers
rep(1:5, times = 2)
 [1] 1 2 3 4 5 1 2 3 4 5
# We can also use 'each' to ask that each element is repeated 'each' times.
rep(1:5, each = 2)
 [1] 1 1 2 2 3 3 4 4 5 5
# What would this give?
rep(rep(4:-5, 5), 2)
 [1]  4  3  2  1  0 -1 -2 -3 -4 -5  4  3  2  1  0 -1 -2 -3 -4 -5  4  3  2
[24]  1  0 -1 -2 -3 -4 -5  4  3  2  1  0 -1 -2 -3 -4 -5  4  3  2  1  0 -1
[47] -2 -3 -4 -5  4  3  2  1  0 -1 -2 -3 -4 -5  4  3  2  1  0 -1 -2 -3 -4
[70] -5  4  3  2  1  0
 [ reached getOption("max.print") -- omitted 25 entries ]
# Or this? Try to press 'Enter' after each parenthesis and comma so you can
# isolate each part of the code and better understand it
rep(rep(seq(1, 10, length.out = 25), 2), 3)
 [1]  1.000  1.375  1.750  2.125  2.500  2.875  3.250  3.625  4.000  4.375
[11]  4.750  5.125  5.500  5.875  6.250  6.625  7.000  7.375  7.750  8.125
[21]  8.500  8.875  9.250  9.625 10.000  1.000  1.375  1.750  2.125  2.500
[31]  2.875  3.250  3.625  4.000  4.375  4.750  5.125  5.500  5.875  6.250
[41]  6.625  7.000  7.375  7.750  8.125  8.500  8.875  9.250  9.625 10.000
[51]  1.000  1.375  1.750  2.125  2.500  2.875  3.250  3.625  4.000  4.375
[61]  4.750  5.125  5.500  5.875  6.250  6.625  7.000  7.375  7.750  8.125
[71]  8.500  8.875  9.250  9.625 10.000
 [ reached getOption("max.print") -- omitted 75 entries ]

Factors

Factors are variables in R which take on a limited number of different values, such variables are often referred to as categorical variables. For example in cross-national research, "Gender" is usually a variable that can either take "Male" or "Female" values. In R terms, the factor would be called Gender, and it would have two levels, "Male" or "Female".

Gender <- factor(rep(c("Male", "Female"), each = 5))
Gender
 [1] Male   Male   Male   Male   Male   Female Female Female Female Female
Levels: Female Male

Alternatively, we can use the gl() function, which generates factors by specifying the pattern of their levels. Where:

gl(n, k, length = n*k, labels = 1:n, ordered = FALSE)

  • n: number of levels
  • k: number of replications
  • length: length of the result. By default: length = n*k
  • labels: labels for the resulting factor levels
  • ordered: whether the result should be ordered or not
Gender <- gl(2, 5, labels = c("Male", "Female"))
Gender
 [1] Male   Male   Male   Male   Male   Female Female Female Female Female
Levels: Male Female

Let's try creating a factor of 3 colors

gl(n = 3, k = 1, labels = c("Brown", "Red", "Green"))
[1] Brown Red   Green
Levels: Brown Red Green
# Now let's use the 'k' argument within the gl function to specify the
# number of replications of each factor level
gl(n = 3, k = 2, length = 9, labels = c("Brown", "Red", "Green"))
[1] Brown Brown Red   Red   Green Green Brown Brown Red  
Levels: Brown Red Green
# Here we see Recycling because we used the 'length' argument which gives
# the length of the wanted result
gl(n = 3, k = 1, length = 12, labels = c("Brown", "Red", "Green"))
 [1] Brown Red   Green Brown Red   Green Brown Red   Green Brown Red  
[12] Green
Levels: Brown Red Green
# We can use rep() and gl() together to achieve a repetion of all factors
# levels at once, instead of repeating each level
rep(gl(n = 4, k = 1, labels = c("Brown", "Red", "Green", "Yellow")), 3)
 [1] Brown  Red    Green  Yellow Brown  Red    Green  Yellow Brown  Red   
[11] Green  Yellow
Levels: Brown Red Green Yellow