In this lesson we will learn about vectorized operations. Remember: In this lesson, all R code is below the "R Source" button, while the output is hidden by default. To see it you will need to click on the "R Output" button. Importantly, at the bottom of the page, you will find a bar which allows you to change the theme of the webpage (changing colors and format) so it can easily adapt to your system and preferences. There you also find "Code highlighting" which changes how R code is displayed to you, and Toggle R code and Figures.

Practical N.2

Vector and R functions

Create a vector called height.female which is composed by the height of all women in our sample, and another vector called n.R.courses for the number of R courses/workshops for each participant. Then for these two vectors, calculate the mean, median, variance, standard deviation, and range. Now, choose at least 5 (five) other functions from the list below and apply to both created vectors. See if you understand what the function returns.

R function Description
max(x) Largest Value
min(x) Smallest Value
mean(x) Arithmetic Mean
sum(x) Sum
median(x) Median
var(x) Variance
sd(x) Standard Deviation
abs(x) Absolute value
range(x) Range
length(x) Length
diff(x, lag=1) lagged differences
scale(x) column center or standardize a matrix.
sqrt(x) square root
ceiling(x) ceiling(3.475) is 4
floor(x) floor(3.475) is 3
round(x, digits=n) round(3.475, digits=2) is 3.48
log(x) natural logarithm
log10(x) common logarithm
exp(x) e^x
summary(x) Min, 1st Quan, Median, Mean, 3rd Quan, and Max
quantile(x) sample quantiles

Boolean expressions in R

Boolean expressions evaluate to either TRUE or FALSE. A crucial part of computing involves conditional statements. Is this value bigger than other? Are two vectors the same size? etc. Questions can be joined together using words like 'and' 'or', 'not'. In R, < means 'less than', > means 'greater than', and ! means 'not' (see Table below).

R function Symbol Description
! logical NOT
& logical AND
logical OR
< less than
<= less than or equal to
> greater than
= greater than or equal to
== logical equals (double =)
!= not equal
&& AND with IF
double upright bars OR with IF
xor(x,y) exclusive OR
isTRUE(x) an abbreviation of identical(TRUE,x)

For all these logical statements, try to figure out (before running the code) the result/answer. Then run the code by pressing ctrl + enter on the desired line.

# Is true equal to false?
TRUE == FALSE
[1] FALSE
# T and F are shorthand for TRUE and FALSE. Try this:
T == TRUE
[1] TRUE
T == F
[1] FALSE
T != F
[1] TRUE
# Is 4 smaller than 4
3 < 4
[1] TRUE
# Is 2 + 2 equal to 5
2 + 2 == 5
[1] FALSE
# Is 2 smaller than 5
2 < 5
[1] TRUE
# Is 7 smaller or equal than minus 2
7 <= -2
[1] FALSE
# Try to figure these out
3 > (3 + 1)
[1] FALSE
4 >= 4
[1] TRUE
(3/4) == (9/12)
[1] TRUE
# The symbol '!' is a negation of a logical statement
!TRUE
[1] FALSE
!F
[1] TRUE
2^4 != 4^2
[1] FALSE
!(2 < 1)
[1] TRUE
!(3 < 6)
[1] FALSE
# The ampersand symbol & means 'and'. A statement is TRUE only if the
# expressions on both sides of the operator are TRUE. One can also think of
# 'intersection' as in set operations
3 * 4 == 12 & 6/8 < 1
[1] TRUE
(3 < 5) & (2 > 0)
[1] TRUE
(2 < 3) & (5 > 5)
[1] FALSE
# The symbol | means 'or'. The | operator is TRUE if at least one of the
# expressions surrounding it is TRUE. You can also think in terms of set
# operations as in 'union' of sets.
(3 < 5) | (2 > 3)
[1] TRUE
(2 < 1) | (5 > 5)
[1] FALSE
TRUE | FALSE
[1] TRUE
FALSE | TRUE
[1] TRUE
FALSE | 2 + 2 == 4
[1] TRUE
# Can you guess?
c(1, 2, 3, 4, 5) <= 3
[1]  TRUE  TRUE  TRUE FALSE FALSE
((5 > 4) & !(3 < 2)) | (6 > 7)
[1] TRUE
# Most Boolean operators act element-wise.  %in% is a matching operator
c(1, 2, 3, 4, 5) %in% c(1, 2, 3)
[1]  TRUE  TRUE  TRUE FALSE FALSE
height %in% c(157.9, 172.8, 180.8, 146.5, 174.3)
[1] TRUE TRUE TRUE TRUE TRUE
height.female %in% c(161.6, 194.2, 171.3, 165.1, 165.6)
[1] FALSE FALSE FALSE FALSE FALSE
# %% is the symbol for modulus. In computing, the modulo operation finds the
# remainder after division of one number by another (sometimes called
# modulus)
5%%2
[1] 1
9%%3
[1] 0
V <- c(3, 2, 8, 6, 5, 6, 11, 0)
I <- (V %in% 2 == 1)

# Lets try to use what we learned thus far with our vectors
height == height.female
[1] FALSE FALSE FALSE FALSE FALSE
height > height.female
[1] FALSE FALSE  TRUE FALSE  TRUE
height < height.female
[1]  TRUE  TRUE FALSE  TRUE FALSE

Advanced Exercises 1

These exercises are slightly adapted (shamelessly copied with minor adjustments) from R-exercises, a website that offers many exercises for you to test your R skills. They also offer a R Course Finder which catalogs several R courses on MOOCs (Massive Open Online Courses) such as Coursera and Khan Academy and other online learning platforms (e.g. Udemy, EdX, Lynda.com).

Exercise 1

There are two main different type of interest, simple and compound. To start let's create 3 variables, initial investment (S = 100), annual simple interest (i1=.02), annual compound interest (i2=.015), and the years that the investment will last (n=2).

Simple Interest: define a variable called simple equal to S * (1 + i1 * n)

Compound Interest: define a variable called compound equal to S * (1 + i2)n

R source
S <- 100
i1 <- 0.1
i2 <- 0.09
n <- 2
simple <- S * (1 + i1 * n)
compound <- S * (1 + i2)^n

Exercise 2

It's natural to ask which type of interest for this values gives more amount of money after 2 years (n = 2). Using logical functions <,>, == check which variable is bigger between simple and compound

R source
simple > compound
R output
[1] TRUE

Exercise 3

Using logical functions <,>, ==, |, & find out if simple or compound is equal to 120

Using logical functions <,>, ==, |, & find out if simple and compound is equal to 120

R source
simple >= 120 | compound >= 120
R output
[1] TRUE
R source
simple >= 120 & compound >= 120
R output
[1] FALSE

Exercise 4

Formulas can deal with vectors, so let's define a vector and use it in one of the formulas we defined earlier. Let's define S as a vector with the following values 100, 96. Remember that c() is the function that allow us to define vectors.

Apply to S the simple interest formula and store the value of the vector in simple

R source
S <- c(100, 95)
simple <- S * (1 + i1 * n)

Exercise 5

Using logical functions <,>, == check if any of the simple values is smaller or equal to compound

R source
simple <= compound
R output
[1] FALSE  TRUE

Exercise 6

Using the function %/% find out how many $20 candies can you buy with the money stored in compound

R source
compound%/%20
R output
[1] 5

Exercise 7

Using the function %% find out how much money is left after buying the candies.

R source
compound%%20
R output
[1] 18.81

Exercise 8

Let's create two new variables, ode defined as rational=1/3 and decimal=0.33. Using the logical function != Verify if this two values are different.

R source
rational <- 1/3
decimal <- 0.33
rational != decimal
R output
[1] TRUE

Exercise 9

There are other functions that can help us compare two variables.

Use the logical function == verify if rational and decimal are the same.

Use the logical function isTRUE verify if rational and decimal are the same.

Use the logical function identical verify if rational and decimal are the same.

R source
rational == decimal
R output
[1] FALSE
R source
isTRUE(rational == decimal)
R output
[1] FALSE
R source
identical(rational, decimal)
R output
[1] FALSE

Exercise 10

Using the help of the logical functions of the previous exercise find the approximation that R uses for 1/3. Hint: It is not the value that R prints when you define 1/3

R source
1/3 == 0.333333333333333
R output
[1] FALSE

Advanced Exercises 2

  1. Calculate square root of 729
  2. Create a new variable 'b' with value 5124
  3. Create a vector numbers from 1 to 21 and find out its class
  4. Create a vector containing following mixed elements {2131, 24, 'j', 2, 'b'} and find out its class
  5. Initialise a character vector of length 26
  6. Assign the character 'a' to the first element in above vector
  7. Create a vector with some of your class mates names (at least 5)
  8. Get the length of above vector
  9. Get the first two friends from above vector
  10. Get the 2nd and 3rd friends
  11. Sort your friends by names
  12. Reverse direction of the above sort
  13. Create with rep() and seq() R functions the following vector: 'a','a','a', 1,2,3,4,5,11,13,15,17,19,21
  14. Sample 50 random numbers between 1 to 100
  15. Sample 50 random numbers between 1 to 500, with replacement
  16. Find the class of 'iris' dataframe, find the class of all the columns of 'iris', get the summary of 'iris', get the top 6 rows, view it in a spreadsheet format, get row names, get column names, get number of rows and get number of columns.
  17. Apply the above functions and inspect results on 'iris' (a base R dataframe)
  18. Get the last 2 rows in last 2 columns from iris dataset
  19. Get rows with Sepal.Width > 3.5 from iris
  20. Get the rows with 'versicolor' species using subset() from iris
R source
sqrt(729)
b <- 5124
one_to_21 <- 1:21
class(one_to_21)
my.vector <- c(2131, 24, "j", 2, "b")
class(my.vector)
charHundred <- character(26)
charHundred
charHundred[1] <- "a"
myFriends <- c("alan", "bala", "amir", "tsong", "chan")
length(myFriends)
myFriends[1:2]
myFriends[c(2, 3)]
sort(myFriends)
myFriends[order(myFriends)]
sort(myFriends, decreasing = TRUE)
myFriends[rev(order(myFriends))]
out <- c(rep("a", 3), seq(1, 5), seq(11, 21, by = 2))
mySample <- sample(1:100, 50)
mySample <- sample(1:500, 50, replace = T)
class(iris)  # get class
sapply(iris, class)  # get class of all columns
str(iris)  # structure
summary(iris)  # summary of airquality
head(iris)  # view the first 6 obs
fix(iris)  # view spreadsheet like grid
rownames(iris)  # row names
colnames(iris)  # columns names
nrow(iris)  # number of rows
ncol(iris)  # number of columns
numRows <- nrow(iris)
numCols <- ncol(iris)
iris[(numRows - 1):numRows, (numCols - 1):numCols]
iris[iris$Sepal.Width > 3, ]
iris[which(iris$Sepal.Width > 3), ]
subset(iris, Species == "versicolor")