In this lesson we will learn about vectorized operations. Remember: In this lesson, all R code is below the “R Source” button, while the output is hidden by default. To see it you will need to click on the “R Output” button. Importantly, at the bottom of the page, you will find a bar which allows you to change the theme of the webpage (changing colors and format) so it can easily adapt to your system and preferences. There you also find “Code highlighting” which changes how R code is displayed to you, and Toggle R code and Figures.
While R would make a great calculator, it is designed to help statisticians and data scientists deal with data. And data never comes with one data point. Instead, data usually comes with several observations at a time for a given variable. So we are going to learn how to deal with these R objects.
Say I am interested in the relationship between height, weight, and gender of students with respect to the number of R workshops each student has taken. So I take a sample of size 10 out of the students population and I measure these students’ height, weight, and ask about they gender, and how many R courses/workshops they attended. Then our data would look like this:
Names | Age | Height | Weight | Gender | Courses |
---|---|---|---|---|---|
Alan | 23 | 170.6 | 76.9 | Male | 1 |
Brian | 31 | 179.6 | 59.6 | Male | 2 |
Carlos | 31 | 168.9 | 48.3 | Male | 0 |
Dalton | 25 | 164.9 | 78.6 | Male | 2 |
Ethan | 32 | 160.9 | 54.6 | Male | 0 |
Flora | 26 | 161.6 | 69.8 | Female | 4 |
Gaia | 35 | 194.2 | 56.0 | Female | 0 |
Helen | 26 | 171.3 | 86.5 | Female | 3 |
Ingrid | 27 | 165.1 | 62.9 | Female | 0 |
Jennifer | 20 | 165.6 | 59.4 | Female | 2 |
In analyzing a data-set, we are often interested in conducting operations for a whole set of numbers of a given variable (which we can call vectors). A vector can contain numbers, strings, logical values, or any other type. For example, if we take our participants Height, it would be an example of a numeric vector. If we take our participants’ Gender, it would be an example of a string vector. In this way, one can build a data-set with several types of vectors.
Let’s focus on male students’ heights as our example. Suppose we are interested in their Arithmetic mean, how can we calculate it in R? Here’s the formula:
$$\frac{1}{n} \sum_{i=i}^{n} x_{i}$$
The first thing we need to do - according to the formula - is to add all the heights. So lets…
157.9 + 172.8 + 180.8 + 146.5 + 174.3
… then we need to divide this sum by the number of observations, which is 5. So, what is the mean height of male students in our sample?
(157.9 + 172.8 + 180.8 + 146.5 + 174.3)/5
Now, let’s do the same operation using a vector. In the previous section we learned how to store a mathematical expression into an object. Now, we are going to store more than one piece of data into an object (i.e., vector). So first, we need to name our vector, let’s call it “height”. And it will receive all five values. In R, we do this by “combining” or “concatenating” several values, so we use the “c” in front of a parenthesis, with values separated by commas.
height <- c(157.9, 172.8, 180.8, 146.5, 174.3)
Let’s check if we created our vector correctly.
height
Now that we created our vector, we can do the same operations we did above for our height vector. This is one of the main advantages or R over other statistical software.
So, type in your Source panel (top-left) the following expressions:
Multiplication
height * 2
Division
height/2
Exponentiation
height^2
Vectorized operations are one of the most important strengths of R because it facilitates immensely the process of dealing with data. For example, if we wanted to calculate the mean of height for males, all we have to do is to know the function in R that calculates the: mean(), and put our vector inside it.
mean(height)
Variance
var(height)
Standard Deviation
sd(height)
Median
median(height)
Range
range(height)
Summary
summary(height)
We can also do regular operations with vectors without naming them
c(1, 2, 3, 4, 5) * 2
Can you guess which mathematical operation the below code is yielding?
c(1, 2, 3, 4, 5) * c(1, 2, 3, 4, 5)
How about this last one?
height - c(1, 2, 3, 4, 5) * c(5, 4, 3, 2, 1)