6 min read

Vectors - Part 3

Introduction

In the previous post, we learnt to perform simple operations on vector and handle missing values. In this post, we will learn to index/subset vectors.

Index Vectors

One of the most important steps in data analysis is selecting a subset of data from a bigger data set. Indexing helps in retrieving values individually or a set of values that meet a specific criteria. In this post, we look at various ways of indexing/subsetting vectors.

Index Operator

[] is the index operator in R. We can use various expressions within [] to subset data. In R, index positions begin at 1 and not 0. To begin with, let us look at values in different index positions:

# random sample of 10 values
vect1 <- sample(10)
vect1
##  [1]  5  7  2  8  9  6 10  4  1  3
# return third element
vect1[3]
## [1] 2
# return seventh element
vect1[7]    
## [1] 10

Out of range index

# random sample of 10 values
vect1 <- sample(10)
vect1
##  [1]  8  2  5  1 10  9  3  4  7  6
# return value at index 0
vect1[0]
## integer(0)
# length of the vector
length(vect1)
## [1] 10
# out of range index
vect1[11]   
## [1] NA

In the first case, we specified the index as 0 and in the second case we used the index 11, which is greater than the length of the vector. R returns an empty vector in the first case and NA in the second case.

Negative Index

Using a negative index will delete the value in the said index position. Unlike other languages, it will not index elements from the end of the vector counting backwards. Let us look at an example to understand how negative index works in R:

# random sample of 10 values
vect1 <- sample(10)
vect1
##  [1]  6  9  3  2 10  8  4  7  1  5
# drop third element
vect1[-3]
## [1]  6  9  2 10  8  4  7  1  5
# drop seventh element
vect1[-7]   
## [1]  6  9  3  2 10  8  7  1  5

Subset Multiple Elements

If we do not specify anything within [], all the elements in the vector will be returned. We can specify the index elements using any expression that generates a sequence of integers. Let us look at a few examples:

# random sample of 10 values
vect1 <- sample(10)
vect1
##  [1]  2  8  1  9  7 10  5  4  3  6
# return all elements
vect1[]
##  [1]  2  8  1  9  7 10  5  4  3  6
# return first 5 values
vect1[1:5]
## [1] 2 8 1 9 7
# return all values from the 5th position
end <- length(vect1)
vect1[5:end]
## [1]  7 10  5  4  3  6

If you are using the colon to generate the index positions, you will have to specify both the starting and ending position, else, R will return an error.

What if we want elements that are not in a sequence as we saw in the last example? In such cases, we have to create a vector using c() and use it to extract elements from the original vector. Below is an example:

# random sample of 10 values
vect1 <- sample(10)
vect1
##  [1]  7  4 10  3  9  8  5  1  2  6
# extract 2nd, 5th and 7th element
select <- c(2, 5, 7)
vect1[select]
## [1] 4 9 5
# extract elements in position 1 to 4, 6 and 9
select <- c(1:4, 6, 9)
vect1[select]
## [1]  7  4 10  3  8  2

Subset Named Vectors

Vectors can be subset using the name of the elements. When using name of elements for subsetting, ensure that the names are enclosed in single or double quotations, else R will return an error. Let us look at a few examples:

vect1 <- c(score1 = 8, score2 = 6, score3 = 9)
vect1
## score1 score2 score3 
##      8      6      9
# extract score2
vect1['score2']
## score2 
##      6
# extract score1 and score3
vect1[c('score1', 'score3')]
## score1 score3 
##      8      9

Subset using logical values

Logical values can be used to subset vectors. They are not very flexible but can be used for simple indexing. In all of the below examples, the logical vectors are recycled to match the length of the vector from which we subset data:

# random sample of 10 values
vect1 <- sample(10)
vect1
##  [1]  8  1  4  5 10  9  3  6  2  7
# returns all values
vect1[TRUE]
##  [1]  8  1  4  5 10  9  3  6  2  7
# empty vector
vect1[FALSE]
## integer(0)
# values in odd positions
vect1[c(TRUE, FALSE)]
## [1]  8  4 10  3  2
# values in even positions
vect1[c(FALSE, TRUE)]
## [1] 1 5 9 6 7

Subset using logical expressions

Logical expressions can be used to extract elements that meet specific criteria. This method is most flexible and useful as we can combine multiple conditions using relational and logical operators. Before we use logical expressions, let us spend some time understanding comparison and logical operators as we will be using them extensively hereafter.

Comparison Operators

When you create an expression using a comparison operator, the output is always a logical value i.e. TRUE or FALSE. Let us see how we can use comparison operators to subset data:

# random sample of 10 values
vect1 <- sample(10)
vect1
##  [1]  3  1  9 10  5  8  2  6  4  7
# return elements greater than 5
vect1 > 5
##  [1] FALSE FALSE  TRUE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE
vect1[vect1 > 5]
## [1]  9 10  8  6  7
# return elements greater than or equal to 5
vect1 >= 5
##  [1] FALSE FALSE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE FALSE  TRUE
vect1[vect1 >= 5]
## [1]  9 10  5  8  6  7
# return elements lesser than 5
vect1 < 5
##  [1]  TRUE  TRUE FALSE FALSE FALSE FALSE  TRUE FALSE  TRUE FALSE
vect1[vect1 < 5]
## [1] 3 1 2 4
# return elements lesser than or equal to 5
vect1 <= 5
##  [1]  TRUE  TRUE FALSE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE
vect1[vect1 <= 5]
## [1] 3 1 5 2 4
# return elements equal to 5
vect1 == 5
##  [1] FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE
vect1[vect1 == 5]
## [1] 5
# return elements not equal to 5
vect1 != 5
##  [1]  TRUE  TRUE  TRUE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE
vect1[vect1 != 5]
## [1]  3  1  9 10  8  2  6  4  7

Logical Operators

Let us combine comparison and logical operators to create expressions and use them to subset vectors:

# random sample of 10 values
vect1 <- sample(10)
vect1
##  [1]  3  2  9  7  5 10  4  8  6  1
# return all elements less than 8 or divisible by 3
vect1[(vect1 < 8 | (vect1 %% 3 == 0))]
## [1] 3 2 9 7 5 4 6 1
# return all elements less than 7 or divisible by 2
vect1[(vect1 < 7 | (vect1 %% 2 == 0))]
## [1]  3  2  5 10  4  8  6  1