Introduction
In the previous post, we learnt to create vectors of different data types. In this post, we will learn to
- coerce different data types
- perform simple operations on vectors
- handle missing data
- index/subset vectors
Naming Vector Elements
It is possible to name the different elements of a vector. The advantage of naming vector elements is that we can later on use these names to access the elements. Use names()
to specify the names of a vector. You can specify the names while creating the vector or add them later.
Method 1: Create vector and add names later
# create vector and add names later
vect1 <- c(1, 2, 3)
# name the elements of the vector
names(vect1) <- c("One", "Two", "Three")
# call vect1
vect1
## One Two Three
## 1 2 3
Method 2: Specify names while creating vector
# specify names while creating vector
vect2 <- c(John = 1, Jack = 2, Jill = 3, Jovial = 4)
# call vect2
vect2
## John Jack Jill Jovial
## 1 2 3 4
Vector Coercion
Vectors are homogeneous i.e. all the elements of the vector must be of the same type. If we try to create a vector by combining different data types, the elements will be coerced to the most flexible type. The below table shows the order in which coercion occurs.
character
data type is the most flexible while logical
data type is the least flexible. If you try to combine any other data type with character
, all the elements will be coerced to type character
. In the absence of character
data, all elements will be coerced to numeric
. Finally, if the data does not include character
or numeric
types, all the elements will be coerced to integer
type.
Case 1: Different Data Types
# vector of different data types
vect1 <- c(1, 1L, 'one', TRUE)
# call vect1
vect1
## [1] "1" "1" "one" "TRUE"
# check data type
class(vect1)
## [1] "character"
Case 2: Numeric, Integer and Logical
# vector of different data types
vect1 <- c(1, 1L, TRUE)
# call vect1
vect1
## [1] 1 1 1
# check data type
class(vect1)
## [1] "numeric"
Case : Integer and Logical
# vector of different data types
vect1 <- c(1L, TRUE)
# call vect1
vect1
## [1] 1 1
# check data type
class(vect1)
## [1] "integer"
To summarize, below is the order in which coercion takes place:
Vector Operations
In this section, we look at simple operations that can be performed on vectors in R. Remember that the nature of the operations depends upon the type of data. Below are a few examples:
Case 1: Vectors of same length
# create two vectors
vect1 <- c(1, 3, 8, 4)
vect2 <- c(2, 7, 1, 9)
# addition
vect1 + vect2
## [1] 3 10 9 13
# subtraction
vect1 - vect2
## [1] -1 -4 7 -5
# multiplication
vect1 * vect2
## [1] 2 21 8 36
# division
vect1 / vect2
## [1] 0.5000000 0.4285714 8.0000000 0.4444444
Case 2: Vectors of different length
In the previous case, the length i.e. the number of elements in the vectors were same. What happens if the length of the vectors are unequal? In such cases, the shorter vector is recycled to match the length of the longer vector. The below example should clear this concept:
# create two vectors
vect1 <- c(2, 7)
vect2 <- c(1, 8, 5, 2)
# addition
vect1 + vect2
## [1] 3 15 7 9
# subtraction
vect1 - vect2
## [1] 1 -1 -3 5
# multiplication
vect1 * vect2
## [1] 2 56 10 14
# division
vect1 / vect2
## [1] 2.000 0.875 0.400 3.500
Missing Data
Missing data is a reality. No matter how careful you are in collecting data for your analysis, chances are always high that you end up with some missing data. In R missing values are represented by NA
. In this section, we will focus on the following:
- test for missing data
- remove missing data
- exclude missing data from analysis
Detect missing data
We first create a vector with missing values. After that, we will use is.na()
to test whether the data contains missing values. is.na()
returns a logical vector equal to the length of the vector being tested. Another function that can be used for detecting missing values is complete.cases()
. Below is an example:
# vector with missing values
vect1 <- c(1, 3, NA, 5, 2)
# use is.na
is.na(vect1)
## [1] FALSE FALSE TRUE FALSE FALSE
# use complete.cases
complete.cases(vect1)
## [1] TRUE TRUE FALSE TRUE TRUE
Omit missing data
In the presensce of missing data, all computations in R will return NA
. To avoid this, we might want to remove the missing data before doing any computation. na.omit()
will remove all missing values from the data. Let us look at an example:
# vector with missing values
vect1 <- c(1, 3, NA, 5, 2)
# call vect1
vect1
## [1] 1 3 NA 5 2
# omit missing values
na.omit(vect1)
## [1] 1 3 5 2
## attr(,"na.action")
## [1] 3
## attr(,"class")
## [1] "omit"
Exclude missing data
To exclude missing values from computations, use na.rm
and set it to TRUE
.
# vector with missing values
vect1 <- c(1, 3, NA, 5, 2)
# compute mean
mean(vect1)
## [1] NA
# compute mean by excluding missing value
mean(vect1, na.rm = TRUE)
## [1] 2.75