Introduction
In the previous post, we learnt about lists. In this post, we will learn about dataframe
.
- create dataframe
- select columns
- select rows
- utitlity functions
Create dataframes
Use data.frame
to create dataframes. Below is the function syntax:
args(data.frame)
## function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE,
## fix.empty.names = TRUE, stringsAsFactors = default.stringsAsFactors())
## NULL
Data frames are basically lists with elements of equal lenght and as such, they are heterogeneous. Let us create a dataframe:
name <- c('John', 'Jack', 'Jill')
age <- c(29, 25, 27)
graduate <- c(TRUE, TRUE, FALSE)
students <- data.frame(name, age, graduate)
students
## name age graduate
## 1 John 29 TRUE
## 2 Jack 25 TRUE
## 3 Jill 27 FALSE
Basic Information
class(students)
## [1] "data.frame"
names(students)
## [1] "name" "age" "graduate"
colnames(students)
## [1] "name" "age" "graduate"
str(students)
## 'data.frame': 3 obs. of 3 variables:
## $ name : chr "John" "Jack" "Jill"
## $ age : num 29 25 27
## $ graduate: logi TRUE TRUE FALSE
dim(students)
## [1] 3 3
nrow(students)
## [1] 3
ncol(students)
## [1] 3
Select Columns
[]
[[]]
$
# using [
students[1]
## name
## 1 John
## 2 Jack
## 3 Jill
# using [[
students[[1]]
## [1] "John" "Jack" "Jill"
# using $
students$name
## [1] "John" "Jack" "Jill"
Multiple Columns
students[, 1:3]
## name age graduate
## 1 John 29 TRUE
## 2 Jack 25 TRUE
## 3 Jill 27 FALSE
students[, c(1, 3)]
## name graduate
## 1 John TRUE
## 2 Jack TRUE
## 3 Jill FALSE
Select Rows
# single row
students[1, ]
## name age graduate
## 1 John 29 TRUE
# multiple row
students[c(1, 3), ]
## name age graduate
## 1 John 29 TRUE
## 3 Jill 27 FALSE
If you have observed carefully, the column names
has been coerced to type factor. This happens because of a default argument in
data.frame
which is stringsAsFactors
which is set to TRUE
. If you do not want to treat it as factors
, set the argument to FALSE
.
students <- data.frame(name, age, graduate, stringsAsFactors = FALSE)
We will learn about wrangling dataframes in a different post.