2 min read

Dataframes

Introduction

In the previous post, we learnt about lists. In this post, we will learn about dataframe.

  • create dataframe
  • select columns
  • select rows
  • utitlity functions

Create dataframes

Use data.frame to create dataframes. Below is the function syntax:

args(data.frame)
## function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, 
##     fix.empty.names = TRUE, stringsAsFactors = default.stringsAsFactors()) 
## NULL

Data frames are basically lists with elements of equal lenght and as such, they are heterogeneous. Let us create a dataframe:

name <- c('John', 'Jack', 'Jill')
age <- c(29, 25, 27)
graduate <- c(TRUE, TRUE, FALSE)
students <- data.frame(name, age, graduate)
students
##   name age graduate
## 1 John  29     TRUE
## 2 Jack  25     TRUE
## 3 Jill  27    FALSE

Basic Information

class(students)
## [1] "data.frame"
names(students)
## [1] "name"     "age"      "graduate"
colnames(students)
## [1] "name"     "age"      "graduate"
str(students)
## 'data.frame':    3 obs. of  3 variables:
##  $ name    : chr  "John" "Jack" "Jill"
##  $ age     : num  29 25 27
##  $ graduate: logi  TRUE TRUE FALSE
dim(students)
## [1] 3 3
nrow(students)
## [1] 3
ncol(students)
## [1] 3

Select Columns

  • []
  • [[]]
  • $
# using [
students[1]
##   name
## 1 John
## 2 Jack
## 3 Jill

# using [[
students[[1]]
## [1] "John" "Jack" "Jill"

# using $
students$name
## [1] "John" "Jack" "Jill"

Multiple Columns

students[, 1:3]
##   name age graduate
## 1 John  29     TRUE
## 2 Jack  25     TRUE
## 3 Jill  27    FALSE

students[, c(1, 3)]
##   name graduate
## 1 John     TRUE
## 2 Jack     TRUE
## 3 Jill    FALSE

Select Rows

# single row
students[1, ]
##   name age graduate
## 1 John  29     TRUE

# multiple row
students[c(1, 3), ]
##   name age graduate
## 1 John  29     TRUE
## 3 Jill  27    FALSE

If you have observed carefully, the column names has been coerced to type factor. This happens because of a default argument in data.frame which is stringsAsFactors which is set to TRUE. If you do not want to treat it as factors, set the argument to FALSE.

students <- data.frame(name, age, graduate, stringsAsFactors = FALSE)

We will learn about wrangling dataframes in a different post.