Introduction to tibbles

Introduction to tibbles

Introduction

A tibble, or tbl_df, is a modern reimagining of the data.frame, keeping what time has proven to be effective, and throwing out what is not. Tibbles are data.frames that are lazy and surly: they do less (i.e. they don’t change variable names or types, and don’t do partial matching) and complain more (e.g. when a variable does not exist). This forces you to confront problems earlier, typically leading to cleaner, more expressive code. Tibbles also have an enhanced print method() which makes them easier to use with large datasets containing complex objects.

Source: https://tibble.tidyverse.org/

In this post, we will explore tibbles. To be more precise, we will learn:

  • how tibbles are different from data frames?
  • how to create tibbles?
  • how to manipulate tibbles?

Libraries, Code & Data

We will use the following packages:

The code can be found here.

library(tibble)
library(dplyr)

Creating tibbles

tibble can be created using any of the following:

  • tibble()
  • as_tibble()
  • tribble()

Let us start with tibble().

tibble(x = letters,
       y = 1:26,
       z = sample(100, 26))
## # A tibble: 26 x 3
##    x         y     z
##    <chr> <int> <int>
##  1 a         1    12
##  2 b         2    38
##  3 c         3    14
##  4 d         4    39
##  5 e         5    62
##  6 f         6    85
##  7 g         7     7
##  8 h         8    78
##  9 i         9    65
## 10 j        10    73
## # ... with 16 more rows

We mentioned the column names followed by the data. If you do not specify the column names, tibble() will supply them. Ensure that the length of each column is same.

tibble features

  • never changes input’s types

tibble() will never alter the input’s type. For example, if you supply a character vector it will not be converted to factor unlike data.frame where you need to set stringsAsFactors to FALSE.

tibble(x = letters,
       y = 1:26,
       z = sample(100, 26))
## # A tibble: 26 x 3
##    x         y     z
##    <chr> <int> <int>
##  1 a         1    72
##  2 b         2    29
##  3 c         3    78
##  4 d         4    15
##  5 e         5    90
##  6 f         6     1
##  7 g         7    99
##  8 h         8    80
##  9 i         9    63
## 10 j        10    62
## # ... with 16 more rows
  • never adjusts variable names

tibble() will never modify the column names. In the below example, you can observe that while data.frame adds a ., tibble() retains the column names as is.

names(data.frame(`order value` = 10))
## [1] "order.value"
names(tibble(`order value` = 10))
## [1] "order value"
  • never prints all rows

tibble() will never print all the rows and clutter your console. It will only print the first 10 rows and only as many columns that fit the width of the console.

x <- 1:100
y <- letters[1]
z <- sample(c(TRUE, FALSE), 100, replace = TRUE)
tibble(x, y, z)
## # A tibble: 100 x 3
##        x y     z    
##    <int> <chr> <lgl>
##  1     1 a     TRUE 
##  2     2 a     FALSE
##  3     3 a     FALSE
##  4     4 a     TRUE 
##  5     5 a     FALSE
##  6     6 a     FALSE
##  7     7 a     TRUE 
##  8     8 a     TRUE 
##  9     9 a     FALSE
## 10    10 a     TRUE 
## # ... with 90 more rows
  • never recycles vector of length greater than 1

Recycling vectors of length greater than 1 often leads to errors and as such tibble() will only recycle vectors of length 1.

x <- 1:100
y <- letters
z <- sample(c(TRUE, FALSE), 100, replace = TRUE)
tibble(x, y, z)
Error in overscope_eval_next(overscope, expr) : object 'y' not found

Membership Testing

We can test if an object is a tibble using is_tibble().

is_tibble(mtcars)
## [1] FALSE
is_tibble(as_tibble(mtcars))
## [1] TRUE

Tribble

Another way to create tibbles is using tribble():

  • it is short for transposed tibbles
  • it is customized for data entry in code
  • column names start with ~
  • and values are separated by commas
tribble(
  ~x, ~y, ~z,
  #--|--|----
  1, TRUE, 'a',
  2, FALSE, 'b'
)
## # A tibble: 2 x 3
##       x y     z    
##   <dbl> <lgl> <chr>
## 1     1 TRUE  a    
## 2     2 FALSE b

Column Names

Names of the columns in tibbles need not be valid R variable names. They can contain unusual characters like a space or a smiley but must be enclosed in ticks.

tibble(
  ` ` = 'space',
  `2` = 'integer',
  `:)` = 'smiley'
)
## # A tibble: 1 x 3
##   ` `   `2`     `:)`  
##   <chr> <chr>   <chr> 
## 1 space integer smiley

Add Rows

Let us add data related to Safari browser to the web traffic data using add_row().

browsers <- enframe(c(chrome = 40, firefox = 20, edge = 30))
browsers
## # A tibble: 3 x 2
##   name    value
##   <chr>   <dbl>
## 1 chrome     40
## 2 firefox    20
## 3 edge       30
add_row(browsers, name = 'safari', value = 10)
## # A tibble: 4 x 2
##   name    value
##   <chr>   <dbl>
## 1 chrome     40
## 2 firefox    20
## 3 edge       30
## 4 safari     10

If we want to add the data at a particular row, we can specify the row number using the .before argument. Let us add the data related to Safari browser in the second row instead of the last row.

add_row(browsers, name = 'safari', value = 10, .before = 2)
## # A tibble: 4 x 2
##   name    value
##   <chr>   <dbl>
## 1 chrome     40
## 2 safari     10
## 3 firefox    20
## 4 edge       30

Add Columns

add_column() adds a new column to tibbles.

browsers <- enframe(c(chrome = 40, firefox = 20, edge = 30, safari = 10))
add_column(browsers, visits = c(4000, 2000, 3000, 1000))
## # A tibble: 4 x 3
##   name    value visits
##   <chr>   <dbl>  <dbl>
## 1 chrome     40   4000
## 2 firefox    20   2000
## 3 edge       30   3000
## 4 safari     10   1000

Rownames

The tibble package provides a set of functions to deal with rownames. Remember, tibble does not have rownames unlike data.frame. To check whether a data set has rownames, use has_rownames().

has_rownames(mtcars)
## [1] TRUE

Remove Rownames

remove_rownames(mtcars)
##     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## 1  21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
## 2  21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
## 3  22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
## 4  21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
## 5  18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
## 6  18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
## 7  14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
## 8  24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
## 9  22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
## 10 19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
## 11 17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
## 12 16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
## 13 17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
## 14 15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
## 15 10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
## 16 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
## 17 14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
## 18 32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
## 19 30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
## 20 33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
## 21 21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
## 22 15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
## 23 15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
## 24 13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
## 25 19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
## 26 27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
## 27 26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
## 28 30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
## 29 15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
## 30 19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
## 31 15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
## 32 21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2

Rownames to Column

head(rownames_to_column(mtcars))
##             rowname  mpg cyl disp  hp drat    wt  qsec vs am gear carb
## 1         Mazda RX4 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## 2     Mazda RX4 Wag 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## 3        Datsun 710 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## 4    Hornet 4 Drive 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## 5 Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## 6           Valiant 18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Column to Rownames

To convert the first column in the data set to rownames, use column_to_rownames():

mtcars_tbl <- rownames_to_column(mtcars)
column_to_rownames(mtcars_tbl)
##                      mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
## Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
## Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
## Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
## Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
## Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
## Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
## Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
## Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
## Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
## Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
## Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
## Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
## Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
## Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
## Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
## Toyota Corona       21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
## Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
## AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
## Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
## Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
## Fiat X1-9           27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
## Porsche 914-2       26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
## Lotus Europa        30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
## Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
## Ferrari Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
## Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
## Volvo 142E          21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2

Glimpse

Use glimpse() to get an overview of the data.

glimpse(mtcars)
## Observations: 32
## Variables: 11
## $ mpg  <dbl> 21.0, 21.0, 22.8, 21.4, 18.7, 18.1, 14.3, 24.4, 22.8, 19....
## $ cyl  <dbl> 6, 6, 4, 6, 8, 6, 8, 4, 4, 6, 6, 8, 8, 8, 8, 8, 8, 4, 4, ...
## $ disp <dbl> 160.0, 160.0, 108.0, 258.0, 360.0, 225.0, 360.0, 146.7, 1...
## $ hp   <dbl> 110, 110, 93, 110, 175, 105, 245, 62, 95, 123, 123, 180, ...
## $ drat <dbl> 3.90, 3.90, 3.85, 3.08, 3.15, 2.76, 3.21, 3.69, 3.92, 3.9...
## $ wt   <dbl> 2.620, 2.875, 2.320, 3.215, 3.440, 3.460, 3.570, 3.190, 3...
## $ qsec <dbl> 16.46, 17.02, 18.61, 19.44, 17.02, 20.22, 15.84, 20.00, 2...
## $ vs   <dbl> 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, ...
## $ am   <dbl> 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, ...
## $ gear <dbl> 4, 4, 4, 3, 3, 3, 3, 4, 4, 4, 4, 3, 3, 3, 3, 3, 3, 4, 4, ...
## $ carb <dbl> 4, 4, 1, 1, 2, 1, 4, 2, 2, 4, 4, 3, 3, 3, 4, 4, 4, 1, 2, ...

Check Column

has_name() can be used to check if a tibble has a specific column.

has_name(mtcars, 'cyl')
## [1] TRUE
has_name(mtcars, 'gears')
## [1] FALSE

Summary

Creating tibbles

  • use tibble() to create tibbles
  • use as_tibble() to coerce other objects to tibble
  • use enframe() to coerce vector to tibble
  • use tribble() to create tibble using data entry

Modifying tibbles

  • use add_row() to add a new row
  • use add_column() to add a new column
  • use remove_rownames() to remove rownames from data
  • use rownames_to_colum() to coerce rowname to first column
  • use column_to_rownames() to coerce first column to rownames

Testing tibbles

  • use is_tibble() to test if an object is a tibble
  • use has_rownames() to check whether a data set has rownames
  • use has_name() to check if tibble has a specific column
  • use glimpse() to get an overview of data