Quick Guide: R & SQLite

Twitter Facebook

Introduction

This is the first post in the series R & Databases. You can find the links to the other two posts of this series below:

In this post, we will learn to:

  • connect to a SQLite database from R
  • display database information
  • list tables in the database
  • query data
    • read entire table
    • read few rows
    • read data in batches
  • create table in database
  • overwrite table in database
  • append data to table in database
  • remove table from database
  • generate SQL query
  • close database connection

Libraries, Code & Data

We will use the following libraries in this post:

All the data sets used in this post can be found here and code can be downloaded from here.

Connection

The first step is to connect to a database. In this post, we will connect to an in memory SQLite databse using dbConnect().

con <- dbConnect(RSQLite::SQLite(), ":memory:")

Connection Summary

We can get the more information about the connection using summary().

summary(con)
##           Length            Class             Mode 
##                1 SQLiteConnection               S4

List Tables

Now that we are connected to a database, let us list all the tables present in it using dbListTables().

dbListTables(con)
## [1] "ecom"         "sqlite_stat1" "sqlite_stat4"

List Fields

Time to explore the ecom table in the database. Use dbListFields() to list all the fields in the table.

dbListFields(con, "ecom")
## [1] "referrer" "device"   "bouncers" "n_visit"  "n_pages"  "duration"

Querying Data

The main objectives of connecting to a database are to:

  • query data from the tables already present
  • create new tables
  • overwrite existing tables
  • delete existing tables

Let us begin with querying data. We can do this in the following ways:

  • read an entire table at once
  • read few rows from a table
  • read data in batches

Entire Table

We can read an entire table from a database using dbReadTable().

dbReadTable(con, 'ecom')
##    referrer device bouncers n_visit n_pages duration
## 1    google laptop     true      10       1      693
## 2     yahoo tablet     true       9       1      459
## 3    direct laptop     true       0       1      996
## 4      bing tablet    false       3      18      468
## 5     yahoo mobile     true       9       1      955
## 6     yahoo laptop    false       5       5      135
## 7     yahoo mobile     true      10       1       75
## 8    direct mobile     true      10       1      908
## 9      bing mobile    false       3      19      209
## 10   google mobile     true       6       1      208
## 11   direct laptop     true       9       1      738
## 12   direct tablet    false       6      12      132
## 13   direct mobile    false       9      14      406
## 14    yahoo tablet    false       5       8       80
## 15    yahoo mobile    false       7       1       19
## 16     bing laptop     true       1       1      995
## 17     bing tablet    false       5      16      368
## 18   google tablet     true       7       1      406
## 19   social tablet    false       7      10      290
## 20   social tablet    false       2       1       28

In some cases, we may not need the entire table but only a specific number of rows. Use dbGetQuery() and supply a SQL statement specifying the number of rows of data to be read from the table. In the below example, we read ten rows of data from the ecom table.

Few Rows

dbGetQuery(con, "select * from ecom limit 10")
##    referrer device bouncers n_visit n_pages duration
## 1    google laptop     true      10       1      693
## 2     yahoo tablet     true       9       1      459
## 3    direct laptop     true       0       1      996
## 4      bing tablet    false       3      18      468
## 5     yahoo mobile     true       9       1      955
## 6     yahoo laptop    false       5       5      135
## 7     yahoo mobile     true      10       1       75
## 8    direct mobile     true      10       1      908
## 9      bing mobile    false       3      19      209
## 10   google mobile     true       6       1      208

In case of very large table, we can read data in batches using dbSendQuery() and dbFetch(). We can mention the number of rows of data to be read while fetching the data using the query generated by dbGetQuery().

Read Data in Batches

query <- dbSendQuery(con, 'select * from ecom')
result <- dbFetch(query, n = 15)
result
##    referrer device bouncers n_visit n_pages duration
## 1    google laptop     true      10       1      693
## 2     yahoo tablet     true       9       1      459
## 3    direct laptop     true       0       1      996
## 4      bing tablet    false       3      18      468
## 5     yahoo mobile     true       9       1      955
## 6     yahoo laptop    false       5       5      135
## 7     yahoo mobile     true      10       1       75
## 8    direct mobile     true      10       1      908
## 9      bing mobile    false       3      19      209
## 10   google mobile     true       6       1      208
## 11   direct laptop     true       9       1      738
## 12   direct tablet    false       6      12      132
## 13   direct mobile    false       9      14      406
## 14    yahoo tablet    false       5       8       80
## 15    yahoo mobile    false       7       1       19

Query

Query Status

To know the status of a query, use dbHasCompleted(). It is very useful in cases of queries that might take a long time to complete.

dbHasCompleted(query)
## [1] FALSE

Query Info

dbGetInfo() will return the following:

  • the sql staement
  • number of rows fetched
  • number of rows modified/affected
  • status of the query
dbGetInfo(query)
## $statement
## [1] "select * from ecom"
## 
## $row.count
## [1] 15
## 
## $rows.affected
## [1] 0
## 
## $has.completed
## [1] FALSE

Latest Query

To get the latest query, use dbGetStatement().

dbGetStatement(query)
## [1] "select * from ecom"

Rows Fetched

To check the number of rows of data returned by a query, use dbGetRowCount().

dbGetRowCount(query)
## [1] 15

Rows Affected

To know the number of rows modified or affected in the table, use dbGetRowsAffected().

dbGetRowsAffected(query)
## [1] 0

Column Info

To know the name of the columns and their data types, use dbColumnInfo().

dbColumnInfo(query)
##       name      type
## 1 referrer character
## 2   device character
## 3 bouncers character
## 4  n_visit   integer
## 5  n_pages    double
## 6 duration    double

Create Table

So far we have explored querying data from an existing table. Now, let us turn our attention to creating new tables in the database.

Introduction

To create a new table, use dbWriteTable(). It takes the following 3 arguments:

  • connection name
  • name of the new table
  • data for the new table
x <- 1:10
y <- letters[1:10]
trial <- tibble::tibble(x, y)
dbWriteTable(con, "trial", trial)
## Warning: Closing open result set, pending rows

Let us check if the new table has been created.

dbListTables(con)
## [1] "ecom"         "sqlite_stat1" "sqlite_stat4" "trial"
dbExistsTable(con, "trial")
## [1] TRUE

Let us query data from the new table.

dbGetQuery(con, "select * from trial limit 5")
##   x y
## 1 1 a
## 2 2 b
## 3 3 c
## 4 4 d
## 5 5 e

Overwrite Table

In some cases, you may want to overwrite the data in an existing table. Use the overwrite argument in dbWriteTable() and set it to TRUE.

x <- sample(100, 10)
y <- letters[11:20]
trial2 <- tibble::tibble(x, y)
dbWriteTable(con, "trial", trial2, overwrite = TRUE)

Let us see if the trial table has been overwritten.

dbGetQuery(con, "select * from trial limit 5")
##    x y
## 1  1 k
## 2 54 l
## 3  6 m
## 4 33 n
## 5 20 o

Append Data

You can append data to an existing table by setting the append argument in dbWriteTable() to TRUE.

x <- sample(100, 10)
y <- letters[5:14]
trial3 <- tibble::tibble(x, y)
dbWriteTable(con, "trial", trial3, append = TRUE)

Let us quickly check if the new data has been appended to the trial table.

dbReadTable(con, "trial")
##     x y
## 1   1 k
## 2  54 l
## 3   6 m
## 4  33 n
## 5  20 o
## 6  17 p
## 7  93 q
## 8   5 r
## 9  57 s
## 10  4 t
## 11 57 e
## 12 39 f
## 13 98 g
## 14 89 h
## 15 73 i
## 16 99 j
## 17 91 k
## 18  1 l
## 19  9 m
## 20 41 n

We can also use sqlAppendTable() to append data to an existing table.

sqlAppendTable(con, "ecom", head(ecom))
## <SQL> INSERT INTO `ecom`
##   (`referrer`, `device`, `bouncers`, `n_visit`, `n_pages`, `duration`)
## VALUES
##   ('google', 'laptop', 'true', 10, 1, 693),
##   ('yahoo', 'tablet', 'true', 9, 1, 459),
##   ('direct', 'laptop', 'true', 0, 1, 996),
##   ('bing', 'tablet', 'false', 3, 18, 468),
##   ('yahoo', 'mobile', 'true', 9, 1, 955),
##   ('yahoo', 'laptop', 'false', 5, 5, 135)

Insert Rows

Introduction

We can insert new rows into existing tables using:

  • dbExecute()
  • dbSendStatement()

Both the function take 2 arguments:

  • connection name
  • sql statement
# use dbExecute
dbExecute(con,
  "INSERT into trial (x, y) VALUES (32, 'c'), (45, 'k'), (61, 'h')"
)
## [1] 3

# use dbSendStatement
dbSendStatement(con,
  "INSERT into trial (x, y) VALUES (25, 'm'), (54, 'l'), (16, 'y')"
)
## <SQLiteResult>
##   SQL  INSERT into trial (x, y) VALUES (25, 'm'), (54, 'l'), (16, 'y')
##   ROWS Fetched: 0 [complete]
##        Changed: 3

Remove Table

If you want to delete/remove a table from the database, use dbRemoveTable().

dbRemoveTable(con, "trial")
## Warning: Closing open result set, pending rows

SQLite Data Type

If you want to know the data type, use dbDataType().

dbDataType(RSQLite::SQLite(), "a")
## [1] "TEXT"
dbDataType(RSQLite::SQLite(), 1:5)
## [1] "INTEGER"
dbDataType(RSQLite::SQLite(), 1.5)
## [1] "REAL"

Close Connection

It is a good practice to close connection to a database when you no longer need to read/write data from/to it. Use dbDisconnect() to close the database connection.

dbDisconnect(con)

Up Next..

In the next post, we will learn about data wrangling using dbplyr.