Rsquared Academy Blog

Explore..Discover..Learn

SQL for Data Science - Part 2

Introduction This is the fourth post in the series R & Databases. You can find the links to the other two posts of this series below: Quick Guide: R & SQLite Data Wrangling with dbplyr SQL for Data Science - Part 1 In this post, we will learn to aggregate data order data and group data Libraries, Code & Data We will use the following libraries in this post:

SQL for Data Science - Part 1

Introduction This is the third post in the series R & Databases. You can find the links to the other two posts of this series below: Quick Guide: R & SQLite Data Wrangling with dbplyr SQL for Data Science - Part 2 In this post, we will learn to: select single column multiple columns distinct values in a column limit the number of records returned handle NULL values and filter columns using the following operators WHERE AND, or & NOT BETWEEN IN LIKE Libraries, Code & Data We will use the following libraries in this post:

Data Wrangling with dbplyr

Introduction This is the second post in the series R & Databases. You can find the links to the first post of this series below: Quick Guide: R & SQLite In this post, we will learn to query data from a database using dplyr. Libraries, Code & Data We will use the following libraries in this post: DBI RSQLite dbplyr dplyr All the data sets used in this post can be found here and code can be downloaded from here.

Quick Guide: R & SQLite

Introduction This is the first post in the series R & Databases. You can find the links to the other two posts of this series below: Data Wrangling with dbplyr SQL for Data Science - Part 1 SQL for Data Science - Part 2 In this post, we will learn to: connect to a SQLite database from R display database information list tables in the database query data read entire table read few rows read data in batches create table in database overwrite table in database append data to table in database remove table from database generate SQL query close database connection Libraries, Code & Data We will use the following libraries in this post:

Categorical Data Analysis using forcats

Introduction In this post, we will learn to work with categorical/qualitative data in R using forcats. Let us begin by installing and loading forcats and a set of other pacakges we will be using. Libraries & Code We will use the following packages: forcats dplyr magrittr ggplot2 tibbe purrr and readr The codes from here. library(forcats) library(tibble) library(magrittr) library(purrr) library(dplyr) library(ggplot2) library(readr) Case Study We will use a case study to explore the various features of the forcats package.

Working with Date and Time in R

Introduction In this post, we will learn to work with date/time data in R using lubridate, an R package that makes it easy to work with dates and time. Let us begin by installing and loading the pacakge. Libraries, Code & Data We will use the following packages: lubridate dplyr magrittr readr The data sets can be downloaded from here and the codes from here. library(lubridate) library(dplyr) library(magrittr) library(readr) Quick Intro Origin Let us look at the origin for the numbering system used for date and time calculations in R.

Hacking strings with stringr

Introduction In this post, we will learn to work with string data in R using stringr. As we did in the other posts, we will use a case study to explore the various features of the stringr package. Let us begin by installing and loading stringr and a set of other pacakges we will be using. Libraries, Code & Data We will use the following libraries: stringr dplyr magrittr tibble purrr and readr The data sets can be downloaded from here and the codes from here.

Readable Code with Pipes

Introduction R code contain a lot of parentheses in case of a sequence of multiple operations. When you are dealing with complex code, it results in nested function calls which are hard to read and maintain. The magrittr package by Stefan Milton Bache provides pipes enabling us to write R code that is readable. Pipes allow us to clearly express a sequence of multiple operations by: structuring operations from left to right avoiding nested function calls intermediate steps overwriting of original data minimizing creation of local variables Pipes If you are using tidyverse, magrittr will be automatically loaded.

Introduction to tibbles

Introduction A tibble, or tbl_df, is a modern reimagining of the data.frame, keeping what time has proven to be effective, and throwing out what is not. Tibbles are data.frames that are lazy and surly: they do less (i.e. they don’t change variable names or types, and don’t do partial matching) and complain more (e.g. when a variable does not exist). This forces you to confront problems earlier, typically leading to cleaner, more expressive code.

Data Wrangling with dplyr - Part 3

Introduction In the previous post, we learnt to combine tables using dplyr. In this post, we will explore a set of helper functions in order to: extract unique rows rename columns sample data extract columns slice rows arrange rows compare tables extract/mutate data using predicate functions count observations for different levels of a variable Libraries, Code & Data We will use the following packages: dplyr readr The data sets can be downloaded from here and the codes from here.