Data Wrangling

Categorical Data Analysis using forcats

Introduction In this post, we will learn to work with categorical/qualitative data in R using forcats. Let us begin by installing and loading forcats and a set of other pacakges we will be using. Libraries & Code We will use the following packages: forcats dplyr magrittr ggplot2 tibbe purrr and readr The codes from here. library(forcats) library(tibble) library(magrittr) library(purrr) library(dplyr) library(ggplot2) library(readr) Case Study We will use a case study to explore the various features of the forcats package.

Working with Date and Time in R

Introduction In this post, we will learn to work with date/time data in R using lubridate, an R package that makes it easy to work with dates and time. Let us begin by installing and loading the pacakge. Libraries, Code & Data We will use the following packages: lubridate dplyr magrittr readr The data sets can be downloaded from here and the codes from here. library(lubridate) library(dplyr) library(magrittr) library(readr) Quick Intro Origin Let us look at the origin for the numbering system used for date and time calculations in R.

Introduction to tibbles

Introduction A tibble, or tbl_df, is a modern reimagining of the data.frame, keeping what time has proven to be effective, and throwing out what is not. Tibbles are data.frames that are lazy and surly: they do less (i.e. they don’t change variable names or types, and don’t do partial matching) and complain more (e.g. when a variable does not exist). This forces you to confront problems earlier, typically leading to cleaner, more expressive code.

Data Wrangling with dplyr - Part 3

Introduction In the previous post, we learnt to combine tables using dplyr. In this post, we will explore a set of helper functions in order to: extract unique rows rename columns sample data extract columns slice rows arrange rows compare tables extract/mutate data using predicate functions count observations for different levels of a variable Libraries, Code & Data We will use the following packages: dplyr readr The data sets can be downloaded from here and the codes from here.

Data Wrangling with dplyr - Part 2

Introduction In the previous post we learnt about dplyr verbs and used them to compute average order value for an online retail company data. In this post, we will learn to combine tables using different *_join functions provided in dplyr. Libraries, Code & Data We will use the following packages: dplyr readr The data sets can be downloaded from here and the codes from here. library(dplyr) library(readr) options(tibble.

Data Wrangling with dplyr - Part 1

Introduction According to a survey by CrowdFlower, data scientists spend most of their time cleaning and manipulating data rather than mining or modeling them for insights. As such, it becomes important to have tools that make data manipulation faster and easier. In today’s post, we introduce you to dplyr, a grammar of data manipulation. Libraries, Code & Data We will use the following libraries: dplyr and readr The data sets can be downloaded from here and the codes from here.