Introduction
This is the fourth post in the series Elegant Data Visualization with
ggplot2. In the previous post, we learnt about geoms and how we can use them
to build different plots. In this post, we will focus on the aesthetics i.e.
color, shape, size, alpha, line type, line width etc. We can map these to
variables or specify values for them. If we want to map the above to variables,
we have to specify them within the aes()
function. We will look at both
methods in the following sections.
Explore aesthetics such as
- color
- shape
- size
- fill
- alpha
- width
Libraries, Code & Data
We will use the following libraries in this post:
All the data sets used in this post can be found here and code can be downloaded from here.
Data
Introduction
ecom <- readr::read_csv('https://raw.githubusercontent.com/rsquaredacademy/datasets/master/web.csv')
ecom
## # A tibble: 1,000 x 11
## id referrer device bouncers n_visit n_pages duration country purchase
## <dbl> <chr> <chr> <lgl> <dbl> <dbl> <dbl> <chr> <lgl>
## 1 1 google laptop TRUE 10 1 693 Czech ~ FALSE
## 2 2 yahoo tablet TRUE 9 1 459 Yemen FALSE
## 3 3 direct laptop TRUE 0 1 996 Brazil FALSE
## 4 4 bing tablet FALSE 3 18 468 China TRUE
## 5 5 yahoo mobile TRUE 9 1 955 Poland FALSE
## 6 6 yahoo laptop FALSE 5 5 135 South ~ FALSE
## 7 7 yahoo mobile TRUE 10 1 75 Bangla~ FALSE
## 8 8 direct mobile TRUE 10 1 908 Indone~ FALSE
## 9 9 bing mobile FALSE 3 19 209 Nether~ FALSE
## 10 10 google mobile TRUE 6 1 208 Czech ~ FALSE
## # ... with 990 more rows, and 2 more variables: order_items <dbl>,
## # order_value <dbl>
Data Dictionary
- id: row id
- referrer: referrer website/search engine
- os: operating system
- browser: browser
- device: device used to visit the website
- n_pages: number of pages visited
- duration: time spent on the website (in seconds)
- repeat: frequency of visits
- country: country of origin
- purchase: whether visitor purchased
- order_value: order value of visitor (in dollars)
Color
In ggplot2, when we mention color
or colour
, it usually refers to the color
of the geoms. The fill
argument is used to specify the color of the shapes in
certain cases. In this first section, we will see how we can specify the color
for the different geoms we learnt in the previous post.
Point
For points, the color
argument specifies the color of the point for certain
shapes and border for others. The fill
argument is used to specify the
background for some shapes and will not work with other shapes. Let us look at
an example:
ggplot(mtcars, aes(x = disp, y = mpg, color = factor(cyl))) +
geom_point()
We can map the variable to color in the geom_point()
function as well since
it inherits the data from the ggplot()
function.
ggplot(mtcars, aes(x = disp, y = mpg)) +
geom_point(aes(color = factor(cyl)))
If you do not want to map a variable to color, you can specify it separately
using the color
argument but in this case it should be outside the aes()
function.
ggplot(mtcars, aes(x = disp, y = mpg)) +
geom_point(color = 'blue')
Now we will change the shape of the points to understand the difference between
color
and fill
arguments. It can be again mapped to variables or values.
Let us map shape to variables.
ggplot(mtcars, aes(x = disp, y = mpg, shape = factor(cyl))) +
geom_point()
Let us map shape to cyl
in the geom_point()
function. Remember, when you
are mapping an aesthetic to a variable, it must be inside aes()
.
ggplot(mtcars, aes(x = disp, y = mpg)) +
geom_point(aes(shape = factor(cyl)))
Instead of mapping shape to a variable, let us specify a value for shape. In
this case, shape
is not wrapped inside aes()
as we are not mapping it to
a variable.
ggplot(mtcars, aes(x = disp, y = mpg)) +
geom_point(shape = 5)
Let us specify a color for the point using color
argument.
ggplot(mtcars, aes(x = disp, y = mpg)) +
geom_point(shape = 5, color = 'blue')
Background color cannot be added for all shapes. In the below example, we try
to modify the background color using the fill
argument but it does not work.
ggplot(mtcars, aes(x = disp, y = mpg)) +
geom_point(shape = 5, fill = 'blue')
Since the shape number is now greater than 21, fill
argument will add background color
in the below case.
ggplot(mtcars, aes(x = disp, y = mpg)) +
geom_point(shape = 22, fill = 'blue')
In shapes greater than number 21, color
argument will modify the border of the shape.
ggplot(mtcars, aes(x = disp, y = mpg)) +
geom_point(shape = 22, color = 'blue')
Let us map size of points to a variable. It is advised to map size only to continuous variables and not categorical variables.
ggplot(mtcars, aes(x = disp, y = mpg, size = disp)) +
geom_point()
If you map size to categorical variables, ggplot2 will throw a warning.
Specify value for size.
ggplot(mtcars, aes(x = disp, y = mpg)) +
geom_point(size = 4)
To modify the opacity of the color, use the alpha
argument.
ggplot(mtcars, aes(x = disp, y = mpg)) +
geom_point(aes(alpha = factor(cyl)), color = 'blue')
## Warning: Using alpha for a discrete variable is not advised.
So far we have focussed on geom_point()
to learn how to map aesthetics to
variables. To explore line type and line width, we will use geom_line()
. In
the previous post, we used geom_line()
to build line charts. Now we will
modify the appearance of the line. In the section below, we will specify values
for color, line type and width. In the next section, we will map the same to
variables in the data. We will use a new data set. You can download it from
here. It
contains GDP (Gross Domestic Product) growth data for the BRICS (Brazil,
Russia, India, China, South Africa) for the years 2000 to 2005.
Data
gdp <- readr::read_csv('https://raw.githubusercontent.com/rsquaredacademy/datasets/master/gdp.csv')
## Warning: Missing column names filled in: 'X1' [1]
A line chart can be created using geom_line()
. In the below example, we
examine the GDP trend of India and modify the color of the line to 'blue'
.
ggplot(gdp, aes(year, india)) +
geom_line(color = 'blue')
To modify the line type, use the linetype
argument. It can take values
between 1 and 5.
ggplot(gdp, aes(year, india)) +
geom_line(linetype = 2)
The line type can also be mentioned in the following way:
ggplot(gdp, aes(year, india)) +
geom_line(linetype = 'dashed')
The width of the line can be modified using the size
argument.
ggplot(gdp, aes(year, india)) +
geom_line(size = 2)
Now let us map the aesthetics to the variables. The data used in the above
example cannot be used as we need a variable with country names. We will use
gather()
function from the tidyr
package to reshape the data.
gdp2 <-
gdp %>%
select(year, growth, india, china) %>%
gather(key = country, value = gdp, -year)
gdp2
## # A tibble: 18 x 3
## year country gdp
## <date> <chr> <dbl>
## 1 2000-01-01 growth 6
## 2 2001-01-01 growth 9
## 3 2002-01-01 growth 8
## 4 2003-01-01 growth 9
## 5 2004-01-01 growth 9
## 6 2005-01-01 growth 8
## 7 2000-01-01 india 5
## 8 2001-01-01 india 9
## 9 2002-01-01 india 8
## 10 2003-01-01 india 8
## 11 2004-01-01 india 5
## 12 2005-01-01 india 7
## 13 2000-01-01 china 8
## 14 2001-01-01 china 5
## 15 2002-01-01 china 6
## 16 2003-01-01 china 8
## 17 2004-01-01 china 9
## 18 2005-01-01 china 8
To map the aesthetics to a variable, we must use the group
argument. In the
below example, we map the aesthetics to country
. But we cannot distinguish
between the lines as their color, width and line type are the same. We have
easily plotted the GDP trend of all countries using the group
argument. Now,
let us ensure that we can distinguish and identidy them using different
aesthetics.
ggplot(gdp2, aes(year, gdp, group = country)) +
geom_line()
Let us begin by ensuring that the lines have different color using the
color
argument within aes()
and assigning it the variable country
.
ggplot(gdp2, aes(year, gdp, group = country)) +
geom_line(aes(color = country))
Instead of color, now we modify the line type using the linetype
argument.
ggplot(gdp2, aes(year, gdp, group = country)) +
geom_line(aes(linetype = country))
In the below instance, we assign different width to the lines using the size
argument.
ggplot(gdp2, aes(year, gdp, group = country)) +
geom_line(aes(size = country))
## Warning: Using size for a discrete variable is not advised.
Before we wrap up, let us quickly see how we can map aesthetics to variables for different plots.
Bar Plots
Here we create a stacked bar plot by mapping fill
to purchase
.
ggplot(ecom, aes(device, fill = purchase)) +
geom_bar()
Histograms
Instead of a bar chart, we create a histogram and again map fill
to
purchase
.
ggplot(ecom) +
geom_histogram(aes(duration, fill = purchase), bins = 10)
Box Plots
We repeat the same exercise below, but replace the bar plot with a box plot.
ggplot(ecom) +
geom_boxplot(aes(device, duration, fill = purchase))
In all the above cases, you can observe that when we are mapping aesthetics
such as color, fill, shape, size or linetype to variables, they are all wrapped
inside aes()
.
Summary
In this post, we learnt about aesthetics i.e. how to modify the properties of geoms such as:
- color
- shape
- size
- fill
- alpha
- width
Up Next..
In the next post, we will learn to modify the axis and labels of a plot.