Introduction
This is the fifth post in the series Elegant Data Visualization with ggplot2. In the previous post, we learnt about text annotations. In this post, we will:
- build scatter plots
- modify point
- color
- fill
- alpha
- shape
- size
- fit regression line
Libraries, Code & Data
We will use the following libraries in this post:
All the data sets used in this post can be found here and code can be downloaded from here.
Basic Plot
As we did in the previous post, let us begin by creating a scatter plot using
geom_point()
to examine the relationship between displacement and miles per
gallon using the mtcars data.
ggplot(mtcars) +
geom_point(aes(disp, mpg))
Jitter
If you want to avoid over plotting, use the position
argument and supply it
the value 'jitter'
. It adds random noise to a plot and makes it easier to
read.
ggplot(mtcars) +
geom_point(aes(disp, mpg), position = 'jitter')
Another way to avoid over plotting is to use geom_jitter()
.
ggplot(mtcars) +
geom_jitter(aes(disp, mpg))
Aesthetics
Now let us modify the appearance of the points. There are two ways:
- specify values
- map them to variables using
aes()
Specify Values
Color
To modify the color of the points, you can use the color
argument and
supply it a valid color name. In the below example, we change the color of the
points to 'blue'
. Keep in mind that the color
argument should be outside
aes()
.
ggplot(mtcars) +
geom_point(aes(disp, mpg), color = 'blue', position = 'jitter')
Alpha
The transparency of the color can be modified using the alpha
argument. It
takes values between 0
and 1
.
ggplot(mtcars) +
geom_point(aes(disp, mpg), color = 'blue', alpha = 0.4, position = 'jitter')
Shape
The shape of the points can be modified using the shape
argument. It
takes values between 0
and 25
.
ggplot(mtcars) +
geom_point(aes(disp, mpg), shape = 3, position = 'jitter')
Size
The size of the points can be modified using the size
argument. It can take
any value greater than 0
.
ggplot(mtcars) +
geom_point(aes(disp, mpg), size = 3, position = 'jitter')
Map Variables
So far, we have specified values for color, shape, size etc. Now, let us map
them to variables using aes()
.
Color
You can modify the color of the points by mapping them to a variable using
aes()
. It allows you to examine the relationship between two continuous
variables at different levels of a categorical variable.
ggplot(mtcars) +
geom_point(aes(disp, mpg, color = factor(cyl)),
position = 'jitter')
The color can be mapped to a conitnuous variable as well and in this case you will be able to examine the relationship betweem two continuous variable for a range of value of a third variable.
ggplot(mtcars) +
geom_point(aes(disp, mpg, color = hp),
position = 'jitter')
Shape
Shape can be mapped to categorical variables. In the below example, we use
factor()
to convert cyl
to categorical data before mapping shape to it.
ggplot2 will throw an error if you map shape to a continuous variable.
ggplot(mtcars) +
geom_point(aes(disp, mpg, shape = factor(cyl)), position = 'jitter')
Size
Size must be always mapped to continuous variables. In the below example, we
have mapped size to hp
variable.
ggplot(mtcars) +
geom_point(aes(disp, mpg, size = hp), color = 'blue', position = 'jitter')
If you map size to categorical data as shown in the below example, ggplot2 will throw a warning.
ggplot(mtcars) +
geom_point(aes(disp, mpg, size = factor(cyl)), color = 'blue', position = 'jitter')
## Warning: Using size for a discrete variable is not advised.
Regression Line
geom_smooth()
allows us to fit a regression line to the plot. By default it
will use least squares method to fit the line but you can also use the loess
method. In the below example, we fit a regression line using the least squares
technique by supplying the value 'lm'
to the method
argument.
ggplot(mtcars, aes(disp, mpg)) +
geom_point(position = 'jitter') +
geom_smooth(method = 'lm', se = FALSE)
## `geom_smooth()` using formula 'y ~ x'
Intercept & Slope
If you know the intercept and the slope of the line, you can use geom_abline()
.
Let us regress mpg
over disp
and then use the result to add the line.
Regression
lm(mpg ~ disp, data = mtcars)
##
## Call:
## lm(formula = mpg ~ disp, data = mtcars)
##
## Coefficients:
## (Intercept) disp
## 29.59985 -0.04122
Add Line
ggplot(mtcars, aes(disp, mpg)) +
geom_point(position = 'jitter') +
geom_abline(slope = -0.04122, intercept = 29.59985)
The se
argument will add a confidence interval around the regression line,
if set to TRUE
.
Conf. Interval
ggplot(mtcars, aes(disp, mpg)) +
geom_point(position = 'jitter') +
geom_smooth(method = 'lm', se = TRUE)
## `geom_smooth()` using formula 'y ~ x'
Loess Method
In the below example, we use the loess method instead of the default least squares method to fit the regression line.
ggplot(mtcars, aes(disp, mpg)) +
geom_point(position = 'jitter') +
geom_smooth(method = 'loess', se = FALSE)
## `geom_smooth()` using formula 'y ~ x'
Summary
In this post, we learnt to:
- build scatter plots
- map aesthetics to variables
- fit regression line
Up Next..
In the next post, we will learn to build line charts.