Practice with ggplot - Part 1
Contents
Practice with ggplot - Part 1#
Introduction#
Learning Outcomes#
In this lab you will:
Use the
ggplot
library inR
to generate data visualizations for the following plotsScatterplots
Bar/column plots
Line plots
Grouped bar plots
Pie charts
Box plots
Jitter/strip plots
Apply the grammar of graphics to describe data visualizations
Choose an appropriate visualization when presented with a research question and data
Extract insight(s) from a visualization
Summarize the benefits and disadvantages of two plot types showing the same data
# Protip: uncomment the next line to suppress the annoying tidyverse
# startup messages
options(tidyverse.quiet = TRUE)
# Protip x 2: add the line above to a file you create in your homedir
# called .Rprofile to make this change permanent. You're welcome!
library(tidyverse)
library(datasets)
Input In [1]
options(tidyverse.quiet = TRUE)
^
SyntaxError: expression cannot contain assignment, perhaps you meant "=="?
Practice Problems (PP) - Basics of plotting with ggplot#
** Please note that ALL of Practice Problem 0 is meant to be for practice and will not be graded by the TAs.**
The purpose of these practice problems is to help you learn the ggplot2
syntax and prepare you to answer the lab questions.
PP 1 - Column/bar plot#
Let’s start by loading in the gapminder
dataset (library("gapminder")
) which has some data on the wealth and life expectancy of countries over time.
Task: Make a column/bar plot of the total number of observations by continent. See image below, your plot should look like that.
This time, let’s look at the summary of the data.frame. Try entering summary(gapminder)
in a new code cell.
### YOUR SOLUTION HERE
PP 2 - Line plot#
We will continue using the gampminder dataset.
Task: Create a line plot of the average life expectancy per continent, over time. See image below, your plot should look like that)
Hint: you’re still learning how to wrangle data in DSCI 523, so we will give you a starting point for the command to plot this:
group_by(`year`, `continent`) %>%
mutate(mean_lifeExp = mean(`lifeExp`)) %>% ...
ggplot(<<<CONTINUE HERE>>>>
```

### YOUR SOLUTION HERE
PP 3 - Strip/Jitter plot#
Staying with the gapminder dataset still.
Task: Create a “strip” or “jitter” plot showing all observations of the life expectancy, split by continent. See image below, your plot should look like that.
### YOUR SOLUTION HERE
PP 4 - Scatterplot#
Switching gears to a new dataset, let’s now grab the airquality
dataset from the datasets package. airquality
contains 6 variables of daily air quality measurements
in New York in 1973:
Ozone: Mean ozone in parts per billion from 1300 to 1500 hours at Roosevelt Island
Solar.R: Solar radiation in Langleys in the frequency band 4000–7700 Angstroms from 0800 to 1200 hours at Central Park
Wind: Average wind speed in miles per hour at 0700 and 1000 hours at LaGuardia Airport
Temp: Maximum daily temperature in degrees Fahrenheit at La Guardia Airport.
The head command allows us to look at the first few lines of a data.frame. Try running the following command in a cell:
head(airquality)
Task: Create a scatterplot of the mean ozone level vs. the maximum daily temperature with ggplot2. See image below, your plot should look like that.
Hint1: Think about which geom_() you’d like to use for a scatterplot Hint2: You may ignore the warning about missing data!
### YOUR SOLUTION HERE
PP5 - Box plot#
Continuing with airquality
here…
Boxplots are often the bread and butter of many data scientists so it’s worth knowing how to create them - even if we will learn better visualization techniques soon. If you want to refresh your memory of what a box represents, head over here.
Task: Make a box plot for the max temperature, grouped by each month. See image below, your plot should look like that.
Challenge: Can you make the x-axis show up as actual months (May-August) rather than numbers from 5-9?
### YOUR SOLUTION HERE
PP6 - Jitter plot on top of a box plot#
Let’s improve upon the boxplot above by adding a jitterplot on top so we can visualize all the points together.
Task: Make a box plot for the max temperature, grouped by each month, add a jitter plot of the same data on top with a transparency/alpha of 0.2 and the outliers (4 circles extending past the boxes) removed from PP5. See image below, your plot should look like that.
### YOUR SOLUTION HERE
PP7 - Grouped Bar chart#
Last practice problem, you’re nearly to the end!
We’ll need a new dataset for this example. Let’s load in the HairEyeColor
dataset, which has the distribution of hair and eye colour for 592 statistics students.
We will unfortunately need to convert it to a data.frame: HairEyeColor_df <- as.data.frame(HairEyeColor)
. This is necessary because not all datasets in the datasets
package are stored in a standard way.
Task: Create a grouped bar chart with the frequency of eye colour, split by sex (Male and Female in the dataset). See image below, your plot should look like that
# Need to convert the raw data to a dataframe for easy plotting
HairEyeColor_df <- as.data.frame(HairEyeColor)
### YOUR SOLUTION HERE
Congratulations!