Practice with Altair - Part 1
Contents
Practice with Altair - Part 1#
Learning Outcomes#
In this lab you will:
Use the
Altair
library inpython
to generate data visualizations for the following plotsScatterplots
Bar/column plots
Line plots
Grouped bar plots
Box plots
Jitter/strip plots
Heatmaps
Extract insight(s) from a visualization
Summarize the benefits and disadvantages of two plot types showing the same data
import pandas as pd
import altair as alt
from vega_datasets import data
Depending on your programming environment, you may need to specify a particular renderer for Altair. If you are using JupyterLab, you do not need to do anything, the correct renderer will be enabled automatically.#
Practice Problems (PP) - Basics of plotting with altair#
Please note that Practice Problems are meant to be for practice, and solutions have been released already.
The purpose of these practice problems is to help you learn the altair
syntax and prepare you to answer the lab questions.
PP 1 - Column/bar plot#
Similarly to working with ggplot
in R
, visualization in Altair is best done with âtidyâ data. Letâs start by loading a data frame:
disasters = data.disasters() # You will need to be connected to the internet for this line to work
disasters2017 = disasters.loc[disasters["Year"] == 2017]
disasters2017.head() # display the first five rows
Entity | Year | Deaths | |
---|---|---|---|
116 | All natural disasters | 2017 | 2087 |
266 | Earthquake | 2017 | 49 |
335 | Epidemic | 2017 | 386 |
390 | Extreme temperature | 2017 | 130 |
501 | Extreme weather | 2017 | 394 |
Task: Make a column/bar plot of the total number of Deaths by Disaster Entity. See image below, your plot should look like that.
Plotting in
Altair
usually starts with callingalt.Chart()
and specifying a data sourcein
ggplot
:ggplot()
Unlike in
ggplot
, column names inAltair
need to be surrounded by single quotes
Next, you need to specify how the data will be mapped to the axes with
.encode()
in
ggplot
:aes()
Traits of the x and y axes (labels, coordinates, ticks, etcâŚ) are specified within the
alt.X
andalt.Y
objects
Then, you specify how the data will be shown on the plot with
.mark_...
in
ggplot
:geom_...()
You can then add overall
.properties()
to tweak things like the height and width of your plot.
#### Your Solution here
PP 2 - Customize encodings#
We will continue using the disaster dataset, but will do some wrangling to look at some other trends
Task: Create a bar plot of Deaths for all natural disasters, over time. See image, your plot should look like that)
Wrangle the data to select only the âAll natural disastersâ:
all_disasters = disasters.loc[disasters['Entity'] == 'All natural disasters']
In the examples above, the data type for each field is inferred automatically based on its type within the Pandas data frame. We can also explicitly indicate the data type to Altair by annotating the field name. In our case we can use
Year:O
:variable:N
indicates a nominal type (unordered, categorical data),variable:O
indicates an ordinal type (rank-ordered data),variable:Q
indicates a quantitative type (numerical data with meaningful magnitudes), andvariable:T
indicates a temporal type (date/time data)
Altair also provides construction methods for encoding definitions, using the syntax alt.X(âYearâ). This alternative is useful for providing more parameters to an encoding like:
alt.X('Year:O', scale=alt.Scale(type='log'), axis=alt.Axis(labelAngle=50) )
You can change the dimensions of your plot by adding:
.properties(width=400, height=100)
all_disasters = disasters.loc[disasters["Entity"] == 'All natural disasters']
#### Your Solution here
PP 3 - Stacked bar chart#
Staying with the disasters dataset still, letâs look a little closer at the years 2011 to 2017.
Task: Create a âNormalized Stackedâ Bar chart showing the percentage of all Deaths according to year form 2011 onwards. See image below, your plot should look like that.
# as always we provide a bit of data wrangling
disasters_since2010 = disasters.loc[(disasters["Year"].isin(["2011","2012","2013","2014","2015","2016","2017"]))
& (disasters["Entity"] != "All natural disasters")]
#### Your Solution here
PP 4 - Scatter plots#
Switching gears to a new dataset, letâs now grab the seattle_weather
dataset from the vega-datasets package. seattle_weather
contains 5 variables of daily weather measurements in Seattle from 2012 to 2015:
daily percipitation (in mm)
daily maximum temperature (in C)
daily minimum temperature (in C
wind speed (metres/sec)
and a weather descriptor
Task: Create a scatterplot of the maximum daily temperature coloured by the weather descriptor for each day. See image below, your plot should look like that.
Hint1: Think about which mark_() youâd like to use for a scatterplot
weather = data.seattle_weather()
weather.head()
date | precipitation | temp_max | temp_min | wind | weather | |
---|---|---|---|---|---|---|
0 | 2012-01-01 | 0.0 | 12.8 | 5.0 | 4.7 | drizzle |
1 | 2012-01-02 | 10.9 | 10.6 | 2.8 | 4.5 | rain |
2 | 2012-01-03 | 0.8 | 11.7 | 7.2 | 2.3 | rain |
3 | 2012-01-04 | 20.3 | 12.2 | 5.6 | 4.7 | rain |
4 | 2012-01-05 | 1.3 | 8.9 | 2.8 | 6.1 | rain |
#### Your Solution here
PP5 - Line plot#
Continuing with the weather
dataset hereâŚ
Altair offers a variety of data transformations defined while plotting. For this exercise we recommend opening the Data Transformation Documentation while working through the exercise. Dates can be handeled separately wihtin altair encodings, for more information look at the dates docuemtnation.
Task: Make a line plot for the mean max temperature, grouped by each month. See image below, your plot should look like that.
Challenge: Can you make the x-axis show up as actual months (May-August)?
#### Your Solution here
PP6 - Box plot#
Letâs also assess the minimum temperatures for Seattle!
Task: Create a box plot, showing minimum temperature ranges per month. Bonus: Can you figure out how to display whiskers ranging from minimum to maximum values?
HINT: You can stylise your plot using .configure_axis()
and .properties()
in altair
#### Your Solution here
PP7 - Jitter/strip plot#
In Altair
, the jitterplot we have seen is called a âstrip plotâ. It is created by starting with a mark_circle and then adding a manual transformation to jitter.
Task: Create a strip plot, showing minimum temperature ranges per month.
In Altair, there currently is no default
mark_jitter()
To create a jitter plot, we will start with a
mark_circle
and then specify the transformation function to use and add it to theAltair
object.We will select a widely accepted transformation for this, which is the âBox-Mullerâ transform:
Start with
mark_circle().encode(alt.X(jitter:Q))
and then specify a transformation like:.transform_calculate(jitter='sqrt(-2*log(random()))*cos(2*PI*random()))'
Use
.configure_facet(spacing=0)
to create a continuous looking y-axis.
#### Your Solution here
PP8 - Concatenating plots#
Sometimes it is extremely useful to display two plots next to each other. Letâs build on the line plot above and display a stacked bar plot next to the line plot. You can check altairs documentation on the different possibilities to concatenate plots.
Task: Make a line plot for the mean max temperature, grouped by each month, and add a stacked bar plot of the same data next to it. See image below, your plot should look like that.
#### Your Solution here
Congratulations!