Practice with Altair - Part 1#

Learning Outcomes#

In this lab you will:

  • Use the Altair library in python to generate data visualizations for the following plots

    • Scatterplots

    • Bar/column plots

    • Line plots

    • Grouped bar plots

    • Box plots

    • Jitter/strip plots

    • Heatmaps

  • Extract insight(s) from a visualization

  • Summarize the benefits and disadvantages of two plot types showing the same data

import pandas as pd
import altair as alt
from vega_datasets import data

Depending on your programming environment, you may need to specify a particular renderer for Altair. If you are using JupyterLab, you do not need to do anything, the correct renderer will be enabled automatically.#

Practice Problems (PP) - Basics of plotting with altair#

Please note that Practice Problems are meant to be for practice, and solutions have been released already.

The purpose of these practice problems is to help you learn the altair syntax and prepare you to answer the lab questions.

PP 1 - Column/bar plot#

Similarly to working with ggplot in R, visualization in Altair is best done with “tidy” data. Let’s start by loading a data frame:

disasters = data.disasters() # You will need to be connected to the internet for this line to work
disasters2017 = disasters.loc[disasters["Year"] == 2017]
disasters2017.head() # display the first five rows
Entity Year Deaths
116 All natural disasters 2017 2087
266 Earthquake 2017 49
335 Epidemic 2017 386
390 Extreme temperature 2017 130
501 Extreme weather 2017 394

Task: Make a column/bar plot of the total number of Deaths by Disaster Entity. See image below, your plot should look like that.

  • Plotting in Altair usually starts with calling alt.Chart() and specifying a data source

    • in ggplot: ggplot()

    • Unlike in ggplot, column names in Altair need to be surrounded by single quotes

  • Next, you need to specify how the data will be mapped to the axes with .encode()

    • in ggplot: aes()

    • Traits of the x and y axes (labels, coordinates, ticks, etc…) are specified within the alt.X and alt.Y objects

  • Then, you specify how the data will be shown on the plot with .mark_...

    • in ggplot: geom_...()

  • You can then add overall .properties() to tweak things like the height and width of your plot.

PP1

#### Your Solution here

PP 2 - Customize encodings#

We will continue using the disaster dataset, but will do some wrangling to look at some other trends

Task: Create a bar plot of Deaths for all natural disasters, over time. See image, your plot should look like that)

  • Wrangle the data to select only the “All natural disasters”: all_disasters = disasters.loc[disasters['Entity'] == 'All natural disasters']

  • In the examples above, the data type for each field is inferred automatically based on its type within the Pandas data frame. We can also explicitly indicate the data type to Altair by annotating the field name. In our case we can use Year:O:

    • variable:N indicates a nominal type (unordered, categorical data),

    • variable:O indicates an ordinal type (rank-ordered data),

    • variable:Q indicates a quantitative type (numerical data with meaningful magnitudes), and

    • variable:T indicates a temporal type (date/time data)

  • Altair also provides construction methods for encoding definitions, using the syntax alt.X(‘Year’). This alternative is useful for providing more parameters to an encoding like: alt.X('Year:O', scale=alt.Scale(type='log'), axis=alt.Axis(labelAngle=50) )

  • You can change the dimensions of your plot by adding: .properties(width=400, height=100)

PP2

all_disasters = disasters.loc[disasters["Entity"] == 'All natural disasters']

#### Your Solution here

PP 3 - Stacked bar chart#

Staying with the disasters dataset still, let’s look a little closer at the years 2011 to 2017.

Task: Create a ‘Normalized Stacked’ Bar chart showing the percentage of all Deaths according to year form 2011 onwards. See image below, your plot should look like that.

PP3

# as always we provide a bit of data wrangling
disasters_since2010 = disasters.loc[(disasters["Year"].isin(["2011","2012","2013","2014","2015","2016","2017"]))
                             & (disasters["Entity"] != "All natural disasters")]
#### Your Solution here

PP 4 - Scatter plots#

Switching gears to a new dataset, let’s now grab the seattle_weather dataset from the vega-datasets package. seattle_weather contains 5 variables of daily weather measurements in Seattle from 2012 to 2015:

  • daily percipitation (in mm)

  • daily maximum temperature (in C)

  • daily minimum temperature (in C

  • wind speed (metres/sec)

  • and a weather descriptor

Task: Create a scatterplot of the maximum daily temperature coloured by the weather descriptor for each day. See image below, your plot should look like that.

Hint1: Think about which mark_() you’d like to use for a scatterplot

PP4

weather = data.seattle_weather()
weather.head()
date precipitation temp_max temp_min wind weather
0 2012-01-01 0.0 12.8 5.0 4.7 drizzle
1 2012-01-02 10.9 10.6 2.8 4.5 rain
2 2012-01-03 0.8 11.7 7.2 2.3 rain
3 2012-01-04 20.3 12.2 5.6 4.7 rain
4 2012-01-05 1.3 8.9 2.8 6.1 rain
#### Your Solution here

PP5 - Line plot#

Continuing with the weather dataset here…

Altair offers a variety of data transformations defined while plotting. For this exercise we recommend opening the Data Transformation Documentation while working through the exercise. Dates can be handeled separately wihtin altair encodings, for more information look at the dates docuemtnation.

Task: Make a line plot for the mean max temperature, grouped by each month. See image below, your plot should look like that.

Challenge: Can you make the x-axis show up as actual months (May-August)?

PP5

#### Your Solution here

PP6 - Box plot#

Let’s also assess the minimum temperatures for Seattle!

Task: Create a box plot, showing minimum temperature ranges per month. Bonus: Can you figure out how to display whiskers ranging from minimum to maximum values?

HINT: You can stylise your plot using .configure_axis() and .properties() in altair

#### Your Solution here

PP7 - Jitter/strip plot#

In Altair, the jitterplot we have seen is called a “strip plot”. It is created by starting with a mark_circle and then adding a manual transformation to jitter.

Task: Create a strip plot, showing minimum temperature ranges per month.

  • In Altair, there currently is no default mark_jitter()

  • To create a jitter plot, we will start with a mark_circle and then specify the transformation function to use and add it to the Altair object.

  • We will select a widely accepted transformation for this, which is the “Box-Muller” transform:

  • Start with mark_circle().encode(alt.X(jitter:Q)) and then specify a transformation like:

    • .transform_calculate(jitter='sqrt(-2*log(random()))*cos(2*PI*random()))'

  • Use .configure_facet(spacing=0) to create a continuous looking y-axis.

PP7

#### Your Solution here

PP8 - Concatenating plots#

Sometimes it is extremely useful to display two plots next to each other. Let’s build on the line plot above and display a stacked bar plot next to the line plot. You can check altairs documentation on the different possibilities to concatenate plots.

Task: Make a line plot for the mean max temperature, grouped by each month, and add a stacked bar plot of the same data next to it. See image below, your plot should look like that.

PP8

#### Your Solution here

Congratulations!