Lecture 7#

Trendlines, confidence intervals, and composite figures#

Lecture learning goals#

By the end of the lecture you will be able to:

  1. Visualize pair-wise differences using a slope plot.

  2. Visualize trends using regression and loess lines.

  3. Create and understand how to interpret confidence intervals and confidence bands.

  4. Telling a story with data (reading only)

  5. Layout plots in panels of a figure grid.

  6. Save figures outside the notebook.

  7. Understand figure formats in the notebook.

  8. Retrieve info on further topics online.

Readings#

This lecture’s readings are both from Fundamentals of Data Visualization.


# Run this cell to ensure that altair plots show up in the exported HTML
# and that the R cell magic works
import altair as alt
import pandas as pd

# Save a vega-lite spec and a PNG blob for each plot in the notebook
alt.renderers.enable('mimetype')
# Handle large data sets without embedding them in the notebook
alt.data_transformers.enable('data_server')

# Load the R cell magic
%load_ext rpy2.ipython

Read in data#

Let’s start by looking at your results from the world health quiz we did in lab 1! Below, I read in the data and assign a label for whether each student had a positive or negative outlook of their own results compared to their estimation of the class average.

%%R
options(tidyverse.quiet = TRUE) 
library(tidyverse)

theme_set(theme_light(base_size = 18))

scores_raw <- read_csv('data/students-gapminder.csv')
colnames(scores_raw) <- c('time', 'student_score', 'estimated_score')
scores <- scores_raw %>%
    mutate(diff = student_score - estimated_score,
           estimation = case_when(
               diff == 0 ~ 'neutral',
               diff < 0 ~ 'negative',
               diff > 0 ~ 'positive')) %>%
    pivot_longer(!c(time, estimation, diff)) %>%
    arrange(desc(diff))
scores
R[write to console]: Error in (function (filename = "Rplot%03d.png", width = 480, height = 480,  : 
  Graphics API version mismatch
---------------------------------------------------------------------------
RRuntimeError                             Traceback (most recent call last)
Input In [2], in <cell line: 1>()
----> 1 get_ipython().run_cell_magic('R', '', "options(tidyverse.quiet = TRUE) \nlibrary(tidyverse)\n\ntheme_set(theme_light(base_size = 18))\n\nscores_raw <- read_csv('data/students-gapminder.csv')\ncolnames(scores_raw) <- c('time', 'student_score', 'estimated_score')\nscores <- scores_raw %>%\n    mutate(diff = student_score - estimated_score,\n           estimation = case_when(\n               diff == 0 ~ 'neutral',\n               diff < 0 ~ 'negative',\n               diff > 0 ~ 'positive')) %>%\n    pivot_longer(!c(time, estimation, diff)) %>%\n    arrange(desc(diff))\nscores\n")

File /opt/hostedtoolcache/Python/3.9.13/x64/lib/python3.9/site-packages/IPython/core/interactiveshell.py:2358, in InteractiveShell.run_cell_magic(self, magic_name, line, cell)
   2356 with self.builtin_trap:
   2357     args = (magic_arg_s, cell)
-> 2358     result = fn(*args, **kwargs)
   2359 return result

File /opt/hostedtoolcache/Python/3.9.13/x64/lib/python3.9/site-packages/rpy2/ipython/rmagic.py:765, in RMagics.R(self, line, cell, local_ns)
    762 else:
    763     cell_display = CELL_DISPLAY_DEFAULT
--> 765 tmpd = self.setup_graphics(args)
    767 text_output = ''
    768 try:

File /opt/hostedtoolcache/Python/3.9.13/x64/lib/python3.9/site-packages/rpy2/ipython/rmagic.py:461, in RMagics.setup_graphics(self, args)
    457 tmpd_fix_slashes = tmpd.replace('\\', '/')
    459 if self.device == 'png':
    460     # Note: that %% is to pass into R for interpolation there
--> 461     grdevices.png("%s/Rplots%%03d.png" % tmpd_fix_slashes,
    462                   **argdict)
    463 elif self.device == 'svg':
    464     self.cairo.CairoSVG("%s/Rplot.svg" % tmpd_fix_slashes,
    465                         **argdict)

File /opt/hostedtoolcache/Python/3.9.13/x64/lib/python3.9/site-packages/rpy2/robjects/functions.py:203, in SignatureTranslatedFunction.__call__(self, *args, **kwargs)
    201         v = kwargs.pop(k)
    202         kwargs[r_k] = v
--> 203 return (super(SignatureTranslatedFunction, self)
    204         .__call__(*args, **kwargs))

File /opt/hostedtoolcache/Python/3.9.13/x64/lib/python3.9/site-packages/rpy2/robjects/functions.py:126, in Function.__call__(self, *args, **kwargs)
    124     else:
    125         new_kwargs[k] = cv.py2rpy(v)
--> 126 res = super(Function, self).__call__(*new_args, **new_kwargs)
    127 res = cv.rpy2py(res)
    128 return res

File /opt/hostedtoolcache/Python/3.9.13/x64/lib/python3.9/site-packages/rpy2/rinterface_lib/conversion.py:45, in _cdata_res_to_rinterface.<locals>._(*args, **kwargs)
     44 def _(*args, **kwargs):
---> 45     cdata = function(*args, **kwargs)
     46     # TODO: test cdata is of the expected CType
     47     return _cdata_to_rinterface(cdata)

File /opt/hostedtoolcache/Python/3.9.13/x64/lib/python3.9/site-packages/rpy2/rinterface.py:813, in SexpClosure.__call__(self, *args, **kwargs)
    806     res = rmemory.protect(
    807         openrlib.rlib.R_tryEval(
    808             call_r,
    809             call_context.__sexp__._cdata,
    810             error_occured)
    811     )
    812     if error_occured[0]:
--> 813         raise embedded.RRuntimeError(_rinterface._geterrmessage())
    814 return res

RRuntimeError: Error in (function (filename = "Rplot%03d.png", width = 480, height = 480,  : 
  Graphics API version mismatch

We could make a distribution plot, such as a KDE for the students score and estimated class score. From this plot we can see that on average students seemed to believe their classmates scored better, but we don’t know if this is because all students thought this, or some thought their classmates scored much better while others thought it was about the same.

%%R 
ggplot(scores) +
    aes(x = value,
       fill = name) +
    geom_density(alpha=0.4)
../_images/Lecture7_6_0.png

Paired comparisons with slope plots#

Drawing out each students score and estimated score, and then connecting them with a line allows us to easily see the trends in how many students thought their score was better or worse than the class.

%%R -o scores
ggplot(scores) +
    aes(x = name,
        y = value,
       group = time) +
    geom_line(aes(color = estimation), size = 1.5) + 
    geom_point(size=3)
    
## This plot is *moderately* hard to read, I suggest faceting by estimation, or avoiding it altogether.
../_images/Lecture7_8_0.png

To make it easier to see how much better or worse each student score is compared to the class estimate, we can color the lines by the difference and set a diverging colormap.

%%R -o scores
ggplot(scores) +
    aes(x = name,
        y = value,
       group = time) +
    geom_line(aes(color = diff), size = 1.5) + 
    geom_point(size=3) +
    scale_color_distiller(palette = 'PuOr', lim = c(-5, 5)) +
    theme(panel.grid.major = element_blank(),
          panel.grid.minor = element_blank())
    
## This plot is *really* hard to read! Change colour scheme, or just avoid altogether!
../_images/Lecture7_10_0.png

Another way we could have visualized these differences would have been as a bar plot of the differences, but we would not know the students’ score, just the difference.

%%R
ggplot(scores) +
    aes(x = diff) +
    geom_bar()
../_images/Lecture7_12_0.png

A scatter plot could also work for this comparison, ideally with a diagonal line at zero difference.

%%R
ggplot(scores_raw) +
    aes(x = student_score,
        y = estimated_score) +
    geom_abline(slope = 1, intercept = 0, color = 'white', size = 3) +
    geom_point()
../_images/Lecture7_14_0.png

Altair#

points = alt.Chart(scores).mark_circle(size=50, color='black', opacity=1).encode(
    alt.X('name'),
    alt.Y('value'),
    alt.Detail('time')).properties(width=300)
points.mark_line(size=4, opacity=0.5).encode(alt.Color('estimation')) + points
../_images/Lecture7_16_0.png
points = alt.Chart(scores).mark_circle(size=50, color='black', opacity=1).encode(
    alt.X('name'),
    alt.Y('value'),
    alt.Detail('time')).properties(width=300)
points.mark_line(size=4, opacity=0.8).encode(alt.Color('diff', scale=alt.Scale(scheme='blueorange', domain=(-6, 6)))) + points
../_images/Lecture7_17_0.png

Trendlines#

Trendlines (also sometimes called “lines of best fit”, or “fitted lines”) are good to highlight general trends in the data that can be hard to elucidate by looking at the raw data points. This can happen if there are many data points or many groups inside the data.

from vega_datasets import data

cars = data.cars()
cars 
Name Miles_per_Gallon Cylinders Displacement Horsepower Weight_in_lbs Acceleration Year Origin
0 chevrolet chevelle malibu 18.0 8 307.0 130.0 3504 12.0 1970-01-01 USA
1 buick skylark 320 15.0 8 350.0 165.0 3693 11.5 1970-01-01 USA
2 plymouth satellite 18.0 8 318.0 150.0 3436 11.0 1970-01-01 USA
3 amc rebel sst 16.0 8 304.0 150.0 3433 12.0 1970-01-01 USA
4 ford torino 17.0 8 302.0 140.0 3449 10.5 1970-01-01 USA
... ... ... ... ... ... ... ... ... ...
401 ford mustang gl 27.0 4 140.0 86.0 2790 15.6 1982-01-01 USA
402 vw pickup 44.0 4 97.0 52.0 2130 24.6 1982-01-01 Europe
403 dodge rampage 32.0 4 135.0 84.0 2295 11.6 1982-01-01 USA
404 ford ranger 28.0 4 120.0 79.0 2625 18.6 1982-01-01 USA
405 chevy s-10 31.0 4 119.0 82.0 2720 19.4 1982-01-01 USA

406 rows Ă— 9 columns

points = alt.Chart(cars).mark_point(opacity=0.3).encode(
    alt.X('Year'),
    alt.Y('Horsepower'),
    alt.Color('Origin')) 
points
../_images/Lecture7_20_0.png

A not so effective way to visualize the trend in this data is to connect all data points with a line.

points + points.mark_line()
../_images/Lecture7_22_0.png

A simple way is to use the mean y-value at each x. This works OK in hour case because each year has several values, but in many cases with a continuous x-axis, you would need to bin it in order to avoid noise from outliers.

points + points.encode(y='mean(Horsepower)').mark_line()
../_images/Lecture7_24_0.png

An alternative to binning continuous data is to use a moving/rolling average, that takes the mean of the last \(n\) observations. In our example here, a moving average becomes a bit complicated because there are so many y-values for the exact same x-value, so we would need to calculate the average for each year first, and then move/roll over that, which can be done using the window_transform method in Altair. A more common and simpler example can be viewed in the Altair docs.

# This will not be on a quiz, I added it for those who are interested.
mean_per_year = cars.groupby(['Origin', 'Year'])['Horsepower'].mean().reset_index()
alt.Chart(mean_per_year).transform_window(
    mean_hp='mean(Horsepower)',
    frame=[-1, 1],
    groupby=['Origin']).mark_line().encode(
    x='Year',
    y='mean_hp:Q',
    color='Origin')
../_images/Lecture7_26_0.png

We can also use the rolling method in pandas for this calculation, but it handles the edges a bit differently.

# This will not be on a quiz, I added it for those who are interested.
mean_per_year = cars.groupby(['Origin', 'Year'])['Horsepower'].mean().reset_index()
mean_per_year['rolling_hp'] = mean_per_year.groupby('Origin')['Horsepower'].rolling(3, center=True).mean().to_numpy()
alt.Chart(mean_per_year).mark_line().encode(
    x='Year',
    y='rolling_hp',
    color='Origin')
../_images/Lecture7_28_0.png

Another way of showing a trend in the data is via regression. You will learn more about this later in the program, but in brief you are fitting a straight line through the data, by choosing the line that minimizes a certain cost function (or penalty), commonly the sum squared deviations of the data to the line (least squares).

points +  points.transform_regression(
    'Year', 'Horsepower', groupby=['Origin']).mark_line(size=3)
../_images/Lecture7_30_0.png

You are not limited to fitting linear lines, but can try fits that are quadratic, polynomial, etc.

points +  points.transform_regression(
    'Year', 'Horsepower', groupby=['Origin'], method='poly').mark_line(size=3)
../_images/Lecture7_32_0.png

Sometimes it is difficult to find a single regression equation that describes the entire data set. Instead, you can fit multiple equations (usually linear and quadratic) to smaller subsets of the data, and add them together to get the final line. This is conceptually similar to a moving/rolling average, which also only uses part of the data to create the trend line.

points +  points.transform_loess(
    'Year', 'Horsepower', groupby=['Origin']).mark_line(size=3)
../_images/Lecture7_34_0.png

The bandwidth parameter controls how much the loess fit should be influenced by local variation in the data, similar to the effect of the bandwidth parameter for a KDE.

# The default is 0.3 and 1 is often close to a linear fit.
points +  points.transform_loess(
    'Year', 'Horsepower', groupby=['Origin'], bandwidth=0.8).mark_line(size=3)
../_images/Lecture7_36_0.png

When to choose which trendline?#

  • If it is important that the line has values that are easy to interpret, choose a rolling mean. These are also the most straightforward trendlines when communicating data to a general audience.

  • If you think a simple line equation (e.g. linear) describes your data well, this can be advantageous since you would know that your data follows a set pattern, and it is easy to predict how the data behaves outside the values you have collected (of course still with more uncertainty the further away from your data you predict).

  • If you are mainly interested in highlighting a trend in the current data, and the two situations described above are not of great importance for your figure, then a loess line could be sutiabe. It has the advantage that it describes trends in data very “naturally”, meaning that it highlights patterns we would tend to highlight ourselves in qualitative assessment. It also less strict in its statistical assumption compared to e.g. a linear regression, so you don’t have to worry about finding the correct equation for the line, and assessing whether your data truly follows that equation globally.

ggplot#

Using the mean as a trendline.

%%R -i cars
ggplot(cars) +
    aes(x = Year,
        y = Horsepower,
        color = Origin) +
    geom_point() +
    geom_line(stat = 'summary', fun = 'mean')
../_images/Lecture7_40_0.png

geom_smooth creates a loess trendline by default. The shaded gray area is the 95% confidence interval of the fitted line.

%%R -i cars
ggplot(cars) +
    aes(x = Year,
        y = Horsepower,
        color = Origin) +
    geom_point() +
    geom_smooth()
R[write to console]: `geom_smooth()` using method = 'loess' and formula 'y ~ x'
../_images/Lecture7_42_1.png

We can color the confidence interval the same as the lines.

%%R -i cars
ggplot(cars) +
    aes(x = Year,
        y = Horsepower,
        color = Origin,
        fill = Origin) +
    geom_point() +
    geom_smooth()
R[write to console]: `geom_smooth()` using method = 'loess' and formula 'y ~ x'
../_images/Lecture7_44_1.png

And also remove it.

%%R -i cars
ggplot(cars) +
    aes(x = Year,
        y = Horsepower,
        color = Origin,
        fill = Origin) +
    geom_point() +
    geom_smooth(se = FALSE, size = 2)
R[write to console]: `geom_smooth()` using method = 'loess' and formula 'y ~ x'
../_images/Lecture7_46_1.png

Similar to the bandwidth in Altair, you can set the span in geom_smooth to alter how sensitive the loess fit is to local variation.

%%R
ggplot(cars) +
    aes(x = Year,
        y = Horsepower,
        color = Origin,
        fill = Origin) +
    geom_point() +
    geom_smooth(se = FALSE, size = 2, span = 1)
R[write to console]: `geom_smooth()` using method = 'loess' and formula 'y ~ x'
../_images/Lecture7_48_1.png

If you wnat a linear regression instead of loess you can set the method to lm (linear model).

%%R
ggplot(cars) +
    aes(x = Year,
        y = Horsepower,
        color = Origin,
        fill = Origin) +
    geom_point() +
    geom_smooth(se = FALSE, size = 2, method = 'lm')
R[write to console]: `geom_smooth()` using formula 'y ~ x'
../_images/Lecture7_50_1.png

Confidence intervals#

To show the confidence interval of the points as a band, we can use mark_errorband. By default this mark show the standard deviation of the points, but we can change the extent to use bootstrapping on the sample data to construct the 95% confidence interval of the mean.

points.mark_errorband(extent='ci')
../_images/Lecture7_52_0.png

We can add in the mean line.

points.mark_errorband(extent='ci') + points.encode(y='mean(Horsepower)').mark_line()
../_images/Lecture7_54_0.png

We can use mark_errorbar to show the standard deviation or confidence interval around a single point.

alt.Chart(cars).mark_errorbar(extent='ci', rule=alt.LineConfig(size=2)).encode(
    x='Horsepower',
    y='Origin')
../_images/Lecture7_56_0.png

Also here, it is helpful to include an indication of the mean.

err_bars = alt.Chart(cars).mark_errorbar(extent='ci', rule=alt.LineConfig(size=2)).encode(
    x='Horsepower',
    y='Origin')

err_bars + err_bars.mark_point(color='black').encode(x='mean(Horsepower)')
../_images/Lecture7_58_0.png

An particularly usful visualization is to combine the above with an indication of the distribution of the data, e.g. as a faded violinplot in the background or as faded marks for all observations. This gives the reader a chance to study the raw data in addition to seeing the mean and its certainty.

err_bars = alt.Chart(cars).mark_errorbar(extent='ci', rule=alt.LineConfig(size=2)).encode(
    x='Horsepower',
    y='Origin')

(err_bars.mark_tick(color='lightgrey')
 + err_bars
 + err_bars.mark_point(color='black').encode(x='mean(Horsepower)'))
../_images/Lecture7_60_0.png

ggplot#

In ggplot, we can create confidence bands via geom_ribbon. Previously we have passed specific statistic summary functions to the fun parameter, but here we will use fun.data because we need both the lower and upper bond of where to plot the ribbon. Whereas fun only allows functions that return a single value which decides where to draw the point on the y-axis (such as mean), fun.data allows functions to return three values (the min, middle, and max y-value). The mean_cl_boot function is especially helpful here, since it returns the upper and lower bound of the bootstrapped CI (and also the mean value, but that is not used by geom_ribbon).

You need the Hmisc package installed in order to use mean_cl_boot, if you don’t nothing will show up but you wont get an error, so it can be tricky to realize what is wrong.

%%R -w 600
ggplot(cars) +
    aes(x = Year,
        y = Horsepower,
        color = Origin,
        fill = Origin) +
    geom_ribbon(stat = 'summary', fun.data = mean_cl_boot, alpha=0.5, color = NA)
    # `color = NA` removes the ymin/ymax lines and shows only the shaded filled area
../_images/Lecture7_63_0.png

We can add a line for the mean here as well.

%%R -w 600
ggplot(cars) +
    aes(x = Year,
        y = Horsepower,
        color = Origin,
        fill = Origin) +
    geom_line(stat = 'summary', fun = mean) +
    geom_ribbon(stat = 'summary', fun.data = mean_cl_boot, alpha=0.5, color = NA)
../_images/Lecture7_65_0.png

To plot the confidence interval around a single point, we can use geom_pointrange, which also plots the mean (so it uses all three values return from mean_cl_boot).

%%R
ggplot(cars) +
    aes(x = Horsepower,
        y = Origin) +
    geom_pointrange(stat = 'summary', fun.data = mean_cl_boot)
../_images/Lecture7_67_0.png

And finally we can plot the observations in the backgound here.

%%R
ggplot(cars) +
    aes(x = Horsepower,
        y = Origin) +
    geom_point(shape = '|', color='grey', size=5) +
    geom_pointrange(stat = 'summary', fun.data = mean_cl_boot, size = 0.7)
../_images/Lecture7_69_0.png
%%R
ggplot(cars) +
    aes(x = Horsepower,
        y = Origin) +
    geom_violin(color = NA, fill = 'steelblue', alpha = 0.4) +
    geom_pointrange(stat = 'summary', fun.data = mean_cl_boot, size = 0.7)
../_images/Lecture7_70_0.png

Figure composition in Altair#

Let’s create two figures to layout together. The titles here are a bit redundant, they’re just mean to facilitate spotting which figure goes where in the multi-panel figure.

from vega_datasets import data

cars = data.cars()

mpg_weight = alt.Chart(cars, title='x = mpg').mark_point().encode(
    x=alt.X('Miles_per_Gallon'),
    y=alt.Y('Weight_in_lbs'),
    color='Origin')
mpg_weight
../_images/Lecture7_73_0.png

If a variable is shared between two figures, it is a good idea to have it on the same axis. This makes it easier to compare the relationship with the previous plot.

hp_weight = alt.Chart(cars, title='x = hp').mark_point().encode(
    x=alt.X('Horsepower'),
    y=alt.Y('Weight_in_lbs'),
    color='Origin')
hp_weight
../_images/Lecture7_75_0.png

To concatenate plots vertically, we can use the ampersand operator.

mpg_weight & hp_weight
../_images/Lecture7_77_0.png

To concatenate horizontally, we use the pipe operator.

mpg_weight | hp_weight
../_images/Lecture7_79_0.png

To add an overall title to the figure, we can use the properties method. We need to surround the plots with a parentheis to show that we are using properties of the composed figures rather than just hp_weight one.

(mpg_weight | hp_weight).properties(title='Overall title')
../_images/Lecture7_81_0.png

In addition to & and |, we could use the functions vconcat and hconcat. You can use what you find the most convenient, this is how to add a title with one of those functions.

alt.hconcat(mpg_weight, hp_weight, title='Overall title')
../_images/Lecture7_83_0.png

We can also build up a figure with varying sizes for the different panels, e.g. adding marginal distribution plots to a scatter plot.

mpg_hist = alt.Chart(cars).mark_bar().encode(
    alt.X('Miles_per_Gallon', bin=True),
    y='count()').properties(height=100)
mpg_hist
../_images/Lecture7_85_0.png
weight_ticks = alt.Chart(cars).mark_tick().encode( 
    x='Origin',
    y='Weight_in_lbs',
    color='Origin')
weight_ticks
../_images/Lecture7_86_0.png
mpg_weight | weight_ticks
../_images/Lecture7_87_0.png

Just adding operations after each other can lead to the wrong grouping of the panels in the figure.

mpg_hist & mpg_weight | weight_ticks
../_images/Lecture7_89_0.png

Adding parenthesis can indicate how to group the different panels.

mpg_hist & (mpg_weight | weight_ticks)
../_images/Lecture7_91_0.png

Saving figures in Altair#

Saving as HTML ensures that any interactive features are still present in the saved file.

combo = mpg_hist & (mpg_weight | weight_ticks)
combo.save('combo.html')

It is also possible to save as non-interactive formats such as png, svg, and pdf. The png and pdf formats can be a bit tricky to getto work correctly, as they depend on other packages. Installation for these should have been taken care of via installing altair_saver, but there are additional instruction in the github repo in case that didn’t work.

# To save a file as png, simply do: 

# combo.save('combo.png')

The resolution/size of the saved image can be controlled via the scale_factor parameter.

#combo.save('combo-hires.png', scale_factor=3)

Figure composition in ggplot#

%%R
library(tidyverse)
%%R -i cars
mpg_weight <- ggplot(cars) +
    aes(x = Miles_per_Gallon,
        y = Weight_in_lbs,
        color = Origin) +
    geom_point()
mpg_weight
../_images/Lecture7_101_0.png
%%R -i cars
hp_weight <- ggplot(cars) +
    aes(x = Horsepower,
        y = Weight_in_lbs,
        color = Origin) +
    geom_point()
hp_weight
../_images/Lecture7_102_0.png

Laying out figures is not built into ggplot, but the functionality is added in separate packages. patchwork is similar to the operator-syntax we used with Altair, and cowplot works similar to the concatenation functions in Altair. Here I will be showing the latter, but you’re free to use either (“cow” are the author’s initials, Claus O Wilke, the same person who wrote Fundamentals of Data Visualization).

%%R -w 600 -h 300
library(cowplot)

plot_grid(hp_weight, mpg_weight)
../_images/Lecture7_104_0.png

Panels can easily be labeled.

%%R -w 600 -h 300
plot_grid(hp_weight, mpg_weight, labels=c('A', 'B'))
../_images/Lecture7_106_0.png

And this can even be automated.

%%R -w 600 -h 300
plot_grid(hp_weight, mpg_weight, labels='AUTO')
../_images/Lecture7_108_0.png

Let’s create a composite figure with marginal distribution plots.

%%R
ggplot(cars) +
    aes(x = Origin) +
    geom_bar()
../_images/Lecture7_110_0.png

If we were to present this barplot as a communication firgure, the bars should not be that wide. It is more visually appealing with narrower bars.

%%R -w 200
ggplot(cars) +
    aes(x = Origin) +
    geom_bar()
../_images/Lecture7_112_0.png
%%R -w 200
origin_count <- ggplot(cars) +
    aes(x = Origin) +
    geom_bar()
origin_count
../_images/Lecture7_113_0.png

To set the widths of figures in the composition plot, we can use rel_widths.

%%R -w 800
plot_grid(mpg_weight, origin_count)
../_images/Lecture7_115_0.png
%%R -w 800
plot_grid(mpg_weight, origin_count, rel_widths=c(3, 1))
../_images/Lecture7_116_0.png

It is not that nice to see the legend between the plots, so lets reorder them.

%%R -w 800
plot_grid(origin_count, mpg_weight, rel_widths=c(1, 3))
../_images/Lecture7_118_0.png

To concatenate vertically, we set the number of columns to 1.

%%R -w 400
plot_grid(origin_count, mpg_weight, ncol=1)
../_images/Lecture7_120_0.png

Finally, we can nest plot grids within each other.

%%R -w 400
top_row <- plot_grid(origin_count, origin_count)
plot_grid(top_row, mpg_weight, ncol=1, rel_heights=c(1,2))
../_images/Lecture7_122_0.png

There are some more tricks in the readme, including how to add a common title for the figures via ggdraw.

Saving figures in ggplot#

The ggsave functions saves the most recent plot to a file.

%%R
grid <- plot_grid(top_row, mpg_weight, ncol=1)
ggsave('grid.png')
R[write to console]: Saving 6.67 x 6.67 in image

You can also specify which figure to save.

%%R
ggsave('grid.png', grid)
R[write to console]: Saving 6.67 x 6.67 in image

Setting the dpi controls the resolution of the saved figure.

%%R
grid <- plot_grid(top_row, mpg_weight, ncol=1)
ggsave('grid2.png', dpi=96)
R[write to console]: Saving 6.67 x 6.67 in image

You can save to PDF and SVG as well.

%%R
grid <- plot_grid(top_row, mpg_weight, ncol=1)
ggsave('grid2.pdf')
R[write to console]: Saving 6.67 x 6.67 in image