# Lecture 8: Tidy evaluation in R

**Note from Firas**

Our series of R lectures will be presented by Dr. Tiffany Timbers, the other option co-director of the Vancouver MDS program.

### What Metaprogramming lets you do in R

- write `library(purrr)` instead of `library("purrr")`
- enable `plot(x, sin(x))` to automatically label the axes with `x` and `sin(x)`
- create a model object via `lm(y ~ x1 + x2, data = df)`
- and much much more (that you will see in Data Wrangling as we explore the tidyverse)

### What is metaprogramming?

Code that writes code/code that mutates code.

### Our narrow focus on metaprogramming for this course:

Tidy evaluation

### Why focus on tidy evaluation

In the rest of MDS you will be relying on functions from the tidyverse to do a lot of:
- data wrangling
- statistics
- data visualization

## The functions from the tidyverse are beautiful to use interactively:

In [1]:
library(gapminder)
library(dplyr)
head(gapminder)


Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union



country,continent,year,lifeExp,pop,gdpPercap
<fct>,<fct>,<int>,<dbl>,<int>,<dbl>
Afghanistan,Asia,1952,28.801,8425333,779.4453
Afghanistan,Asia,1957,30.332,9240934,820.853
Afghanistan,Asia,1962,31.997,10267083,853.1007
Afghanistan,Asia,1967,34.02,11537966,836.1971
Afghanistan,Asia,1972,36.088,13079460,739.9811
Afghanistan,Asia,1977,38.438,14880372,786.1134


with base r:

In [2]:
gapminder[gapminder$country == "Canada" & gapminder$year == 1952, ]

country,continent,year,lifeExp,pop,gdpPercap
<fct>,<fct>,<int>,<dbl>,<int>,<dbl>
Canada,Americas,1952,68.75,14785584,11367.16


In the tidyverse:

In [None]:
filter(gapminder, country == "Canada", year == 1952)

### How does that even work?

- When functions like `filter` are called, there is a delay in evaluation and the data frame is temporarily promoted as first class objects, we say the data masks the workspace

- This is to allow the promotion of the data frame, such that it masks the workspace (global environment) 

- When this happens, R can then find the relevant columns for the computation

*This is referred to as data masking*

### We can capture expressions and later evaluate the code:

We will functions from base R and rlang to demonstrate this:

In [6]:
library(rlang)
x <- 10
y <- expr(x)

In [7]:
eval(y)

### We can also manipulate the environment code is evaluated in:

In [9]:
x <- 10
y <- 2
expr(x + y)
eval(expr(x + y), env(x = 1000))

x + y

### You can evaluate in a data mask:

In [11]:
#head(gapminder)

In [12]:
gapminder$gdpPercap * gapminder$pop

ERROR: Error in eval(expr, envir, enclos): object 'gdpPercap' not found


In [64]:
x <- expr(gdpPercap * pop)
x

gdpPercap * pop

In [66]:
eval(x, gapminder)

The **data mask** allows you to mingle variables from an environment and a data frame in a single expression. 

In [15]:
y <- 1000
new_exp <- expr(pop / y)

eval(new_exp, gapminder)

### Back to our example:

What is going on here?

- code evaluation is delayed 
- the `filter` function quotes columns `country` and `year`
- the `filter` function then creates a data mask (to mingle variables from the environment and the data frame)
- the columns `country` and `year` and unquoted and evaluated within the data mask

In [None]:
filter(gapminder, country == "Canada", year == 1952)

### Trade off of lovely interactivity of tidyverse functions...

### programming with them can be more challenging.

Let's try writing a function which wraps filter for gapminder:

In [16]:
#filter(gapminder, country == "Canada")

filter_gap <- function(col, val) {
    filter(gapminder, col == val)
}

filter_gap(country, "Canada")

ERROR: Error: object 'country' not found


Why does `filter` work with non-quoted variable names, but our function `filter_gap` fail?

### Proper way of defining this function:

Use `enquo` to quote the column names, and then `!!` to unquote them in context.

In [17]:
filter_gap <- function(col, val) {
    col <- enquo(col)
    filter(gapminder, !!col == val)
}

filter_gap(country, "Canada")

country,continent,year,lifeExp,pop,gdpPercap
<fct>,<fct>,<int>,<dbl>,<int>,<dbl>
Canada,Americas,1952,68.75,14785584,11367.16
Canada,Americas,1957,69.96,17010154,12489.95
Canada,Americas,1962,71.3,18985849,13462.49
Canada,Americas,1967,72.13,20819767,16076.59
Canada,Americas,1972,72.88,22284500,18970.57
Canada,Americas,1977,74.21,23796400,22090.88
Canada,Americas,1982,75.76,25201900,22898.79
Canada,Americas,1987,76.86,26549700,26626.52
Canada,Americas,1992,77.95,28523502,26342.88
Canada,Americas,1997,78.61,30305843,28954.93


### Evaluating functions and quoting functions in R

- differ in the way they get their arguments

- evaluating functions take arguments as values

- a quoting function is not passed the value of an expression, it is passed the expression itself

### Evaluating functions 

- take arguments as values:

In [18]:
identity(6)

In [19]:
identity(2 * 3)

In [20]:
a <- 2
b <- 3
identity(a * b)

### Quoting functions 

- take the expression itself, not the value

In [25]:
typeof(quote(6))

In [26]:
typeof(quote(2 * 3))

In [27]:
typeof(quote(a * b))
identity(quote(a * b))

a * b

You get the code! Not the value!

### Not always one or the other:

In practice some functions take both arguments that are evaluated and quoted, for example:

```select(iris, Species)``` 

Here `iris` is an evaluated argument, and `Species` is a quoted argument.

### How can you tell if an argument is quoted?

The argument will not work correctly outside of its original context and ordinary indirect references do not work, some examples:

```library(dplyr)```


In [28]:
temp <- dplyr

ERROR: Error in eval(expr, envir, enclos): object 'dplyr' not found


In [29]:
temp <- "dplyr"
library(temp)

ERROR: Error in library(temp): there is no package called ‘temp’


We get these errors because there is no `dplyr` object for R to find, and `temp` is interpreted by `library` directly as a package name rather than as an indirect reference.

```filter(mtcars, cyl == 4)```

In [30]:
temp <- cyl == 4

ERROR: Error in eval(expr, envir, enclos): object 'cyl' not found


In [31]:
temp <- "cyl" == 4
filter(mtcars, temp)

mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>


```sum(mtcars2$am)```

In [32]:
temp <- mtcars$am

In [33]:
sum(temp)

It worked! `sum()` is an evaluating function and the indirect reference was resolved in the ordinary way.

R cannot find `cyl` because we haven’t specified where to find it. This object exists only inside the mtcars data frame. And then when we put `temp` in the `subset` function it tries to use `temp` as a column name but it doesn't exist and so we get nothing returned.

### Challenge 1

Which of the function arguments in the function call below are quoted? Which are evaluated?

```arrange(mtcars, cyl)```

### Let's try some quoting and unquoting so we can use indirect references in a dplyr function:

Unquoting is accomplished using the `!!` (pronounced "bang bang") operator.

In [38]:
col1 <- quote(country)
val1 <- "Canada"
col2 <- quote(year)
val2 <- 2007

typeof(col1)
col2

year

In [35]:
filter(gapminder, !!col1 == val1, !!col2 == val2)

country,continent,year,lifeExp,pop,gdpPercap
<fct>,<fct>,<int>,<dbl>,<int>,<dbl>
Canada,Americas,2007,80.653,33390141,36319.24


### Viewing the unquoted expression:

The `qq_show` function from the `rlang` package performs unquoting and prints the result to the screen:

In [39]:
qq_show(filter(gapminder, !!col1 == val1, !!col2 == val2))

filter(gapminder, country == val1, year == val2)


### Challenge 2

Re-write the code below using quoting and unquoting so that you can create two variables, `var_1` and `var_2`, to indirectly reference the column names in this function call. Also use `rlang::qq_show` to check your expresion.

```arrange(mtcars, hp, mpg)```

In [41]:
var_1 <- quote(hp)
var_2 <- quote(mpg)

qq_show(arrange(mtcars, !!var_1, !!var_2))

arrange(mtcars, hp, mpg)


### `enquo` vs `quote` when writing a function

- `quote` quotes what you typed

- `enquo` quotes what your user typed (i.e., it makes a function argument automatically quote its input)

In [43]:
filter_gap <- function(col, val) {
    col <- enquo(col)
    filter(gapminder, !!col == val)
}

filter_gap(country, "Canada")

country,continent,year,lifeExp,pop,gdpPercap
<fct>,<fct>,<int>,<dbl>,<int>,<dbl>
Canada,Americas,1952,68.75,14785584,11367.16
Canada,Americas,1957,69.96,17010154,12489.95
Canada,Americas,1962,71.3,18985849,13462.49
Canada,Americas,1967,72.13,20819767,16076.59
Canada,Americas,1972,72.88,22284500,18970.57
Canada,Americas,1977,74.21,23796400,22090.88
Canada,Americas,1982,75.76,25201900,22898.79
Canada,Americas,1987,76.86,26549700,26626.52
Canada,Americas,1992,77.95,28523502,26342.88
Canada,Americas,1997,78.61,30305843,28954.93


### Quote - unquote pattern all in one step: interpolation

- In the newest release of `rlang`, there has been the introduction of the `{{` (pronounced "curly curly") operator.

- Does the same thing as `enguo` and `!!` but (hopefully) easier to use.

In [44]:
filter_gap <- function(col, val) {
    filter(gapminder, {{col}} == val)
}

filter_gap(country, "Canada")

country,continent,year,lifeExp,pop,gdpPercap
<fct>,<fct>,<int>,<dbl>,<int>,<dbl>
Canada,Americas,1952,68.75,14785584,11367.16
Canada,Americas,1957,69.96,17010154,12489.95
Canada,Americas,1962,71.3,18985849,13462.49
Canada,Americas,1967,72.13,20819767,16076.59
Canada,Americas,1972,72.88,22284500,18970.57
Canada,Americas,1977,74.21,23796400,22090.88
Canada,Americas,1982,75.76,25201900,22898.79
Canada,Americas,1987,76.86,26549700,26626.52
Canada,Americas,1992,77.95,28523502,26342.88
Canada,Americas,1997,78.61,30305843,28954.93


### Creating functions that handle unquoted column names:

Or do it all in one step with the brand new curly curly `{{` operator:

In [None]:
filter_gap <- function(col, val) {
    filter(gapminder, {{col}} == val)
}

filter_gap(country, "Canada")

### Creating functions that handle column names as strings:

Sometimes you want to pass a column name into a function as a string (often useful when you are programming and have the column names as a character vector).

You can do this by using symbols + unquoting with `sym` + `!!` :

In [None]:
# example of what we want to wrap: filter(gapminder, country == "Canada")
filter_gap <- function(col, val) {
    col <- sym(col)
    filter(gapminder, !!col == val)
}

filter_gap("country", "Canada")

### Another operator is needed when assigning values...

- `:` is needed before the `=` when quoting and unquoting:

In [61]:
library(rlang)

In [71]:
r <- quote(c(1, 2, 3))
eval(r)
{{c(1, 2, 3)}}

?eval_tidy

In [63]:
x <- quote(mpg)
typeof(x)
select(mtcars, !!x)

Unnamed: 0_level_0,mpg
Unnamed: 0_level_1,<dbl>
Mazda RX4,21.0
Mazda RX4 Wag,21.0
Datsun 710,22.8
Hornet 4 Drive,21.4
Hornet Sportabout,18.7
Valiant,18.1
Duster 360,14.3
Merc 240D,24.4
Merc 230,22.8
Merc 280,19.2


In [48]:
old_col <- quote(mpg)
new_col <- quote(kml)

mutate(mtcars, !!new_col := !!old_col * 0.425144)

#mutate(mtcars, mpg = mpg * 0.425144)

mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb,kml
<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
21.0,6,160.0,110,3.9,2.62,16.46,0,1,4,4,8.928024
21.0,6,160.0,110,3.9,2.875,17.02,0,1,4,4,8.928024
22.8,4,108.0,93,3.85,2.32,18.61,1,1,4,1,9.693283
21.4,6,258.0,110,3.08,3.215,19.44,1,0,3,1,9.098082
18.7,8,360.0,175,3.15,3.44,17.02,0,0,3,2,7.950193
18.1,6,225.0,105,2.76,3.46,20.22,1,0,3,1,7.695106
14.3,8,360.0,245,3.21,3.57,15.84,0,0,3,4,6.079559
24.4,4,146.7,62,3.69,3.19,20.0,1,0,4,2,10.373514
22.8,4,140.8,95,3.92,3.15,22.9,1,0,4,2,9.693283
19.2,6,167.6,123,3.92,3.44,18.3,1,0,4,4,8.162765


In [None]:
old_col <- quote(mpg)
new_col <- quote(kml)

mutate(mtcars, !!new_col := !!old_col * 0.425144)

### Pass the dots when you can

If you are only passing on variable to a tidyverse function, and that variable is not used in logical comparisons, or in variable assignment, you can get away with passing the dots:

In [49]:
sort_gap <- function(...) {
    arrange(gapminder, ...)
}

sort_gap(year)

country,continent,year,lifeExp,pop,gdpPercap
<fct>,<fct>,<int>,<dbl>,<int>,<dbl>
Afghanistan,Asia,1952,28.801,8425333,779.4453
Albania,Europe,1952,55.230,1282697,1601.0561
Algeria,Africa,1952,43.077,9279525,2449.0082
Angola,Africa,1952,30.015,4232095,3520.6103
Argentina,Americas,1952,62.485,17876956,5911.3151
Australia,Oceania,1952,69.120,8691212,10039.5956
Austria,Europe,1952,66.800,6927772,6137.0765
Bahrain,Asia,1952,50.939,120447,9867.0848
Bangladesh,Asia,1952,37.484,46886859,684.2442
Belgium,Europe,1952,68.000,8730405,8343.1051


### Notes on passing the dots

- the dots must be the last function argument
- they are useful because you can add multiple arguments

For example:

In [None]:
sort_gap <- function(..., x) {
    print(x + 1)
    arrange(gapminder, ...)
}

sort_gap(year, continent, country, 2)

In [None]:
sort_gap <- function(x, ...) {
    print(x + 1)
    arrange(gapminder, ...)
}

sort_gap(1, year, continent, country)

### Pass the dots is not always the solution...

In [52]:
library(dplyr)
select_n_change <- function(data, ...) {
    out <- select(data, ...)
    mutate(out, ... := ... / mean(...))
}

select_n_change(mtcars, mpg, cyl, disp, hp)

ERROR: Error: object 'mpg' not found


### When passing in different column names to different functions, use quote & unquote:

In [53]:
select_n_change <- function(data, col_range, col_to_change) {
    out <- select(data, {{col_range}})
    mutate(out, {{col_to_change}} := {{col_to_change}} / mean({{col_to_change}}))
}

select_n_change(mtcars, mpg:hp, mpg)

mpg,cyl,disp,hp
<dbl>,<dbl>,<dbl>,<dbl>
1.0452636,6,160.0,110
1.0452636,6,160.0,110
1.1348577,4,108.0,93
1.0651734,6,258.0,110
0.9307824,8,360.0,175
0.9009177,6,225.0,105
0.7117748,8,360.0,245
1.2144968,4,146.7,62
1.1348577,4,140.8,95
0.9556696,6,167.6,123


## Combine quoting & unquoting with pass the dots:

In [54]:
select_n_change <- function(data, col_to_change, ...) {
    out <- select(data, ...)
    mutate(out, {{col_to_change}} := {{col_to_change}} / mean({{col_to_change}}))
}

select_n_change(mtcars, mpg, mpg, drat, carb)

mpg,drat,carb
<dbl>,<dbl>,<dbl>
1.0452636,3.9,4
1.0452636,3.9,4
1.1348577,3.85,1
1.0651734,3.08,1
0.9307824,3.15,2
0.9009177,2.76,1
0.7117748,3.21,4
1.2144968,3.69,2
1.1348577,3.92,2
0.9556696,3.92,4


In [None]:
head(mtcars)

### Another operator is needed when assigning values...

In [None]:
mutate(mtcars, kml = mpg * 0.425144)



### What did we learn?
- quoting and unquoting
- data masking
- without and quoting and unquoting function arguments for quoted functions, R will attempt to evaluate the arguments - potentially out of context.
- introduction to the tidyverse

### Attribution:

- [Tidy evaluation](https://tidyeval.tidyverse.org/) by Lionel Henry & Hadley Wickham
- [Tidy eval in context](https://speakerdeck.com/jennybc/tidy-eval-in-context)  talk by Jenny Bryan
- [Programming in the tidyverse](https://dplyr.tidyverse.org/articles/programming.html) 

In [75]:
iris[c(2, 4)]

Sepal.Width,Petal.Width
<dbl>,<dbl>
3.5,0.2
3.0,0.2
3.2,0.2
3.1,0.2
3.6,0.2
3.9,0.4
3.4,0.3
3.4,0.2
2.9,0.2
3.1,0.1
