bookclub-advr

DSLC Advanced R Book Club
git clone https://git.eamoncaddigan.net/bookclub-advr.git
Log | Files | Refs | README | LICENSE

commit d14e20a20f2afe981a1cc42fa35ba89f9cb5e4d6
parent 05c2058ec26cca2c35a07a03ad921963d04873cc
Author: Steffi LaZerte <steffi@steffi.ca>
Date:   Fri, 11 Oct 2024 09:46:59 -0500

Steffi's chp 19 edits (#74)

* Steffi's chp 19 edits

* Actually formula not required
Diffstat:
M19_Quasiquotation.Rmd | 446+++++++++++++++++++++++++++++++++++++++++--------------------------------------
1 file changed, 229 insertions(+), 217 deletions(-)

diff --git a/19_Quasiquotation.Rmd b/19_Quasiquotation.Rmd @@ -1,9 +1,3 @@ -```{r, echo= FALSE, message=FALSE} -library(rlang) -library(purrr) -``` - - # Quasiquotation **Learning objectives:** @@ -12,27 +6,36 @@ library(purrr) - Why it's important - Learn some practical uses +```{r, message=FALSE} +library(rlang) +library(purrr) +``` + ## Introduction -- Three pillars of *tidy* evaluation +Three pillars of *tidy* evaluation + 1. Quasiquotation 2. Quosures (chapter 20) 3. Data masks (Chapter 20) -- Quasiquotation = quotation + unquotation: - - **Quote.** Capture unevaluated expression ...("defuse") - - **Unquote.** Except for selected parts which we do want to evaluate! ("inject") - -- Functions that use these features are said to use Non-standard evaluation (NSE) +**Quasiquotation = quotation + unquotation** +- **Quote.** Capture unevaluated expression... ("defuse") +- **Unquote.** Evaluate selections of quoted expression! ("inject") +- Functions that use these features are said to use Non-standard evaluation (NSE) - Note: related to Lisp macros, and also exists in other languages with Lisp heritage, e.g. Julia -## Motivation +> On it's own, Quasiquotation good for programming, but combined with other tools, +> important for data analysis. +## Motivation Simple *concrete* example: -`Cement` is a function that works like `paste` but doesn't need need quotes: +`cement()` is a function that works like `paste()` but doesn't need need quotes + +(Think of automatically adding 'quotes' to the arguments) ```{r} cement <- function(...) { @@ -43,209 +46,251 @@ cement <- function(...) { cement(Good, morning, Hadley) ``` -What if we wanted to use variables ? This is where 'unquoting' comes in! +What if we wanted to use variables? What is an object and what should be quoted? + +This is where 'unquoting' comes in! ```{r} -name = "Bob" -cement(Good, afternoon, !!name) +name <- "Bob" +cement(Good, afternoon, !!name) # Bang-bang! ``` +## Vocabulary {-} - -## Nonstandard evaluation {-} +Can think of `cement()` and `paste()` as being 'mirror-images' of each other. -* Functions like `dplyr::filter` use nonstandard evaluation, and quote some of their arguments to help make code more *tidy*. +- `paste()` - define what to quote - **Evaluates** arguments +- `cement()` - define what to unquote - **Quotes** arguments -```{r} -#| eval: FALSE -# `cyl` is written as a bare name--a symbol defined in the global environment -# but `cyl` only exists in the data frame "environment" -# so, `{dplyr}` quotes the argument -dplyr::filter(mtcars, cyl == 4) +**Quoting function** similar to, but more precise than, **Non-standard evaluation (NSE)** + +- Tidyverse functions - e.g., `dplyr::mutate()`, `tidyr::pivot_longer()` +- Base functions - e.g., `library()`, `subset()`, `with()` + +**Quoting function** arguments cannot be evaluated outside of function: +```{r, error = TRUE} +cement(Good, afternoon, Cohort) # No problem +Good # Error! ``` - -* You often can detect this if the argument wouldn't work in isolation, for example: -```{r, eval = FALSE} -library(MASS) # this is fine -MASS -#> Error: object MASS not found +**Non-quoting (standard) function** arguments can be evaluated: +```{r} +paste("Good", "afternoon", "Cohort") +"Good" ``` -and -```{r, eval = FALSE} -cyl -#> Error: object 'cyl' not found +## Quoting + +**Capture expressions without evaluating them** + +```{r, echo = FALSE} +data.frame( + t = rep(c("One", "Many"), 3), + Developer = c("`expr()`","`exprs()`", + "`quote()`", "`substitute()`", + "", ""), + User = c("`enexpr()`", "`enexprs()`", + "`alist()`", "`as.list(substitute(...()))`", + "`ensym()`", "`ensyms()`"), + type = c("Expression", "Expression", "R Base", "R Base", "Symbol", "Symbol")) |> + dplyr::group_by(type) |> + gt::gt() |> + gt::tab_row_group(label = "R Base (Quotation)", rows = type == "R Base")|> + gt::tab_row_group(label = "Symbol (Quasiquotation)", rows = type == "Symbol") |> + gt::tab_row_group(label = "Expression (Quasiquotation)", rows = type == "Expression")|> + gt::cols_label(t = "") |> + gt::tab_options(row_group.font.weight = "bold") |> + gt::tab_style(style = gt::cell_text(align = "center", weight = "bold"), + locations = gt::cells_column_labels()) |> + gt::tab_style(style = gt::cell_borders(style = "hidden"), locations = gt::cells_body()) |> + gt::tab_style(style = gt::cell_borders(sides = "top", style = "solid"), + locations = gt::cells_body(rows = c(1, 3, 5))) |> + gt::tab_style(style = gt::cell_borders(sides = "bottom", style = "solid"), + locations = gt::cells_body(rows = c(2, 4))) |> + gt::cols_align("center", columns = -1) |> + gt::fmt_markdown() |> + gt::cols_width(t ~ px(100)) ``` +- Non-base functions are from **rlang** +- **Developer** - From you, direct, fixed, interactive +- **User** - From the user, indirect, varying, programmatic -## Quote +Also: -- Expression +- `bquote()` provides a limited form of quasiquotation +- `~`, the formula, is a quoting function (see [Section 20.3.4](https://adv-r.hadley.nz/evaluation.html#quosure-impl)) +### `expr()` and `exprs()` {-} ```{r} -# for interactive use -rlang::expr(x+y) - -# enexpr works on function arguments (looks at internal promise object) -f2 <- function(x) rlang::enexpr(x) -f2(a + b + c) +expr(x + y) +exprs(exp1 = x + y, exp2 = x * y) ``` -- To capture multiple arguments, use `enexprs()` +### `enexpr()`^[`enexpr()` = **en**rich `expr()`] and `enexprs()` {-} ```{r} -f <- function(...) enexprs(...) -f(x=1, y= 10 *z) -``` - - -- For symbols, there is `ensym` and `ensyms` which check that the argument is a symbol or string. +f <- function(x) enexpr(x) +f(a + b + c) -## Base R method {-} +f2 <- function(x, y) enexprs(exp1 = x, exp2 = y) +f2(x = a + b, y = c + d) +``` -* Base R methods do not support unquoting. +### `ensym()` and `ensyms()` {-} -* Base R equivalent of `expr` is `quote` +- **[Remember](https://adv-r.hadley.nz/expressions.html#symbols):** Symbol represents the name of an object. Can only be length 1. +- These are stricter than `enexpr/s()` -* Base R equivalent of `enexpr` is `substitute` (note that `enexpr` uses `substitute`!) +```{r} +f <- function(x) ensym(x) +f(a) -```{r, eval = FALSE} -enexpr -#>function (arg) -#>{ -#> .Call(ffi_enexpr, substitute(arg), parent.frame()) -#>} +f2 <- function(x, y) ensyms(sym1 = x, sym2 = y) +f2(x = a, y = "b") ``` -* `bquote()` provides a limited form of quasiquotation, see section 19.5 +## Unquoting -* `~`, the formula, is a quoting function, discussed in Section 20.3.4 +**Selectively evaluate parts of an expression** -## Unquote +- Merges ASTs with template +- 1 argument `!!` (**unquote**, **bang-bang**) + - Unquoting a *function call* evaluates and returns results + - Unquoting a *function (name)* replaces the function (alternatively use `call2()`) +- \>1 arguments `!!!` (**unquote-splice**, **bang-bang-bang**, **triple bang**) +- `!!` and `!!!` only work like this inside quoting function using rlang -- Unquoting allows you to merge together ASTs with selective evaluation. +### Basic unquoting {-} -- Use `!!` (*inject* operator) - -- One argument +**One argument** ```{r} -# quote `-1` as `x` -x <- rlang::expr(-1) -# unquote `x` to substitute its unquoted value -# use bang-bang operator -res = rlang::expr(f(!!x, y)) -print(res) -lobstr::ast(!!res) +x <- expr(a + b) +y <- expr(c / d) ``` -- If the right-hand side of `!!` is a function call, it will evalute the function and insert the results. +```{r, collapse = TRUE} +expr(f(x, y)) # No unquoting +expr(f(!!x, !!y)) # Unquoting +``` +**Multiple arguments** ```{r} -mean_rm <- function(var) { - var <- ensym(var) - expr(mean(!!var, na.rm = TRUE)) -} -expr(!!mean_rm(x) + !!mean_rm(y)) -#> mean(x, na.rm = TRUE) + mean(y, na.rm = TRUE) +z <- exprs(a + b, c + d) +w <- exprs(exp1 = a + b, exp2 = c + d) ``` +```{r, collapse = TRUE} +expr(f(z)) # No unquoting +expr(f(!!!z)) # Unquoting +expr(f(!!!w)) # Unquoting when named +``` -- Multiple arguments, use `!!!` *Splice* +### Special usages or cases {-} -```{r} -xs <- rlang::exprs(1, a, -b) -# unquote multiple arguments -# use bang-bang-bang operator -res=expr(f(!!!xs, y)) -res +For example, get the AST of an expression +```{r, collapse = TRUE} +lobstr::ast(x) +lobstr::ast(!!x) ``` -```{r} -lobstr::ast(!!res) + + +Unquote *function call* +```{r, collapse = TRUE} +expr(f(!!mean(c(100, 200, 300)), y)) ``` -## ... (dot-dot-dot) +Unquote *function* +```{r, collapse = TRUE} +f <- expr(sd) +expr((!!f)(x)) +expr((!!f)(!!x + !!y)) +``` -* !!! is also useful in other places where you have a list of expressions you want to insert into a call. +## Non-quoting -* Two motivating examples: +Only `bquote()` provides a limited form of quasiquotation. -List of dataframes you want to `rbind` (a list of arbitrary length) +The rest of base selectively uses or does not use quoting (rather than unquoting). -```{r} -dfs <- list( - a = data.frame(x = 1, y = 2), - b = data.frame(x = 3, y = 4) -) -``` +Four basic forms of quoting/non-quoting: -How to supply an argument name indirectly? - -```{r} -var <- "x" -val <- c(4, 3, 9) -``` - - -* For the first one, we can use unquote (splice) in `dplyr::bind_rows`` +1. **Pair of functions** - Quoting and non-quoting + - e.g., `$` (quoting) and `[[` (non-quoting) +2. **Pair of Arguments** - Quoting and non-quoting + - e.g., `rm(...)` (quoting) and `rm(list = c(...))` (non-quoting) +3. **Arg to control quoting** + - e.g., `library(rlang)` (quoting) and `library(pkg, character.only = TRUE)` (where `pkg <- "rlang"`) +4. **Quote if evaluation fails** + - `help(var)` - Quote, show help for var + - `help(var)` (where `var <- "mean"`) - No quote, show help for mean + - `help(var)` (where `var <- 10`) - Quote fails, show help for var -```{r} -dplyr::bind_rows(!!!dfs) -``` -This is known 'splatting' in some other langauges (Ruby, Go, Julia). Python calls this argument unpacking (`**kwarg`) +## ... (dot-dot-dot) [When using ... with quoting] -* For the second we need to unquote the left side of an `=`. Tidy eval lets us do this with a special `:=` +- Sometimes need to supply an *arbitrary* list of expressions or arguments in a function (`...`) +- But need a way to use these when we don't necessarily have the names +- Remember `!!` and `!!!` only work with functions that use rlang +- Can use `list2(...)` to turn `...` into "tidy dots" which *can* be unquoted and spliced +- Require `list2()` if going to be passing or using `!!` or `!!!` in `...` +- `list2()` is a wrapper around `dots_list()` with the most common defaults -```{r} -tibble::tibble(!!var := val) +**No need for `list2()`** +```{r, collapse = TRUE} +d <- function(...) data.frame(list(...)) +d(x = c(1:3), y = c(2, 4, 6)) ``` -* Functions that have these capabilities are said to have *tidy dots* (or apparently now it is called *dynamic dots*). To get this capability in your own functions, use `list2`! - -## Example of `list2()` {-} +**Require `list2()`** +```{r, collapse = TRUE, error = TRUE} +vars <- list(x = c(1:3), y = c(2, 4, 6)) +d(!!!vars) +d2 <- function(...) data.frame(list2(...)) +d2(!!!vars) +# Same result but x and y evaluated later +vars_expr <- exprs(x = c(1:3), y = c(2, 4, 6)) +d2(!!!vars_expr) +``` +Getting argument names (symbols) from variables ```{r} -set_attr <- function(.x, ...) { - attr <- rlang::list2(...) - attributes(.x) <- attr - .x -} +nm <- "z" +val <- letters[1:4] +d2(x = 1:4, !!nm := val) +``` -attrs <- list(x = 1, y = 2) -attr_name <- "z" +## `exec()` [Making your own ...] {-} -1:10 %>% - set_attr(w = 0, !!!attrs, !!attr_name := 3) %>% - str() -``` -### Exercise from 19.6.5 {-} +What if your function doesn't have tidy dots? -What is the problem here? -```{r, eval=FALSE} -set_attr <- function(x, ...) { - attr <- rlang::list2(...) - attributes(x) <- attr - x +Can't use `!!` or `:=` if doesn't support rlang or dynamic dots +```{r, collapse=TRUE, error = TRUE} +my_mean <- function(x, arg_name, arg_val) { + mean(x, !!arg_name := arg_val) } -set_attr(1:10, x = 10) -#> Error in attributes(x) <- attr : attributes must be named + +my_mean(c(NA, 1:10), arg_name = "na.rm", arg_val = TRUE) ``` -## Exec {-} +Let's use the ... from `exec()` +```{r, eval = FALSE} +exec(.fn, ..., .env = caller_env()) +``` -What about existing functions that don't support tidy dots? Use `exec` -```{r} -arg_name <- "na.rm" -arg_val <- TRUE -exec("mean", 1:10, !!arg_name := arg_val) +```{r, collapse=TRUE} +my_mean <- function(x, arg_name, arg_val) { + exec("mean", x, !!arg_name := arg_val) +} + +my_mean(c(NA, 1:10), arg_name = "na.rm", arg_val = TRUE) ``` -Note that you do not unquote arg_val. +Note that you do not unquote `arg_val`. Also `exec` is useful for mapping over a list of functions: @@ -255,23 +300,18 @@ funs <- c("mean", "median", "sd") purrr::map_dbl(funs, exec, x, na.rm = TRUE) ``` - - -## dots_list {-} - -- `list2()` is a wrapper around `dots_list` with the most common defaults: - - - `.ignore_empty` : Ignores any empty arguments, lets you use trailing commas in a list - - `.homonyms` : controls what happens when multiple arguments use the same name, `list2()` uses default of `keep` - - `.preserve_empty` controls what do so with empty arguments if they are not ignored. - ## Base R `do.call` {-} -`do.call(what, args)` . `what` is a function to call, `args` is a list of arguments to pass to the function. +`do.call(what, args)` -```{r} -do.call("rbind", dfs) +- `what` is a function to call +- `args` is a list of arguments to pass to the function. + +```{r, collapse = TRUE} +nrow(mtcars) +mtcars3 <- do.call("rbind", list(mtcars, mtcars, mtcars)) +nrow(mtcars3) ``` @@ -286,76 +326,48 @@ exec_ <- function(f, ..., .env = caller_env()){ } ``` -## Map-reduce example {-} +## Case Studies (side note) -Function that will return an expression corresponding to a linear model. +Sometimes you want to run a bunch of models, without having to copy/paste each one. -```{r} -linear <- function(var, val) { - - # capture variable as a symbol - var <- ensym(var) - - # Create a list of symbols of the form var[[1]], var[[2], etc] - coef_name <- map(seq_along(val[-1]), ~ expr((!!var)[[!!.x]])) +BUT, you also want the summary function to show the appropriate model call, +not one with hidden variables (e.g., `lm(y ~ x, data = data)`). - # map over the coefficients and the names to create the terms - summands <- map2(val[-1], coef_name, ~ expr((!!.x * !!.y))) - - # Dont forget the intercept - summands <- c(val[[1]], summands) +We can achieve this by building expressions and unquoting as needed: - # Reduce! - reduce(summands, ~ expr(!!.x + !!.y)) -} - -linear(x, c(10, 5, -4)) -#> 10 + (5 * x[[1L]]) + (-4 * x[[2L]]) -``` - - -## Creating functions example {-} +```{r, collapse = TRUE} +library(purrr) -* `rlang::new_function()` creates a function from its three components and supports tidy evaluation +vars <- data.frame(x = c("hp", "hp"), + y = c("mpg", "cyl")) -* Alternative to function factories. +x_sym <- syms(vars$x) +y_sym <- syms(vars$y) -Example: -```{r} -power <- function(exponent) { - new_function( - exprs(x = ), - expr({ - x ^ !!exponent - }), - caller_env() - ) -} -power(0.5) - +formulae <- map2(x_sym, y_sym, \(x, y) expr(!!y ~ !!x)) +formulae +models <- map(formulae, \(f) expr(lm(!!f, data = mtcars))) +summary(eval(models[[1]])) ``` -Another example, is `graphics::curve` that allows you to plot an expression without creating a function. It could be implemented like this: - -```{r} -curve2 <- function(expr, xlim = c(0, 1), n = 100) { - expr <- enexpr(expr) - f <- new_function(exprs(x = ), expr) +As a function: +```{r, collapse = TRUE} +lm_df <- function(df, data) { + x_sym <- map(df$x, as.symbol) + y_sym <- map(df$y, as.symbol) + data <- enexpr(data) - x <- seq(xlim[1], xlim[2], length = n) - y <- f(x) - - plot(x, y, type = "l", ylab = expr_text(expr)) + formulae <- map2(x_sym, y_sym, \(x, y) expr(!!y ~ !!x)) + models <- map(formulae, \(f) expr(lm(!!f, !!data))) + + map(models, \(m) summary(eval(m))) } -curve2(sin(exp(4 * x)), n = 1000) -``` - -## Summary {-} - -* In this chapter we dove into non-standard evaluation with quasiquotation +vars <- data.frame(x = c("hp", "hp"), + y = c("mpg", "cyl")) +lm_df(vars, data = mtcars) +``` -* Quasiquotation is useful on its own but in the next chapter we will look at the `quosures` and `data masks` to unleash the full power of *tidy evaluation*!