bookclub-advr

DSLC Advanced R Book Club
git clone https://git.eamoncaddigan.net/bookclub-advr.git
Log | Files | Refs | README | LICENSE

commit 8ba1eb56584403feeaf279de57d73e1279d3dc6a
parent b3d19774aaad49c2c92942d528bde2bf87843f7f
Author: Collin K. Berke, Ph.D <32435546+collinberke@users.noreply.github.com>
Date:   Mon, 24 Apr 2023 17:41:25 -0500

Update Chapter 18 notes for cohort 7 (#52)

* Edit notes

* Continue to edit notes

---------

Co-authored-by: Jon Harmon <jonthegeek@gmail.com>
Diffstat:
M18_Expressions.Rmd | 656+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--------------
1 file changed, 541 insertions(+), 115 deletions(-)

diff --git a/18_Expressions.Rmd b/18_Expressions.Rmd @@ -2,68 +2,81 @@ **Learning objectives:** -- We are going to understand the idea of the abstract syntax tree (AST) which is the tree like structure that underlie all R code. - -- We are going to understand the data structures that underlie the AST such as constants, symbols and calls which are collectively known as expression. - -- We are going to explore the idea behind parsing, the act of converting the linear sequence of character in code into the AST, and we will use that idea to explore some details of R’s grammar. - -- We are going to see how we are going to use recursive functions to compute on the language, i.e. writting functions that compute with expressions. - -- We are going to see how to work with three more specialized data structures: **pairlists**, **missing arguments**, and **expression vectors**. - -```{r,message=FALSE,warning=FALSE} +* Understand the idea of the abstract syntax tree (AST). +* Discuss the data structures that underlie the AST: + * Constants + * Symbols + * Calls +* Explore the idea behind parsing. +* Explore some details of R's grammar. +* Discuss the use or recursive functions to compute on the language. +* Work with three other more specialized data structures: + * Pairlists + * Missing arguments + * Expression vectors + +```{r, message = FALSE, warning = FALSE} library(rlang) library(lobstr) ``` ## Introduction -- To compute on the language, we first need to understand its structure. This requires some new vocabulary, some new tools, and some new ways of thinking about R code. The first of these is the distinction between an operation and its result. +> To compute on the language, we first need to understand its structure. +* This requires a few things: + * New vocabulary. + * New tools to inspect and modify expressions. + * Approach the use of the language with new ways of thinking. +* One of the first new ways of thinking is the distinction between an operation and its result. -```{r,error=TRUE} +```{r, error = TRUE} y <- x * 10 ``` -- We can capture the intent of the code without executing it using the rlang package. + +* We can capture the intent of the code without executing it using the rlang package. ```{r} z <- rlang::expr(y <- x * 10) z ``` -- We can the evaluate the expression using the **base::eval** function. -```{r} +* We can then evaluate the expression using the **base::eval** function. + +```{r} x <- 4 -base::eval(expr =(y <- x * 10)) +base::eval(expr(y <- x * 10)) + +y ``` -**Evaluating multiple expressions** +### Evaluating multiple expressions + +* We can write multiple expressions at once and it acts similar to `source()`. -- We can write multiple expressions at once & it acts similar to source() +* `expression()` returns a vector and can be passed to `eval()`. -- expression() returns a vector and can be passed to eval() ```{r} -eval(expression( -x <- 4, -x * 10 -)) +eval(expression(x <- 4, x * 10)) ``` -exprs() does not and has to be used in a loop +`exprs()` does not and has to be used in a loop + ```{r} -for (i in exprs(x <- 4,x * 10)) { +for (i in exprs(x <- 4, x * 10)) { print(i) print(eval(i)) } ``` -## Abstract Syntact Tree (AST) - -- Expression is an object that captures the structure of code without evaluating it. -- Expressions are also called abstract syntax trees (ASTs) because the structure of code is hierarchical and can be naturally represented as a tree. Understanding this tree structure is crucial for inspecting and modifying expressions. +## Abstract Syntact Tree (AST) +* Expressions are objects that capture the structure of code without evaluating it. +* Expressions are also called abstract syntax trees (ASTs) because the structure of code is hierarchical and can be naturally represented as a tree. +* Understanding this tree structure is crucial for inspecting and modifying expressions. + * Branches = Calls + * Leaves = Symbols and constants ```{r,eval=FALSE} f(x, "y", 1) @@ -71,14 +84,13 @@ f(x, "y", 1) ![](images/simple.png) - -**With lobstr::ast():** +### With `lobstr::ast():` ```{r} lobstr::ast(f(x, "y", 1)) ``` -- Some function might also contain more call like the example below: +* Some functions might also contain more calls like the example below: ```{r,eval=FALSE} f(g(1, 2), h(3, 4, i())): @@ -88,107 +100,137 @@ f(g(1, 2), h(3, 4, i())): ```{r} lobstr::ast(f(g(1, 2), h(3, 4, i()))) ``` -- We can read the **hand-drawn diagrams** from left-to-right (ignoring vertical position), and the **lobstr-drawn diagrams** from top-to-bottom (ignoring horizontal position). - -- The depth within the tree is determined by the nesting of function calls. This also determines evaluation order, **as evaluation generally proceeds from deepest-to-shallowest, but this is not guaranteed because of lazy evaluation**. +* Read the **hand-drawn diagrams** from left-to-right (ignoring vertical position) +* Read the **lobstr-drawn diagrams** from top-to-bottom (ignoring horizontal position). +* The depth within the tree is determined by the nesting of function calls. +* Depth also determines evaluation order, **as evaluation generally proceeds from deepest-to-shallowest, but this is not guaranteed because of lazy evaluation**. ## Infix calls -- Infix calls are function in which the function names comes in between the arguments while the prefix calls the function names comes before it arguments. - -- Every call in R can be written in tree form because any call can be written in prefix form. Take **y <- x * 10** again: what are the functions that are being called? It is not as easy to spot as f(x, 1) because this expression contains **two infix calls:** **<-** and *. That means that these two lines of code are equivalent: +> Every call in R can be written in tree form because any call can be written in prefix form. -```{r,eval=FALSE} +```{r eval=FALSE} y <- x * 10 - `<-`(y, `*`(x, 10)) ``` -![](images/prefix.png) +* Since this is a characteristic of the language, regardless if it's a function written in infix or prefix form, all function calls can be represented using an AST. +![](images/prefix.png) ```{r} lobstr::ast(y <- x * 10) ``` -- There is no difference between the ASTs, and if you generate an expression with prefix calls, R will still print it in infix form: +```{r} +lobstr::ast(`<-`(y, `*`(x, 10))) +``` + +* There is no difference between the ASTs, and if you generate an expression with prefix calls, R will still print it in infix form: ```{r} rlang::expr(`<-`(y, `*`(x, 10))) ``` -**18.2.4 Exercises** + +## Expression components + +* Collectively, the data structures present in the AST are called expressions. +* These include: + 1. Constants + 2. Symbols + 3. Calls + 4. Pairlists ## Constants -- Scalar constants are the simplest component of the AST. -- A constant is either **NULL** or a **length-1** atomic vector (or scalar) like TRUE, 1L, 2.5 or "x". -- We can test for a constant with **rlang::is_syntactic_literal()**. +* Scalar constants are the simplest component of the AST. +* A constant is either **NULL** or a **length-1** atomic vector (or scalar) + * e.g., `TRUE`, `1L`, `2.5` or `x`. +* We can test for a constant with **rlang::is_syntactic_literal()**. +* Constants are self-quoting in the sense that the expression used to represent a constant is the same constant: -- Constants are self-quoting in the sense that the expression used to represent a constant is the same constant: ```{r} identical(expr(TRUE), TRUE) -``` -```{r} +# [1] TRUE identical(expr(1), 1) -``` -```{r} +# [1] TRUE identical(expr(2L), 2L) -``` -```{r} +# [1] TRUE identical(expr("x"), "x") +# [1] TRUE ``` ## Symbols -- A symbol represents the name of an object like **x**, **mtcars**, or **mean**. In base R, the terms symbol and name are used interchangeably (i.e. **is.name()** is identical to **is.symbol())**, but this book used symbol consistently because **“name”** has many other meanings. -- You can create a symbol in two ways: by capturing code that references an object with **expr()**, or turning a string into a symbol with **rlang::sym():** +* A symbol represents the name of an object. + * `x` + * `mtcars` + * `mean`. +* In base R, the terms symbol and name are used interchangeably (i.e. `is.name()` is identical to `is.symbol()`), but this book used symbol consistently because **“name”** has many other meanings. +* You can create a symbol in two ways: + 1. by capturing code that references an object with `expr()`. + 2. turning a string into a symbol with `rlang::sym():`. + ```{r} expr(x) ``` + ```{r} sym("x") ``` -- Symbol can be turned back into a string with **as.character()** or **rlang::as_string()**. as_string() has the advantage of clearly signalling that you’ll get a character vector of length 1. + +* A symbol can be turned back into a string with `as.character()` or `rlang::as_string()`. +* `as_string()` has the advantage of clearly signalling that you’ll get a character vector of length 1. ```{r} as_string(expr(x)) ``` -- We can recognise a symbol because it’s printed without quotes, **str()** tells you that it’s a symbol, and **is.symbol()** is TRUE: + +* We can recognise a symbol because it’s printed without quotes +* `str()` tells you that it’s a symbol, and `is.symbol()` is TRUE: ```{r} str(expr(x)) ``` + ```{r} is.symbol(expr(x)) ``` -- The symbol type is not vectorised, i.e. a symbol is always length 1. If you want multiple symbols, you’ll need to put them in a list, using **rlang::syms()**. - -## Calls - -- A call object represents a captured function call. -- Call objects are a special type of list90 where the first component specifies the function to call (usually a symbol), and the remaining elements are the arguments for that call. +* The symbol type is not vectorised, i.e. a symbol is always length 1. +* If you want multiple symbols, you’ll need to put them in a list, using `rlang::syms()`. -- Call objects create branches in the AST, because calls can be nested inside other calls. +## Calls -- You can identify a call object when printed because it looks just like a function call. Confusingly **typeof()** and **str()** print **language 91** for call objects, but **is.call()** returns TRUE: +* A call object represents a captured function call. +* Call objects are a special type of list. + * The first component specifies the function to call (usually a symbol). + * The remaining elements are the arguments for that call. +* Call objects create branches in the AST, because calls can be nested inside other calls. +* You can identify a call object when printed because it looks just like a function call. +* Confusingly `typeof()` and `str()` print language for call objects, but `is.call()` returns TRUE: ```{r} lobstr::ast(read.table("important.csv", row.names = FALSE)) ``` + ```{r} x <- expr(read.table("important.csv", row.names = FALSE)) ``` + ```{r} typeof(x) ``` + ```{r} is.call(x) ``` ## Subsetting -- Calls generally behave like lists, i.e. you can use standard subsetting tools. The first element of the call object is the function to call, which is usually a symbol: +* Calls generally behave like lists. +* Since they are list like, you can use standard subsetting tools. +* The first element of the call object is the function to call, which is usually a symbol: ```{r} x[[1]] @@ -197,48 +239,61 @@ x[[1]] ```{r} is.symbol(x[[1]]) ``` -- The remainder of the elements are the arguments: +* The remainder of the elements are the arguments: ```{r} as.list(x[-1]) ``` -- We can extract individual arguments with [[ or, if named, $: +* We can extract individual arguments with [[ or, if named, $: + ```{r} x[[2]] ``` + ```{r} x$row.names ``` -- We can determine the number of arguments in a call object by subtracting 1 from its length: + +* We can determine the number of arguments in a call object by subtracting 1 from its length: + ```{r} length(x) - 1 ``` -- Extracting specific arguments from calls is challenging because of R’s flexible rules for argument matching: it could potentially be in any location, with the full name, with an abbreviated name, or with no name. -- To work around this problem, you can use **rlang::call_standardise()** which standardises all arguments to use the full name: +* Extracting specific arguments from calls is challenging because of R’s flexible rules for argument matching: + * It could potentially be in any location, with the full name, with an abbreviated name, or with no name. + +* To work around this problem, you can use **rlang::call_standardise()** which standardises all arguments to use the full name: + ```{r} rlang::call_standardise(x) ``` -- But If the function uses ... it’s not possible to standardise all arguments. -- Calls can be modified in the same way as lists: +* But If the function uses ... it’s not possible to standardise all arguments. +* Calls can be modified in the same way as lists: + ```{r} x$header <- TRUE x ``` + ## Function position -- The first element of the call object is the function position. This contains the function that will be called when the object is evaluated, and is usually a symbol. + +* The first element of the call object is the function position. This contains the function that will be called when the object is evaluated, and is usually a symbol. ```{r} lobstr::ast(foo()) ``` -- While R allows you to surround the name of the function with quotes, the parser converts it to a symbol: +* While R allows you to surround the name of the function with quotes, the parser converts it to a symbol: ```{r} lobstr::ast("foo"()) ``` -- However, sometimes the function doesn’t exist in the current environment and you need to do some computation to retrieve it: for example, if the function is in another package, is a method of an R6 object, or is created by a function factory. In this case, the function position will be occupied by another call: + +* However, sometimes the function doesn’t exist in the current environment and you need to do some computation to retrieve it: + * For example, if the function is in another package, is a method of an R6 object, or is created by a function factory. In this case, the function position will be occupied by another call: + ```{r} lobstr::ast(pkg::foo(1)) ``` @@ -246,107 +301,434 @@ lobstr::ast(pkg::foo(1)) ```{r} lobstr::ast(obj$foo(1)) ``` + ```{r} lobstr::ast(foo(1)(2)) ``` ![](images/call-call.png) - ## Constructing -- You can construct a call object from its components using rlang::call2(). The first argument is the name of the function to call (either as a string, a symbol, or another call). The remaining arguments will be passed along to the call: +* You can construct a call object from its components using rlang::call2(). +* The first argument is the name of the function to call (either as a string, a symbol, or another call). +* The remaining arguments will be passed along to the call: ```{r} call2("mean", x = expr(x), na.rm = TRUE) ``` + ```{r} call2(expr(base::mean), x = expr(x), na.rm = TRUE) ``` -- Infix calls created in this way still print as usual. + +* Infix calls created in this way still print as usual. ```{r} call2("<-", expr(x), 10) ``` -**18.3.5 Exercises** - ## Parsing and grammar -- The process by which a computer language takes a string and constructs an expression is called parsing, and is governed by a set of rules known as a grammar. - -- We are going to use **lobstr::ast()** to explore some of the details of R’s grammar, and then show how you can transform back and forth between expressions and strings. - - **Operator precedence** -- Infix functions introduce two sources of ambiguity. -- The first source of ambiguity arises from infix functions: what does 1 + 2 * 3 yield? Do you get 9 (i.e. (1 + 2) * 3), or 7 (i.e. 1 + (2 * 3))? In other words, which of the two possible parse trees below does R use? +* **Parsing** - The process by which a computer language takes a string and constructs an expression. Parsing is governed by a set of rules known as a grammar. +* We are going to use **lobstr::ast()** to explore some of the details of R’s grammar, and then show how you can transform back and forth between expressions and strings. +* **Operator precedence** - Conventions used by the programming language to resolve ambiguity. +* Infix functions introduce two sources of ambiguity. +* The first source of ambiguity arises from infix functions: what does 1 + 2 * 3 yield? Do you get 9 (i.e. (1 + 2) * 3), or 7 (i.e. 1 + (2 * 3))? In other words, which of the two possible parse trees below does R use? ![](images/ambig-order.png) - -- Programming languages use conventions called operator precedence to resolve this ambiguity. We can use **ast()** to see what R does: +* Programming languages use conventions called operator precedence to resolve this ambiguity. We can use **ast()** to see what R does: ```{r} lobstr::ast(1 + 2 * 3) ``` -- Predicting the precedence of other operators is harder. There’s one particularly surprising case in R: ! has a much lower precedence (i.e. it binds less tightly) than you might expect. This allows you to write useful operations like: + +* PEMDAS is pretty clear on what to do. Other operator precedence isn't as clear. +* There’s one particularly surprising case in R: + * ! has a much lower precedence (i.e. it binds less tightly) than you might expect. + * This allows you to write useful operations like: + ```{r} lobstr::ast(!x %in% y) ``` -- **R has over 30 infix operators divided into 18 precedence** groups. While the details are described in **?Syntax**, very few people have memorised the complete ordering. If there’s any confusion, use parentheses! +* **R has over 30 infix operators divided into 18 precedence** groups. +* While the details are described in **?Syntax**, very few people have memorised the complete ordering. +* If there’s any confusion, use parentheses! -**Associativity** -- The second source of ambiguity is introduced by repeated usage of the same infix function. For example, is 1 + 2 + 3 equivalent to **(1 + 2) + 3** or to **1 + (2 + 3)**? This normally doesn’t matter because **x + (y + z) == (x + y) + z**, i.e. addition is associative, but is needed because some S3 classes define + in a non-associative way. +```{r} +# override PEMDAS +lobstr::ast((1 + 2) * 3) +``` -- For example, ggplot2 overloads + to build up a complex plot from simple pieces; this is non-associative because earlier layers are drawn underneath later layers (i.e. **geom_point()** + **geom_smooth()** does not yield the same plot as **geom_smooth()** + **geom_point())**. +## Associativity -- In R, most operators are left-associative, i.e. the operations on the left are evaluated first: +* The second source of ambiguity is introduced by repeated usage of the same infix function. + +```{r} +1 + 2 + 3 + +# What does R do first? +(1 + 2) + 3 + +# or +1 + (2 + 3) +``` + +* In this case it doesn't matter. Other places it might, like in `ggplot2`. + +* In R, most operators are left-associative, i.e. the operations on the left are evaluated first: ```{r} lobstr::ast(1 + 2 + 3) ``` - **Parsing and deparsing** -- Most of the time you type code into the console, and R takes care of turning the characters you’ve typed into an AST. But occasionally you have code stored in a string, and you want to parse it yourself. You can do so using **rlang::parse_expr()**: + +* There's two exceptions to this rule: + 1. exponentiation + 2. assignment + +```{r eval=FALSE} +lobstr::ast(2 ^ 2 ^ 3) +# █─`^` +# ├─2 +# └─█─`^` +# ├─2 +# └─3 +``` ```{r} +lobstr::ast(x <- y <- z) +# █─`<-` +# ├─x +# └─█─`<-` +# ├─y +# └─z +``` + +## Parsing and deparsing + +* Parsing - turning characters you've typed into an AST. +* R usually takes care of parsing code for us. +* But occasionally you have code stored as a string, and you want to parse it yourself. +* You can do so using **rlang::parse_expr()**: + +```{r eval=FALSE} x1 <- "y <- x + 10" x1 +# [1] "y <- x + 10" +is.call(x1) +# [1] FALSE ``` + +```{r eval=FALSE} +x2 <- rlang::parse_expr(x1) +x2 +# y <- x + 10 +is.call(x2) +# [1] TRUE +``` + +* **parse_expr()** always returns a single expression. +* If you have multiple expression separated by ; or \n, you’ll need to use **rlang::parse_exprs()**. It returns a list of expressions: + ```{r} -is.call(x1) +x3 <- "a <- 1; a + 1" +``` + +```{r eval=FALSE} +rlang::parse_exprs(x3) +# [[1]] +# a <- 1 +# +# [[2]] +# a + 1 +# ``` +* If you do this a lot, **quasiquotation** may be a safer approach. + * More about this in Chapter 19. +* The inverse of parsing is deparsing. +* **Deparsing** - given an expression, you want the string that would generate it. +* This happens automatically when you print an expression. +* You can get the string with **rlang::expr_text()**: +* Parsing and deparsing are not symmetric. + * Parsing creates the AST. + ```{r} -x2 <- rlang::parse_expr(x1) +z <- expr(y <- x + 10) +``` + +```{r} +expr_text(z) +``` -x2 +## Using the AST to solve more complicated problems + +* Here we focus on what we learned to perform recursion on the AST. +* Two parts of a recursive function: + * Recursive case + * Base case +* These two parts correspond well to tree like data structures. + +## Two helper functions + +* First, we need an `epxr_type()` function to return the type of expression element as a string. + +```{r} +expr_type <- function(x) { + if (rlang::is_syntactic_literal(x)) { + "constant" + } else if (is.symbol(x)) { + "symbol" + } else if (is.call(x)) { + "call" + } else if (is.pairlist(x)) { + "pairlist" + } else { + typeof(x) + } +} +``` +```{r eval=FALSE} +expr_type(expr("a")) +# [1] "constant" +expr_type(expr(x)) +# [1] "symbol" +expr_type(expr(f(1, 2))) +# [1] "call" ``` +* Second, we need a wrapper function to handle exceptions. + ```{r} -is.call(x2) +switch_expr <- function(x, ...) { + switch(expr_type(x), + ..., + stop("Don't know how to handle type ", typeof(x), call. = FALSE) + ) +} +``` + +* Lastly, we can write a basic template that walks the AST using the `switch()` statement. + +```{r, eval=FALSE} +recurse_call <- function(x) { + switch_expr(x, + # Base cases + symbol = , + constant = , + + # Recursive cases + call = , + pairlist = + ) +} +``` + +## Specific use cases for `recurse_call()` + +### Example 1: Finding F and T + +* Using `F` and `T` in our code rather than `FALSE` and `TRUE` is bad practice. +* Say we want to walk the AST to find times when we use `F` and `T`. +* Start off by finding the type of `T` vs `TRUE`. + +```{r eval=FALSE} +expr_type(expr(TRUE)) +# [1] "constant" + +expr_type(expr(T)) +# [1] "symbol" ``` -- **parse_expr()** always returns a single expression. If you have multiple expression separated by ; or \n, you’ll need to use **rlang::parse_exprs()**. It returns a list of expressions: + +* With this knowledge, we can now write the base cases of our recursive function. +* The logic is as follows: + * A constant is never a logical abberviation and a symbol is an abbreviation if it's "F" or "T": ```{r} -x3 <- "a <- 1; a + 1" +logical_abbr_rec <- function(x) { + switch_expr(x, + constant = FALSE, + symbol = as_string(x) %in% c("F", "T") + ) +} +``` + +```{r eval=FALSE} +logical_abbr_rec(expr(TRUE)) +# [1] FALSE +logical_abbr_rec(expr(T)) +# [1] TRUE ``` + +* It's best practice to write another wrapper, assuming every input you receive will be an expression. + ```{r} -rlang::parse_exprs(x3) +logical_abbr <- function(x) { + logical_abbr_rec(enexpr(x)) +} + +logical_abbr(T) +# [1] TRUE +logical_abbr(FALSE) +# [1] FALSE ``` -- If we find ourselve working with strings containing code very frequently, we should reconsider our process. We are going to address this in Chapter 19 where we are going to generate expressions using **quasiquotation** more safely. -- The inverse of parsing is deparsing: given an expression, you want the string that would generate it. This happens automatically when you print an expression, and you can get the string with **rlang::expr_text()**: +### Next step: code for the recursive cases + +* Here we want to do the same thing for calls and for pairlists. +* Here's the logic: recursively apply the function to each subcomponent, and return `TRUE` if any subcomponent contains a logical abbreviation. +* This is simplified by using the `purrr::some()` function, which iterates over a list and returns `TRUE` if the predicate function is true for any element. + ```{r} -z <- expr(y <- x + 10) +logical_abbr_rec <- function(x) { + switch_expr(x, + # Base cases + constant = FALSE, + symbol = as_string(x) %in% c("F", "T"), + # Recursive cases + call = , + # Are we sure this is the correct function to use? + # Why not logical_abbr_rec? + pairlist = purrr::some(x, logical_abbr_rec) + ) +} + +logical_abbr(mean(x, na.rm = T)) +# [1] TRUE + +logical_abbr(function(x, na.rm = T) FALSE) +# [1] TRUE ``` + +## Example 2: Finding all variables created by assignment + +* Listing all the variables is a little more complicated. +* Figure out what assignment looks like based on the AST. + +```{r eval=FALSE} +ast(x <- 10) +# █─`<-` +# ├─x +# └─10 +``` + +* Now we need to decide what data structure we're going to use for the results. + * Easiest thing will be to return a character vector. + * We would need to use a list if we wanted to return symbols. + +### Dealing with the base cases + +```{r eval=FALSE} +find_assign_rec <- function(x) { + switch_expr(x, + constant = , + symbol = character() + ) +} +find_assign <- function(x) find_assign_rec(enexpr(x)) + +find_assign("x") +#> character(0) +find_assign(x) +#> character(0) +``` + +### Dealing with the recursive cases + +* Here is the function to flatten pairlists. + ```{r} -expr_text(z) +flat_map_chr <- function(.x, .f, ...) { + purrr::flatten_chr(purrr::map(.x, .f, ...)) +} + +flat_map_chr(letters[1:3], ~ rep(., sample(3, 1))) +#> [1] "a" "b" "b" "b" "c" "c" "c" +``` + +* Here is the code needed to identify calls. + +```{r eval=FALSE} +find_assign_rec <- function(x) { + switch_expr(x, + # Base cases + constant = , + symbol = character(), + + # Recursive cases + pairlist = flat_map_chr(as.list(x), find_assign_rec), + call = { + if (is_call(x, "<-")) { + as_string(x[[2]]) + } else { + flat_map_chr(as.list(x), find_assign_rec) + } + } + ) +} + +find_assign(a <- 1) +#> [1] "a" +find_assign({ + a <- 1 + { + b <- 2 + } +}) +#> [1] "a" "b" +``` + +### Make the function more robust + +* Throw cases at it that we think might break the function. +* Write a function to handle these cases. + +```{r eval=FALSE} +find_assign_call <- function(x) { + if (is_call(x, "<-") && is_symbol(x[[2]])) { + lhs <- as_string(x[[2]]) + children <- as.list(x)[-1] + } else { + lhs <- character() + children <- as.list(x) + } + + c(lhs, flat_map_chr(children, find_assign_rec)) +} + +find_assign_rec <- function(x) { + switch_expr(x, + # Base cases + constant = , + symbol = character(), + + # Recursive cases + pairlist = flat_map_chr(x, find_assign_rec), + call = find_assign_call(x) + ) +} + +find_assign(a <- b <- c <- 1) +#> [1] "a" "b" "c" +find_assign(system.time(x <- print(y <- 5))) +#> [1] "x" "y" ``` - **18.4.4 Exercises** + +* This approach certainly is more complicated, but it's important to start simple and move up. + +## Specialised data structures + +* Pairlists +* Missing arguments +* Expression vectors ## Pairlists -- Pairlists are a remnant of R’s past and have been replaced by lists almost everywhere. The only place you are likely to see pairlists in R is when working with calls to the function, as the formal arguments to a function are stored in a pairlist: + +* Pairlists are a remnant of R’s past and have been replaced by lists almost everywhere. +* The only place you are likely to see pairlists in R is when working with calls to the function, as the formal arguments to a function are stored in a pairlist: + ```{r} f <- expr(function(x, y = 10) x + y) ``` @@ -357,21 +739,62 @@ args ```{r} typeof(args) ``` -- Fortunately, whenever you encounter a pairlist, you can treat it just like a regular list: +* Fortunately, whenever you encounter a pairlist, you can treat it just like a regular list: ```{r} pl <- pairlist(x = 1, y = 2) ``` + ```{r} length(pl) ``` + ```{r} pl$x ``` + +## Missing arguments + +* Empty symbols +* To create an empty symbol, you need to use `missing_arg()` or `expr()`. + +```{r} +missing_arg() +typeof(missing_arg()) +``` + +* Empty symbols don't print anything. + * To check, we need to use `rlang::is_missing()` + +```{r} +is_missing(missing_arg()) +# [1] TRUE +``` + +* These are usually present in function formals: + +```{r} +f <- expr(function(x, y = 10) x + y) +# function(x, y = 10) x + y +args <- f[[2]] +# $x +# +# +# $y +# [1] 10 +# +is_missing(args[[1]]) +#> [1] TRUE +``` + ## Expression vectors -- an expression vector is just a list of expressions. The only difference is that calling eval() on an expression evaluates each individual expression. -- Expression vectors are only produced by two base functions: expression() and parse(): +* An expression vector is just a list of expressions. + * The only difference is that calling eval() on an expression evaluates each individual expression. + * Instead, it might be more advantageous to use a list of expressions. + +* Expression vectors are only produced by two base functions: + `expression()` and `parse()`: ```{r} exp1 <- parse(text = c(" @@ -379,6 +802,7 @@ x <- 4 x ")) ``` + ```{r} exp2 <- expression(x <- 4, x) ``` @@ -392,7 +816,9 @@ typeof(exp2) exp1 exp2 ``` + - Like calls and pairlists, expression vectors behave like lists: + ```{r} length(exp1) exp1[[1]]