bookclub-advr

DSLC Advanced R Book Club
git clone https://git.eamoncaddigan.net/bookclub-advr.git
Log | Files | Refs | README | LICENSE

20.Rmd (7627B)


      1 ---
      2 engine: knitr
      3 title: Evaluation
      4 ---
      5 
      6 ## Learning objectives:
      7 
      8 - Learn evaluation basics
      9 - Learn about **quosures** and **data mask**
     10 - Understand tidy evaluation
     11 
     12 ```{r message=FALSE,warning=FALSE}
     13 library(rlang)
     14 library(purrr)
     15 ```
     16 
     17 ## A bit of a recap
     18 
     19 - Metaprogramming: To separate our description of the action from the action itself - Separate the code from its evaluation.
     20 - Quasiquotation: combine code written by the *function's author* with code written by the *function's user*.
     21   - Unquotation: it gives the *user* the ability to evaluate parts of a quoted argument.
     22   - Evaluation: it gives the *developer* the ability to evluated quoted expression in custom environments.
     23 
     24 **Tidy evaluation**: quasiquotation, quosures and data masks
     25 
     26 ## Evaluation basics 
     27 
     28 We use `eval()` to evaluate, run, or execute expressions. It requires two arguments: 
     29 
     30 - `expr`: the object to evaluate, either an expression or a symbol.
     31 - `env`: the environment in which to evaluate the expression or where to look for the values. 
     32 Defaults to current env.
     33 
     34 ```{r}
     35 sumexpr <- expr(x + y)
     36 x <- 10
     37 y <- 40
     38 eval(sumexpr)
     39 ```
     40 
     41 ```{r}
     42 eval(sumexpr, envir = env(x = 1000, y = 10))
     43 ```
     44 
     45 
     46 ## Application: reimplementing `source()`
     47 
     48 What do we need?
     49 
     50 - Read the file being sourced. 
     51 - Parse its expressions (quote them?)
     52 - Evaluate each expression saving the results 
     53 - Return the results
     54 
     55 ```{r}
     56 source2 <- function(path, env = caller_env()) {
     57   file <- paste(readLines(path, warn = FALSE), collapse = "\n")
     58   exprs <- parse_exprs(file)
     59 
     60   res <- NULL
     61   for (i in seq_along(exprs)) {
     62     res <- eval(exprs[[i]], env)
     63   }
     64 
     65   invisible(res)
     66 }
     67 ```
     68 
     69 The real source is much more complex.
     70 
     71 ## Quosures
     72 
     73 **quosures** are a data structure from `rlang` containing both and expression and an environment
     74 
     75 *Quoting* + *closure* because it quotes the expression and encloses the environment.
     76 
     77 Three ways to create them:
     78 
     79 -  Used mostly for learning: `new_quosure()`, creates a quosure from its components.
     80 
     81 ```{r}
     82 q1 <- rlang::new_quosure(expr(x + y), 
     83                          env(x = 1, y = 10))
     84 ```
     85 
     86 With a quosure, we can use `eval_tidy()` directly. 
     87 
     88 ```{r}
     89 rlang::eval_tidy(q1)
     90 ```
     91 
     92 And get its components
     93 
     94 ```{r}
     95 rlang::get_expr(q1)
     96 rlang::get_env(q1)
     97 ```
     98 
     99 Or set them
    100 
    101 ```{r}
    102 q1 <- set_env(q1, env(x = 3, y = 4))
    103 eval_tidy(q1)
    104 ```
    105 
    106 
    107 - Used in the real world: `enquo()` o `enquos()`, to capture user supplied expressions. They take the environment from where they're created. 
    108 
    109 ```{r}
    110 foo <- function(x) enquo(x)
    111 quo_foo <- foo(a + b)
    112 ```
    113 
    114 ```{r}
    115 get_expr(quo_foo)
    116 get_env(quo_foo)
    117 ```
    118 
    119 - Almost never used: `quo()` and `quos()`,  to match to `expr()` and `exprs()`.
    120 
    121 ## Quosures and `...`
    122 
    123 Quosures are just a convenience, but they are essential when it comes to  working with `...`, because you can have each argument from `...` associated with a different environment. 
    124 
    125 ```{r}
    126 g <- function(...) {
    127   ## Creating our quosures from ...
    128   enquos(...)
    129 }
    130 
    131 createQuos <- function(...) {
    132   ## symbol from the function environment
    133   x <- 1
    134   g(..., f = x)
    135 }
    136 ```
    137 
    138 ```{r}
    139 ## symbol from the global environment
    140 x <- 0
    141 qs <- createQuos(global = x)
    142 qs
    143 ```
    144 
    145 ## Other facts about quosures
    146 
    147 Formulas were the inspiration for closures because they also capture an expression and an environment
    148 
    149 ```{r}
    150 f <- ~runif(3)
    151 str(f)
    152 ```
    153 
    154 There was an early version of tidy evaluation with formulas, but there's no easy way to implement quasiquotation with them. 
    155 
    156 They are actually call objects 
    157 
    158 ```{r}
    159 q4 <- new_quosure(expr(x + y + z))
    160 class(q4)
    161 is.call(q4)
    162 ```
    163 
    164 with an attribute to store the environment
    165 
    166 ```{r}
    167 attr(q4, ".Environment")
    168 ```
    169 
    170 
    171 **Nested quosures**
    172 
    173 With quosiquotation we can embed quosures in expressions. 
    174 
    175 ```{r}
    176 q2 <- new_quosure(expr(x), env(x = 1))
    177 q3 <- new_quosure(expr(x), env(x = 100))
    178 
    179 nq <- expr(!!q2 + !!q3)
    180 ```
    181 
    182 And evaluate them 
    183 
    184 ```{r}
    185 eval_tidy(nq)
    186 ```
    187 
    188 But for printing it's better to use `expr_print(x)` 
    189 
    190 ```{r}
    191 expr_print(nq)
    192 nq
    193 ```
    194 
    195 ## Data mask
    196 
    197 A data frame where the evaluated code will look first for its variable definitions. 
    198 
    199 Used in packages like dplyr and ggplot. 
    200 
    201 To use it we need to supply the data mask as a second argument to `eval_tidy()`
    202 
    203 ```{r}
    204 q1 <- new_quosure(expr(x * y), env(x = 100))
    205 df <- data.frame(y = 1:10)
    206 
    207 eval_tidy(q1, df)
    208 ```
    209 
    210 Everything together, in one function. 
    211 
    212 ```{r}
    213 with2 <- function(data, expr) {
    214   expr <- enquo(expr)
    215   eval_tidy(expr, data)
    216 }
    217 ```
    218 
    219 But we need to create the objects that are not part of our data mask
    220 ```{r}
    221 x <- 100
    222 with2(df, x * y)
    223 ```
    224 
    225 Also doable with `base::eval()` instead of `rlang::eval_tidy()` but we have to use `base::substitute()` instead of `enquo()` (like we did for `enexpr()`) and we need to specify the environment.
    226 
    227 ```{r}
    228 with3 <- function(data, expr) {
    229   expr <- substitute(expr)
    230   eval(expr, data, caller_env())
    231 }
    232 ```
    233 
    234 ```{r}
    235 with3(df, x*y)
    236 ```
    237 
    238 ## Pronouns: .data$ and .env$
    239 
    240 **Ambiguity!!**
    241 
    242 An object value can come from the env or from the data mask
    243 
    244 ```{r}
    245 q1 <- new_quosure(expr(x * y + x), env = env(x = 1))
    246 df <- data.frame(y = 1:5, 
    247                  x = 10)
    248 
    249 eval_tidy(q1, df)
    250 ```
    251 
    252 We use pronouns: 
    253 
    254 - `.data$x`: `x` from the data mask
    255 - `.env$x`: `x` from the environment
    256 
    257 
    258 ```{r}
    259 q1 <- new_quosure(expr(.data$x * y + .env$x), env = env(x = 1))
    260 eval_tidy(q1, df)
    261 ```
    262 
    263 ## Application: reimplementing `base::subset()`
    264 
    265 `base::subset()` works like `dplyr::filter()`: it selects rows of a data frame given an expression. 
    266 
    267 What do we need?
    268 
    269 - Quote the expression to filter
    270 - Figure out which rows in the data frame pass the filter
    271 - Subset the data frame
    272 
    273 ```{r}
    274 subset2 <- function(data, rows) {
    275   rows <- enquo(rows)
    276   rows_val <- eval_tidy(rows, data)
    277   stopifnot(is.logical(rows_val))
    278 
    279   data[rows_val, , drop = FALSE]
    280 }
    281 ```
    282 
    283 ```{r}
    284 sample_df <- data.frame(a = 1:5, b = 5:1, c = c(5, 3, 2, 4, 1))
    285 
    286 # Shorthand for sample_df[sample_df$b == sample_df$c, ]
    287 subset2(sample_df, b == c)
    288 ```
    289 
    290 ## Using tidy evaluation
    291 
    292 Most of the time we might not call it directly, but call a function that uses `eval_tidy()` (becoming developer AND user)
    293 
    294 **Use case**: resample and subset
    295 
    296 We have a function that resamples a dataset: 
    297 
    298 ```{r}
    299 resample <- function(df, n) {
    300   idx <- sample(nrow(df), n, replace = TRUE)
    301   df[idx, , drop = FALSE]
    302 }
    303 ```
    304 
    305 ```{r}
    306 resample(sample_df, 10)
    307 ```
    308 
    309 But we also want to use subset and we want to create a function that allow us to resample and subset (with `subset2()`) in a single step. 
    310 
    311 First attempt: 
    312 
    313 ```{r}
    314 subsample <- function(df, cond, n = nrow(df)) {
    315   df <- subset2(df, cond)
    316   resample(df, n)
    317 }
    318 ```
    319 
    320 ```{r error=TRUE}
    321 subsample(sample_df, b == c, 10)
    322 ```
    323 
    324 What happened? 
    325 
    326 `subsample()` doesn't quote any arguments and `cond` is evaluated normally
    327 
    328 So we have to quote `cond` and unquote it when we pass it to `subset2()`
    329 
    330 ```{r}
    331 subsample <- function(df, cond, n = nrow(df)) {
    332   cond <- enquo(cond)
    333 
    334   df <- subset2(df, !!cond)
    335   resample(df, n)
    336 }
    337 ```
    338 
    339 ```{r}
    340 subsample(sample_df, b == c, 10)
    341 ```
    342 
    343 **Be careful!**, potential ambiguity:
    344 
    345 ```{r}
    346 threshold_x <- function(df, val) {
    347   subset2(df, x >= val)
    348 }
    349 ```
    350 
    351 What would happen if `x` exists in the calling environment but doesn't exist in `df`? Or if `val` also exists in `df`?
    352 
    353 So, as developers of `threshold_x()` and users of `subset2()`, we have to add some pronouns:
    354 
    355 ```{r}
    356 threshold_x <- function(df, val) {
    357   subset2(df, .data$x >= .env$val)
    358 }
    359 ```
    360 
    361 
    362 Just remember:  
    363 
    364 > As a general rule of thumb, as a function author it’s your responsibility 
    365 > to avoid ambiguity with any expressions that you create; 
    366 > it’s the user’s responsibility to avoid ambiguity in expressions that they create.
    367 
    368 
    369 ## Base evaluation
    370 
    371 Check 20.6 in the book!