html.json - bookclub-advr - DSLC Advanced R Book Club

html.json (12316B)
      1 {
      2   "hash": "4b17f38460385558b331471436bda17a",
      3   "result": {
      4     "engine": "knitr",
      5     "markdown": "---\nengine: knitr\ntitle: Evaluation\n---\n\n## Learning objectives:\n\n- Learn evaluation basics\n- Learn about **quosures** and **data mask**\n- Understand tidy evaluation\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(rlang)\nlibrary(purrr)\n```\n:::\n\n\n## A bit of a recap\n\n- Metaprogramming: To separate our description of the action from the action itself - Separate the code from its evaluation.\n- Quasiquotation: combine code written by the *function's author* with code written by the *function's user*.\n  - Unquotation: it gives the *user* the ability to evaluate parts of a quoted argument.\n  - Evaluation: it gives the *developer* the ability to evluated quoted expression in custom environments.\n\n**Tidy evaluation**: quasiquotation, quosures and data masks\n\n## Evaluation basics \n\nWe use `eval()` to evaluate, run, or execute expressions. It requires two arguments: \n\n- `expr`: the object to evaluate, either an expression or a symbol.\n- `env`: the environment in which to evaluate the expression or where to look for the values. \nDefaults to current env.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsumexpr <- expr(x + y)\nx <- 10\ny <- 40\neval(sumexpr)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 50\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\neval(sumexpr, envir = env(x = 1000, y = 10))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1010\n```\n\n\n:::\n:::\n\n\n\n## Application: reimplementing `source()`\n\nWhat do we need?\n\n- Read the file being sourced. \n- Parse its expressions (quote them?)\n- Evaluate each expression saving the results \n- Return the results\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsource2 <- function(path, env = caller_env()) {\n  file <- paste(readLines(path, warn = FALSE), collapse = \"\\n\")\n  exprs <- parse_exprs(file)\n\n  res <- NULL\n  for (i in seq_along(exprs)) {\n    res <- eval(exprs[[i]], env)\n  }\n\n  invisible(res)\n}\n```\n:::\n\n\nThe real source is much more complex.\n\n## Quosures\n\n**quosures** are a data structure from `rlang` containing both and expression and an environment\n\n*Quoting* + *closure* because it quotes the expression and encloses the environment.\n\nThree ways to create them:\n\n-  Used mostly for learning: `new_quosure()`, creates a quosure from its components.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nq1 <- rlang::new_quosure(expr(x + y), \n                         env(x = 1, y = 10))\n```\n:::\n\n\nWith a quosure, we can use `eval_tidy()` directly. \n\n\n::: {.cell}\n\n```{.r .cell-code}\nrlang::eval_tidy(q1)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 11\n```\n\n\n:::\n:::\n\n\nAnd get its components\n\n\n::: {.cell}\n\n```{.r .cell-code}\nrlang::get_expr(q1)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> x + y\n```\n\n\n:::\n\n```{.r .cell-code}\nrlang::get_env(q1)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> <environment: 0x000001f161f6d698>\n```\n\n\n:::\n:::\n\n\nOr set them\n\n\n::: {.cell}\n\n```{.r .cell-code}\nq1 <- set_env(q1, env(x = 3, y = 4))\neval_tidy(q1)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 7\n```\n\n\n:::\n:::\n\n\n\n- Used in the real world: `enquo()` o `enquos()`, to capture user supplied expressions. They take the environment from where they're created. \n\n\n::: {.cell}\n\n```{.r .cell-code}\nfoo <- function(x) enquo(x)\nquo_foo <- foo(a + b)\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nget_expr(quo_foo)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> a + b\n```\n\n\n:::\n\n```{.r .cell-code}\nget_env(quo_foo)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> <environment: R_GlobalEnv>\n```\n\n\n:::\n:::\n\n\n- Almost never used: `quo()` and `quos()`,  to match to `expr()` and `exprs()`.\n\n## Quosures and `...`\n\nQuosures are just a convenience, but they are essential when it comes to  working with `...`, because you can have each argument from `...` associated with a different environment. \n\n\n::: {.cell}\n\n```{.r .cell-code}\ng <- function(...) {\n  ## Creating our quosures from ...\n  enquos(...)\n}\n\ncreateQuos <- function(...) {\n  ## symbol from the function environment\n  x <- 1\n  g(..., f = x)\n}\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\n## symbol from the global environment\nx <- 0\nqs <- createQuos(global = x)\nqs\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> <list_of<quosure>>\n#> \n#> $global\n#> <quosure>\n#> expr: ^x\n#> env:  global\n#> \n#> $f\n#> <quosure>\n#> expr: ^x\n#> env:  0x000001f15fc6f2e0\n```\n\n\n:::\n:::\n\n\n## Other facts about quosures\n\nFormulas were the inspiration for closures because they also capture an expression and an environment\n\n\n::: {.cell}\n\n```{.r .cell-code}\nf <- ~runif(3)\nstr(f)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> Class 'formula'  language ~runif(3)\n#>   ..- attr(*, \".Environment\")=<environment: R_GlobalEnv>\n```\n\n\n:::\n:::\n\n\nThere was an early version of tidy evaluation with formulas, but there's no easy way to implement quasiquotation with them. \n\nThey are actually call objects \n\n\n::: {.cell}\n\n```{.r .cell-code}\nq4 <- new_quosure(expr(x + y + z))\nclass(q4)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"quosure\" \"formula\"\n```\n\n\n:::\n\n```{.r .cell-code}\nis.call(q4)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] TRUE\n```\n\n\n:::\n:::\n\n\nwith an attribute to store the environment\n\n\n::: {.cell}\n\n```{.r .cell-code}\nattr(q4, \".Environment\")\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> <environment: R_GlobalEnv>\n```\n\n\n:::\n:::\n\n\n\n**Nested quosures**\n\nWith quosiquotation we can embed quosures in expressions. \n\n\n::: {.cell}\n\n```{.r .cell-code}\nq2 <- new_quosure(expr(x), env(x = 1))\nq3 <- new_quosure(expr(x), env(x = 100))\n\nnq <- expr(!!q2 + !!q3)\n```\n:::\n\n\nAnd evaluate them \n\n\n::: {.cell}\n\n```{.r .cell-code}\neval_tidy(nq)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 101\n```\n\n\n:::\n:::\n\n\nBut for printing it's better to use `expr_print(x)` \n\n\n::: {.cell}\n\n```{.r .cell-code}\nexpr_print(nq)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> (^x) + (^x)\n```\n\n\n:::\n\n```{.r .cell-code}\nnq\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> (~x) + ~x\n```\n\n\n:::\n:::\n\n\n## Data mask\n\nA data frame where the evaluated code will look first for its variable definitions. \n\nUsed in packages like dplyr and ggplot. \n\nTo use it we need to supply the data mask as a second argument to `eval_tidy()`\n\n\n::: {.cell}\n\n```{.r .cell-code}\nq1 <- new_quosure(expr(x * y), env(x = 100))\ndf <- data.frame(y = 1:10)\n\neval_tidy(q1, df)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#>  [1]  100  200  300  400  500  600  700  800  900 1000\n```\n\n\n:::\n:::\n\n\nEverything together, in one function. \n\n\n::: {.cell}\n\n```{.r .cell-code}\nwith2 <- function(data, expr) {\n  expr <- enquo(expr)\n  eval_tidy(expr, data)\n}\n```\n:::\n\n\nBut we need to create the objects that are not part of our data mask\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- 100\nwith2(df, x * y)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#>  [1]  100  200  300  400  500  600  700  800  900 1000\n```\n\n\n:::\n:::\n\n\nAlso doable with `base::eval()` instead of `rlang::eval_tidy()` but we have to use `base::substitute()` instead of `enquo()` (like we did for `enexpr()`) and we need to specify the environment.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nwith3 <- function(data, expr) {\n  expr <- substitute(expr)\n  eval(expr, data, caller_env())\n}\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nwith3(df, x*y)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#>  [1]  100  200  300  400  500  600  700  800  900 1000\n```\n\n\n:::\n:::\n\n\n## Pronouns: .data$ and .env$\n\n**Ambiguity!!**\n\nAn object value can come from the env or from the data mask\n\n\n::: {.cell}\n\n```{.r .cell-code}\nq1 <- new_quosure(expr(x * y + x), env = env(x = 1))\ndf <- data.frame(y = 1:5, \n                 x = 10)\n\neval_tidy(q1, df)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 20 30 40 50 60\n```\n\n\n:::\n:::\n\n\nWe use pronouns: \n\n- `.data$x`: `x` from the data mask\n- `.env$x`: `x` from the environment\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nq1 <- new_quosure(expr(.data$x * y + .env$x), env = env(x = 1))\neval_tidy(q1, df)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 11 21 31 41 51\n```\n\n\n:::\n:::\n\n\n## Application: reimplementing `base::subset()`\n\n`base::subset()` works like `dplyr::filter()`: it selects rows of a data frame given an expression. \n\nWhat do we need?\n\n- Quote the expression to filter\n- Figure out which rows in the data frame pass the filter\n- Subset the data frame\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsubset2 <- function(data, rows) {\n  rows <- enquo(rows)\n  rows_val <- eval_tidy(rows, data)\n  stopifnot(is.logical(rows_val))\n\n  data[rows_val, , drop = FALSE]\n}\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsample_df <- data.frame(a = 1:5, b = 5:1, c = c(5, 3, 2, 4, 1))\n\n# Shorthand for sample_df[sample_df$b == sample_df$c, ]\nsubset2(sample_df, b == c)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#>   a b c\n#> 1 1 5 5\n#> 5 5 1 1\n```\n\n\n:::\n:::\n\n\n## Using tidy evaluation\n\nMost of the time we might not call it directly, but call a function that uses `eval_tidy()` (becoming developer AND user)\n\n**Use case**: resample and subset\n\nWe have a function that resamples a dataset: \n\n\n::: {.cell}\n\n```{.r .cell-code}\nresample <- function(df, n) {\n  idx <- sample(nrow(df), n, replace = TRUE)\n  df[idx, , drop = FALSE]\n}\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nresample(sample_df, 10)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#>     a b c\n#> 4   4 2 4\n#> 3   3 3 2\n#> 1   1 5 5\n#> 1.1 1 5 5\n#> 3.1 3 3 2\n#> 5   5 1 1\n#> 5.1 5 1 1\n#> 3.2 3 3 2\n#> 5.2 5 1 1\n#> 4.1 4 2 4\n```\n\n\n:::\n:::\n\n\nBut we also want to use subset and we want to create a function that allow us to resample and subset (with `subset2()`) in a single step. \n\nFirst attempt: \n\n\n::: {.cell}\n\n```{.r .cell-code}\nsubsample <- function(df, cond, n = nrow(df)) {\n  df <- subset2(df, cond)\n  resample(df, n)\n}\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsubsample(sample_df, b == c, 10)\n```\n\n::: {.cell-output .cell-output-error}\n\n```\n#> Error: object 'b' not found\n```\n\n\n:::\n:::\n\n\nWhat happened? \n\n`subsample()` doesn't quote any arguments and `cond` is evaluated normally\n\nSo we have to quote `cond` and unquote it when we pass it to `subset2()`\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsubsample <- function(df, cond, n = nrow(df)) {\n  cond <- enquo(cond)\n\n  df <- subset2(df, !!cond)\n  resample(df, n)\n}\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsubsample(sample_df, b == c, 10)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#>     a b c\n#> 5   5 1 1\n#> 5.1 5 1 1\n#> 5.2 5 1 1\n#> 1   1 5 5\n#> 5.3 5 1 1\n#> 1.1 1 5 5\n#> 5.4 5 1 1\n#> 5.5 5 1 1\n#> 1.2 1 5 5\n#> 1.3 1 5 5\n```\n\n\n:::\n:::\n\n\n**Be careful!**, potential ambiguity:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nthreshold_x <- function(df, val) {\n  subset2(df, x >= val)\n}\n```\n:::\n\n\nWhat would happen if `x` exists in the calling environment but doesn't exist in `df`? Or if `val` also exists in `df`?\n\nSo, as developers of `threshold_x()` and users of `subset2()`, we have to add some pronouns:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nthreshold_x <- function(df, val) {\n  subset2(df, .data$x >= .env$val)\n}\n```\n:::\n\n\n\nJust remember:  \n\n> As a general rule of thumb, as a function author it’s your responsibility \n> to avoid ambiguity with any expressions that you create; \n> it’s the user’s responsibility to avoid ambiguity in expressions that they create.\n\n\n## Base evaluation\n\nCheck 20.6 in the book!\n",
      6     "supporting": [
      7       "20_files"
      8     ],
      9     "filters": [
     10       "rmarkdown/pagebreak.lua"
     11     ],
     12     "includes": {},
     13     "engineDependencies": {},
     14     "preserve": {},
     15     "postProcess": true
     16   }
     17 }
	bookclub-advr DSLC Advanced R Book Club
	git clone https://git.eamoncaddigan.net/bookclub-advr.git
	Log \| Files \| Refs \| README \| LICENSE