bookclub-advr

DSLC Advanced R Book Club
git clone https://git.eamoncaddigan.net/bookclub-advr.git
Log | Files | Refs | README | LICENSE

07.qmd (11138B)


      1 ---
      2 engine: knitr
      3 title: Environments
      4 ---
      5 
      6 ## Learning objectives:
      7 
      8 - Create, modify, and inspect environments 
      9 
     10 - Recognize special environments 
     11 
     12 - Understand how environments power lexical scoping and namespaces
     13 
     14 # 7.2 Environment Basics
     15 
     16 ## Environments are similar to lists
     17 
     18 Generally, an environment is similar to a named list, with four important exceptions:
     19 
     20 - Every name must be unique.
     21 
     22 - The names in an environment are not ordered.
     23 
     24 - An environment has a parent.
     25 
     26 - Environments are not copied when modified.
     27 
     28 ::: {.notes}
     29 
     30 - Lists can have duplicate names, e.g. x <- base::list(a = 1, a = 1)
     31 
     32 - Lists have an inherent order, e.g. x[[1]] returns the first element of the list above
     33 
     34 - Environments copy by reference, not by replacement, e.g.: 
     35 
     36   Modifying a list produces a different memory address
     37   base::identical(lobstr::obj_addr(x), lobstr::obj_addr({x[[1]] <- 2; x}))
     38 
     39   y <- rlang::env()
     40   y$'a' <- 1
     41 
     42   base::identical(
     43     lobstr::obj_addr(y),
     44     lobstr::obj_addr({y[['a']] <- 2; y})
     45   )
     46 
     47 :::
     48 
     49 ## Create a new environment with `{rlang}`
     50 
     51 :::: {.columns}
     52 
     53 ::: {.column}
     54 
     55 ```{r}
     56 e1 <- rlang::env(
     57   rlang::global_env(),
     58   a = FALSE,
     59   b = "a",
     60   c = 2.3,
     61   d = 1:3,
     62 )
     63 ```
     64 
     65 :::
     66 
     67 ::: {.column}
     68 
     69 ```{r}
     70 e2 <- rlang::new_environment(
     71   data = list(
     72     a = FALSE,
     73     b = "a",
     74     c = 2.3,
     75     d = 1:3
     76   ),
     77   parent = rlang::global_env()
     78 )
     79 ```
     80 
     81 :::
     82 
     83 ::::
     84 
     85 ::: {.notes}
     86 
     87 rlang::env() creates a child of the current environment by default and takes a variable number of named objects to populate it.
     88 
     89 rlang::new_environment() creates a child of the empty environment by default and takes a named list of objects to populate it.
     90 
     91 :::
     92 
     93 
     94 ## An environment associates, or **binds** a set of names to a set of values
     95 
     96 :::: {.columns}
     97 
     98 ::: {.column}
     99 
    100 - A bag of names with no implied order
    101 
    102 - Bindings live within the environment
    103 
    104 ![](images/07-bindings.png)
    105 
    106 :::
    107 
    108 ::: {.column}
    109 
    110 - Environments have reference semantics and thus can contain themselves
    111 
    112 ```{r}
    113 #| eval: false
    114 e1$d <- e1
    115 ```
    116 
    117 ![](images/07_loop.png)
    118 
    119 :::
    120 
    121 ::::
    122 
    123 ::: {.notes}
    124 
    125 no implied order (unlike a list, so no index subsetting)
    126 
    127 the grey box represents an environment 
    128 
    129 the blue dot represents the parent environment 
    130 
    131 letters represent variable names bound within the environment
    132 
    133 --- 
    134 
    135 reference semantics store a reference to the object's memory address, not the actual value (as is done in value semantics)
    136 
    137 :::
    138  
    139 ## Inspect environments with `{rlang}`
    140 
    141 ```{r}
    142 rlang::env_print(e1)
    143 ```
    144 
    145 ```{r}
    146 rlang::env_names(e1)
    147 ```
    148 
    149 ```{r}
    150 rlang::env_has(e1, "a")
    151 ```
    152 
    153 ```{r}
    154 rlang::env_get(e1, "a")
    155 ```
    156 
    157 ```{r}
    158 rlang::env_parent(e1)
    159 ```
    160 
    161 ::: {.notes}
    162 
    163 base::print() displays the memory address and is not as helpful as rlang::env_print()
    164 
    165 :::
    166 
    167 ## By default, the current environment is your global environment
    168 
    169 - The current environment is where code is currently executing
    170 
    171 - The global environment *is* your current environment when working interactively
    172 
    173 ```{r}
    174 rlang::current_env()
    175 
    176 rlang::global_env()
    177 
    178 base::identical(
    179   rlang::current_env(),
    180   rlang::global_env()
    181 )
    182 ```
    183 
    184 ::: {.notes}
    185 
    186 If you open a new R session, you are in the global environment by default (unless otherwise modified by say the .rprofile file)
    187 
    188 The current environment isn't *always* your global environment. Your current environment changes as you move into and out of functions, for example.
    189 
    190 :::
    191 
    192 ## Every environment has a parent environment 
    193 
    194 - Allows for lexical scoping
    195 
    196 ```{r}
    197 e2a <- rlang::env(d = 4, e = 5)
    198 
    199 e2b <- rlang::env(e2a, a = 1, b = 2, c = 3)
    200 
    201 rlang::env_parent(e2b)
    202 
    203 rlang::env_parents(e2b)
    204 ```
    205 
    206 ![](images/07_parents.png)
    207 
    208 ::: {.notes}
    209 
    210 Lexical scoping means if a name is not found in an environment, then R will look in its parent (and so on)
    211 
    212 Lexical scoping is in contrast to dynamic scoping, where the variable is retrieved as it is defined at run time 
    213 
    214 :::
    215 
    216 ## Only the **empty** environment does not have a parent
    217 
    218 ```{r}
    219 e2c <- rlang::env(rlang::empty_env(), d = 4, e = 5)
    220 
    221 e2d <- rlang::env(e2c, a = 1, b = 2, c = 3)
    222 ```
    223 
    224 ![](images/07_parents-empty.png){width=50% height=50%}
    225 
    226 ::: {.notes}
    227 
    228 THe lack of a parent is shown by the hollow blue dot
    229 
    230 :::
    231 
    232 ## All environments eventually terminate with the empty environment
    233 
    234 ```{r}
    235 rlang::env_parents(e2b, last = rlang::empty_env())
    236 ```
    237 
    238 ::: {.notes}
    239 
    240 The empty enviornment typically isn't shown but can be displayed by setting the `last` parameter of `rlang::env_parents()`
    241 
    242 :::
    243 
    244 ## Be wary of using `<<-`
    245 
    246 - Regular assignment (`<-`) always creates a variable in the current environment
    247 
    248 - Super assignment (`<<-`) does a few things:
    249 
    250   1. modifies the variable if it exists in a parent environment
    251 
    252   2. creates the variable in the global environment if it does not exist 
    253 
    254 ::: {.notes}
    255 
    256 `<<-` searches through environments via the search path, using the first found instance of the variable 
    257 
    258 `<<--` does not search through package environments as they are above the global environment on the search path
    259 
    260 e1 <- rlang::env()
    261 e2 <- rlang::env(e1)
    262 
    263 rlang::env_poke(e1, "a", 1)
    264 
    265 withr::with_environment(
    266   e2,
    267   a <<- 2
    268 )
    269 :::
    270 
    271 
    272 ## Retrieve environment variables with `$`, `[[`, or `{rlang}` functions
    273 
    274 ```{r}
    275 e3 <- rlang::env(x = 1, y = 2)
    276 
    277 e3$x
    278 ```
    279 
    280 ```{r}
    281 e3[["x"]]
    282 ```
    283 
    284 ```{r}
    285 rlang::env_get(e3, "x")
    286 ```
    287 
    288 ```{r}
    289 #| error: true
    290 e3[[1]]
    291 ```
    292 
    293 ```{r}
    294 #| error: true
    295 e3["x"]
    296 ```
    297 
    298 ## Add bindings to an environment with ``$`, `[[`, or `{rlang}` functions`
    299 
    300 ```{r}
    301 e3$z <- 3
    302 
    303 e3[["z"]] <- 3
    304 
    305 rlang::env_poke(e3, "z", 3)
    306 
    307 rlang::env_bind(e3, z = 3, b = 20)
    308 
    309 rlang::env_unbind(e3, "z")
    310 ```
    311 
    312 ::: {.notes}
    313 
    314 rlang::env_has() is used to check if a variable exists within the environment
    315 
    316 rlang::env_unbind() is used to unbind a variable from an environment
    317 
    318 :::
    319 
    320 ## Special cases for binding environment variables
    321 
    322 - `rlang::env_bind_lazy()` creates delayed bindings
    323 
    324   - evaluated the first time they are accessed
    325 
    326 - `rlang::env_bind_active()` creates active bindings
    327 
    328   - re-computed every time they’re accessed
    329 
    330 # 7.3 Recursing over environments
    331 
    332 ## Explore environments recursively 
    333 
    334 ```{r}
    335 where <- function(name, env = caller_env()) {
    336   if (identical(env, empty_env())) {
    337     # Base case
    338     stop("Can't find ", name, call. = FALSE)
    339   } else if (env_has(env, name)) {
    340     # Success case
    341     env
    342   } else {
    343     # Recursive case
    344     where(name, env_parent(env))
    345   }
    346 }
    347 ```
    348 
    349 ::: {.notes}
    350 
    351 Why is recursing over environments important? 
    352 
    353 Recursion is not the same thing as iteration
    354 
    355 :::
    356 
    357 # 7.4 Special Environments
    358 
    359 ## Attaching packages changes the search path 
    360 
    361 - The **search path** is the order in which R will look through environments for objects
    362 
    363 - Attached packages become a parent of the global environment
    364 
    365 - The immediate parent of the global environment is that last package attached
    366 
    367 ![](images/07_search-path.png)
    368 
    369 
    370 ::: {.notes}
    371 
    372 Autoloads and base are always the last two environments on the search path
    373 
    374 Autoloads uses lazy loading to make large package objects (like datasets) available without taking up memory
    375 
    376 Functions within base are used to load all other packages
    377 
    378 :::
    379 
    380 ## Attaching packages changes the search path 
    381 
    382 :::: {.columns}
    383 
    384 ::: {.column}
    385 
    386 ```{r}
    387 rlang::search_envs()
    388 ```
    389 
    390 :::
    391 
    392 ::: {.column}
    393 
    394 ```{r}
    395 library(rlang)
    396 
    397 rlang::search_envs()
    398 ```
    399 
    400 :::
    401 
    402 ::::
    403 
    404 ::: {.notes}
    405 
    406 Attaching `{rlang}` modifies the search path
    407 
    408 :::
    409 
    410 ## Functions enclose their current environment
    411 
    412 - Functions enclose current environment when it is created
    413 
    414 ```{r}
    415 y <- 1
    416 
    417 f <- function(x) x + y
    418 
    419 rlang::fn_env(f)
    420 ```
    421 
    422 ![](images/07_binding.png)
    423 
    424 ::: {.notes}
    425 
    426 The function environment is represented by the black dot
    427 
    428 The function `f()` knows where to look for y thanks to the function environment
    429 
    430 :::
    431 
    432 ## Functions enclose their current environment
    433 
    434 - `g()` is *being bound by* the environment `e` but *binds* the global environment 
    435 
    436 - The function environment is the global environment but the binding environment is `e` 
    437 
    438 ```{r}
    439 e <- env()
    440 
    441 e$g <- function() 1
    442 
    443 rlang::fn_env(e$g)
    444 ```
    445 
    446 ## Functions enclose their current environment
    447 
    448 ![](images/07_binding-2.png)
    449 
    450 ## Namespaces ensure package environment independence
    451 
    452 - Every package has an underlying namespace
    453 
    454 - Every function is associated with a package environment and namespace environment
    455 
    456 - Package environments contain exported objects
    457 
    458 - Namespace environments contain exported and internal objects
    459 
    460 ::: {.notes}
    461 
    462 Contrast `dplyr::across` and `dplyr:::across_glue_mask()` 
    463 
    464 `sd()` is bound to the `{stats}` namespace environment
    465 
    466 :::
    467 
    468 ## Namespaces ensure package environment independence
    469 
    470 ![](images/07_namespace-bind.png)
    471 
    472 ## Namespaces ensure package environment independence
    473 
    474 ![](images/07_namespace.png)
    475 
    476 ::: {.notes}
    477 
    478 `var()` is found in the stats namespace first, so that is the definition of var that is used by `sd()`
    479 
    480 If an object called by `sd()` wasn't found in the stats namespace, it would be searched for according to the search path 
    481 
    482 :::
    483 
    484 ## Functions use ephemeral execution environments
    485 
    486 - Functions create a new environment to use whenever executed
    487 
    488 - The execution environment is a child of the function environment
    489 
    490 - Execution environments are garbage collected on function exit  
    491 
    492 ## Functions use ephemeral execution environments
    493 
    494 ![](images/07_execution.png)
    495 
    496 # 7.5 Call stacks
    497 
    498 ## The caller environment informs the call stack
    499 
    500 - The caller environment is the environment from which the function was called
    501 
    502 - Accessed with `rlang::caller_env()` 
    503 
    504 - The call stack is created within the caller environment
    505 
    506 :::: {.columns}
    507 
    508 ::: {.column}
    509 
    510 ```{r}
    511 f <- function(x) {
    512   g(x = 2)
    513 }
    514 g <- function(x) {
    515   h(x = 3)
    516 }
    517 h <- function(x) {
    518   lobstr::cst()
    519 }
    520 ```
    521 
    522 :::
    523 
    524 ::: {.column}
    525 
    526 ```{r}
    527 f(x = 1)
    528 ```
    529 
    530 :::
    531 
    532 ::::
    533 
    534 ::: {.notes}
    535 
    536 `traceback()` is the base R approach
    537 
    538 `lobstr::cst()` prints the call stack in order of call, opposite of `traceback()`
    539 
    540 Does `lobstr::cst()` now prints the caller environment?
    541 
    542 :::
    543 
    544 ## The caller environment informs the call stack
    545 
    546 - Call stack is more complicated with lazy evaluation
    547 
    548 ```{r}
    549 a <- function(x) b(x)
    550 b <- function(x) d(x)
    551 d <- function(x) x
    552 
    553 a(f())
    554 ```
    555 
    556 ::: {.notes}
    557 
    558 Do different branches represent different caller environments?
    559 
    560 Note that `c()` was replaced with `d()` as it could not be rendered with `c()`
    561 
    562 :::
    563 
    564 ## The caller environment informs the call stack
    565 
    566 ![](images/07_calling.png) 
    567 
    568 ::: {.notes}
    569 
    570 - Each frame contains: 
    571 
    572   1. An expression
    573 
    574   2. An environment
    575 
    576   3. A parent
    577 
    578 :::
    579 
    580 ## R uses lexical scoping, not dynamic scoping
    581 
    582 > R uses lexical scoping: it looks up the values of names based on how a function is defined, not how it is called. “Lexical” here is not the English adjective that means relating to words or a vocabulary. It’s a technical CS term that tells us that the scoping rules use a parse-time, rather than a run-time structure. - [Chapter 6 - functions](https://adv-r.hadley.nz/functions.html)
    583 
    584 - Dynamic scoping means functions use variables as they are defined in the calling environment
    585 
    586 # 7.6 Data structures
    587 
    588 ## Environments are useful data structures
    589 
    590 - Usecase include:
    591 
    592   1. Avoiding copies of large data
    593 
    594   2. Managing state within a package
    595 
    596   3. As a hashmap
    597 
    598 ::: {.notes} 
    599 
    600 Finding a function in a package uses constant time
    601 
    602 :::
    603 
    604 
    605