bookclub-advr

DSLC Advanced R Book Club
git clone https://git.eamoncaddigan.net/bookclub-advr.git
Log | Files | Refs | README | LICENSE

html.json (11354B)


      1 {
      2   "hash": "5d45e2f475170b2f4aeae3a9e7c745dc",
      3   "result": {
      4     "engine": "knitr",
      5     "markdown": "---\nengine: knitr\ntitle: Names and values\n---\n\n## Learning objectives\n\n- Distinguish between an *object* and its *name*.\n- Identify when data are *copied* versus *modified*.\n- Trace and identify the memory used by R.\n\nThe `{lobstr}` package will help us throughout the chapter\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(lobstr)\n```\n:::\n\n\n\n## Syntactic names are easier to create and work with than non-syntactic names\n\n\n- Syntactic names: `my_variable`, `x`, `cpp11`, `.by`.\n  - Can't use names in `?Reserved`\n\n- Non-syntactic names need to be surrounded in backticks. \n\n## Names are *bound to* values with `<-`\n\n\n::: {.cell}\n\n```{.r .cell-code}\na <- c(1, 2, 3)\na\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1 2 3\n```\n\n\n:::\n\n```{.r .cell-code}\nobj_addr(a)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"0x1624dfae968\"\n```\n\n\n:::\n:::\n\n\n## Many names can be bound to the same values\n\n\n::: {.cell}\n\n```{.r .cell-code}\nb <- a\nobj_addr(a)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"0x1624dfae968\"\n```\n\n\n:::\n\n```{.r .cell-code}\nobj_addr(b)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"0x1624dfae968\"\n```\n\n\n:::\n:::\n\n\n## If shared values are modified, the object is copied to a new address\n\n\n::: {.cell}\n\n```{.r .cell-code}\nb[[1]] <- 5\nobj_addr(a)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"0x1624dfae968\"\n```\n\n\n:::\n\n```{.r .cell-code}\nobj_addr(b)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"0x16249d8f228\"\n```\n\n\n:::\n:::\n\n\n## Memory addresses can differ even if objects seem the same\n\n\n::: {.cell}\n\n```{.r .cell-code}\na <- 1:10\nb <- a\nc <- 1:10\n\nobj_addr(a)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"0x1624bd3f5e8\"\n```\n\n\n:::\n\n```{.r .cell-code}\nobj_addr(b)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"0x1624bd3f5e8\"\n```\n\n\n:::\n\n```{.r .cell-code}\nobj_addr(c)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"0x1624bd7a540\"\n```\n\n\n:::\n:::\n\n\n## Functions have a single address regardless of how they're referenced\n\n\n::: {.cell}\n\n```{.r .cell-code}\nobj_addr(mean)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"0x162498c1738\"\n```\n\n\n:::\n\n```{.r .cell-code}\nobj_addr(base::mean)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"0x162498c1738\"\n```\n\n\n:::\n\n```{.r .cell-code}\nobj_addr(get(\"mean\"))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"0x162498c1738\"\n```\n\n\n:::\n:::\n\n\n## Unlike most objects, environments keep the same memory address on modify\n\n\n::: {.cell}\n\n```{.r .cell-code}\nd <- new.env()\nobj_addr(d)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"0x1624d749900\"\n```\n\n\n:::\n\n```{.r .cell-code}\ne <- d\ne[['a']] <- 1\nobj_addr(e)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"0x1624d749900\"\n```\n\n\n:::\n\n```{.r .cell-code}\nobj_addr(d)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"0x1624d749900\"\n```\n\n\n:::\n\n```{.r .cell-code}\nd[['a']]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1\n```\n\n\n:::\n:::\n\n\n## Use `tracemem` to validate if values are copied or modified\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- runif(10)\ntracemem(x)\n#> [1] \"<000001F4185B4B08>\"\ny <- x\nx[[1]] <- 10\n#> tracemem[0x000001f4185b4b08 -> 0x000001f4185b4218]:\nuntracemem(x)\n```\n:::\n\n\n## `tracemem` shows internal C code minimizes copying\n\n\n::: {.cell}\n\n```{.r .cell-code}\ny <- as.list(x)\ntracemem(y)\n#> [1] \"<000001AD67FDCD38>\"\nmedians <- vapply(x, median, numeric(1))\nfor (i in 1:5) {\n  y[[i]] <- y[[i]] - medians[[i]]\n}\n#> tracemem[0x000001ad67fdcd38 -> 0x000001ad61982638]:\nuntracemem(y)\n```\n:::\n\n\n## A function's environment follows copy-on-modify rules\n\n:::: columns\n\n::: column\n\n::: {.cell}\n\n```{.r .cell-code}\nf <- function(a) {\n  a\n}\n\nx <- c(1, 2, 3)\nz <- f(x) # No change in value\n\nobj_addr(x)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"0x1624dda26e8\"\n```\n\n\n:::\n\n```{.r .cell-code}\nobj_addr(z) # No address change \n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"0x1624dda26e8\"\n```\n\n\n:::\n:::\n\n:::\n\n::: column\n![](images/02-trace.png)\n:::\n\n::::\n\n::: notes\n- Diagrams will be explained more in chapter 7.\n- `a` points to same address as `x`.\n- If `a` modified inside function, `z` would have new address.\n:::\n\n\n## `ref()` shows the memory address of a list and its *elements*\n\n:::: columns\n\n::: column\n\n::: {.cell}\n\n```{.r .cell-code}\nl1 <- list(1, 2, 3)\nobj_addr(l1)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"0x1624e315d68\"\n```\n\n\n:::\n\n```{.r .cell-code}\nl2 <- l1\nl2[[3]] <- 4\nref(l1, l2)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> █ [1:0x1624e315d68] <list> \n#> ├─[2:0x1624e77ebd0] <dbl> \n#> ├─[3:0x1624e77ea10] <dbl> \n#> └─[4:0x1624e77e850] <dbl> \n#>  \n#> █ [5:0x1624e34ed18] <list> \n#> ├─[2:0x1624e77ebd0] \n#> ├─[3:0x1624e77ea10] \n#> └─[6:0x1624e754e70] <dbl>\n```\n\n\n:::\n:::\n\n:::\n\n::: column\n![](images/02-l-modify-2.png){width=50%}\n:::\n\n::::\n\n## Since dataframes are lists of (column) vectors, mutating a column modifies only that column\n\n\n::: {.cell}\n\n```{.r .cell-code}\nd1 <- data.frame(x = c(1, 5, 6), y = c(2, 4, 3))\nd2 <- d1\nd2[, 2] <- d2[, 2] * 2\nref(d1, d2)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> █ [1:0x1624ec8d348] <df[,2]> \n#> ├─x = [2:0x1624f230148] <dbl> \n#> └─y = [3:0x1624f2300f8] <dbl> \n#>  \n#> █ [4:0x1624ee2aec8] <df[,2]> \n#> ├─x = [2:0x1624f230148] \n#> └─y = [5:0x1624f7aab98] <dbl>\n```\n\n\n:::\n:::\n\n\n## Since dataframes are lists of (column) vectors, mutating a row modifies the value\n\n\n::: {.cell}\n\n```{.r .cell-code}\nd1 <- data.frame(x = c(1, 5, 6), y = c(2, 4, 3))\nd2 <- d1\nd2[1, ] <- d2[1, ] * 2\nref(d1, d2)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> █ [1:0x1624f91a508] <df[,2]> \n#> ├─x = [2:0x16250095318] <dbl> \n#> └─y = [3:0x162500952c8] <dbl> \n#>  \n#> █ [4:0x1624fb324c8] <df[,2]> \n#> ├─x = [5:0x162501700e8] <dbl> \n#> └─y = [6:0x16250170098] <dbl>\n```\n\n\n:::\n:::\n\n\n::: notes\n- Here \"mutate\" means \"change\", not `dplyr::mutate()`\n:::\n\n## Characters are unique due to the global string pool\n\n:::: columns\n\n::: column\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- 1:4\nref(x)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1:0x1624ff60cb0] <int>\n```\n\n\n:::\n\n```{.r .cell-code}\ny <- 1:4\nref(y)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1:0x162500b4318] <int>\n```\n\n\n:::\n\n```{.r .cell-code}\nx <- c(\"a\", \"a\", \"b\")\nref(x, character = TRUE)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> █ [1:0x1625089a188] <chr> \n#> ├─[2:0x16241da3118] <string: \"a\"> \n#> ├─[2:0x16241da3118] \n#> └─[3:0x1624818b3b8] <string: \"b\">\n```\n\n\n:::\n\n```{.r .cell-code}\ny <- c(\"a\")\nref(y, character = TRUE)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> █ [1:0x16250892930] <chr> \n#> └─[2:0x16241da3118] <string: \"a\">\n```\n\n\n:::\n:::\n\n:::\n\n::: column\n![](images/02-character-2.png)\n:::\n\n::::\n\n::: notes\n- \"a\" is always at the same address.\n- Each member of character vector has its own address (kind of list-like).\n:::\n\n## Memory amount can also be measured, using `lobstr::obj_size`\n\n\n::: {.cell}\n\n```{.r .cell-code}\nbanana <- \"bananas bananas bananas\"\nobj_addr(banana)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"0x1624bb90318\"\n```\n\n\n:::\n\n```{.r .cell-code}\nobj_size(banana)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> 136 B\n```\n\n\n:::\n:::\n\n\n## Alternative Representation or ALTREPs represent vector values efficiently\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- 1:10\nobj_size(x)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> 680 B\n```\n\n\n:::\n\n```{.r .cell-code}\ny <- 1:10000\nobj_size(y)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> 680 B\n```\n\n\n:::\n:::\n\n\n## We can measure memory & speed using `bench::mark()`\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmed <- function(d, medians) {\n  for (i in seq_along(medians)) {\n    d[[i]] <- d[[i]] - medians[[i]]\n  }\n}\nx <- data.frame(matrix(runif(5 * 1e4), ncol = 5))\nmedians <- vapply(x, median, numeric(1))\ny <- as.list(x)\n\nbench::mark(\n  \"data.frame\" = med(x, medians),\n  \"list\" = med(y, medians)\n)[, c(\"min\", \"median\", \"mem_alloc\")]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> # A tibble: 2 × 3\n#>        min   median mem_alloc\n#>   <bch:tm> <bch:tm> <bch:byt>\n#> 1   52.7µs   71.2µs     491KB\n#> 2   16.8µs   35.1µs     391KB\n```\n\n\n:::\n:::\n\n\n::: notes\n- The thing to see: list version uses less RAM and is faster\n:::\n\n## The garbage collector `gc()` explicitly clears out unbound objects\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- 1:3\nx <- 2:4 # \"1:3\" is orphaned\nrm(x) # \"2:4\" is orphaned\ngc()\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#>           used (Mb) gc trigger (Mb) max used (Mb)\n#> Ncells  791094 42.3    1505455 80.4  1505455 80.4\n#> Vcells 1497588 11.5    8388608 64.0  8388528 64.0\n```\n\n\n:::\n\n```{.r .cell-code}\nlobstr::mem_used() # Wrapper around gc()\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> 56.29 MB\n```\n\n\n:::\n:::\n\n\n::: aside\n`gc()` runs automatically, never *need* to call\n:::\n\n::: notes\n- `mem_used()` multiplies Ncells \"used\" by either 28 (32-bit architecture) or 56 (64-bit architecture)., and Vcells \"used\" by 8, adds them, and converts to Mb.\n:::",
      6     "supporting": [
      7       "02_files"
      8     ],
      9     "filters": [
     10       "rmarkdown/pagebreak.lua"
     11     ],
     12     "includes": {
     13       "include-after-body": [
     14         "\n<script>\n  // htmlwidgets need to know to resize themselves when slides are shown/hidden.\n  // Fire the \"slideenter\" event (handled by htmlwidgets.js) when the current\n  // slide changes (different for each slide format).\n  (function () {\n    // dispatch for htmlwidgets\n    function fireSlideEnter() {\n      const event = window.document.createEvent(\"Event\");\n      event.initEvent(\"slideenter\", true, true);\n      window.document.dispatchEvent(event);\n    }\n\n    function fireSlideChanged(previousSlide, currentSlide) {\n      fireSlideEnter();\n\n      // dispatch for shiny\n      if (window.jQuery) {\n        if (previousSlide) {\n          window.jQuery(previousSlide).trigger(\"hidden\");\n        }\n        if (currentSlide) {\n          window.jQuery(currentSlide).trigger(\"shown\");\n        }\n      }\n    }\n\n    // hookup for slidy\n    if (window.w3c_slidy) {\n      window.w3c_slidy.add_observer(function (slide_num) {\n        // slide_num starts at position 1\n        fireSlideChanged(null, w3c_slidy.slides[slide_num - 1]);\n      });\n    }\n\n  })();\n</script>\n\n"
     15       ]
     16     },
     17     "engineDependencies": {},
     18     "preserve": {},
     19     "postProcess": true
     20   }
     21 }