Advanced R book club - Chapter 2 document edits (#87) - bookclub-advr

commit 802c1533e2ed1c0736cb9aade05099ae82f83df7
parent 011e94299c79a179e804427680c9786d18651635
Author: Nick Giangreco <nick.giangreco@gmail.com>
Date:   Mon, 18 Aug 2025 07:53:01 -0400

Advanced R book club - Chapter 2 document edits (#87)

* first pass at restructuring slide headers to be a more compact presentation experience

* finished revising chapter 2 slides

* Update slides/02.qmd

make LO's declarative

Co-authored-by: Jon Harmon <jonthegeek@gmail.com>

* Update slides/02.qmd

Co-authored-by: Jon Harmon <jonthegeek@gmail.com>

* Update slides/02.qmd

Co-authored-by: Jon Harmon <jonthegeek@gmail.com>

* Update slides/02.qmd

Co-authored-by: Jon Harmon <jonthegeek@gmail.com>

* Update slides/02.qmd

Co-authored-by: Jon Harmon <jonthegeek@gmail.com>

* Update slides/02.qmd

Co-authored-by: Jon Harmon <jonthegeek@gmail.com>

* Update slides/02.qmd

Co-authored-by: Jon Harmon <jonthegeek@gmail.com>

* Update slides/02.qmd

Co-authored-by: Jon Harmon <jonthegeek@gmail.com>

* using assertion slide making technique

* fix spelling  and shorten titles

* Tweak 02 slides and repair shared metadata.

---------

Co-authored-by: Jon Harmon <jonthegeek@gmail.com>
Diffstat:
M _freeze/slides/02/execute-results/html.json  | 10 +++++++---
A slides/.gitignore  | 2 ++
M slides/02.qmd  | 489 +++++++++++++++++++++++--------------------------------------------------------
M slides/_metadata.yml  | 5 +----

4 files changed, 148 insertions(+), 358 deletions(-)
diff --git a/_freeze/slides/02/execute-results/html.json b/_freeze/slides/02/execute-results/html.json
@@ -1,15 +1,19 @@
 {
-  "hash": "a11b52aa373c64a45cd125fdb1d36946",
+  "hash": "387178242b89e67960838bbc8e8752bc",
   "result": {
     "engine": "knitr",
-    "markdown": "---\nengine: knitr\ntitle: Names and values\n---\n\n## Learning objectives\n\n- To be able to understand distinction between an *object* and its *name*\n- With this knowledge, to be able write faster code using less memory\n- To better understand R's functional programming tools\n\nUsing lobstr package here.\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(lobstr)\n```\n:::\n\n\n\n## Quiz\n\n### 1. How do I create a new column called `3` that contains the sum of `1` and `2`?\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndf <- data.frame(runif(3), runif(3))\nnames(df) <- c(1, 2)\ndf\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#>           1         2\n#> 1 0.8893205 0.9874973\n#> 2 0.4645398 0.7004741\n#> 3 0.7312149 0.2986040\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndf$`3` <- df$`1` + df$`2`\ndf\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#>           1         2        3\n#> 1 0.8893205 0.9874973 1.876818\n#> 2 0.4645398 0.7004741 1.165014\n#> 3 0.7312149 0.2986040 1.029819\n```\n\n\n:::\n:::\n\n\n**What makes these names challenging?**\n\n> You need to use backticks (`) when the name of an object doesn't start with a \n> a character or '.' [or . followed by a number] (non-syntactic names).\n\n### 2. How much memory does `y` occupy?\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- runif(1e6)\ny <- list(x, x, x)\n```\n:::\n\n\nNeed to use the lobstr package:\n\n::: {.cell}\n\n```{.r .cell-code}\nlobstr::obj_size(y)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> 8.00 MB\n```\n\n\n:::\n:::\n\n\n> Note that if you look in the RStudio Environment or use R base `object.size()`\n> you actually get a value of 24 MB\n\n\n::: {.cell}\n\n```{.r .cell-code}\nobject.size(y)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> 24000224 bytes\n```\n\n\n:::\n:::\n\n\n### 3. On which line does `a` get copied in the following example?\n\n::: {.cell}\n\n```{.r .cell-code}\na <- c(1, 5, 3, 2)\nb <- a\nb[[1]] <- 10\n```\n:::\n\n\n> Not until `b` is modified, the third line\n\n## Binding basics\n\n- Create values and *bind* a name to them\n- Names have values (rather than values have names)\n- Multiple names can refer to the same values\n- We can look at an object's address to keep track of the values independent of their names\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- c(1, 2, 3)\ny <- x\nobj_addr(x)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"0x2a58503acd8\"\n```\n\n\n:::\n\n```{.r .cell-code}\nobj_addr(y)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"0x2a58503acd8\"\n```\n\n\n:::\n:::\n\n\n\n### Exercises\n\n##### 1. Explain the relationships\n\n::: {.cell}\n\n```{.r .cell-code}\na <- 1:10\nb <- a\nc <- b\nd <- 1:10\n```\n:::\n\n\n> `a` `b` and `c` are all names that refer to the first value `1:10`\n> \n> `d` is a name that refers to the *second* value of `1:10`.\n\n\n##### 2. Do the following all point to the same underlying function object? hint: `lobstr::obj_addr()`\n\n::: {.cell}\n\n```{.r .cell-code}\nobj_addr(mean)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"0x2a5828bf738\"\n```\n\n\n:::\n\n```{.r .cell-code}\nobj_addr(base::mean)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"0x2a5828bf738\"\n```\n\n\n:::\n\n```{.r .cell-code}\nobj_addr(get(\"mean\"))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"0x2a5828bf738\"\n```\n\n\n:::\n\n```{.r .cell-code}\nobj_addr(evalq(mean))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"0x2a5828bf738\"\n```\n\n\n:::\n\n```{.r .cell-code}\nobj_addr(match.fun(\"mean\"))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"0x2a5828bf738\"\n```\n\n\n:::\n:::\n\n\n> Yes!\n\n## Copy-on-modify\n\n- If you modify a value bound to multiple names, it is 'copy-on-modify'\n- If you modify a value bound to a single name, it is 'modify-in-place'\n- Use `tracemem()` to see when a name's value changes\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- c(1, 2, 3)\ncat(tracemem(x), \"\\n\")\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> <000002A585CC0FF8>\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ny <- x\ny[[3]] <- 4L  # Changes (copy-on-modify)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> tracemem[0x000002a585cc0ff8 -> 0x000002a58600d5e8]: eval eval withVisible withCallingHandlers eval eval with_handlers doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList withRestarts <Anonymous> evaluate in_dir in_input_dir eng_r block_exec call_block process_group withCallingHandlers <Anonymous> process_file <Anonymous> <Anonymous> execute .main\n```\n\n\n:::\n\n```{.r .cell-code}\ny[[3]] <- 5L  # Doesn't change (modify-in-place)\n```\n:::\n\n\nTurn off `tracemem()` with `untracemem()`\n\n> Can also use `ref(x)` to get the address of the value bound to a given name\n\n\n## Functions\n\n- Copying also applies within functions\n- If you copy (but don't modify) `x` within `f()`, no copy is made\n\n\n::: {.cell}\n\n```{.r .cell-code}\nf <- function(a) {\n  a\n}\n\nx <- c(1, 2, 3)\nz <- f(x) # No change in value\n\nref(x)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1:0x2a58669dd18] <dbl>\n```\n\n\n:::\n\n```{.r .cell-code}\nref(z)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1:0x2a58669dd18] <dbl>\n```\n\n\n:::\n:::\n\n\n<!-- ![](images/02-trace.png) -->\n\n## Lists\n\n- A list overall, has it's own reference (id)\n- List *elements* also each point to other values\n- List doesn't store the value, it *stores a reference to the value*\n- As of R 3.1.0, modifying lists creates a *shallow copy*\n    - References (bindings) are copied, but *values are not*\n\n\n::: {.cell}\n\n```{.r .cell-code}\nl1 <- list(1, 2, 3)\nl2 <- l1\nl2[[3]] <- 4\n```\n:::\n\n\n- We can use `ref()` to see how they compare\n  - See how the list reference is different\n  - But first two items in each list are the same\n\n\n::: {.cell}\n\n```{.r .cell-code}\nref(l1, l2)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> █ [1:0x2a586f2e698] <list> \n#> ├─[2:0x2a5877133b8] <dbl> \n#> ├─[3:0x2a5877131f8] <dbl> \n#> └─[4:0x2a587713038] <dbl> \n#>  \n#> █ [5:0x2a586fc3098] <list> \n#> ├─[2:0x2a5877133b8] \n#> ├─[3:0x2a5877131f8] \n#> └─[6:0x2a58770fc78] <dbl>\n```\n\n\n:::\n:::\n\n\n![](images/02-l-modify-2.png){width=50%}\n\n## Data Frames\n\n- Data frames are lists of vectors\n- So copying and modifying a column *only affects that column*\n- **BUT** if you modify a *row*, every column must be copied\n\n\n::: {.cell}\n\n```{.r .cell-code}\nd1 <- data.frame(x = c(1, 5, 6), y = c(2, 4, 3))\nd2 <- d1\nd3 <- d1\n```\n:::\n\n\nOnly the modified column changes\n\n::: {.cell}\n\n```{.r .cell-code}\nd2[, 2] <- d2[, 2] * 2\nref(d1, d2)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> █ [1:0x2a584931608] <df[,2]> \n#> ├─x = [2:0x2a57f3b9cc8] <dbl> \n#> └─y = [3:0x2a57f3b9c78] <dbl> \n#>  \n#> █ [4:0x2a5810eb508] <df[,2]> \n#> ├─x = [2:0x2a57f3b9cc8] \n#> └─y = [5:0x2a57feb2058] <dbl>\n```\n\n\n:::\n:::\n\n\nAll columns change\n\n::: {.cell}\n\n```{.r .cell-code}\nd3[1, ] <- d3[1, ] * 3\nref(d1, d3)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> █ [1:0x2a584931608] <df[,2]> \n#> ├─x = [2:0x2a57f3b9cc8] <dbl> \n#> └─y = [3:0x2a57f3b9c78] <dbl> \n#>  \n#> █ [4:0x2a57faa92c8] <df[,2]> \n#> ├─x = [5:0x2a585a91b38] <dbl> \n#> └─y = [6:0x2a585a91ae8] <dbl>\n```\n\n\n:::\n:::\n\n\n## Character vectors\n\n- R has a **global string pool**\n- Elements of character vectors point to unique strings in the pool\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- c(\"a\", \"a\", \"abc\", \"d\")\n```\n:::\n\n\n![](images/02-character-2.png)\n\n## Exercises\n\n##### 1. Why is `tracemem(1:10)` not useful?\n\n> Because it tries to trace a value that is not bound to a name\n\n##### 2. Why are there two copies?\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- c(1L, 2L, 3L)\ntracemem(x)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"<000002A5856391C8>\"\n```\n\n\n:::\n\n```{.r .cell-code}\nx[[3]] <- 4\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> tracemem[0x000002a5856391c8 -> 0x000002a585653f08]: eval eval withVisible withCallingHandlers eval eval with_handlers doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList withRestarts <Anonymous> evaluate in_dir in_input_dir eng_r block_exec call_block process_group withCallingHandlers <Anonymous> process_file <Anonymous> <Anonymous> execute .main \n#> tracemem[0x000002a585653f08 -> 0x000002a58663f8b8]: eval eval withVisible withCallingHandlers eval eval with_handlers doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList withRestarts <Anonymous> evaluate in_dir in_input_dir eng_r block_exec call_block process_group withCallingHandlers <Anonymous> process_file <Anonymous> <Anonymous> execute .main\n```\n\n\n:::\n:::\n\n\n> Because we convert an *integer* vector (using 1L, etc.) to a *double* vector (using just 4)- \n\n##### 3. What is the relationships among these objects?\n\n\n::: {.cell}\n\n```{.r .cell-code}\na <- 1:10      \nb <- list(a, a)\nc <- list(b, a, 1:10) # \n```\n:::\n\n\na <- obj 1    \nb <- obj 1, obj 1    \nc <- b(obj 1, obj 1), obj 1, 1:10    \n\n\n::: {.cell}\n\n```{.r .cell-code}\nref(c)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> █ [1:0x2a586fc3ea8] <list> \n#> ├─█ [2:0x2a585a1a308] <list> \n#> │ ├─[3:0x2a585aa1c40] <int> \n#> │ └─[3:0x2a585aa1c40] \n#> ├─[3:0x2a585aa1c40] \n#> └─[4:0x2a585b13d90] <int>\n```\n\n\n:::\n:::\n\n\n\n##### 4. What happens here?\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- list(1:10)\nx[[2]] <- x\n```\n:::\n\n\n- `x` is a list\n- `x[[2]] <- x` creates a new list, which in turn contains a reference to the \n  original list\n- `x` is no longer bound to `list(1:10)`\n\n\n::: {.cell}\n\n```{.r .cell-code}\nref(x)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> █ [1:0x2a586172508] <list> \n#> ├─[2:0x2a58641c040] <int> \n#> └─█ [3:0x2a586b06c48] <list> \n#>   └─[2:0x2a58641c040]\n```\n\n\n:::\n:::\n\n\n![](images/02-copy_on_modify_fig2.png){width=50%}\n\n## Object Size\n\n- Use `lobstr::obj_size()` \n- Lists may be smaller than expected because of referencing the same value\n- Strings may be smaller than expected because using global string pool\n- Difficult to predict how big something will be\n  - Can only add sizes together if they share no references in common\n\n### Alternative Representation\n- As of R 3.5.0 - ALTREP\n- Represent some vectors compactly\n    - e.g., 1:1000 - not 10,000 values, just 1 and 1,000\n\n### Exercises\n\n##### 1. Why are the sizes so different?\n\n\n::: {.cell}\n\n```{.r .cell-code}\ny <- rep(list(runif(1e4)), 100)\n\nobject.size(y) # ~8000 kB\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> 8005648 bytes\n```\n\n\n:::\n\n```{.r .cell-code}\nobj_size(y)    # ~80   kB\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> 80.90 kB\n```\n\n\n:::\n:::\n\n\n> From `?object.size()`: \n> \n> \"This function merely provides a rough indication: it should be reasonably accurate for atomic vectors, but **does not detect if elements of a list are shared**, for example.\n\n##### 2. Why is the size misleading?\n\n\n::: {.cell}\n\n```{.r .cell-code}\nfuns <- list(mean, sd, var)\nobj_size(funs)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> 18.76 kB\n```\n\n\n:::\n:::\n\n\n> Because they reference functions from base and stats, which are always available.\n> Why bother looking at the size? What use is that?\n\n##### 3. Predict the sizes\n\n\n::: {.cell}\n\n```{.r .cell-code}\na <- runif(1e6) # 8 MB\nobj_size(a)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> 8.00 MB\n```\n\n\n:::\n:::\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nb <- list(a, a)\n```\n:::\n\n\n- There is one value ~8MB\n- `a` and `b[[1]]` and `b[[2]]` all point to the same value.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nobj_size(b)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> 8.00 MB\n```\n\n\n:::\n\n```{.r .cell-code}\nobj_size(a, b)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> 8.00 MB\n```\n\n\n:::\n:::\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nb[[1]][[1]] <- 10\n```\n:::\n\n- Now there are two values ~8MB each (16MB total)\n- `a` and `b[[2]]` point to the same value (8MB)\n- `b[[1]]` is new (8MB) because the first element (`b[[1]][[1]]`) has been changed\n\n\n::: {.cell}\n\n```{.r .cell-code}\nobj_size(b)     # 16 MB (two values, two element references)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> 16.00 MB\n```\n\n\n:::\n\n```{.r .cell-code}\nobj_size(a, b)  # 16 MB (a & b[[2]] point to the same value)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> 16.00 MB\n```\n\n\n:::\n:::\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nb[[2]][[1]] <- 10\n```\n:::\n\n- Finally, now there are three values ~8MB each (24MB total)\n- Although `b[[1]]` and `b[[2]]` have the same contents, \n  they are not references to the same object.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nobj_size(b)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> 16.00 MB\n```\n\n\n:::\n\n```{.r .cell-code}\nobj_size(a, b)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> 24.00 MB\n```\n\n\n:::\n:::\n\n\n\n## Modify-in-place\n\n- Modifying usually creates a copy except for\n    - Objects with a single binding (performance optimization)\n    - Environments (special)\n\n### Objects with a single binding\n\n- Hard to know if copy will occur\n- If you have 2+ bindings and remove them, R can't follow how many are removed (so will always think there are more than one)\n- May make a copy even if there's only one binding left\n- Using a function makes a reference to it **unless it's a function based on C**\n- Best to use `tracemem()` to check rather than guess.\n\n\n#### Example - lists vs. data frames in for loop\n\n**Setup**  \n\nCreate the data to modify\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- data.frame(matrix(runif(5 * 1e4), ncol = 5))\nmedians <- vapply(x, median, numeric(1))\n```\n:::\n\n\n\n**Data frame - Copied every time!**\n\n::: {.cell}\n\n```{.r .cell-code}\ncat(tracemem(x), \"\\n\")\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> <000002A587857268>\n```\n\n\n:::\n\n```{.r .cell-code}\nfor (i in seq_along(medians)) {\n  x[[i]] <- x[[i]] - medians[[i]]\n}\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> tracemem[0x000002a587857268 -> 0x000002a584b5de78]: eval eval withVisible withCallingHandlers eval eval with_handlers doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList withRestarts <Anonymous> evaluate in_dir in_input_dir eng_r block_exec call_block process_group withCallingHandlers <Anonymous> process_file <Anonymous> <Anonymous> execute .main \n#> tracemem[0x000002a584b5de78 -> 0x000002a584b5d2a8]: [[<-.data.frame [[<- eval eval withVisible withCallingHandlers eval eval with_handlers doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList withRestarts <Anonymous> evaluate in_dir in_input_dir eng_r block_exec call_block process_group withCallingHandlers <Anonymous> process_file <Anonymous> <Anonymous> execute .main \n#> tracemem[0x000002a584b5d2a8 -> 0x000002a584b5d238]: eval eval withVisible withCallingHandlers eval eval with_handlers doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList withRestarts <Anonymous> evaluate in_dir in_input_dir eng_r block_exec call_block process_group withCallingHandlers <Anonymous> process_file <Anonymous> <Anonymous> execute .main \n#> tracemem[0x000002a584b5d238 -> 0x000002a584b5d1c8]: [[<-.data.frame [[<- eval eval withVisible withCallingHandlers eval eval with_handlers doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList withRestarts <Anonymous> evaluate in_dir in_input_dir eng_r block_exec call_block process_group withCallingHandlers <Anonymous> process_file <Anonymous> <Anonymous> execute .main \n#> tracemem[0x000002a584b5d1c8 -> 0x000002a584b5d158]: eval eval withVisible withCallingHandlers eval eval with_handlers doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList withRestarts <Anonymous> evaluate in_dir in_input_dir eng_r block_exec call_block process_group withCallingHandlers <Anonymous> process_file <Anonymous> <Anonymous> execute .main \n#> tracemem[0x000002a584b5d158 -> 0x000002a584b5d0e8]: [[<-.data.frame [[<- eval eval withVisible withCallingHandlers eval eval with_handlers doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList withRestarts <Anonymous> evaluate in_dir in_input_dir eng_r block_exec call_block process_group withCallingHandlers <Anonymous> process_file <Anonymous> <Anonymous> execute .main \n#> tracemem[0x000002a584b5d0e8 -> 0x000002a584b5d078]: eval eval withVisible withCallingHandlers eval eval with_handlers doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList withRestarts <Anonymous> evaluate in_dir in_input_dir eng_r block_exec call_block process_group withCallingHandlers <Anonymous> process_file <Anonymous> <Anonymous> execute .main \n#> tracemem[0x000002a584b5d078 -> 0x000002a584b5d008]: [[<-.data.frame [[<- eval eval withVisible withCallingHandlers eval eval with_handlers doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList withRestarts <Anonymous> evaluate in_dir in_input_dir eng_r block_exec call_block process_group withCallingHandlers <Anonymous> process_file <Anonymous> <Anonymous> execute .main \n#> tracemem[0x000002a584b5d008 -> 0x000002a584bbfea8]: eval eval withVisible withCallingHandlers eval eval with_handlers doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList withRestarts <Anonymous> evaluate in_dir in_input_dir eng_r block_exec call_block process_group withCallingHandlers <Anonymous> process_file <Anonymous> <Anonymous> execute .main \n#> tracemem[0x000002a584bbfea8 -> 0x000002a584bbfe38]: [[<-.data.frame [[<- eval eval withVisible withCallingHandlers eval eval with_handlers doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList withRestarts <Anonymous> evaluate in_dir in_input_dir eng_r block_exec call_block process_group withCallingHandlers <Anonymous> process_file <Anonymous> <Anonymous> execute .main\n```\n\n\n:::\n\n```{.r .cell-code}\nuntracemem(x)\n```\n:::\n\n\n**List (uses internal C code) - Copied once!**\n\n::: {.cell}\n\n```{.r .cell-code}\ny <- as.list(x)\n\ncat(tracemem(y), \"\\n\")\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> <000002A584B16388>\n```\n\n\n:::\n\n```{.r .cell-code}\nfor (i in seq_along(medians)) {\n  y[[i]] <- y[[i]] - medians[[i]]\n}\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> tracemem[0x000002a584b16388 -> 0x000002a582d8fea8]: eval eval withVisible withCallingHandlers eval eval with_handlers doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList withRestarts <Anonymous> evaluate in_dir in_input_dir eng_r block_exec call_block process_group withCallingHandlers <Anonymous> process_file <Anonymous> <Anonymous> execute .main\n```\n\n\n:::\n\n```{.r .cell-code}\nuntracemem(y)\n```\n:::\n\n\n#### Benchmark this (Exercise #2)\n\n**First wrap in a function**\n\n::: {.cell}\n\n```{.r .cell-code}\nmed <- function(d, medians) {\n  for (i in seq_along(medians)) {\n    d[[i]] <- d[[i]] - medians[[i]]\n  }\n}\n```\n:::\n\n\n**Try with 5 columns**\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- data.frame(matrix(runif(5 * 1e4), ncol = 5))\nmedians <- vapply(x, median, numeric(1))\ny <- as.list(x)\n\nbench::mark(\n  \"data.frame\" = med(x, medians),\n  \"list\" = med(y, medians)\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> # A tibble: 2 × 6\n#>   expression      min   median `itr/sec` mem_alloc `gc/sec`\n#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>\n#> 1 data.frame   52.3µs   68.2µs    13411.     410KB     201.\n#> 2 list         16.2µs     33µs    28621.     391KB     279.\n```\n\n\n:::\n:::\n\n\n**Try with 20 columns**\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- data.frame(matrix(runif(5 * 1e4), ncol = 20))\nmedians <- vapply(x, median, numeric(1))\ny <- as.list(x)\n\nbench::mark(\n  \"data.frame\" = med(x, medians),\n  \"list\" = med(y, medians)\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> # A tibble: 2 × 6\n#>   expression      min   median `itr/sec` mem_alloc `gc/sec`\n#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>\n#> 1 data.frame  143.8µs  189.7µs     4722.     400KB     50.1\n#> 2 list         25.6µs   39.7µs    24419.     392KB    243.\n```\n\n\n:::\n:::\n\n\n**WOW!**\n\n\n### Environmments\n- Always modified in place (**reference semantics**)\n- Interesting because if you modify the environment, all existing bindings have the same reference\n- If two names point to the same environment, and you update one, you update both!\n\n\n::: {.cell}\n\n```{.r .cell-code}\ne1 <- rlang::env(a = 1, b = 2, c = 3)\ne2 <- e1\ne1$c <- 4\ne2$c\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 4\n```\n\n\n:::\n:::\n\n\n- This means that environments can contain themselves (!)\n\n### Exercises\n\n##### 1. Why isn't this circular?\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- list()\nx[[1]] <- x\n```\n:::\n\n\n> Because the binding to the list() object moves from `x` in the first line to `x[[1]]` in the second.\n\n##### 2. (see \"Objects with a single binding\")\n\n##### 3. What happens if you attempt to use tracemem() on an environment?\n\n\n::: {.cell}\n\n```{.r .cell-code}\ne1 <- rlang::env(a = 1, b = 2, c = 3)\ntracemem(e1)\n```\n\n::: {.cell-output .cell-output-error}\n\n```\n#> Error in tracemem(e1): 'tracemem' is not useful for promise and environment objects\n```\n\n\n:::\n:::\n\n\n> Because environments always modified in place, there's no point in tracing them\n\n\n## Unbinding and the garbage collector\n\n- If you delete the 'name' bound to an object, the object still exists\n- R runs a \"garbage collector\" (GC) to remove these objects when it needs more memory\n- \"Looking from the outside, it’s basically impossible to predict when the GC will run. In fact, you shouldn’t even try.\"\n- If you want to know when it runs, use `gcinfo(TRUE)` to get a message printed\n- You can force GC with `gc()` but you never need to to use more memory *within* R\n- Only reason to do so is to free memory for other system software, or, to get the\nmessage printed about how much memory is being used\n\n\n::: {.cell}\n\n```{.r .cell-code}\ngc()\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#>           used (Mb) gc trigger (Mb) max used (Mb)\n#> Ncells  805637 43.1    1486050 79.4  1486050 79.4\n#> Vcells 4532584 34.6   10146329 77.5 10146315 77.5\n```\n\n\n:::\n\n```{.r .cell-code}\nmem_used()\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> 81.38 MB\n```\n\n\n:::\n:::\n\n\n- These numbers will **not** be what you OS tells you because, \n  1. It includes objects created by R, but not R interpreter\n  2. R and OS are lazy and don't reclaim/release memory until it's needed\n  3. R counts memory from objects, but there are gaps due to those that are deleted -> \n  *memory fragmentation* [less memory actually available they you might think]\n",
+    "markdown": "---\nengine: knitr\ntitle: Names and values\n---\n\n## Learning objectives\n\n- Distinguish between an *object* and its *name*.\n- Identify when data are *copied* versus *modified*.\n- Trace and identify the memory used by R.\n\nThe `{lobstr}` package will help us throughout the chapter\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(lobstr)\n```\n:::\n\n\n\n## Syntactic names are easier to create and work with than non-syntactic names\n\n\n- Syntactic names: `my_variable`, `x`, `cpp11`, `.by`.\n  - Can't use names in `?Reserved`\n\n- Non-syntactic names need to be surrounded in backticks. \n\n## Names are *bound to* values with `<-`\n\n\n::: {.cell}\n\n```{.r .cell-code}\na <- c(1, 2, 3)\na\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1 2 3\n```\n\n\n:::\n\n```{.r .cell-code}\nobj_addr(a)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"0x2226bfbc968\"\n```\n\n\n:::\n:::\n\n\n## Many names can be bound to the same values\n\n\n::: {.cell}\n\n```{.r .cell-code}\nb <- a\nobj_addr(a)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"0x2226bfbc968\"\n```\n\n\n:::\n\n```{.r .cell-code}\nobj_addr(b)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"0x2226bfbc968\"\n```\n\n\n:::\n:::\n\n\n## If shared values are modified, the object is copied to a new address\n\n\n::: {.cell}\n\n```{.r .cell-code}\nb[[1]] <- 5\nobj_addr(a)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"0x2226bfbc968\"\n```\n\n\n:::\n\n```{.r .cell-code}\nobj_addr(b)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"0x2226fda7278\"\n```\n\n\n:::\n:::\n\n\n## Memory addresses can differ even if objects seem the same\n\n\n::: {.cell}\n\n```{.r .cell-code}\na <- 1:10\nb <- a\nc <- 1:10\n\nobj_addr(a)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"0x22271c9e7b0\"\n```\n\n\n:::\n\n```{.r .cell-code}\nobj_addr(b)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"0x22271c9e7b0\"\n```\n\n\n:::\n\n```{.r .cell-code}\nobj_addr(c)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"0x22271d77708\"\n```\n\n\n:::\n:::\n\n\n## Functions have a single address regardless of how they're referenced\n\n\n::: {.cell}\n\n```{.r .cell-code}\nobj_addr(mean)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"0x2226f891738\"\n```\n\n\n:::\n\n```{.r .cell-code}\nobj_addr(base::mean)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"0x2226f891738\"\n```\n\n\n:::\n\n```{.r .cell-code}\nobj_addr(get(\"mean\"))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"0x2226f891738\"\n```\n\n\n:::\n:::\n\n\n## Unlike most objects, environments keep the same memory address on modify\n\n\n::: {.cell}\n\n```{.r .cell-code}\nd <- new.env()\nobj_addr(d)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"0x2226b7489d8\"\n```\n\n\n:::\n\n```{.r .cell-code}\ne <- d\ne[['a']] <- 1\nobj_addr(e)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"0x2226b7489d8\"\n```\n\n\n:::\n\n```{.r .cell-code}\nobj_addr(d)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"0x2226b7489d8\"\n```\n\n\n:::\n\n```{.r .cell-code}\nd[['a']]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1\n```\n\n\n:::\n:::\n\n\n## Use `tracemem` to validate if values are copied or modified\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- runif(10)\ntracemem(x)\n#> [1] \"<000001F4185B4B08>\"\ny <- x\nx[[1]] <- 10\n#> tracemem[0x000001f4185b4b08 -> 0x000001f4185b4218]:\nuntracemem(x)\n```\n:::\n\n\n## `tracemem` shows internal C code minimizes copying\n\n\n::: {.cell}\n\n```{.r .cell-code}\ny <- as.list(x)\ntracemem(y)\n#> [1] \"<000001AD67FDCD38>\"\nmedians <- vapply(x, median, numeric(1))\nfor (i in 1:5) {\n  y[[i]] <- y[[i]] - medians[[i]]\n}\n#> tracemem[0x000001ad67fdcd38 -> 0x000001ad61982638]:\nuntracemem(y)\n```\n:::\n\n\n## A function's environment follows copy-on-modify rules\n\n:::: {.columns}\n\n::: {.column}\n\n::: {.cell}\n\n```{.r .cell-code}\nf <- function(a) {\n  a\n}\n\nx <- c(1, 2, 3)\nz <- f(x) # No change in value\n\nobj_addr(x)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"0x2226bdb8738\"\n```\n\n\n:::\n\n```{.r .cell-code}\nobj_addr(z) # No address change \n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"0x2226bdb8738\"\n```\n\n\n:::\n:::\n\n:::\n\n::: {.column}\n![](images/02-trace.png)\n:::\n\n::::\n\n::: notes\n- Diagrams will be explained more in chapter 7.\n- `a` points to same address as `x`.\n- If `a` modified inside function, `z` would have new address.\n:::\n\n\n## `ref()` shows the memory address of a list and its *elements*\n\n:::: {.columns}\n\n::: {.column}\n\n::: {.cell}\n\n```{.r .cell-code}\nl1 <- list(1, 2, 3)\nobj_addr(l1)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"0x2226c315db8\"\n```\n\n\n:::\n\n```{.r .cell-code}\nl2 <- l1\nl2[[3]] <- 4\nref(l1, l2)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> █ [1:0x2226c315db8] <list> \n#> ├─[2:0x2226c730078] <dbl> \n#> ├─[3:0x2226c75cdc8] <dbl> \n#> └─[4:0x2226c75cc08] <dbl> \n#>  \n#> █ [5:0x2226c36cd68] <list> \n#> ├─[2:0x2226c730078] \n#> ├─[3:0x2226c75cdc8] \n#> └─[6:0x2226c75b318] <dbl>\n```\n\n\n:::\n:::\n\n:::\n\n::: {.column}\n![](images/02-l-modify-2.png){width=50%}\n:::\n\n::::\n\n## Since dataframes are lists of (column) vectors, mutating a column modifies only that column\n\n\n::: {.cell}\n\n```{.r .cell-code}\nd1 <- data.frame(x = c(1, 5, 6), y = c(2, 4, 3))\nd2 <- d1\nd2[, 2] <- d2[, 2] * 2\nref(d1, d2)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> █ [1:0x2226cca93c8] <df[,2]> \n#> ├─x = [2:0x22272216148] <dbl> \n#> └─y = [3:0x222722160f8] <dbl> \n#>  \n#> █ [4:0x2226ce0cf48] <df[,2]> \n#> ├─x = [2:0x22272216148] \n#> └─y = [5:0x222727b0c38] <dbl>\n```\n\n\n:::\n:::\n\n\n## Since dataframes are lists of (column) vectors, mutating a row modifies the value\n\n\n::: {.cell}\n\n```{.r .cell-code}\nd1 <- data.frame(x = c(1, 5, 6), y = c(2, 4, 3))\nd2 <- d1\nd2[1, ] <- d2[1, ] * 2\nref(d1, d2)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> █ [1:0x22272912588] <df[,2]> \n#> ├─x = [2:0x222730bd408] <dbl> \n#> └─y = [3:0x222730bd3b8] <dbl> \n#>  \n#> █ [4:0x22272b0e548] <df[,2]> \n#> ├─x = [5:0x222731501d8] <dbl> \n#> └─y = [6:0x22273150188] <dbl>\n```\n\n\n:::\n:::\n\n\n::: notes\n- Here \"mutate\" means \"change\", not `dplyr::mutate()`\n:::\n\n## Characters are unique due to the global string pool\n\n:::: {.columns}\n\n::: {.column}\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- 1:4\nref(x)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1:0x22272f78460] <int>\n```\n\n\n:::\n\n```{.r .cell-code}\ny <- 1:4\nref(y)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1:0x222730899d8] <int>\n```\n\n\n:::\n\n```{.r .cell-code}\nx <- c(\"a\", \"a\", \"b\")\nref(x, character = TRUE)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> █ [1:0x2227394f1d8] <chr> \n#> ├─[2:0x22267d9b118] <string: \"a\"> \n#> ├─[2:0x22267d9b118] \n#> └─[3:0x2226e17f3b8] <string: \"b\">\n```\n\n\n:::\n\n```{.r .cell-code}\ny <- c(\"a\")\nref(y, character = TRUE)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> █ [1:0x2227397fce8] <chr> \n#> └─[2:0x22267d9b118] <string: \"a\">\n```\n\n\n:::\n:::\n\n:::\n\n::: {.column}\n![](images/02-character-2.png)\n:::\n\n::::\n\n::: notes\n- \"a\" is always at the same address.\n- Each member of character vector has its own address (kind of list-like).\n:::\n\n## Memory amount can also be measured, using `lobstr::obj_size`\n\n\n::: {.cell}\n\n```{.r .cell-code}\nbanana <- \"bananas bananas bananas\"\nobj_addr(banana)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"0x22271bc25b8\"\n```\n\n\n:::\n\n```{.r .cell-code}\nobj_size(banana)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> 136 B\n```\n\n\n:::\n:::\n\n\n## Alternative Representation or ALTREPs represent vector values efficiently\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- 1:10\nobj_size(x)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> 680 B\n```\n\n\n:::\n\n```{.r .cell-code}\ny <- 1:10000\nobj_size(y)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> 680 B\n```\n\n\n:::\n:::\n\n\n## We can measure memory & speed using `bench::mark()`\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmed <- function(d, medians) {\n  for (i in seq_along(medians)) {\n    d[[i]] <- d[[i]] - medians[[i]]\n  }\n}\nx <- data.frame(matrix(runif(5 * 1e4), ncol = 5))\nmedians <- vapply(x, median, numeric(1))\ny <- as.list(x)\n\nbench::mark(\n  \"data.frame\" = med(x, medians),\n  \"list\" = med(y, medians)\n)[, c(\"min\", \"median\", \"mem_alloc\")]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> # A tibble: 2 × 3\n#>        min   median mem_alloc\n#>   <bch:tm> <bch:tm> <bch:byt>\n#> 1   52.7µs   66.6µs     491KB\n#> 2   22.8µs   33.7µs     391KB\n```\n\n\n:::\n:::\n\n\n::: notes\n- The thing to see: list version uses less RAM and is faster\n:::\n\n## The garbage collector `gc()` explicitly clears out unbound objects\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- 1:3\nx <- 2:4 # \"1:3\" is orphaned\nrm(x) # \"2:4\" is orphaned\ngc()\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#>           used (Mb) gc trigger (Mb) max used (Mb)\n#> Ncells  791104 42.3    1505464 80.5  1505464 80.5\n#> Vcells 1497631 11.5    8388608 64.0  8388482 64.0\n```\n\n\n:::\n\n```{.r .cell-code}\nlobstr::mem_used() # Wrapper around gc()\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> 56.29 MB\n```\n\n\n:::\n:::\n\n\n::: aside\n`gc()` runs automatically, never *need* to call\n:::\n\n::: notes\n- `mem_used()` multiplies Ncells \"used\" by either 28 (32-bit architecture) or 56 (64-bit architecture)., and Vcells \"used\" by 8, adds them, and converts to Mb.\n:::",
     "supporting": [
       "02_files"
     ],
     "filters": [
       "rmarkdown/pagebreak.lua"
     ],
-    "includes": {},
+    "includes": {
+      "include-after-body": [
+        "\n<script>\n  // htmlwidgets need to know to resize themselves when slides are shown/hidden.\n  // Fire the \"slideenter\" event (handled by htmlwidgets.js) when the current\n  // slide changes (different for each slide format).\n  (function () {\n    // dispatch for htmlwidgets\n    function fireSlideEnter() {\n      const event = window.document.createEvent(\"Event\");\n      event.initEvent(\"slideenter\", true, true);\n      window.document.dispatchEvent(event);\n    }\n\n    function fireSlideChanged(previousSlide, currentSlide) {\n      fireSlideEnter();\n\n      // dispatch for shiny\n      if (window.jQuery) {\n        if (previousSlide) {\n          window.jQuery(previousSlide).trigger(\"hidden\");\n        }\n        if (currentSlide) {\n          window.jQuery(currentSlide).trigger(\"shown\");\n        }\n      }\n    }\n\n    // hookup for slidy\n    if (window.w3c_slidy) {\n      window.w3c_slidy.add_observer(function (slide_num) {\n        // slide_num starts at position 1\n        fireSlideChanged(null, w3c_slidy.slides[slide_num - 1]);\n      });\n    }\n\n  })();\n</script>\n\n"
+      ]
+    },
     "engineDependencies": {},
     "preserve": {},
     "postProcess": true
diff --git a/slides/.gitignore b/slides/.gitignore
@@ -0,0 +1 @@
+*_files/
+\ No newline at end of file
diff --git a/slides/02.qmd b/slides/02.qmd
@@ -5,132 +5,114 @@ title: Names and values
 
 ## Learning objectives
 
-- To be able to understand distinction between an *object* and its *name*
-- With this knowledge, to be able write faster code using less memory
-- To better understand R's functional programming tools
+- Distinguish between an *object* and its *name*.
+- Identify when data are *copied* versus *modified*.
+- Trace and identify the memory used by R.
+
+The `{lobstr}` package will help us throughout the chapter
 
-Using lobstr package here.
 ```{r}
 library(lobstr)
 ```
 
 
-## Quiz
-
-### 1. How do I create a new column called `3` that contains the sum of `1` and `2`?
+## Syntactic names are easier to create and work with than non-syntactic names
 
-```{r}
-df <- data.frame(runif(3), runif(3))
-names(df) <- c(1, 2)
-df
-```
 
-```{r}
-df$`3` <- df$`1` + df$`2`
-df
-```
+- Syntactic names: `my_variable`, `x`, `cpp11`, `.by`.
+  - Can't use names in `?Reserved`
 
-**What makes these names challenging?**
+- Non-syntactic names need to be surrounded in backticks. 
 
-> You need to use backticks (`) when the name of an object doesn't start with a 
-> a character or '.' [or . followed by a number] (non-syntactic names).
-
-### 2. How much memory does `y` occupy?
+## Names are *bound to* values with `<-`
 
 ```{r}
-x <- runif(1e6)
-y <- list(x, x, x)
+a <- c(1, 2, 3)
+a
+obj_addr(a)
 ```
 
-Need to use the lobstr package:
-```{r}
-lobstr::obj_size(y)
-```
-
-> Note that if you look in the RStudio Environment or use R base `object.size()`
-> you actually get a value of 24 MB
+## Many names can be bound to the same values
 
 ```{r}
-object.size(y)
-```
-
-### 3. On which line does `a` get copied in the following example?
-```{r}
-a <- c(1, 5, 3, 2)
 b <- a
-b[[1]] <- 10
+obj_addr(a)
+obj_addr(b)
 ```
 
-> Not until `b` is modified, the third line
-
-## Binding basics
-
-- Create values and *bind* a name to them
-- Names have values (rather than values have names)
-- Multiple names can refer to the same values
-- We can look at an object's address to keep track of the values independent of their names
+## If shared values are modified, the object is copied to a new address
 
 ```{r}
-x <- c(1, 2, 3)
-y <- x
-obj_addr(x)
-obj_addr(y)
+b[[1]] <- 5
+obj_addr(a)
+obj_addr(b)
 ```
 
+## Memory addresses can differ even if objects seem the same
 
-### Exercises
-
-##### 1. Explain the relationships
 ```{r}
 a <- 1:10
 b <- a
-c <- b
-d <- 1:10
-```
+c <- 1:10
 
-> `a` `b` and `c` are all names that refer to the first value `1:10`
-> 
-> `d` is a name that refers to the *second* value of `1:10`.
+obj_addr(a)
+obj_addr(b)
+obj_addr(c)
+```
 
+## Functions have a single address regardless of how they're referenced
 
-##### 2. Do the following all point to the same underlying function object? hint: `lobstr::obj_addr()`
 ```{r}
 obj_addr(mean)
 obj_addr(base::mean)
 obj_addr(get("mean"))
-obj_addr(evalq(mean))
-obj_addr(match.fun("mean"))
 ```
 
-> Yes!
-
-## Copy-on-modify
-
-- If you modify a value bound to multiple names, it is 'copy-on-modify'
-- If you modify a value bound to a single name, it is 'modify-in-place'
-- Use `tracemem()` to see when a name's value changes
+## Unlike most objects, environments keep the same memory address on modify
 
 ```{r}
-x <- c(1, 2, 3)
-cat(tracemem(x), "\n")
+d <- new.env()
+obj_addr(d)
+e <- d
+e[['a']] <- 1
+obj_addr(e)
+obj_addr(d)
+d[['a']]
 ```
 
+## Use `tracemem` to validate if values are copied or modified
+
 ```{r}
+#| eval: false
+x <- runif(10)
+tracemem(x)
+#> [1] "<000001F4185B4B08>"
 y <- x
-y[[3]] <- 4L  # Changes (copy-on-modify)
-y[[3]] <- 5L  # Doesn't change (modify-in-place)
+x[[1]] <- 10
+#> tracemem[0x000001f4185b4b08 -> 0x000001f4185b4218]:
+untracemem(x)
 ```
 
-Turn off `tracemem()` with `untracemem()`
-
-> Can also use `ref(x)` to get the address of the value bound to a given name
+## `tracemem` shows internal C code minimizes copying
 
+```{r}
+#| eval: false
+y <- as.list(x)
+tracemem(y)
+#> [1] "<000001AD67FDCD38>"
+medians <- vapply(x, median, numeric(1))
+for (i in 1:5) {
+  y[[i]] <- y[[i]] - medians[[i]]
+}
+#> tracemem[0x000001ad67fdcd38 -> 0x000001ad61982638]:
+untracemem(y)
+```
 
-## Functions
+## A function's environment follows copy-on-modify rules
 
-- Copying also applies within functions
-- If you copy (but don't modify) `x` within `f()`, no copy is made
+:::: {.columns}
 
+::: {.column}
 ```{r}
 f <- function(a) {
   a
@@ -139,264 +121,119 @@ f <- function(a) {
 x <- c(1, 2, 3)
 z <- f(x) # No change in value
 
-ref(x)
-ref(z)
+obj_addr(x)
+obj_addr(z) # No address change 
 ```
+:::
 
-<!-- ![](images/02-trace.png) -->
+::: {.column}
+![](images/02-trace.png)
+:::
 
-## Lists
+::::
 
-- A list overall, has it's own reference (id)
-- List *elements* also each point to other values
-- List doesn't store the value, it *stores a reference to the value*
-- As of R 3.1.0, modifying lists creates a *shallow copy*
-    - References (bindings) are copied, but *values are not*
+::: notes
+- Diagrams will be explained more in chapter 7.
+- `a` points to same address as `x`.
+- If `a` modified inside function, `z` would have new address.
+:::
 
+
+## `ref()` shows the memory address of a list and its *elements*
+
+:::: {.columns}
+
+::: {.column}
 ```{r}
 l1 <- list(1, 2, 3)
+obj_addr(l1)
 l2 <- l1
 l2[[3]] <- 4
-```
-
-- We can use `ref()` to see how they compare
-  - See how the list reference is different
-  - But first two items in each list are the same
-
-```{r}
 ref(l1, l2)
 ```
+:::
 
+::: {.column}
 ![](images/02-l-modify-2.png){width=50%}
+:::
 
-## Data Frames
+::::
 
-- Data frames are lists of vectors
-- So copying and modifying a column *only affects that column*
-- **BUT** if you modify a *row*, every column must be copied
+## Since dataframes are lists of (column) vectors, mutating a column modifies only that column
 
 ```{r}
 d1 <- data.frame(x = c(1, 5, 6), y = c(2, 4, 3))
 d2 <- d1
-d3 <- d1
-```
-
-Only the modified column changes
-```{r}
 d2[, 2] <- d2[, 2] * 2
 ref(d1, d2)
 ```
 
-All columns change
-```{r}
-d3[1, ] <- d3[1, ] * 3
-ref(d1, d3)
-```
-
-## Character vectors
-
-- R has a **global string pool**
-- Elements of character vectors point to unique strings in the pool
+## Since dataframes are lists of (column) vectors, mutating a row modifies the value
 
 ```{r}
-x <- c("a", "a", "abc", "d")
-```
-
-![](images/02-character-2.png)
-
-## Exercises
-
-##### 1. Why is `tracemem(1:10)` not useful?
-
-> Because it tries to trace a value that is not bound to a name
-
-##### 2. Why are there two copies?
-```{r}
-x <- c(1L, 2L, 3L)
-tracemem(x)
-x[[3]] <- 4
-```
-
-> Because we convert an *integer* vector (using 1L, etc.) to a *double* vector (using just 4)- 
-
-##### 3. What is the relationships among these objects?
-
-```{r}
-a <- 1:10      
-b <- list(a, a)
-c <- list(b, a, 1:10) # 
-```
-
-a <- obj 1    
-b <- obj 1, obj 1    
-c <- b(obj 1, obj 1), obj 1, 1:10    
-
-```{r}
-ref(c)
+d1 <- data.frame(x = c(1, 5, 6), y = c(2, 4, 3))
+d2 <- d1
+d2[1, ] <- d2[1, ] * 2
+ref(d1, d2)
 ```
 
+::: notes
+- Here "mutate" means "change", not `dplyr::mutate()`
+:::
 
-##### 4. What happens here?
-```{r}
-x <- list(1:10)
-x[[2]] <- x
-```
+## Characters are unique due to the global string pool
 
-- `x` is a list
-- `x[[2]] <- x` creates a new list, which in turn contains a reference to the 
-  original list
-- `x` is no longer bound to `list(1:10)`
+:::: {.columns}
 
+::: {.column}
 ```{r}
+x <- 1:4
 ref(x)
+y <- 1:4
+ref(y)
+x <- c("a", "a", "b")
+ref(x, character = TRUE)
+y <- c("a")
+ref(y, character = TRUE)
 ```
+:::
 
-![](images/02-copy_on_modify_fig2.png){width=50%}
-
-## Object Size
-
-- Use `lobstr::obj_size()` 
-- Lists may be smaller than expected because of referencing the same value
-- Strings may be smaller than expected because using global string pool
-- Difficult to predict how big something will be
-  - Can only add sizes together if they share no references in common
-
-### Alternative Representation
-- As of R 3.5.0 - ALTREP
-- Represent some vectors compactly
-    - e.g., 1:1000 - not 10,000 values, just 1 and 1,000
-
-### Exercises
-
-##### 1. Why are the sizes so different?
-
-```{r}
-y <- rep(list(runif(1e4)), 100)
-
-object.size(y) # ~8000 kB
-obj_size(y)    # ~80   kB
-```
-
-> From `?object.size()`: 
-> 
-> "This function merely provides a rough indication: it should be reasonably accurate for atomic vectors, but **does not detect if elements of a list are shared**, for example.
-
-##### 2. Why is the size misleading?
-
-```{r}
-funs <- list(mean, sd, var)
-obj_size(funs)
-```
-
-> Because they reference functions from base and stats, which are always available.
-> Why bother looking at the size? What use is that?
-
-##### 3. Predict the sizes
-
-```{r}
-a <- runif(1e6) # 8 MB
-obj_size(a)
-```
-
-
-```{r}
-b <- list(a, a)
-```
-
-- There is one value ~8MB
-- `a` and `b[[1]]` and `b[[2]]` all point to the same value.
-
-```{r}
-obj_size(b)
-obj_size(a, b)
-```
-
-
-```{r}
-b[[1]][[1]] <- 10
-```
-- Now there are two values ~8MB each (16MB total)
-- `a` and `b[[2]]` point to the same value (8MB)
-- `b[[1]]` is new (8MB) because the first element (`b[[1]][[1]]`) has been changed
-
-```{r}
-obj_size(b)     # 16 MB (two values, two element references)
-obj_size(a, b)  # 16 MB (a & b[[2]] point to the same value)
-```
-
-
-```{r}
-b[[2]][[1]] <- 10
-```
-- Finally, now there are three values ~8MB each (24MB total)
-- Although `b[[1]]` and `b[[2]]` have the same contents, 
-  they are not references to the same object.
-
-```{r}
-obj_size(b)
-obj_size(a, b)
-```
-
-
-## Modify-in-place
-
-- Modifying usually creates a copy except for
-    - Objects with a single binding (performance optimization)
-    - Environments (special)
-
-### Objects with a single binding
-
-- Hard to know if copy will occur
-- If you have 2+ bindings and remove them, R can't follow how many are removed (so will always think there are more than one)
-- May make a copy even if there's only one binding left
-- Using a function makes a reference to it **unless it's a function based on C**
-- Best to use `tracemem()` to check rather than guess.
+::: {.column}
+![](images/02-character-2.png)
+:::
 
+::::
 
-#### Example - lists vs. data frames in for loop
+::: notes
+- "a" is always at the same address.
+- Each member of character vector has its own address (kind of list-like).
+:::
 
-**Setup**  
+## Memory amount can also be measured, using `lobstr::obj_size`
 
-Create the data to modify
 ```{r}
-x <- data.frame(matrix(runif(5 * 1e4), ncol = 5))
-medians <- vapply(x, median, numeric(1))
+banana <- "bananas bananas bananas"
+obj_addr(banana)
+obj_size(banana)
 ```
 
+## Alternative Representation or ALTREPs represent vector values efficiently
 
-**Data frame - Copied every time!**
-```{r}
-cat(tracemem(x), "\n")
-for (i in seq_along(medians)) {
-  x[[i]] <- x[[i]] - medians[[i]]
-}
-untracemem(x)
-```
-
-**List (uses internal C code) - Copied once!**
 ```{r}
-y <- as.list(x)
-
-cat(tracemem(y), "\n")
-for (i in seq_along(medians)) {
-  y[[i]] <- y[[i]] - medians[[i]]
-}
-untracemem(y)
+x <- 1:10
+obj_size(x)
+y <- 1:10000
+obj_size(y)
 ```
 
-#### Benchmark this (Exercise #2)
+## We can measure memory & speed using `bench::mark()`
 
-**First wrap in a function**
 ```{r}
 med <- function(d, medians) {
   for (i in seq_along(medians)) {
     d[[i]] <- d[[i]] - medians[[i]]
   }
 }
-```
-
-**Try with 5 columns**
-```{r}
 x <- data.frame(matrix(runif(5 * 1e4), ncol = 5))
 medians <- vapply(x, median, numeric(1))
 y <- as.list(x)
@@ -404,78 +241,27 @@ y <- as.list(x)
 bench::mark(
   "data.frame" = med(x, medians),
   "list" = med(y, medians)
-)
-```
-
-**Try with 20 columns**
-```{r}
-x <- data.frame(matrix(runif(5 * 1e4), ncol = 20))
-medians <- vapply(x, median, numeric(1))
-y <- as.list(x)
-
-bench::mark(
-  "data.frame" = med(x, medians),
-  "list" = med(y, medians)
-)
+)[, c("min", "median", "mem_alloc")]
 ```
 
-**WOW!**
+::: notes
+- The thing to see: list version uses less RAM and is faster
+:::
 
-
-### Environmments
-- Always modified in place (**reference semantics**)
-- Interesting because if you modify the environment, all existing bindings have the same reference
-- If two names point to the same environment, and you update one, you update both!
-
-```{r}
-e1 <- rlang::env(a = 1, b = 2, c = 3)
-e2 <- e1
-e1$c <- 4
-e2$c
-```
-
-- This means that environments can contain themselves (!)
-
-### Exercises
-
-##### 1. Why isn't this circular?
-```{r}
-x <- list()
-x[[1]] <- x
-```
-
-> Because the binding to the list() object moves from `x` in the first line to `x[[1]]` in the second.
-
-##### 2. (see "Objects with a single binding")
-
-##### 3. What happens if you attempt to use tracemem() on an environment?
-
-```{r}
-#| error: true
-e1 <- rlang::env(a = 1, b = 2, c = 3)
-tracemem(e1)
-```
-
-> Because environments always modified in place, there's no point in tracing them
-
-
-## Unbinding and the garbage collector
-
-- If you delete the 'name' bound to an object, the object still exists
-- R runs a "garbage collector" (GC) to remove these objects when it needs more memory
-- "Looking from the outside, it’s basically impossible to predict when the GC will run. In fact, you shouldn’t even try."
-- If you want to know when it runs, use `gcinfo(TRUE)` to get a message printed
-- You can force GC with `gc()` but you never need to to use more memory *within* R
-- Only reason to do so is to free memory for other system software, or, to get the
-message printed about how much memory is being used
+## The garbage collector `gc()` explicitly clears out unbound objects
 
 ```{r}
+x <- 1:3
+x <- 2:4 # "1:3" is orphaned
+rm(x) # "2:4" is orphaned
 gc()
-mem_used()
+lobstr::mem_used() # Wrapper around gc()
 ```
 
-- These numbers will **not** be what you OS tells you because, 
-  1. It includes objects created by R, but not R interpreter
-  2. R and OS are lazy and don't reclaim/release memory until it's needed
-  3. R counts memory from objects, but there are gaps due to those that are deleted -> 
-  *memory fragmentation* [less memory actually available they you might think]
+::: aside
+`gc()` runs automatically, never *need* to call
+:::
+
+::: notes
+- `mem_used()` multiplies Ncells "used" by either 28 (32-bit architecture) or 56 (64-bit architecture)., and Vcells "used" by 8, adds them, and converts to Mb.
+:::
+\ No newline at end of file
diff --git a/slides/_metadata.yml b/slides/_metadata.yml
@@ -7,10 +7,7 @@ format:
     incremental: false
 execute:
   eval: true
-  r:
-    echo: true
-  mermaid:
-    echo: false
+  echo: true
 knitr:
   opts_chunk:
     comment: "#>"

	bookclub-advr DSLC Advanced R Book Club
	git clone https://git.eamoncaddigan.net/bookclub-advr.git
	Log \| Files \| Refs \| README \| LICENSE

M	_freeze/slides/02/execute-results/html.json	\|	10	+++++++---
A	slides/.gitignore	\|	2	++
M	slides/02.qmd	\|	489	+++++++++++++++++++++++--------------------------------------------------------
M	slides/_metadata.yml	\|	5	+----