commit f22786e78fef423e1665535f51e10df65a64f3a0 parent 1f5873e49a7b16de137bb3753538e93eefd74005 Author: Jon Harmon <jonthegeek@gmail.com> Date: Mon, 4 Aug 2025 14:52:47 -0500 Extract videos into separate files (#83) I kept the script I used so I can copy it to an eventual package to help me make future conversions. Diffstat:
232 files changed, 1341 insertions(+), 1437 deletions(-)
diff --git a/.Rbuildignore b/.Rbuildignore @@ -1,3 +1,5 @@ ^.*\.Rproj$ ^\.Rproj\.user$ ^\.github$ +^[.]?air[.]toml$ +^\.vscode$ diff --git a/_freeze/slides/02/execute-results/html.json b/_freeze/slides/02/execute-results/html.json @@ -1,9 +1,11 @@ { - "hash": "187fa4fdc5274717a47ba9d683c72618", + "hash": "a11b52aa373c64a45cd125fdb1d36946", "result": { "engine": "knitr", - "markdown": "---\nengine: knitr\ntitle: Names and values\n---\n\n## Learning objectives\n\n- To be able to understand distinction between an *object* and its *name*\n- With this knowledge, to be able write faster code using less memory\n- To better understand R's functional programming tools\n\nUsing lobstr package here.\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(lobstr)\n```\n:::\n\n\n\n## Quiz\n\n### 1. How do I create a new column called `3` that contains the sum of `1` and `2`?\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndf <- data.frame(runif(3), runif(3))\nnames(df) <- c(1, 2)\ndf\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> 1 2\n#> 1 0.3404627 0.03332227\n#> 2 0.9892320 0.19464379\n#> 3 0.7218355 0.77230025\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndf$`3` <- df$`1` + df$`2`\ndf\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> 1 2 3\n#> 1 0.3404627 0.03332227 0.373785\n#> 2 0.9892320 0.19464379 1.183876\n#> 3 0.7218355 0.77230025 1.494136\n```\n\n\n:::\n:::\n\n\n**What makes these names challenging?**\n\n> You need to use backticks (`) when the name of an object doesn't start with a \n> a character or '.' [or . followed by a number] (non-syntactic names).\n\n### 2. How much memory does `y` occupy?\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- runif(1e6)\ny <- list(x, x, x)\n```\n:::\n\n\nNeed to use the lobstr package:\n\n::: {.cell}\n\n```{.r .cell-code}\nlobstr::obj_size(y)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> 8.00 MB\n```\n\n\n:::\n:::\n\n\n> Note that if you look in the RStudio Environment or use R base `object.size()`\n> you actually get a value of 24 MB\n\n\n::: {.cell}\n\n```{.r .cell-code}\nobject.size(y)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> 24000224 bytes\n```\n\n\n:::\n:::\n\n\n### 3. On which line does `a` get copied in the following example?\n\n::: {.cell}\n\n```{.r .cell-code}\na <- c(1, 5, 3, 2)\nb <- a\nb[[1]] <- 10\n```\n:::\n\n\n> Not until `b` is modified, the third line\n\n## Binding basics\n\n- Create values and *bind* a name to them\n- Names have values (rather than values have names)\n- Multiple names can refer to the same values\n- We can look at an object's address to keep track of the values independent of their names\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- c(1, 2, 3)\ny <- x\nobj_addr(x)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"0x188d0060c28\"\n```\n\n\n:::\n\n```{.r .cell-code}\nobj_addr(y)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"0x188d0060c28\"\n```\n\n\n:::\n:::\n\n\n\n### Exercises\n\n##### 1. Explain the relationships\n\n::: {.cell}\n\n```{.r .cell-code}\na <- 1:10\nb <- a\nc <- b\nd <- 1:10\n```\n:::\n\n\n> `a` `b` and `c` are all names that refer to the first value `1:10`\n> \n> `d` is a name that refers to the *second* value of `1:10`.\n\n\n##### 2. Do the following all point to the same underlying function object? hint: `lobstr::obj_addr()`\n\n::: {.cell}\n\n```{.r .cell-code}\nobj_addr(mean)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"0x188cb8b1738\"\n```\n\n\n:::\n\n```{.r .cell-code}\nobj_addr(base::mean)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"0x188cb8b1738\"\n```\n\n\n:::\n\n```{.r .cell-code}\nobj_addr(get(\"mean\"))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"0x188cb8b1738\"\n```\n\n\n:::\n\n```{.r .cell-code}\nobj_addr(evalq(mean))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"0x188cb8b1738\"\n```\n\n\n:::\n\n```{.r .cell-code}\nobj_addr(match.fun(\"mean\"))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"0x188cb8b1738\"\n```\n\n\n:::\n:::\n\n\n> Yes!\n\n## Copy-on-modify\n\n- If you modify a value bound to multiple names, it is 'copy-on-modify'\n- If you modify a value bound to a single name, it is 'modify-in-place'\n- Use `tracemem()` to see when a name's value changes\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- c(1, 2, 3)\ncat(tracemem(x), \"\\n\")\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> <00000188D0CBA058>\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ny <- x\ny[[3]] <- 4L # Changes (copy-on-modify)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> tracemem[0x00000188d0cba058 -> 0x00000188d0fa2648]: eval eval withVisible withCallingHandlers eval eval with_handlers doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList withRestarts <Anonymous> evaluate in_dir in_input_dir eng_r block_exec call_block process_group withCallingHandlers <Anonymous> process_file <Anonymous> <Anonymous> execute .main\n```\n\n\n:::\n\n```{.r .cell-code}\ny[[3]] <- 5L # Doesn't change (modify-in-place)\n```\n:::\n\n\nTurn off `tracemem()` with `untracemem()`\n\n> Can also use `ref(x)` to get the address of the value bound to a given name\n\n\n## Functions\n\n- Copying also applies within functions\n- If you copy (but don't modify) `x` within `f()`, no copy is made\n\n\n::: {.cell}\n\n```{.r .cell-code}\nf <- function(a) {\n a\n}\n\nx <- c(1, 2, 3)\nz <- f(x) # No change in value\n\nref(x)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1:0x188d1604d78] <dbl>\n```\n\n\n:::\n\n```{.r .cell-code}\nref(z)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1:0x188d1604d78] <dbl>\n```\n\n\n:::\n:::\n\n\n<!--  -->\n\n## Lists\n\n- A list overall, has it's own reference (id)\n- List *elements* also each point to other values\n- List doesn't store the value, it *stores a reference to the value*\n- As of R 3.1.0, modifying lists creates a *shallow copy*\n - References (bindings) are copied, but *values are not*\n\n\n::: {.cell}\n\n```{.r .cell-code}\nl1 <- list(1, 2, 3)\nl2 <- l1\nl2[[3]] <- 4\n```\n:::\n\n\n- We can use `ref()` to see how they compare\n - See how the list reference is different\n - But first two items in each list are the same\n\n\n::: {.cell}\n\n```{.r .cell-code}\nref(l1, l2)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> █ [1:0x188d1f72868] <list> \n#> ├─[2:0x188d2624388] <dbl> \n#> ├─[3:0x188d26241c8] <dbl> \n#> └─[4:0x188d2624008] <dbl> \n#> \n#> █ [5:0x188d1f713c8] <list> \n#> ├─[2:0x188d2624388] \n#> ├─[3:0x188d26241c8] \n#> └─[6:0x188d261ac48] <dbl>\n```\n\n\n:::\n:::\n\n\n{width=50%}\n\n## Data Frames\n\n- Data frames are lists of vectors\n- So copying and modifying a column *only affects that column*\n- **BUT** if you modify a *row*, every column must be copied\n\n\n::: {.cell}\n\n```{.r .cell-code}\nd1 <- data.frame(x = c(1, 5, 6), y = c(2, 4, 3))\nd2 <- d1\nd3 <- d1\n```\n:::\n\n\nOnly the modified column changes\n\n::: {.cell}\n\n```{.r .cell-code}\nd2[, 2] <- d2[, 2] * 2\nref(d1, d2)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> █ [1:0x188cd88acc8] <df[,2]> \n#> ├─x = [2:0x188cf51e418] <dbl> \n#> └─y = [3:0x188cf51e3c8] <dbl> \n#> \n#> █ [4:0x188cf309dc8] <df[,2]> \n#> ├─x = [2:0x188cf51e418] \n#> └─y = [5:0x188d0111bd8] <dbl>\n```\n\n\n:::\n:::\n\n\nAll columns change\n\n::: {.cell}\n\n```{.r .cell-code}\nd3[1, ] <- d3[1, ] * 3\nref(d1, d3)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> █ [1:0x188cd88acc8] <df[,2]> \n#> ├─x = [2:0x188cf51e418] <dbl> \n#> └─y = [3:0x188cf51e3c8] <dbl> \n#> \n#> █ [4:0x188cfbb7048] <df[,2]> \n#> ├─x = [5:0x188d0b36148] <dbl> \n#> └─y = [6:0x188d0b360f8] <dbl>\n```\n\n\n:::\n:::\n\n\n## Character vectors\n\n- R has a **global string pool**\n- Elements of character vectors point to unique strings in the pool\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- c(\"a\", \"a\", \"abc\", \"d\")\n```\n:::\n\n\n\n\n## Exercises\n\n##### 1. Why is `tracemem(1:10)` not useful?\n\n> Because it tries to trace a value that is not bound to a name\n\n##### 2. Why are there two copies?\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- c(1L, 2L, 3L)\ntracemem(x)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"<00000188D06A9888>\"\n```\n\n\n:::\n\n```{.r .cell-code}\nx[[3]] <- 4\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> tracemem[0x00000188d06a9888 -> 0x00000188d0711508]: eval eval withVisible withCallingHandlers eval eval with_handlers doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList withRestarts <Anonymous> evaluate in_dir in_input_dir eng_r block_exec call_block process_group withCallingHandlers <Anonymous> process_file <Anonymous> <Anonymous> execute .main \n#> tracemem[0x00000188d0711508 -> 0x00000188d16add18]: eval eval withVisible withCallingHandlers eval eval with_handlers doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList withRestarts <Anonymous> evaluate in_dir in_input_dir eng_r block_exec call_block process_group withCallingHandlers <Anonymous> process_file <Anonymous> <Anonymous> execute .main\n```\n\n\n:::\n:::\n\n\n> Because we convert an *integer* vector (using 1L, etc.) to a *double* vector (using just 4)- \n\n##### 3. What is the relationships among these objects?\n\n\n::: {.cell}\n\n```{.r .cell-code}\na <- 1:10 \nb <- list(a, a)\nc <- list(b, a, 1:10) # \n```\n:::\n\n\na <- obj 1 \nb <- obj 1, obj 1 \nc <- b(obj 1, obj 1), obj 1, 1:10 \n\n\n::: {.cell}\n\n```{.r .cell-code}\nref(c)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> █ [1:0x188d2022558] <list> \n#> ├─█ [2:0x188d0b05b48] <list> \n#> │ ├─[3:0x188d0b5b858] <int> \n#> │ └─[3:0x188d0b5b858] \n#> ├─[3:0x188d0b5b858] \n#> └─[4:0x188d0c0e4a0] <int>\n```\n\n\n:::\n:::\n\n\n\n##### 4. What happens here?\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- list(1:10)\nx[[2]] <- x\n```\n:::\n\n\n- `x` is a list\n- `x[[2]] <- x` creates a new list, which in turn contains a reference to the \n original list\n- `x` is no longer bound to `list(1:10)`\n\n\n::: {.cell}\n\n```{.r .cell-code}\nref(x)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> █ [1:0x188d1229388] <list> \n#> ├─[2:0x188d150cd98] <int> \n#> └─█ [3:0x188d1c717e8] <list> \n#> └─[2:0x188d150cd98]\n```\n\n\n:::\n:::\n\n\n{width=50%}\n\n## Object Size\n\n- Use `lobstr::obj_size()` \n- Lists may be smaller than expected because of referencing the same value\n- Strings may be smaller than expected because using global string pool\n- Difficult to predict how big something will be\n - Can only add sizes together if they share no references in common\n\n### Alternative Representation\n- As of R 3.5.0 - ALTREP\n- Represent some vectors compactly\n - e.g., 1:1000 - not 10,000 values, just 1 and 1,000\n\n### Exercises\n\n##### 1. Why are the sizes so different?\n\n\n::: {.cell}\n\n```{.r .cell-code}\ny <- rep(list(runif(1e4)), 100)\n\nobject.size(y) # ~8000 kB\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> 8005648 bytes\n```\n\n\n:::\n\n```{.r .cell-code}\nobj_size(y) # ~80 kB\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> 80.90 kB\n```\n\n\n:::\n:::\n\n\n> From `?object.size()`: \n> \n> \"This function merely provides a rough indication: it should be reasonably accurate for atomic vectors, but **does not detect if elements of a list are shared**, for example.\n\n##### 2. Why is the size misleading?\n\n\n::: {.cell}\n\n```{.r .cell-code}\nfuns <- list(mean, sd, var)\nobj_size(funs)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> 18.76 kB\n```\n\n\n:::\n:::\n\n\n> Because they reference functions from base and stats, which are always available.\n> Why bother looking at the size? What use is that?\n\n##### 3. Predict the sizes\n\n\n::: {.cell}\n\n```{.r .cell-code}\na <- runif(1e6) # 8 MB\nobj_size(a)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> 8.00 MB\n```\n\n\n:::\n:::\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nb <- list(a, a)\n```\n:::\n\n\n- There is one value ~8MB\n- `a` and `b[[1]]` and `b[[2]]` all point to the same value.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nobj_size(b)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> 8.00 MB\n```\n\n\n:::\n\n```{.r .cell-code}\nobj_size(a, b)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> 8.00 MB\n```\n\n\n:::\n:::\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nb[[1]][[1]] <- 10\n```\n:::\n\n- Now there are two values ~8MB each (16MB total)\n- `a` and `b[[2]]` point to the same value (8MB)\n- `b[[1]]` is new (8MB) because the first element (`b[[1]][[1]]`) has been changed\n\n\n::: {.cell}\n\n```{.r .cell-code}\nobj_size(b) # 16 MB (two values, two element references)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> 16.00 MB\n```\n\n\n:::\n\n```{.r .cell-code}\nobj_size(a, b) # 16 MB (a & b[[2]] point to the same value)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> 16.00 MB\n```\n\n\n:::\n:::\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nb[[2]][[1]] <- 10\n```\n:::\n\n- Finally, now there are three values ~8MB each (24MB total)\n- Although `b[[1]]` and `b[[2]]` have the same contents, \n they are not references to the same object.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nobj_size(b)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> 16.00 MB\n```\n\n\n:::\n\n```{.r .cell-code}\nobj_size(a, b)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> 24.00 MB\n```\n\n\n:::\n:::\n\n\n\n## Modify-in-place\n\n- Modifying usually creates a copy except for\n - Objects with a single binding (performance optimization)\n - Environments (special)\n\n### Objects with a single binding\n\n- Hard to know if copy will occur\n- If you have 2+ bindings and remove them, R can't follow how many are removed (so will always think there are more than one)\n- May make a copy even if there's only one binding left\n- Using a function makes a reference to it **unless it's a function based on C**\n- Best to use `tracemem()` to check rather than guess.\n\n\n#### Example - lists vs. data frames in for loop\n\n**Setup** \n\nCreate the data to modify\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- data.frame(matrix(runif(5 * 1e4), ncol = 5))\nmedians <- vapply(x, median, numeric(1))\n```\n:::\n\n\n\n**Data frame - Copied every time!**\n\n::: {.cell}\n\n```{.r .cell-code}\ncat(tracemem(x), \"\\n\")\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> <00000188D27817A8>\n```\n\n\n:::\n\n```{.r .cell-code}\nfor (i in seq_along(medians)) {\n x[[i]] <- x[[i]] - medians[[i]]\n}\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> tracemem[0x00000188d27817a8 -> 0x00000188d283d118]: eval eval withVisible withCallingHandlers eval eval with_handlers doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList withRestarts <Anonymous> evaluate in_dir in_input_dir eng_r block_exec call_block process_group withCallingHandlers <Anonymous> process_file <Anonymous> <Anonymous> execute .main \n#> tracemem[0x00000188d283d118 -> 0x00000188d283c548]: [[<-.data.frame [[<- eval eval withVisible withCallingHandlers eval eval with_handlers doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList withRestarts <Anonymous> evaluate in_dir in_input_dir eng_r block_exec call_block process_group withCallingHandlers <Anonymous> process_file <Anonymous> <Anonymous> execute .main \n#> tracemem[0x00000188d283c548 -> 0x00000188d283c4d8]: eval eval withVisible withCallingHandlers eval eval with_handlers doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList withRestarts <Anonymous> evaluate in_dir in_input_dir eng_r block_exec call_block process_group withCallingHandlers <Anonymous> process_file <Anonymous> <Anonymous> execute .main \n#> tracemem[0x00000188d283c4d8 -> 0x00000188d283c468]: [[<-.data.frame [[<- eval eval withVisible withCallingHandlers eval eval with_handlers doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList withRestarts <Anonymous> evaluate in_dir in_input_dir eng_r block_exec call_block process_group withCallingHandlers <Anonymous> process_file <Anonymous> <Anonymous> execute .main \n#> tracemem[0x00000188d283c468 -> 0x00000188d283c3f8]: eval eval withVisible withCallingHandlers eval eval with_handlers doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList withRestarts <Anonymous> evaluate in_dir in_input_dir eng_r block_exec call_block process_group withCallingHandlers <Anonymous> process_file <Anonymous> <Anonymous> execute .main \n#> tracemem[0x00000188d283c3f8 -> 0x00000188d283c388]: [[<-.data.frame [[<- eval eval withVisible withCallingHandlers eval eval with_handlers doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList withRestarts <Anonymous> evaluate in_dir in_input_dir eng_r block_exec call_block process_group withCallingHandlers <Anonymous> process_file <Anonymous> <Anonymous> execute .main \n#> tracemem[0x00000188d283c388 -> 0x00000188d283c318]: eval eval withVisible withCallingHandlers eval eval with_handlers doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList withRestarts <Anonymous> evaluate in_dir in_input_dir eng_r block_exec call_block process_group withCallingHandlers <Anonymous> process_file <Anonymous> <Anonymous> execute .main \n#> tracemem[0x00000188d283c318 -> 0x00000188d283c2a8]: [[<-.data.frame [[<- eval eval withVisible withCallingHandlers eval eval with_handlers doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList withRestarts <Anonymous> evaluate in_dir in_input_dir eng_r block_exec call_block process_group withCallingHandlers <Anonymous> process_file <Anonymous> <Anonymous> execute .main \n#> tracemem[0x00000188d283c2a8 -> 0x00000188d283c238]: eval eval withVisible withCallingHandlers eval eval with_handlers doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList withRestarts <Anonymous> evaluate in_dir in_input_dir eng_r block_exec call_block process_group withCallingHandlers <Anonymous> process_file <Anonymous> <Anonymous> execute .main \n#> tracemem[0x00000188d283c238 -> 0x00000188d283c1c8]: [[<-.data.frame [[<- eval eval withVisible withCallingHandlers eval eval with_handlers doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList withRestarts <Anonymous> evaluate in_dir in_input_dir eng_r block_exec call_block process_group withCallingHandlers <Anonymous> process_file <Anonymous> <Anonymous> execute .main\n```\n\n\n:::\n\n```{.r .cell-code}\nuntracemem(x)\n```\n:::\n\n\n**List (uses internal C code) - Copied once!**\n\n::: {.cell}\n\n```{.r .cell-code}\ny <- as.list(x)\n\ncat(tracemem(y), \"\\n\")\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> <00000188CD850188>\n```\n\n\n:::\n\n```{.r .cell-code}\nfor (i in seq_along(medians)) {\n y[[i]] <- y[[i]] - medians[[i]]\n}\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> tracemem[0x00000188cd850188 -> 0x00000188cde947e8]: eval eval withVisible withCallingHandlers eval eval with_handlers doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList withRestarts <Anonymous> evaluate in_dir in_input_dir eng_r block_exec call_block process_group withCallingHandlers <Anonymous> process_file <Anonymous> <Anonymous> execute .main\n```\n\n\n:::\n\n```{.r .cell-code}\nuntracemem(y)\n```\n:::\n\n\n#### Benchmark this (Exercise #2)\n\n**First wrap in a function**\n\n::: {.cell}\n\n```{.r .cell-code}\nmed <- function(d, medians) {\n for (i in seq_along(medians)) {\n d[[i]] <- d[[i]] - medians[[i]]\n }\n}\n```\n:::\n\n\n**Try with 5 columns**\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- data.frame(matrix(runif(5 * 1e4), ncol = 5))\nmedians <- vapply(x, median, numeric(1))\ny <- as.list(x)\n\nbench::mark(\n \"data.frame\" = med(x, medians),\n \"list\" = med(y, medians)\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> # A tibble: 2 × 6\n#> expression min median `itr/sec` mem_alloc `gc/sec`\n#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>\n#> 1 data.frame 54.3µs 69.9µs 12785. 410KB 192.\n#> 2 list 21.9µs 32.2µs 29164. 391KB 283.\n```\n\n\n:::\n:::\n\n\n**Try with 20 columns**\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- data.frame(matrix(runif(5 * 1e4), ncol = 20))\nmedians <- vapply(x, median, numeric(1))\ny <- as.list(x)\n\nbench::mark(\n \"data.frame\" = med(x, medians),\n \"list\" = med(y, medians)\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> # A tibble: 2 × 6\n#> expression min median `itr/sec` mem_alloc `gc/sec`\n#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>\n#> 1 data.frame 135.6µs 174.7µs 5314. 400KB 57.3\n#> 2 list 28.4µs 38.6µs 24248. 392KB 243.\n```\n\n\n:::\n:::\n\n\n**WOW!**\n\n\n### Environmments\n- Always modified in place (**reference semantics**)\n- Interesting because if you modify the environment, all existing bindings have the same reference\n- If two names point to the same environment, and you update one, you update both!\n\n\n::: {.cell}\n\n```{.r .cell-code}\ne1 <- rlang::env(a = 1, b = 2, c = 3)\ne2 <- e1\ne1$c <- 4\ne2$c\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 4\n```\n\n\n:::\n:::\n\n\n- This means that environments can contain themselves (!)\n\n### Exercises\n\n##### 1. Why isn't this circular?\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- list()\nx[[1]] <- x\n```\n:::\n\n\n> Because the binding to the list() object moves from `x` in the first line to `x[[1]]` in the second.\n\n##### 2. (see \"Objects with a single binding\")\n\n##### 3. What happens if you attempt to use tracemem() on an environment?\n\n\n::: {.cell}\n\n```{.r .cell-code}\ne1 <- rlang::env(a = 1, b = 2, c = 3)\ntracemem(e1)\n```\n\n::: {.cell-output .cell-output-error}\n\n```\n#> Error in tracemem(e1): 'tracemem' is not useful for promise and environment objects\n```\n\n\n:::\n:::\n\n\n> Because environments always modified in place, there's no point in tracing them\n\n\n## Unbinding and the garbage collector\n\n- If you delete the 'name' bound to an object, the object still exists\n- R runs a \"garbage collector\" (GC) to remove these objects when it needs more memory\n- \"Looking from the outside, it’s basically impossible to predict when the GC will run. In fact, you shouldn’t even try.\"\n- If you want to know when it runs, use `gcinfo(TRUE)` to get a message printed\n- You can force GC with `gc()` but you never need to to use more memory *within* R\n- Only reason to do so is to free memory for other system software, or, to get the\nmessage printed about how much memory is being used\n\n\n::: {.cell}\n\n```{.r .cell-code}\ngc()\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> used (Mb) gc trigger (Mb) max used (Mb)\n#> Ncells 806847 43.1 1485610 79.4 1485610 79.4\n#> Vcells 4533816 34.6 10146329 77.5 10146311 77.5\n```\n\n\n:::\n\n```{.r .cell-code}\nmem_used()\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> 81.46 MB\n```\n\n\n:::\n:::\n\n\n- These numbers will **not** be what you OS tells you because, \n 1. It includes objects created by R, but not R interpreter\n 2. R and OS are lazy and don't reclaim/release memory until it's needed\n 3. R counts memory from objects, but there are gaps due to those that are deleted -> \n *memory fragmentation* [less memory actually available they you might think]\n\n\n## Meeting Videos\n\n### Cohort 1\n\n(no video recorded)\n\n### Cohort 2\n\n<iframe src=\"https://www.youtube.com/embed/pCiNj2JRK50\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 3\n\n<iframe src=\"https://www.youtube.com/embed/-bEXdOoxO_E\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 4\n\n<iframe src=\"https://www.youtube.com/embed/gcVU_F-L6zY\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 5\n\n<iframe src=\"https://www.youtube.com/embed/aqcvKox9V0Q\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 6\n\n<iframe src=\"https://www.youtube.com/embed/O4Oo_qO7SIY\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n<details>\n<summary> Meeting chat log </summary>\n\n```\n00:16:57\tFederica Gazzelloni:\tcohort 2 video: https://www.youtube.com/watch?v=pCiNj2JRK50\n00:18:39\tFederica Gazzelloni:\tcohort 2 presentation: https://r4ds.github.io/bookclub-Advanced_R/Presentations/Week02/Cohort2_America/Chapter2Slides.html#1\n00:40:24\tArthur Shaw:\tJust the opposite, Ryan. Very clear presentation!\n00:51:54\tTrevin:\tparquet?\n00:53:00\tArthur Shaw:\tWe may all be right. {arrow} looks to deal with feather and parquet files: https://arrow.apache.org/docs/r/\n01:00:04\tArthur Shaw:\tSome questions for future meetings. (1) I find Ryan's use of slides hugely effective in conveying information. Would it be OK if future sessions (optionally) used slides? If so, should/could we commit slides to some folder on the repo? (2) I think reusing the images from Hadley's books really helps understanding and discussion. Is that OK to do? Here I'm thinking about copyright concerns. (If possible, I would rather not redraw variants of Hadley's images.)\n01:01:35\tFederica Gazzelloni:\tIt's all ok, you can use past presentation, you don't need to push them to the repo, you can use the images from the book\n01:07:19\tFederica Gazzelloni:\tCan I use: gc(reset = TRUE) safely?\n```\n</details>\n\n### Cohort 7\n\n<iframe src=\"https://www.youtube.com/embed/kpAUoGO6elE\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n<details>\n\n<summary>Meeting chat log</summary>\n```\n00:09:40\tRyan Honomichl:\thttps://drdoane.com/three-deep-truths-about-r/\n00:12:51\tRobert Hilly:\tBe right back\n00:36:12\tRyan Honomichl:\tbrb\n00:41:18\tRon:\tI tried mapply and also got different answers\n00:41:44\tcollinberke:\tInteresting, would like to know more what is going on.\n00:49:57\tRobert Hilly:\tsimple_map <- function(x, f, ...) {\n out <- vector(\"list\", length(x))\n for (i in seq_along(x)) {\n out[[i]] <- f(x[[i]], ...)\n }\n out\n}\n```\n</details>\n", - "supporting": [], + "markdown": "---\nengine: knitr\ntitle: Names and values\n---\n\n## Learning objectives\n\n- To be able to understand distinction between an *object* and its *name*\n- With this knowledge, to be able write faster code using less memory\n- To better understand R's functional programming tools\n\nUsing lobstr package here.\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(lobstr)\n```\n:::\n\n\n\n## Quiz\n\n### 1. How do I create a new column called `3` that contains the sum of `1` and `2`?\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndf <- data.frame(runif(3), runif(3))\nnames(df) <- c(1, 2)\ndf\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> 1 2\n#> 1 0.8893205 0.9874973\n#> 2 0.4645398 0.7004741\n#> 3 0.7312149 0.2986040\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndf$`3` <- df$`1` + df$`2`\ndf\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> 1 2 3\n#> 1 0.8893205 0.9874973 1.876818\n#> 2 0.4645398 0.7004741 1.165014\n#> 3 0.7312149 0.2986040 1.029819\n```\n\n\n:::\n:::\n\n\n**What makes these names challenging?**\n\n> You need to use backticks (`) when the name of an object doesn't start with a \n> a character or '.' [or . followed by a number] (non-syntactic names).\n\n### 2. How much memory does `y` occupy?\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- runif(1e6)\ny <- list(x, x, x)\n```\n:::\n\n\nNeed to use the lobstr package:\n\n::: {.cell}\n\n```{.r .cell-code}\nlobstr::obj_size(y)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> 8.00 MB\n```\n\n\n:::\n:::\n\n\n> Note that if you look in the RStudio Environment or use R base `object.size()`\n> you actually get a value of 24 MB\n\n\n::: {.cell}\n\n```{.r .cell-code}\nobject.size(y)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> 24000224 bytes\n```\n\n\n:::\n:::\n\n\n### 3. On which line does `a` get copied in the following example?\n\n::: {.cell}\n\n```{.r .cell-code}\na <- c(1, 5, 3, 2)\nb <- a\nb[[1]] <- 10\n```\n:::\n\n\n> Not until `b` is modified, the third line\n\n## Binding basics\n\n- Create values and *bind* a name to them\n- Names have values (rather than values have names)\n- Multiple names can refer to the same values\n- We can look at an object's address to keep track of the values independent of their names\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- c(1, 2, 3)\ny <- x\nobj_addr(x)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"0x2a58503acd8\"\n```\n\n\n:::\n\n```{.r .cell-code}\nobj_addr(y)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"0x2a58503acd8\"\n```\n\n\n:::\n:::\n\n\n\n### Exercises\n\n##### 1. Explain the relationships\n\n::: {.cell}\n\n```{.r .cell-code}\na <- 1:10\nb <- a\nc <- b\nd <- 1:10\n```\n:::\n\n\n> `a` `b` and `c` are all names that refer to the first value `1:10`\n> \n> `d` is a name that refers to the *second* value of `1:10`.\n\n\n##### 2. Do the following all point to the same underlying function object? hint: `lobstr::obj_addr()`\n\n::: {.cell}\n\n```{.r .cell-code}\nobj_addr(mean)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"0x2a5828bf738\"\n```\n\n\n:::\n\n```{.r .cell-code}\nobj_addr(base::mean)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"0x2a5828bf738\"\n```\n\n\n:::\n\n```{.r .cell-code}\nobj_addr(get(\"mean\"))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"0x2a5828bf738\"\n```\n\n\n:::\n\n```{.r .cell-code}\nobj_addr(evalq(mean))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"0x2a5828bf738\"\n```\n\n\n:::\n\n```{.r .cell-code}\nobj_addr(match.fun(\"mean\"))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"0x2a5828bf738\"\n```\n\n\n:::\n:::\n\n\n> Yes!\n\n## Copy-on-modify\n\n- If you modify a value bound to multiple names, it is 'copy-on-modify'\n- If you modify a value bound to a single name, it is 'modify-in-place'\n- Use `tracemem()` to see when a name's value changes\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- c(1, 2, 3)\ncat(tracemem(x), \"\\n\")\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> <000002A585CC0FF8>\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ny <- x\ny[[3]] <- 4L # Changes (copy-on-modify)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> tracemem[0x000002a585cc0ff8 -> 0x000002a58600d5e8]: eval eval withVisible withCallingHandlers eval eval with_handlers doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList withRestarts <Anonymous> evaluate in_dir in_input_dir eng_r block_exec call_block process_group withCallingHandlers <Anonymous> process_file <Anonymous> <Anonymous> execute .main\n```\n\n\n:::\n\n```{.r .cell-code}\ny[[3]] <- 5L # Doesn't change (modify-in-place)\n```\n:::\n\n\nTurn off `tracemem()` with `untracemem()`\n\n> Can also use `ref(x)` to get the address of the value bound to a given name\n\n\n## Functions\n\n- Copying also applies within functions\n- If you copy (but don't modify) `x` within `f()`, no copy is made\n\n\n::: {.cell}\n\n```{.r .cell-code}\nf <- function(a) {\n a\n}\n\nx <- c(1, 2, 3)\nz <- f(x) # No change in value\n\nref(x)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1:0x2a58669dd18] <dbl>\n```\n\n\n:::\n\n```{.r .cell-code}\nref(z)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1:0x2a58669dd18] <dbl>\n```\n\n\n:::\n:::\n\n\n<!--  -->\n\n## Lists\n\n- A list overall, has it's own reference (id)\n- List *elements* also each point to other values\n- List doesn't store the value, it *stores a reference to the value*\n- As of R 3.1.0, modifying lists creates a *shallow copy*\n - References (bindings) are copied, but *values are not*\n\n\n::: {.cell}\n\n```{.r .cell-code}\nl1 <- list(1, 2, 3)\nl2 <- l1\nl2[[3]] <- 4\n```\n:::\n\n\n- We can use `ref()` to see how they compare\n - See how the list reference is different\n - But first two items in each list are the same\n\n\n::: {.cell}\n\n```{.r .cell-code}\nref(l1, l2)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> █ [1:0x2a586f2e698] <list> \n#> ├─[2:0x2a5877133b8] <dbl> \n#> ├─[3:0x2a5877131f8] <dbl> \n#> └─[4:0x2a587713038] <dbl> \n#> \n#> █ [5:0x2a586fc3098] <list> \n#> ├─[2:0x2a5877133b8] \n#> ├─[3:0x2a5877131f8] \n#> └─[6:0x2a58770fc78] <dbl>\n```\n\n\n:::\n:::\n\n\n{width=50%}\n\n## Data Frames\n\n- Data frames are lists of vectors\n- So copying and modifying a column *only affects that column*\n- **BUT** if you modify a *row*, every column must be copied\n\n\n::: {.cell}\n\n```{.r .cell-code}\nd1 <- data.frame(x = c(1, 5, 6), y = c(2, 4, 3))\nd2 <- d1\nd3 <- d1\n```\n:::\n\n\nOnly the modified column changes\n\n::: {.cell}\n\n```{.r .cell-code}\nd2[, 2] <- d2[, 2] * 2\nref(d1, d2)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> █ [1:0x2a584931608] <df[,2]> \n#> ├─x = [2:0x2a57f3b9cc8] <dbl> \n#> └─y = [3:0x2a57f3b9c78] <dbl> \n#> \n#> █ [4:0x2a5810eb508] <df[,2]> \n#> ├─x = [2:0x2a57f3b9cc8] \n#> └─y = [5:0x2a57feb2058] <dbl>\n```\n\n\n:::\n:::\n\n\nAll columns change\n\n::: {.cell}\n\n```{.r .cell-code}\nd3[1, ] <- d3[1, ] * 3\nref(d1, d3)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> █ [1:0x2a584931608] <df[,2]> \n#> ├─x = [2:0x2a57f3b9cc8] <dbl> \n#> └─y = [3:0x2a57f3b9c78] <dbl> \n#> \n#> █ [4:0x2a57faa92c8] <df[,2]> \n#> ├─x = [5:0x2a585a91b38] <dbl> \n#> └─y = [6:0x2a585a91ae8] <dbl>\n```\n\n\n:::\n:::\n\n\n## Character vectors\n\n- R has a **global string pool**\n- Elements of character vectors point to unique strings in the pool\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- c(\"a\", \"a\", \"abc\", \"d\")\n```\n:::\n\n\n\n\n## Exercises\n\n##### 1. Why is `tracemem(1:10)` not useful?\n\n> Because it tries to trace a value that is not bound to a name\n\n##### 2. Why are there two copies?\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- c(1L, 2L, 3L)\ntracemem(x)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"<000002A5856391C8>\"\n```\n\n\n:::\n\n```{.r .cell-code}\nx[[3]] <- 4\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> tracemem[0x000002a5856391c8 -> 0x000002a585653f08]: eval eval withVisible withCallingHandlers eval eval with_handlers doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList withRestarts <Anonymous> evaluate in_dir in_input_dir eng_r block_exec call_block process_group withCallingHandlers <Anonymous> process_file <Anonymous> <Anonymous> execute .main \n#> tracemem[0x000002a585653f08 -> 0x000002a58663f8b8]: eval eval withVisible withCallingHandlers eval eval with_handlers doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList withRestarts <Anonymous> evaluate in_dir in_input_dir eng_r block_exec call_block process_group withCallingHandlers <Anonymous> process_file <Anonymous> <Anonymous> execute .main\n```\n\n\n:::\n:::\n\n\n> Because we convert an *integer* vector (using 1L, etc.) to a *double* vector (using just 4)- \n\n##### 3. What is the relationships among these objects?\n\n\n::: {.cell}\n\n```{.r .cell-code}\na <- 1:10 \nb <- list(a, a)\nc <- list(b, a, 1:10) # \n```\n:::\n\n\na <- obj 1 \nb <- obj 1, obj 1 \nc <- b(obj 1, obj 1), obj 1, 1:10 \n\n\n::: {.cell}\n\n```{.r .cell-code}\nref(c)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> █ [1:0x2a586fc3ea8] <list> \n#> ├─█ [2:0x2a585a1a308] <list> \n#> │ ├─[3:0x2a585aa1c40] <int> \n#> │ └─[3:0x2a585aa1c40] \n#> ├─[3:0x2a585aa1c40] \n#> └─[4:0x2a585b13d90] <int>\n```\n\n\n:::\n:::\n\n\n\n##### 4. What happens here?\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- list(1:10)\nx[[2]] <- x\n```\n:::\n\n\n- `x` is a list\n- `x[[2]] <- x` creates a new list, which in turn contains a reference to the \n original list\n- `x` is no longer bound to `list(1:10)`\n\n\n::: {.cell}\n\n```{.r .cell-code}\nref(x)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> █ [1:0x2a586172508] <list> \n#> ├─[2:0x2a58641c040] <int> \n#> └─█ [3:0x2a586b06c48] <list> \n#> └─[2:0x2a58641c040]\n```\n\n\n:::\n:::\n\n\n{width=50%}\n\n## Object Size\n\n- Use `lobstr::obj_size()` \n- Lists may be smaller than expected because of referencing the same value\n- Strings may be smaller than expected because using global string pool\n- Difficult to predict how big something will be\n - Can only add sizes together if they share no references in common\n\n### Alternative Representation\n- As of R 3.5.0 - ALTREP\n- Represent some vectors compactly\n - e.g., 1:1000 - not 10,000 values, just 1 and 1,000\n\n### Exercises\n\n##### 1. Why are the sizes so different?\n\n\n::: {.cell}\n\n```{.r .cell-code}\ny <- rep(list(runif(1e4)), 100)\n\nobject.size(y) # ~8000 kB\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> 8005648 bytes\n```\n\n\n:::\n\n```{.r .cell-code}\nobj_size(y) # ~80 kB\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> 80.90 kB\n```\n\n\n:::\n:::\n\n\n> From `?object.size()`: \n> \n> \"This function merely provides a rough indication: it should be reasonably accurate for atomic vectors, but **does not detect if elements of a list are shared**, for example.\n\n##### 2. Why is the size misleading?\n\n\n::: {.cell}\n\n```{.r .cell-code}\nfuns <- list(mean, sd, var)\nobj_size(funs)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> 18.76 kB\n```\n\n\n:::\n:::\n\n\n> Because they reference functions from base and stats, which are always available.\n> Why bother looking at the size? What use is that?\n\n##### 3. Predict the sizes\n\n\n::: {.cell}\n\n```{.r .cell-code}\na <- runif(1e6) # 8 MB\nobj_size(a)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> 8.00 MB\n```\n\n\n:::\n:::\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nb <- list(a, a)\n```\n:::\n\n\n- There is one value ~8MB\n- `a` and `b[[1]]` and `b[[2]]` all point to the same value.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nobj_size(b)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> 8.00 MB\n```\n\n\n:::\n\n```{.r .cell-code}\nobj_size(a, b)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> 8.00 MB\n```\n\n\n:::\n:::\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nb[[1]][[1]] <- 10\n```\n:::\n\n- Now there are two values ~8MB each (16MB total)\n- `a` and `b[[2]]` point to the same value (8MB)\n- `b[[1]]` is new (8MB) because the first element (`b[[1]][[1]]`) has been changed\n\n\n::: {.cell}\n\n```{.r .cell-code}\nobj_size(b) # 16 MB (two values, two element references)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> 16.00 MB\n```\n\n\n:::\n\n```{.r .cell-code}\nobj_size(a, b) # 16 MB (a & b[[2]] point to the same value)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> 16.00 MB\n```\n\n\n:::\n:::\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nb[[2]][[1]] <- 10\n```\n:::\n\n- Finally, now there are three values ~8MB each (24MB total)\n- Although `b[[1]]` and `b[[2]]` have the same contents, \n they are not references to the same object.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nobj_size(b)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> 16.00 MB\n```\n\n\n:::\n\n```{.r .cell-code}\nobj_size(a, b)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> 24.00 MB\n```\n\n\n:::\n:::\n\n\n\n## Modify-in-place\n\n- Modifying usually creates a copy except for\n - Objects with a single binding (performance optimization)\n - Environments (special)\n\n### Objects with a single binding\n\n- Hard to know if copy will occur\n- If you have 2+ bindings and remove them, R can't follow how many are removed (so will always think there are more than one)\n- May make a copy even if there's only one binding left\n- Using a function makes a reference to it **unless it's a function based on C**\n- Best to use `tracemem()` to check rather than guess.\n\n\n#### Example - lists vs. data frames in for loop\n\n**Setup** \n\nCreate the data to modify\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- data.frame(matrix(runif(5 * 1e4), ncol = 5))\nmedians <- vapply(x, median, numeric(1))\n```\n:::\n\n\n\n**Data frame - Copied every time!**\n\n::: {.cell}\n\n```{.r .cell-code}\ncat(tracemem(x), \"\\n\")\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> <000002A587857268>\n```\n\n\n:::\n\n```{.r .cell-code}\nfor (i in seq_along(medians)) {\n x[[i]] <- x[[i]] - medians[[i]]\n}\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> tracemem[0x000002a587857268 -> 0x000002a584b5de78]: eval eval withVisible withCallingHandlers eval eval with_handlers doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList withRestarts <Anonymous> evaluate in_dir in_input_dir eng_r block_exec call_block process_group withCallingHandlers <Anonymous> process_file <Anonymous> <Anonymous> execute .main \n#> tracemem[0x000002a584b5de78 -> 0x000002a584b5d2a8]: [[<-.data.frame [[<- eval eval withVisible withCallingHandlers eval eval with_handlers doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList withRestarts <Anonymous> evaluate in_dir in_input_dir eng_r block_exec call_block process_group withCallingHandlers <Anonymous> process_file <Anonymous> <Anonymous> execute .main \n#> tracemem[0x000002a584b5d2a8 -> 0x000002a584b5d238]: eval eval withVisible withCallingHandlers eval eval with_handlers doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList withRestarts <Anonymous> evaluate in_dir in_input_dir eng_r block_exec call_block process_group withCallingHandlers <Anonymous> process_file <Anonymous> <Anonymous> execute .main \n#> tracemem[0x000002a584b5d238 -> 0x000002a584b5d1c8]: [[<-.data.frame [[<- eval eval withVisible withCallingHandlers eval eval with_handlers doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList withRestarts <Anonymous> evaluate in_dir in_input_dir eng_r block_exec call_block process_group withCallingHandlers <Anonymous> process_file <Anonymous> <Anonymous> execute .main \n#> tracemem[0x000002a584b5d1c8 -> 0x000002a584b5d158]: eval eval withVisible withCallingHandlers eval eval with_handlers doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList withRestarts <Anonymous> evaluate in_dir in_input_dir eng_r block_exec call_block process_group withCallingHandlers <Anonymous> process_file <Anonymous> <Anonymous> execute .main \n#> tracemem[0x000002a584b5d158 -> 0x000002a584b5d0e8]: [[<-.data.frame [[<- eval eval withVisible withCallingHandlers eval eval with_handlers doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList withRestarts <Anonymous> evaluate in_dir in_input_dir eng_r block_exec call_block process_group withCallingHandlers <Anonymous> process_file <Anonymous> <Anonymous> execute .main \n#> tracemem[0x000002a584b5d0e8 -> 0x000002a584b5d078]: eval eval withVisible withCallingHandlers eval eval with_handlers doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList withRestarts <Anonymous> evaluate in_dir in_input_dir eng_r block_exec call_block process_group withCallingHandlers <Anonymous> process_file <Anonymous> <Anonymous> execute .main \n#> tracemem[0x000002a584b5d078 -> 0x000002a584b5d008]: [[<-.data.frame [[<- eval eval withVisible withCallingHandlers eval eval with_handlers doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList withRestarts <Anonymous> evaluate in_dir in_input_dir eng_r block_exec call_block process_group withCallingHandlers <Anonymous> process_file <Anonymous> <Anonymous> execute .main \n#> tracemem[0x000002a584b5d008 -> 0x000002a584bbfea8]: eval eval withVisible withCallingHandlers eval eval with_handlers doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList withRestarts <Anonymous> evaluate in_dir in_input_dir eng_r block_exec call_block process_group withCallingHandlers <Anonymous> process_file <Anonymous> <Anonymous> execute .main \n#> tracemem[0x000002a584bbfea8 -> 0x000002a584bbfe38]: [[<-.data.frame [[<- eval eval withVisible withCallingHandlers eval eval with_handlers doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList withRestarts <Anonymous> evaluate in_dir in_input_dir eng_r block_exec call_block process_group withCallingHandlers <Anonymous> process_file <Anonymous> <Anonymous> execute .main\n```\n\n\n:::\n\n```{.r .cell-code}\nuntracemem(x)\n```\n:::\n\n\n**List (uses internal C code) - Copied once!**\n\n::: {.cell}\n\n```{.r .cell-code}\ny <- as.list(x)\n\ncat(tracemem(y), \"\\n\")\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> <000002A584B16388>\n```\n\n\n:::\n\n```{.r .cell-code}\nfor (i in seq_along(medians)) {\n y[[i]] <- y[[i]] - medians[[i]]\n}\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> tracemem[0x000002a584b16388 -> 0x000002a582d8fea8]: eval eval withVisible withCallingHandlers eval eval with_handlers doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList withRestarts <Anonymous> evaluate in_dir in_input_dir eng_r block_exec call_block process_group withCallingHandlers <Anonymous> process_file <Anonymous> <Anonymous> execute .main\n```\n\n\n:::\n\n```{.r .cell-code}\nuntracemem(y)\n```\n:::\n\n\n#### Benchmark this (Exercise #2)\n\n**First wrap in a function**\n\n::: {.cell}\n\n```{.r .cell-code}\nmed <- function(d, medians) {\n for (i in seq_along(medians)) {\n d[[i]] <- d[[i]] - medians[[i]]\n }\n}\n```\n:::\n\n\n**Try with 5 columns**\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- data.frame(matrix(runif(5 * 1e4), ncol = 5))\nmedians <- vapply(x, median, numeric(1))\ny <- as.list(x)\n\nbench::mark(\n \"data.frame\" = med(x, medians),\n \"list\" = med(y, medians)\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> # A tibble: 2 × 6\n#> expression min median `itr/sec` mem_alloc `gc/sec`\n#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>\n#> 1 data.frame 52.3µs 68.2µs 13411. 410KB 201.\n#> 2 list 16.2µs 33µs 28621. 391KB 279.\n```\n\n\n:::\n:::\n\n\n**Try with 20 columns**\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- data.frame(matrix(runif(5 * 1e4), ncol = 20))\nmedians <- vapply(x, median, numeric(1))\ny <- as.list(x)\n\nbench::mark(\n \"data.frame\" = med(x, medians),\n \"list\" = med(y, medians)\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> # A tibble: 2 × 6\n#> expression min median `itr/sec` mem_alloc `gc/sec`\n#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>\n#> 1 data.frame 143.8µs 189.7µs 4722. 400KB 50.1\n#> 2 list 25.6µs 39.7µs 24419. 392KB 243.\n```\n\n\n:::\n:::\n\n\n**WOW!**\n\n\n### Environmments\n- Always modified in place (**reference semantics**)\n- Interesting because if you modify the environment, all existing bindings have the same reference\n- If two names point to the same environment, and you update one, you update both!\n\n\n::: {.cell}\n\n```{.r .cell-code}\ne1 <- rlang::env(a = 1, b = 2, c = 3)\ne2 <- e1\ne1$c <- 4\ne2$c\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 4\n```\n\n\n:::\n:::\n\n\n- This means that environments can contain themselves (!)\n\n### Exercises\n\n##### 1. Why isn't this circular?\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- list()\nx[[1]] <- x\n```\n:::\n\n\n> Because the binding to the list() object moves from `x` in the first line to `x[[1]]` in the second.\n\n##### 2. (see \"Objects with a single binding\")\n\n##### 3. What happens if you attempt to use tracemem() on an environment?\n\n\n::: {.cell}\n\n```{.r .cell-code}\ne1 <- rlang::env(a = 1, b = 2, c = 3)\ntracemem(e1)\n```\n\n::: {.cell-output .cell-output-error}\n\n```\n#> Error in tracemem(e1): 'tracemem' is not useful for promise and environment objects\n```\n\n\n:::\n:::\n\n\n> Because environments always modified in place, there's no point in tracing them\n\n\n## Unbinding and the garbage collector\n\n- If you delete the 'name' bound to an object, the object still exists\n- R runs a \"garbage collector\" (GC) to remove these objects when it needs more memory\n- \"Looking from the outside, it’s basically impossible to predict when the GC will run. In fact, you shouldn’t even try.\"\n- If you want to know when it runs, use `gcinfo(TRUE)` to get a message printed\n- You can force GC with `gc()` but you never need to to use more memory *within* R\n- Only reason to do so is to free memory for other system software, or, to get the\nmessage printed about how much memory is being used\n\n\n::: {.cell}\n\n```{.r .cell-code}\ngc()\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> used (Mb) gc trigger (Mb) max used (Mb)\n#> Ncells 805637 43.1 1486050 79.4 1486050 79.4\n#> Vcells 4532584 34.6 10146329 77.5 10146315 77.5\n```\n\n\n:::\n\n```{.r .cell-code}\nmem_used()\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> 81.38 MB\n```\n\n\n:::\n:::\n\n\n- These numbers will **not** be what you OS tells you because, \n 1. It includes objects created by R, but not R interpreter\n 2. R and OS are lazy and don't reclaim/release memory until it's needed\n 3. R counts memory from objects, but there are gaps due to those that are deleted -> \n *memory fragmentation* [less memory actually available they you might think]\n", + "supporting": [ + "02_files" + ], "filters": [ "rmarkdown/pagebreak.lua" ], diff --git a/_freeze/slides/03/execute-results/html.json b/_freeze/slides/03/execute-results/html.json @@ -1,13 +1,17 @@ { - "hash": "36d6421e6ed573ea0c17c4ed33ad9b3b", + "hash": "eb6c217758ae1a65dff62d4b190e7c4e", "result": { "engine": "knitr", - "markdown": "---\nengine: knitr\ntitle: Vectors\n---\n\n## Learning objectives:\n\n- Learn about different types of vectors and their attributes\n- Navigate through vector types and their value types\n- Venture into factors and date-time objects\n- Discuss the differences between data frames and tibbles\n- Do not get absorbed by the `NA` and `NULL` black hole\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(\"dplyr\")\nlibrary(\"gt\")\nlibrary(\"palmerpenguins\")\n```\n:::\n\n\n\n<details>\n<summary>Session Info</summary>\n\n::: {.cell}\n\n```{.r .cell-code}\nutils::sessionInfo()\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> R version 4.5.1 (2025-06-13 ucrt)\n#> Platform: x86_64-w64-mingw32/x64\n#> Running under: Windows 11 x64 (build 26100)\n#> \n#> Matrix products: default\n#> LAPACK version 3.12.1\n#> \n#> locale:\n#> [1] LC_COLLATE=English_United States.utf8 \n#> [2] LC_CTYPE=English_United States.utf8 \n#> [3] LC_MONETARY=English_United States.utf8\n#> [4] LC_NUMERIC=C \n#> [5] LC_TIME=English_United States.utf8 \n#> \n#> time zone: America/Chicago\n#> tzcode source: internal\n#> \n#> attached base packages:\n#> [1] stats graphics grDevices utils datasets methods base \n#> \n#> other attached packages:\n#> [1] palmerpenguins_0.1.1 gt_1.0.0 dplyr_1.1.4 \n#> \n#> loaded via a namespace (and not attached):\n#> [1] digest_0.6.37 R6_2.6.1 fastmap_1.2.0 tidyselect_1.2.1 \n#> [5] xfun_0.52 magrittr_2.0.3 glue_1.8.0 tibble_3.3.0 \n#> [9] knitr_1.50 pkgconfig_2.0.3 htmltools_0.5.8.1 generics_0.1.4 \n#> [13] rmarkdown_2.29 lifecycle_1.0.4 xml2_1.3.8 cli_3.6.5 \n#> [17] vctrs_0.6.5 compiler_4.5.1 tools_4.5.1 pillar_1.11.0 \n#> [21] evaluate_1.0.4 yaml_2.3.10 rlang_1.1.6 jsonlite_2.0.0 \n#> [25] htmlwidgets_1.6.4 keyring_1.4.1\n```\n\n\n:::\n:::\n\n</details>\n\n## Aperitif\n\n\n\n### Counting Penguins\n\nConsider this code to count the number of Gentoo penguins in the `penguins` data set. We see that there are 124 Gentoo penguins.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsum(\"Gentoo\" == penguins$species)\n# output: 124\n```\n:::\n\n\n### In\n\nOne subtle error can arise in trying out `%in%` here instead.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nspecies_vector <- penguins |> select(species)\nprint(\"Gentoo\" %in% species_vector)\n# output: FALSE\n```\n:::\n\n\n\n\n### Fix: base R \n\n\n::: {.cell}\n\n```{.r .cell-code}\nspecies_unlist <- penguins |> select(species) |> unlist()\nprint(\"Gentoo\" %in% species_unlist)\n# output: TRUE\n```\n:::\n\n\n### Fix: dplyr\n\n\n::: {.cell}\n\n```{.r .cell-code}\nspecies_pull <- penguins |> select(species) |> pull()\nprint(\"Gentoo\" %in% species_pull)\n# output: TRUE\n```\n:::\n\n\n### Motivation\n\n* What are the different types of vectors?\n* How does this affect accessing vectors?\n\n<details>\n<summary>Side Quest: Looking up the `%in%` operator</summary>\nIf you want to look up the manual pages for the `%in%` operator with the `?`, use backticks:\n\n\n::: {.cell}\n\n```{.r .cell-code}\n?`%in%`\n```\n:::\n\n\nand we find that `%in%` is a wrapper for the `match()` function.\n\n</details>\n\n\n## Types of Vectors\n\n \n\nTwo main types:\n\n- **Atomic**: Elements all the same type.\n- **List**: Elements are different Types.\n\nClosely related but not technically a vector:\n\n- **NULL**: Null elements. Often length zero.\n\n## Atomic Vectors\n\n### Types of atomic vectors\n\n \n\n- **Logical**: True/False\n- **Integer**: Numeric (discrete, no decimals)\n- **Double**: Numeric (continuous, decimals)\n- **Character**: String\n\n### Vectors of Length One\n\n**Scalars** are vectors that consist of a single value.\n\n#### Logicals\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlgl1 <- TRUE\nlgl2 <- T #abbreviation for TRUE\nlgl3 <- FALSE\nlgl4 <- F #abbreviation for FALSE\n```\n:::\n\n\n#### Doubles\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# integer, decimal, scientific, or hexidecimal format\ndbl1 <- 1\ndbl2 <- 1.234 # decimal\ndbl3 <- 1.234e0 # scientific format\ndbl4 <- 0xcafe # hexidecimal format\n```\n:::\n\n\n#### Integers\n\nIntegers must be followed by L and cannot have fractional values\n\n\n::: {.cell}\n\n```{.r .cell-code}\nint1 <- 1L\nint2 <- 1234L\nint3 <- 1234e0L\nint4 <- 0xcafeL\n```\n:::\n\n\n<details>\n<summary>Pop Quiz: Why \"L\" for integers?</summary>\nWickham notes that the use of `L` dates back to the **C** programming language and its \"long int\" type for memory allocation.\n</details>\n\n#### Strings\n\nStrings can use single or double quotes and special characters are escaped with \\\n\n\n::: {.cell}\n\n```{.r .cell-code}\nstr1 <- \"hello\" # double quotes\nstr2 <- 'hello' # single quotes\nstr3 <- \"مرحبًا\" # Unicode\nstr4 <- \"\\U0001f605\" # sweaty_smile\n```\n:::\n\n\n### Longer\n\nThere are several ways to make longer vectors:\n\n**1. With single values** inside c() for combine.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlgl_var <- c(TRUE, FALSE)\nint_var <- c(1L, 6L, 10L)\ndbl_var <- c(1, 2.5, 4.5)\nchr_var <- c(\"these are\", \"some strings\")\n```\n:::\n\n\n \n\n**2. With other vectors**\n\n\n::: {.cell}\n\n```{.r .cell-code}\nc(c(1, 2), c(3, 4)) # output is not nested\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1 2 3 4\n```\n\n\n:::\n:::\n\n\n<details>\n<summary>Side Quest: rlang</summary>\n\n`{rlang}` has [vector constructor functions too](https://rlang.r-lib.org/reference/vector-construction.html):\n\n- `rlang::lgl(...)`\n- `rlang::int(...)`\n- `rlang::dbl(...)`\n- `rlang::chr(...)`\n\nThey look to do both more and less than `c()`.\n\n- More:\n - Enforce type\n - Splice lists\n - More types: `rlang::bytes()`, `rlang::cpl(...)`\n- Less:\n - Stricter rules on names\n\nNote: currently has `questioning` lifecycle badge, since these constructors may get moved to `vctrs`\n\n</details>\n\n### Type and Length\n\nWe can determine the type of a vector with `typeof()` and its length with `length()`\n\n\n::: {.cell}\n::: {.cell-output-display}\n\n```{=html}\n<div id=\"ggazdkgmch\" style=\"padding-left:0px;padding-right:0px;padding-top:10px;padding-bottom:10px;overflow-x:auto;overflow-y:auto;width:auto;height:auto;\">\n<style>#ggazdkgmch table {\n font-family: system-ui, 'Segoe UI', Roboto, Helvetica, Arial, sans-serif, 'Apple Color Emoji', 'Segoe UI Emoji', 'Segoe UI Symbol', 'Noto Color Emoji';\n -webkit-font-smoothing: antialiased;\n -moz-osx-font-smoothing: grayscale;\n}\n\n#ggazdkgmch thead, #ggazdkgmch tbody, #ggazdkgmch tfoot, #ggazdkgmch tr, #ggazdkgmch td, #ggazdkgmch th {\n border-style: none;\n}\n\n#ggazdkgmch p {\n margin: 0;\n padding: 0;\n}\n\n#ggazdkgmch .gt_table {\n display: table;\n border-collapse: collapse;\n line-height: normal;\n margin-left: auto;\n margin-right: auto;\n color: #333333;\n font-size: 16px;\n font-weight: normal;\n font-style: normal;\n background-color: #FFFFFF;\n width: auto;\n border-top-style: solid;\n border-top-width: 2px;\n border-top-color: #A8A8A8;\n border-right-style: none;\n border-right-width: 2px;\n border-right-color: #D3D3D3;\n border-bottom-style: solid;\n border-bottom-width: 2px;\n border-bottom-color: #A8A8A8;\n border-left-style: none;\n border-left-width: 2px;\n border-left-color: #D3D3D3;\n}\n\n#ggazdkgmch .gt_caption {\n padding-top: 4px;\n padding-bottom: 4px;\n}\n\n#ggazdkgmch .gt_title {\n color: #333333;\n font-size: 125%;\n font-weight: initial;\n padding-top: 4px;\n padding-bottom: 4px;\n padding-left: 5px;\n padding-right: 5px;\n border-bottom-color: #FFFFFF;\n border-bottom-width: 0;\n}\n\n#ggazdkgmch .gt_subtitle {\n color: #333333;\n font-size: 85%;\n font-weight: initial;\n padding-top: 3px;\n padding-bottom: 5px;\n padding-left: 5px;\n padding-right: 5px;\n border-top-color: #FFFFFF;\n border-top-width: 0;\n}\n\n#ggazdkgmch .gt_heading {\n background-color: #FFFFFF;\n text-align: center;\n border-bottom-color: #FFFFFF;\n border-left-style: none;\n border-left-width: 1px;\n border-left-color: #D3D3D3;\n border-right-style: none;\n border-right-width: 1px;\n border-right-color: #D3D3D3;\n}\n\n#ggazdkgmch .gt_bottom_border {\n border-bottom-style: solid;\n border-bottom-width: 2px;\n border-bottom-color: #D3D3D3;\n}\n\n#ggazdkgmch .gt_col_headings {\n border-top-style: solid;\n border-top-width: 2px;\n border-top-color: #D3D3D3;\n border-bottom-style: solid;\n border-bottom-width: 2px;\n border-bottom-color: #D3D3D3;\n border-left-style: none;\n border-left-width: 1px;\n border-left-color: #D3D3D3;\n border-right-style: none;\n border-right-width: 1px;\n border-right-color: #D3D3D3;\n}\n\n#ggazdkgmch .gt_col_heading {\n color: #333333;\n background-color: #FFFFFF;\n font-size: 100%;\n font-weight: normal;\n text-transform: inherit;\n border-left-style: none;\n border-left-width: 1px;\n border-left-color: #D3D3D3;\n border-right-style: none;\n border-right-width: 1px;\n border-right-color: #D3D3D3;\n vertical-align: bottom;\n padding-top: 5px;\n padding-bottom: 6px;\n padding-left: 5px;\n padding-right: 5px;\n overflow-x: hidden;\n}\n\n#ggazdkgmch .gt_column_spanner_outer {\n color: #333333;\n background-color: #FFFFFF;\n font-size: 100%;\n font-weight: normal;\n text-transform: inherit;\n padding-top: 0;\n padding-bottom: 0;\n padding-left: 4px;\n padding-right: 4px;\n}\n\n#ggazdkgmch .gt_column_spanner_outer:first-child {\n padding-left: 0;\n}\n\n#ggazdkgmch .gt_column_spanner_outer:last-child {\n padding-right: 0;\n}\n\n#ggazdkgmch .gt_column_spanner {\n border-bottom-style: solid;\n border-bottom-width: 2px;\n border-bottom-color: #D3D3D3;\n vertical-align: bottom;\n padding-top: 5px;\n padding-bottom: 5px;\n overflow-x: hidden;\n display: inline-block;\n width: 100%;\n}\n\n#ggazdkgmch .gt_spanner_row {\n border-bottom-style: hidden;\n}\n\n#ggazdkgmch .gt_group_heading {\n padding-top: 8px;\n padding-bottom: 8px;\n padding-left: 5px;\n padding-right: 5px;\n color: #333333;\n background-color: #FFFFFF;\n font-size: 100%;\n font-weight: initial;\n text-transform: inherit;\n border-top-style: solid;\n border-top-width: 2px;\n border-top-color: #D3D3D3;\n border-bottom-style: solid;\n border-bottom-width: 2px;\n border-bottom-color: #D3D3D3;\n border-left-style: none;\n border-left-width: 1px;\n border-left-color: #D3D3D3;\n border-right-style: none;\n border-right-width: 1px;\n border-right-color: #D3D3D3;\n vertical-align: middle;\n text-align: left;\n}\n\n#ggazdkgmch .gt_empty_group_heading {\n padding: 0.5px;\n color: #333333;\n background-color: #FFFFFF;\n font-size: 100%;\n font-weight: initial;\n border-top-style: solid;\n border-top-width: 2px;\n border-top-color: #D3D3D3;\n border-bottom-style: solid;\n border-bottom-width: 2px;\n border-bottom-color: #D3D3D3;\n vertical-align: middle;\n}\n\n#ggazdkgmch .gt_from_md > :first-child {\n margin-top: 0;\n}\n\n#ggazdkgmch .gt_from_md > :last-child {\n margin-bottom: 0;\n}\n\n#ggazdkgmch .gt_row {\n padding-top: 8px;\n padding-bottom: 8px;\n padding-left: 5px;\n padding-right: 5px;\n margin: 10px;\n border-top-style: solid;\n border-top-width: 1px;\n border-top-color: #D3D3D3;\n border-left-style: none;\n border-left-width: 1px;\n border-left-color: #D3D3D3;\n border-right-style: none;\n border-right-width: 1px;\n border-right-color: #D3D3D3;\n vertical-align: middle;\n overflow-x: hidden;\n}\n\n#ggazdkgmch .gt_stub {\n color: #333333;\n background-color: #FFFFFF;\n font-size: 100%;\n font-weight: initial;\n text-transform: inherit;\n border-right-style: solid;\n border-right-width: 2px;\n border-right-color: #D3D3D3;\n padding-left: 5px;\n padding-right: 5px;\n}\n\n#ggazdkgmch .gt_stub_row_group {\n color: #333333;\n background-color: #FFFFFF;\n font-size: 100%;\n font-weight: initial;\n text-transform: inherit;\n border-right-style: solid;\n border-right-width: 2px;\n border-right-color: #D3D3D3;\n padding-left: 5px;\n padding-right: 5px;\n vertical-align: top;\n}\n\n#ggazdkgmch .gt_row_group_first td {\n border-top-width: 2px;\n}\n\n#ggazdkgmch .gt_row_group_first th {\n border-top-width: 2px;\n}\n\n#ggazdkgmch .gt_summary_row {\n color: #333333;\n background-color: #FFFFFF;\n text-transform: inherit;\n padding-top: 8px;\n padding-bottom: 8px;\n padding-left: 5px;\n padding-right: 5px;\n}\n\n#ggazdkgmch .gt_first_summary_row {\n border-top-style: solid;\n border-top-color: #D3D3D3;\n}\n\n#ggazdkgmch .gt_first_summary_row.thick {\n border-top-width: 2px;\n}\n\n#ggazdkgmch .gt_last_summary_row {\n padding-top: 8px;\n padding-bottom: 8px;\n padding-left: 5px;\n padding-right: 5px;\n border-bottom-style: solid;\n border-bottom-width: 2px;\n border-bottom-color: #D3D3D3;\n}\n\n#ggazdkgmch .gt_grand_summary_row {\n color: #333333;\n background-color: #FFFFFF;\n text-transform: inherit;\n padding-top: 8px;\n padding-bottom: 8px;\n padding-left: 5px;\n padding-right: 5px;\n}\n\n#ggazdkgmch .gt_first_grand_summary_row {\n padding-top: 8px;\n padding-bottom: 8px;\n padding-left: 5px;\n padding-right: 5px;\n border-top-style: double;\n border-top-width: 6px;\n border-top-color: #D3D3D3;\n}\n\n#ggazdkgmch .gt_last_grand_summary_row_top {\n padding-top: 8px;\n padding-bottom: 8px;\n padding-left: 5px;\n padding-right: 5px;\n border-bottom-style: double;\n border-bottom-width: 6px;\n border-bottom-color: #D3D3D3;\n}\n\n#ggazdkgmch .gt_striped {\n background-color: rgba(128, 128, 128, 0.05);\n}\n\n#ggazdkgmch .gt_table_body {\n border-top-style: solid;\n border-top-width: 2px;\n border-top-color: #D3D3D3;\n border-bottom-style: solid;\n border-bottom-width: 2px;\n border-bottom-color: #D3D3D3;\n}\n\n#ggazdkgmch .gt_footnotes {\n color: #333333;\n background-color: #FFFFFF;\n border-bottom-style: none;\n border-bottom-width: 2px;\n border-bottom-color: #D3D3D3;\n border-left-style: none;\n border-left-width: 2px;\n border-left-color: #D3D3D3;\n border-right-style: none;\n border-right-width: 2px;\n border-right-color: #D3D3D3;\n}\n\n#ggazdkgmch .gt_footnote {\n margin: 0px;\n font-size: 90%;\n padding-top: 4px;\n padding-bottom: 4px;\n padding-left: 5px;\n padding-right: 5px;\n}\n\n#ggazdkgmch .gt_sourcenotes {\n color: #333333;\n background-color: #FFFFFF;\n border-bottom-style: none;\n border-bottom-width: 2px;\n border-bottom-color: #D3D3D3;\n border-left-style: none;\n border-left-width: 2px;\n border-left-color: #D3D3D3;\n border-right-style: none;\n border-right-width: 2px;\n border-right-color: #D3D3D3;\n}\n\n#ggazdkgmch .gt_sourcenote {\n font-size: 90%;\n padding-top: 4px;\n padding-bottom: 4px;\n padding-left: 5px;\n padding-right: 5px;\n}\n\n#ggazdkgmch .gt_left {\n text-align: left;\n}\n\n#ggazdkgmch .gt_center {\n text-align: center;\n}\n\n#ggazdkgmch .gt_right {\n text-align: right;\n font-variant-numeric: tabular-nums;\n}\n\n#ggazdkgmch .gt_font_normal {\n font-weight: normal;\n}\n\n#ggazdkgmch .gt_font_bold {\n font-weight: bold;\n}\n\n#ggazdkgmch .gt_font_italic {\n font-style: italic;\n}\n\n#ggazdkgmch .gt_super {\n font-size: 65%;\n}\n\n#ggazdkgmch .gt_footnote_marks {\n font-size: 75%;\n vertical-align: 0.4em;\n position: initial;\n}\n\n#ggazdkgmch .gt_asterisk {\n font-size: 100%;\n vertical-align: 0;\n}\n\n#ggazdkgmch .gt_indent_1 {\n text-indent: 5px;\n}\n\n#ggazdkgmch .gt_indent_2 {\n text-indent: 10px;\n}\n\n#ggazdkgmch .gt_indent_3 {\n text-indent: 15px;\n}\n\n#ggazdkgmch .gt_indent_4 {\n text-indent: 20px;\n}\n\n#ggazdkgmch .gt_indent_5 {\n text-indent: 25px;\n}\n\n#ggazdkgmch .katex-display {\n display: inline-flex !important;\n margin-bottom: 0.75em !important;\n}\n\n#ggazdkgmch div.Reactable > div.rt-table > div.rt-thead > div.rt-tr.rt-tr-group-header > div.rt-th-group:after {\n height: 0px !important;\n}\n</style>\n<table class=\"gt_table\" data-quarto-disable-processing=\"false\" data-quarto-bootstrap=\"false\">\n <thead>\n <tr class=\"gt_heading\">\n <td colspan=\"4\" class=\"gt_heading gt_title gt_font_normal gt_bottom_border\" style>Types of Atomic Vectors<span class=\"gt_footnote_marks\" style=\"white-space:nowrap;font-style:italic;font-weight:normal;line-height:0;\"><sup>1</sup></span></td>\n </tr>\n \n <tr class=\"gt_col_headings\">\n <th class=\"gt_col_heading gt_columns_bottom_border gt_center\" rowspan=\"1\" colspan=\"1\" scope=\"col\" id=\"var_names\">name</th>\n <th class=\"gt_col_heading gt_columns_bottom_border gt_center\" rowspan=\"1\" colspan=\"1\" scope=\"col\" id=\"var_values\">value</th>\n <th class=\"gt_col_heading gt_columns_bottom_border gt_center\" rowspan=\"1\" colspan=\"1\" scope=\"col\" id=\"var_type\">typeof()</th>\n <th class=\"gt_col_heading gt_columns_bottom_border gt_center\" rowspan=\"1\" colspan=\"1\" scope=\"col\" id=\"var_length\">length()</th>\n </tr>\n </thead>\n <tbody class=\"gt_table_body\">\n <tr><td headers=\"var_names\" class=\"gt_row gt_center\">lgl_var</td>\n<td headers=\"var_values\" class=\"gt_row gt_center\">TRUE, FALSE</td>\n<td headers=\"var_type\" class=\"gt_row gt_center\" style=\"background-color: #F9E3D6;\">logical</td>\n<td headers=\"var_length\" class=\"gt_row gt_center\" style=\"background-color: #E0FFFF;\">2</td></tr>\n <tr><td headers=\"var_names\" class=\"gt_row gt_center\">int_var</td>\n<td headers=\"var_values\" class=\"gt_row gt_center\">1L, 6L, 10L</td>\n<td headers=\"var_type\" class=\"gt_row gt_center\" style=\"background-color: #F9E3D6;\">integer</td>\n<td headers=\"var_length\" class=\"gt_row gt_center\" style=\"background-color: #E0FFFF;\">3</td></tr>\n <tr><td headers=\"var_names\" class=\"gt_row gt_center\">dbl_var</td>\n<td headers=\"var_values\" class=\"gt_row gt_center\">1, 2.5, 4.5</td>\n<td headers=\"var_type\" class=\"gt_row gt_center\" style=\"background-color: #F9E3D6;\">double</td>\n<td headers=\"var_length\" class=\"gt_row gt_center\" style=\"background-color: #E0FFFF;\">3</td></tr>\n <tr><td headers=\"var_names\" class=\"gt_row gt_center\">chr_var</td>\n<td headers=\"var_values\" class=\"gt_row gt_center\">'these are', 'some strings'</td>\n<td headers=\"var_type\" class=\"gt_row gt_center\" style=\"background-color: #F9E3D6;\">character</td>\n<td headers=\"var_length\" class=\"gt_row gt_center\" style=\"background-color: #E0FFFF;\">2</td></tr>\n </tbody>\n \n <tfoot class=\"gt_footnotes\">\n <tr>\n <td class=\"gt_footnote\" colspan=\"4\"><span class=\"gt_footnote_marks\" style=\"white-space:nowrap;font-style:italic;font-weight:normal;line-height:0;\"><sup>1</sup></span> Source: https://adv-r.hadley.nz/index.html</td>\n </tr>\n </tfoot>\n</table>\n</div>\n```\n\n:::\n:::\n\n\n<details>\n<summary>Side Quest: Penguins</summary>\n\n::: {.cell}\n\n```{.r .cell-code}\ntypeof(penguins$species)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"integer\"\n```\n\n\n:::\n\n```{.r .cell-code}\nclass(penguins$species)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"factor\"\n```\n\n\n:::\n\n```{.r .cell-code}\ntypeof(species_unlist)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"integer\"\n```\n\n\n:::\n\n```{.r .cell-code}\nclass(species_unlist)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"factor\"\n```\n\n\n:::\n\n```{.r .cell-code}\ntypeof(species_pull)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"integer\"\n```\n\n\n:::\n\n```{.r .cell-code}\nclass(species_pull)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"factor\"\n```\n\n\n:::\n:::\n\n\n</details>\n\n### Missing values\n\n#### Contagion\n\nFor most computations, an operation over values that includes a missing value yields a missing value (unless you're careful)\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# contagion\n5*NA\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] NA\n```\n\n\n:::\n\n```{.r .cell-code}\nsum(c(1, 2, NA, 3))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] NA\n```\n\n\n:::\n:::\n\n\n#### Exceptions\n\n\n::: {.cell}\n\n```{.r .cell-code}\nNA ^ 0\n#> [1] 1\nNA | TRUE\n#> [1] TRUE\nNA & FALSE\n#> [1] FALSE\n```\n:::\n\n\n\n#### Innoculation\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsum(c(1, 2, NA, 3), na.rm = TRUE)\n# output: 6\n```\n:::\n\n\nTo search for missing values use `is.na()`\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- c(NA, 5, NA, 10)\nx == NA\n# output: NA NA NA NA [BATMAN!]\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nis.na(x)\n# output: TRUE FALSE TRUE FALSE\n```\n:::\n\n\n<details>\n<summary>Side Quest: NA Types</summary>\n\nEach type has its own NA type\n\n- Logical: `NA`\n- Integer: `NA_integer`\n- Double: `NA_double`\n- Character: `NA_character`\n\nThis may not matter in many contexts.\n\nBut this does matter for operations where types matter like `dplyr::if_else()`.\n</details>\n\n\n### Testing\n\n**What type of vector `is.*`() it?**\n\nTest data type:\n\n- Logical: `is.logical()`\n- Integer: `is.integer()`\n- Double: `is.double()`\n- Character: `is.character()`\n\n**What type of object is it?**\n\nDon't test objects with these tools:\n\n- `is.vector()`\n- `is.atomic()`\n- `is.numeric()` \n\nThey don’t test if you have a vector, atomic vector, or numeric vector; you’ll need to carefully read the documentation to figure out what they actually do (preview: *attributes*)\n\n<details>\n<summary>Side Quest: rlang</summary>\n\nInstead, maybe, use `{rlang}`\n\n- `rlang::is_vector`\n- `rlang::is_atomic`\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# vector\nrlang::is_vector(c(1, 2))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] TRUE\n```\n\n\n:::\n\n```{.r .cell-code}\nrlang::is_vector(list(1, 2))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] TRUE\n```\n\n\n:::\n\n```{.r .cell-code}\n# atomic\nrlang::is_atomic(c(1, 2))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] TRUE\n```\n\n\n:::\n\n```{.r .cell-code}\nrlang::is_atomic(list(1, \"a\"))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] FALSE\n```\n\n\n:::\n:::\n\n\nSee more [here](https://rlang.r-lib.org/reference/type-predicates.html)\n</details>\n\n\n### Coercion\n\n* R follows rules for coercion: character → double → integer → logical\n\n* R can coerce either automatically or explicitly\n\n#### **Automatic**\n\nTwo contexts for automatic coercion:\n\n1. Combination\n2. Mathematical\n\n##### Coercion by Combination:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nstr(c(TRUE, \"TRUE\"))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> chr [1:2] \"TRUE\" \"TRUE\"\n```\n\n\n:::\n:::\n\n\n##### Coercion by Mathematical operations:\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# imagine a logical vector about whether an attribute is present\nhas_attribute <- c(TRUE, FALSE, TRUE, TRUE)\n\n# number with attribute\nsum(has_attribute)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 3\n```\n\n\n:::\n:::\n\n\n#### **Explicit**\n\n<!--\n\nUse `as.*()`\n\n- Logical: `as.logical()`\n- Integer: `as.integer()`\n- Double: `as.double()`\n- Character: `as.character()`\n\n-->\n\n\n::: {.cell}\n::: {.cell-output-display}\n\n```{=html}\n<div id=\"iququhfgjf\" style=\"padding-left:0px;padding-right:0px;padding-top:10px;padding-bottom:10px;overflow-x:auto;overflow-y:auto;width:auto;height:auto;\">\n<style>#iququhfgjf table {\n font-family: system-ui, 'Segoe UI', Roboto, Helvetica, Arial, sans-serif, 'Apple Color Emoji', 'Segoe UI Emoji', 'Segoe UI Symbol', 'Noto Color Emoji';\n -webkit-font-smoothing: antialiased;\n -moz-osx-font-smoothing: grayscale;\n}\n\n#iququhfgjf thead, #iququhfgjf tbody, #iququhfgjf tfoot, #iququhfgjf tr, #iququhfgjf td, #iququhfgjf th {\n border-style: none;\n}\n\n#iququhfgjf p {\n margin: 0;\n padding: 0;\n}\n\n#iququhfgjf .gt_table {\n display: table;\n border-collapse: collapse;\n line-height: normal;\n margin-left: auto;\n margin-right: auto;\n color: #333333;\n font-size: 16px;\n font-weight: normal;\n font-style: normal;\n background-color: #FFFFFF;\n width: auto;\n border-top-style: solid;\n border-top-width: 2px;\n border-top-color: #A8A8A8;\n border-right-style: none;\n border-right-width: 2px;\n border-right-color: #D3D3D3;\n border-bottom-style: solid;\n border-bottom-width: 2px;\n border-bottom-color: #A8A8A8;\n border-left-style: none;\n border-left-width: 2px;\n border-left-color: #D3D3D3;\n}\n\n#iququhfgjf .gt_caption {\n padding-top: 4px;\n padding-bottom: 4px;\n}\n\n#iququhfgjf .gt_title {\n color: #333333;\n font-size: 125%;\n font-weight: initial;\n padding-top: 4px;\n padding-bottom: 4px;\n padding-left: 5px;\n padding-right: 5px;\n border-bottom-color: #FFFFFF;\n border-bottom-width: 0;\n}\n\n#iququhfgjf .gt_subtitle {\n color: #333333;\n font-size: 85%;\n font-weight: initial;\n padding-top: 3px;\n padding-bottom: 5px;\n padding-left: 5px;\n padding-right: 5px;\n border-top-color: #FFFFFF;\n border-top-width: 0;\n}\n\n#iququhfgjf .gt_heading {\n background-color: #FFFFFF;\n text-align: center;\n border-bottom-color: #FFFFFF;\n border-left-style: none;\n border-left-width: 1px;\n border-left-color: #D3D3D3;\n border-right-style: none;\n border-right-width: 1px;\n border-right-color: #D3D3D3;\n}\n\n#iququhfgjf .gt_bottom_border {\n border-bottom-style: solid;\n border-bottom-width: 2px;\n border-bottom-color: #D3D3D3;\n}\n\n#iququhfgjf .gt_col_headings {\n border-top-style: solid;\n border-top-width: 2px;\n border-top-color: #D3D3D3;\n border-bottom-style: solid;\n border-bottom-width: 2px;\n border-bottom-color: #D3D3D3;\n border-left-style: none;\n border-left-width: 1px;\n border-left-color: #D3D3D3;\n border-right-style: none;\n border-right-width: 1px;\n border-right-color: #D3D3D3;\n}\n\n#iququhfgjf .gt_col_heading {\n color: #333333;\n background-color: #FFFFFF;\n font-size: 100%;\n font-weight: normal;\n text-transform: inherit;\n border-left-style: none;\n border-left-width: 1px;\n border-left-color: #D3D3D3;\n border-right-style: none;\n border-right-width: 1px;\n border-right-color: #D3D3D3;\n vertical-align: bottom;\n padding-top: 5px;\n padding-bottom: 6px;\n padding-left: 5px;\n padding-right: 5px;\n overflow-x: hidden;\n}\n\n#iququhfgjf .gt_column_spanner_outer {\n color: #333333;\n background-color: #FFFFFF;\n font-size: 100%;\n font-weight: normal;\n text-transform: inherit;\n padding-top: 0;\n padding-bottom: 0;\n padding-left: 4px;\n padding-right: 4px;\n}\n\n#iququhfgjf .gt_column_spanner_outer:first-child {\n padding-left: 0;\n}\n\n#iququhfgjf .gt_column_spanner_outer:last-child {\n padding-right: 0;\n}\n\n#iququhfgjf .gt_column_spanner {\n border-bottom-style: solid;\n border-bottom-width: 2px;\n border-bottom-color: #D3D3D3;\n vertical-align: bottom;\n padding-top: 5px;\n padding-bottom: 5px;\n overflow-x: hidden;\n display: inline-block;\n width: 100%;\n}\n\n#iququhfgjf .gt_spanner_row {\n border-bottom-style: hidden;\n}\n\n#iququhfgjf .gt_group_heading {\n padding-top: 8px;\n padding-bottom: 8px;\n padding-left: 5px;\n padding-right: 5px;\n color: #333333;\n background-color: #FFFFFF;\n font-size: 100%;\n font-weight: initial;\n text-transform: inherit;\n border-top-style: solid;\n border-top-width: 2px;\n border-top-color: #D3D3D3;\n border-bottom-style: solid;\n border-bottom-width: 2px;\n border-bottom-color: #D3D3D3;\n border-left-style: none;\n border-left-width: 1px;\n border-left-color: #D3D3D3;\n border-right-style: none;\n border-right-width: 1px;\n border-right-color: #D3D3D3;\n vertical-align: middle;\n text-align: left;\n}\n\n#iququhfgjf .gt_empty_group_heading {\n padding: 0.5px;\n color: #333333;\n background-color: #FFFFFF;\n font-size: 100%;\n font-weight: initial;\n border-top-style: solid;\n border-top-width: 2px;\n border-top-color: #D3D3D3;\n border-bottom-style: solid;\n border-bottom-width: 2px;\n border-bottom-color: #D3D3D3;\n vertical-align: middle;\n}\n\n#iququhfgjf .gt_from_md > :first-child {\n margin-top: 0;\n}\n\n#iququhfgjf .gt_from_md > :last-child {\n margin-bottom: 0;\n}\n\n#iququhfgjf .gt_row {\n padding-top: 8px;\n padding-bottom: 8px;\n padding-left: 5px;\n padding-right: 5px;\n margin: 10px;\n border-top-style: solid;\n border-top-width: 1px;\n border-top-color: #D3D3D3;\n border-left-style: none;\n border-left-width: 1px;\n border-left-color: #D3D3D3;\n border-right-style: none;\n border-right-width: 1px;\n border-right-color: #D3D3D3;\n vertical-align: middle;\n overflow-x: hidden;\n}\n\n#iququhfgjf .gt_stub {\n color: #333333;\n background-color: #FFFFFF;\n font-size: 100%;\n font-weight: initial;\n text-transform: inherit;\n border-right-style: solid;\n border-right-width: 2px;\n border-right-color: #D3D3D3;\n padding-left: 5px;\n padding-right: 5px;\n}\n\n#iququhfgjf .gt_stub_row_group {\n color: #333333;\n background-color: #FFFFFF;\n font-size: 100%;\n font-weight: initial;\n text-transform: inherit;\n border-right-style: solid;\n border-right-width: 2px;\n border-right-color: #D3D3D3;\n padding-left: 5px;\n padding-right: 5px;\n vertical-align: top;\n}\n\n#iququhfgjf .gt_row_group_first td {\n border-top-width: 2px;\n}\n\n#iququhfgjf .gt_row_group_first th {\n border-top-width: 2px;\n}\n\n#iququhfgjf .gt_summary_row {\n color: #333333;\n background-color: #FFFFFF;\n text-transform: inherit;\n padding-top: 8px;\n padding-bottom: 8px;\n padding-left: 5px;\n padding-right: 5px;\n}\n\n#iququhfgjf .gt_first_summary_row {\n border-top-style: solid;\n border-top-color: #D3D3D3;\n}\n\n#iququhfgjf .gt_first_summary_row.thick {\n border-top-width: 2px;\n}\n\n#iququhfgjf .gt_last_summary_row {\n padding-top: 8px;\n padding-bottom: 8px;\n padding-left: 5px;\n padding-right: 5px;\n border-bottom-style: solid;\n border-bottom-width: 2px;\n border-bottom-color: #D3D3D3;\n}\n\n#iququhfgjf .gt_grand_summary_row {\n color: #333333;\n background-color: #FFFFFF;\n text-transform: inherit;\n padding-top: 8px;\n padding-bottom: 8px;\n padding-left: 5px;\n padding-right: 5px;\n}\n\n#iququhfgjf .gt_first_grand_summary_row {\n padding-top: 8px;\n padding-bottom: 8px;\n padding-left: 5px;\n padding-right: 5px;\n border-top-style: double;\n border-top-width: 6px;\n border-top-color: #D3D3D3;\n}\n\n#iququhfgjf .gt_last_grand_summary_row_top {\n padding-top: 8px;\n padding-bottom: 8px;\n padding-left: 5px;\n padding-right: 5px;\n border-bottom-style: double;\n border-bottom-width: 6px;\n border-bottom-color: #D3D3D3;\n}\n\n#iququhfgjf .gt_striped {\n background-color: rgba(128, 128, 128, 0.05);\n}\n\n#iququhfgjf .gt_table_body {\n border-top-style: solid;\n border-top-width: 2px;\n border-top-color: #D3D3D3;\n border-bottom-style: solid;\n border-bottom-width: 2px;\n border-bottom-color: #D3D3D3;\n}\n\n#iququhfgjf .gt_footnotes {\n color: #333333;\n background-color: #FFFFFF;\n border-bottom-style: none;\n border-bottom-width: 2px;\n border-bottom-color: #D3D3D3;\n border-left-style: none;\n border-left-width: 2px;\n border-left-color: #D3D3D3;\n border-right-style: none;\n border-right-width: 2px;\n border-right-color: #D3D3D3;\n}\n\n#iququhfgjf .gt_footnote {\n margin: 0px;\n font-size: 90%;\n padding-top: 4px;\n padding-bottom: 4px;\n padding-left: 5px;\n padding-right: 5px;\n}\n\n#iququhfgjf .gt_sourcenotes {\n color: #333333;\n background-color: #FFFFFF;\n border-bottom-style: none;\n border-bottom-width: 2px;\n border-bottom-color: #D3D3D3;\n border-left-style: none;\n border-left-width: 2px;\n border-left-color: #D3D3D3;\n border-right-style: none;\n border-right-width: 2px;\n border-right-color: #D3D3D3;\n}\n\n#iququhfgjf .gt_sourcenote {\n font-size: 90%;\n padding-top: 4px;\n padding-bottom: 4px;\n padding-left: 5px;\n padding-right: 5px;\n}\n\n#iququhfgjf .gt_left {\n text-align: left;\n}\n\n#iququhfgjf .gt_center {\n text-align: center;\n}\n\n#iququhfgjf .gt_right {\n text-align: right;\n font-variant-numeric: tabular-nums;\n}\n\n#iququhfgjf .gt_font_normal {\n font-weight: normal;\n}\n\n#iququhfgjf .gt_font_bold {\n font-weight: bold;\n}\n\n#iququhfgjf .gt_font_italic {\n font-style: italic;\n}\n\n#iququhfgjf .gt_super {\n font-size: 65%;\n}\n\n#iququhfgjf .gt_footnote_marks {\n font-size: 75%;\n vertical-align: 0.4em;\n position: initial;\n}\n\n#iququhfgjf .gt_asterisk {\n font-size: 100%;\n vertical-align: 0;\n}\n\n#iququhfgjf .gt_indent_1 {\n text-indent: 5px;\n}\n\n#iququhfgjf .gt_indent_2 {\n text-indent: 10px;\n}\n\n#iququhfgjf .gt_indent_3 {\n text-indent: 15px;\n}\n\n#iququhfgjf .gt_indent_4 {\n text-indent: 20px;\n}\n\n#iququhfgjf .gt_indent_5 {\n text-indent: 25px;\n}\n\n#iququhfgjf .katex-display {\n display: inline-flex !important;\n margin-bottom: 0.75em !important;\n}\n\n#iququhfgjf div.Reactable > div.rt-table > div.rt-thead > div.rt-tr.rt-tr-group-header > div.rt-th-group:after {\n height: 0px !important;\n}\n</style>\n<table class=\"gt_table\" data-quarto-disable-processing=\"false\" data-quarto-bootstrap=\"false\">\n <thead>\n <tr class=\"gt_heading\">\n <td colspan=\"6\" class=\"gt_heading gt_title gt_font_normal gt_bottom_border\" style>Coercion of Atomic Vectors<span class=\"gt_footnote_marks\" style=\"white-space:nowrap;font-style:italic;font-weight:normal;line-height:0;\"><sup>1</sup></span></td>\n </tr>\n \n <tr class=\"gt_col_headings\">\n <th class=\"gt_col_heading gt_columns_bottom_border gt_center\" rowspan=\"1\" colspan=\"1\" scope=\"col\" id=\"var_names\">name</th>\n <th class=\"gt_col_heading gt_columns_bottom_border gt_center\" rowspan=\"1\" colspan=\"1\" scope=\"col\" id=\"var_values\">value</th>\n <th class=\"gt_col_heading gt_columns_bottom_border gt_center\" rowspan=\"1\" colspan=\"1\" scope=\"col\" id=\"as_logical\">as.logical()</th>\n <th class=\"gt_col_heading gt_columns_bottom_border gt_center\" rowspan=\"1\" colspan=\"1\" scope=\"col\" id=\"as_integer\">as.integer()</th>\n <th class=\"gt_col_heading gt_columns_bottom_border gt_center\" rowspan=\"1\" colspan=\"1\" scope=\"col\" id=\"as_double\">as.double()</th>\n <th class=\"gt_col_heading gt_columns_bottom_border gt_center\" rowspan=\"1\" colspan=\"1\" scope=\"col\" id=\"as_character\">as.character()</th>\n </tr>\n </thead>\n <tbody class=\"gt_table_body\">\n <tr><td headers=\"var_names\" class=\"gt_row gt_center\">lgl_var</td>\n<td headers=\"var_values\" class=\"gt_row gt_center\">TRUE, FALSE</td>\n<td headers=\"as_logical\" class=\"gt_row gt_center\" style=\"background-color: #F9E3D6;\">TRUE FALSE</td>\n<td headers=\"as_integer\" class=\"gt_row gt_center\" style=\"background-color: #E0FFFF;\">1 0</td>\n<td headers=\"as_double\" class=\"gt_row gt_center\" style=\"background-color: #F9E3D6;\">1 0</td>\n<td headers=\"as_character\" class=\"gt_row gt_center\" style=\"background-color: #E0FFFF;\">'TRUE' 'FALSE'</td></tr>\n <tr><td headers=\"var_names\" class=\"gt_row gt_center\">int_var</td>\n<td headers=\"var_values\" class=\"gt_row gt_center\">1L, 6L, 10L</td>\n<td headers=\"as_logical\" class=\"gt_row gt_center\" style=\"background-color: #F9E3D6;\">TRUE TRUE TRUE</td>\n<td headers=\"as_integer\" class=\"gt_row gt_center\" style=\"background-color: #E0FFFF;\">1 6 10</td>\n<td headers=\"as_double\" class=\"gt_row gt_center\" style=\"background-color: #F9E3D6;\">1 6 10</td>\n<td headers=\"as_character\" class=\"gt_row gt_center\" style=\"background-color: #E0FFFF;\">'1' '6' '10'</td></tr>\n <tr><td headers=\"var_names\" class=\"gt_row gt_center\">dbl_var</td>\n<td headers=\"var_values\" class=\"gt_row gt_center\">1, 2.5, 4.5</td>\n<td headers=\"as_logical\" class=\"gt_row gt_center\" style=\"background-color: #F9E3D6;\">TRUE TRUE TRUE</td>\n<td headers=\"as_integer\" class=\"gt_row gt_center\" style=\"background-color: #E0FFFF;\">1 2 4</td>\n<td headers=\"as_double\" class=\"gt_row gt_center\" style=\"background-color: #F9E3D6;\">1.0 2.5 4.5</td>\n<td headers=\"as_character\" class=\"gt_row gt_center\" style=\"background-color: #E0FFFF;\">'1' '2.5' '4.5'</td></tr>\n <tr><td headers=\"var_names\" class=\"gt_row gt_center\">chr_var</td>\n<td headers=\"var_values\" class=\"gt_row gt_center\">'these are', 'some strings'</td>\n<td headers=\"as_logical\" class=\"gt_row gt_center\" style=\"background-color: #F9E3D6;\">NA NA</td>\n<td headers=\"as_integer\" class=\"gt_row gt_center\" style=\"background-color: #E0FFFF;\">NA_integer</td>\n<td headers=\"as_double\" class=\"gt_row gt_center\" style=\"background-color: #F9E3D6;\">NA_double</td>\n<td headers=\"as_character\" class=\"gt_row gt_center\" style=\"background-color: #E0FFFF;\">'these are', 'some strings'</td></tr>\n </tbody>\n \n <tfoot class=\"gt_footnotes\">\n <tr>\n <td class=\"gt_footnote\" colspan=\"6\"><span class=\"gt_footnote_marks\" style=\"white-space:nowrap;font-style:italic;font-weight:normal;line-height:0;\"><sup>1</sup></span> Source: https://adv-r.hadley.nz/index.html</td>\n </tr>\n </tfoot>\n</table>\n</div>\n```\n\n:::\n:::\n\n\nBut note that coercion may fail in one of two ways, or both:\n\n- With warning/error\n- NAs\n\n\n::: {.cell}\n\n```{.r .cell-code}\nas.integer(c(1, 2, \"three\"))\n```\n\n::: {.cell-output .cell-output-stderr}\n\n```\n#> Warning: NAs introduced by coercion\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1 2 NA\n```\n\n\n:::\n:::\n\n\n### Exercises\n\n1. How do you create raw and complex scalars?\n\n<details><summary>Answer(s)</summary>\n\n::: {.cell}\n\n```{.r .cell-code}\nas.raw(42)\n#> [1] 2a\ncharToRaw(\"A\")\n#> [1] 41\ncomplex(length.out = 1, real = 1, imaginary = 1)\n#> [1] 1+1i\n```\n:::\n\n</details>\n\n2. Test your knowledge of the vector coercion rules by predicting the output of the following uses of c():\n\n\n::: {.cell}\n\n```{.r .cell-code}\nc(1, FALSE)\nc(\"a\", 1)\nc(TRUE, 1L)\n```\n:::\n\n\n<details><summary>Answer(s)</summary>\n\n::: {.cell}\n\n```{.r .cell-code}\nc(1, FALSE) # will be coerced to double -> 1 0\nc(\"a\", 1) # will be coerced to character -> \"a\" \"1\"\nc(TRUE, 1L) # will be coerced to integer -> 1 1\n```\n:::\n\n</details>\n\n3. Why is `1 == \"1\"` true? Why is `-1 < FALSE` true? Why is `\"one\" < 2` false?\n\n<details><summary>Answer(s)</summary>\nThese comparisons are carried out by operator-functions (==, <), which coerce their arguments to a common type. In the examples above, these types will be character, double and character: 1 will be coerced to \"1\", FALSE is represented as 0 and 2 turns into \"2\" (and numbers precede letters in lexicographic order (may depend on locale)).\n\n</details>\n\n4. Why is the default missing value, NA, a logical vector? What’s special about logical vectors?\n\n<details><summary>Answer(s)</summary>\nThe presence of missing values shouldn’t affect the type of an object. Recall that there is a type-hierarchy for coercion from character → double → integer → logical. When combining `NA`s with other atomic types, the `NA`s will be coerced to integer (`NA_integer_`), double (`NA_real_`) or character (`NA_character_`) and not the other way round. If `NA` were a character and added to a set of other values all of these would be coerced to character as well.\n</details>\n\n5. Precisely what do `is.atomic()`, `is.numeric()`, and `is.vector()` test for?\n\n<details><summary>Answer(s)</summary>\nThe documentation states that:\n\n* `is.atomic()` tests if an object is an atomic vector (as defined in *Advanced R*) or is `NULL` (!).\n* `is.numeric()` tests if an object has type integer or double and is not of class `factor`, `Date`, `POSIXt` or `difftime`.\n* `is.vector()` tests if an object is a vector (as defined in *Advanced R*) or an expression and has no attributes, apart from names.\n\nAtomic vectors are defined in *Advanced R* as objects of type logical, integer, double, complex, character or raw. Vectors are defined as atomic vectors or lists.\n</details>\n\n\n\n## Attributes\n\nAttributes are name-value pairs that attach metadata to an object(vector).\n\n* **Name-value pairs**: attributes have a name and a value\n* **Metadata**: not data itself, but data about the data\n\n### How? \n\n#### Getting and Setting\n\nThree functions:\n\n1. retrieve and modify single attributes with `attr()`\n2. retrieve en masse with `attributes()`\n3. set en masse with `structure()`\n\n**Single attribute**\n\nUse `attr()`\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# some object\na <- c(1, 2, 3)\n\n# set attribute\nattr(x = a, which = \"attribute_name\") <- \"some attribute\"\n\n# get attribute\nattr(a, \"attribute_name\")\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"some attribute\"\n```\n\n\n:::\n:::\n\n\n**Multiple attributes**\n\nTo set multiple attributes, use `structure()` To get multiple attributes, use `attributes()`\n\n\n::: {.cell}\n\n:::\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\na <- 1:3\nattr(a, \"x\") <- \"abcdef\"\nattr(a, \"x\")\n#> [1] \"abcdef\"\n\nattr(a, \"y\") <- 4:6\nstr(attributes(a))\n#> List of 2\n#> $ x: chr \"abcdef\"\n#> $ y: int [1:3] 4 5 6\n\n# Or equivalently\na <- structure(\n 1:3, \n x = \"abcdef\",\n y = 4:6\n)\nstr(attributes(a))\n#> List of 2\n#> $ x: chr \"abcdef\"\n#> $ y: int [1:3] 4 5 6\n```\n:::\n\n\n \n\n### Why\n\nThree particularly important attributes: \n\n1. **names** - a character vector giving each element a name\n2. **dimension** - (or dim) turns vectors into matrices and arrays \n3. **class** - powers the S3 object system (we'll learn more about this in chapter 13)\n\nMost attributes are lost by most operations. Only two attributes are routinely preserved: names and dimension.\n\n#### Names\n\n~~Three~~ Four ways to name:\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# (1) When creating it: \nx <- c(A = 1, B = 2, C = 3)\nx\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> A B C \n#> 1 2 3\n```\n\n\n:::\n\n```{.r .cell-code}\n# (2) By assigning a character vector to names()\ny <- 1:3\nnames(y) <- c(\"a\", \"b\", \"c\")\ny\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> a b c \n#> 1 2 3\n```\n\n\n:::\n\n```{.r .cell-code}\n# (3) Inline, with setNames():\nz <- setNames(1:3, c(\"a\", \"b\", \"c\"))\nz\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> a b c \n#> 1 2 3\n```\n\n\n:::\n:::\n\n\n \n\n\n::: {.cell}\n\n```{.r .cell-code}\n# (4) By setting names--with {rlang}\na <- 1:3\nrlang::set_names(\n x = a,\n nm = c(\"a\", \"b\", \"c\")\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> a b c \n#> 1 2 3\n```\n\n\n:::\n:::\n\n\n \n\n* You can remove names from a vector by using `x <- unname(x)` or `names(x) <- NULL`.\n* Thematically but not directly related: labelled class vectors with `haven::labelled()`\n\n\n#### Dimensions\n\nCreate matrices and arrays with `matrix()` and `array()`, or by using the assignment form of `dim()`:\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Two scalar arguments specify row and column sizes\nx <- matrix(1:6, nrow = 2, ncol = 3)\nx\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [,1] [,2] [,3]\n#> [1,] 1 3 5\n#> [2,] 2 4 6\n```\n\n\n:::\n\n```{.r .cell-code}\n# One vector argument to describe all dimensions\ny <- array(1:24, c(2, 3, 4)) # rows, columns, no of arrays\ny\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> , , 1\n#> \n#> [,1] [,2] [,3]\n#> [1,] 1 3 5\n#> [2,] 2 4 6\n#> \n#> , , 2\n#> \n#> [,1] [,2] [,3]\n#> [1,] 7 9 11\n#> [2,] 8 10 12\n#> \n#> , , 3\n#> \n#> [,1] [,2] [,3]\n#> [1,] 13 15 17\n#> [2,] 14 16 18\n#> \n#> , , 4\n#> \n#> [,1] [,2] [,3]\n#> [1,] 19 21 23\n#> [2,] 20 22 24\n```\n\n\n:::\n\n```{.r .cell-code}\n# You can also modify an object in place by setting dim()\nz <- 1:6\ndim(z) <- c(2, 3) # rows, columns\nz\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [,1] [,2] [,3]\n#> [1,] 1 3 5\n#> [2,] 2 4 6\n```\n\n\n:::\n\n```{.r .cell-code}\na <- 1:24\ndim(a) <- c(2, 3, 4) # rows, columns, no of arrays\na\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> , , 1\n#> \n#> [,1] [,2] [,3]\n#> [1,] 1 3 5\n#> [2,] 2 4 6\n#> \n#> , , 2\n#> \n#> [,1] [,2] [,3]\n#> [1,] 7 9 11\n#> [2,] 8 10 12\n#> \n#> , , 3\n#> \n#> [,1] [,2] [,3]\n#> [1,] 13 15 17\n#> [2,] 14 16 18\n#> \n#> , , 4\n#> \n#> [,1] [,2] [,3]\n#> [1,] 19 21 23\n#> [2,] 20 22 24\n```\n\n\n:::\n:::\n\n\n##### Functions for working with vectors, matrices and arrays:\n\nVector | Matrix\t| Array\n:----- | :---------- | :-----\n`names()` | `rownames()`, `colnames()` | `dimnames()`\n`length()` | `nrow()`, `ncol()` | `dim()`\n`c()` | `rbind()`, `cbind()` | `abind::abind()`\n— | `t()` | `aperm()`\n`is.null(dim(x))` | `is.matrix()` | `is.array()`\n\n* **Caution**: A vector without a `dim` attribute set is often thought of as 1-dimensional, but actually has `NULL` dimensions.\n* One dimension?\n\n\n::: {.cell}\n\n```{.r .cell-code}\nstr(1:3) # 1d vector\n#> int [1:3] 1 2 3\nstr(matrix(1:3, ncol = 1)) # column vector\n#> int [1:3, 1] 1 2 3\nstr(matrix(1:3, nrow = 1)) # row vector\n#> int [1, 1:3] 1 2 3\nstr(array(1:3, 3)) # \"array\" vector\n#> int [1:3(1d)] 1 2 3\n```\n:::\n\n\n\n### Exercises\n\n1. How is `setNames()` implemented? How is `unname()` implemented? Read the source code.\n\n<details><summary>Answer(s)</summary>\n`setNames()` is implemented as:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsetNames <- function(object = nm, nm) {\n names(object) <- nm\n object\n}\n```\n:::\n\n\nBecause the data argument comes first, `setNames()` also works well with the magrittr-pipe operator. When no first argument is given, the result is a named vector (this is rather untypical as required arguments usually come first):\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsetNames( , c(\"a\", \"b\", \"c\"))\n#> a b c \n#> \"a\" \"b\" \"c\"\n```\n:::\n\n\n`unname()` is implemented in the following way:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nunname <- function(obj, force = FALSE) {\n if (!is.null(names(obj))) \n names(obj) <- NULL\n if (!is.null(dimnames(obj)) && (force || !is.data.frame(obj))) \n dimnames(obj) <- NULL\n obj\n}\n```\n:::\n\n\n`unname()` removes existing names (or dimnames) by setting them to `NULL`.\n</details>\n\n2. What does `dim()` return when applied to a 1-dimensional vector? When might you use `NROW()` or `NCOL()`?\n\n<details><summary>Answer(s)</summary>\n\n> dim() will return NULL when applied to a 1d vector.\n\nOne may want to use `NROW()` or `NCOL()` to handle atomic vectors, lists and NULL values in the same way as one column matrices or data frames. For these objects `nrow()` and `ncol()` return NULL:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- 1:10\n\n# Return NULL\nnrow(x)\n#> NULL\nncol(x)\n#> NULL\n\n# Pretend it's a column vector\nNROW(x)\n#> [1] 10\nNCOL(x)\n#> [1] 1\n```\n:::\n\n\n</details>\n\n3. How would you describe the following three objects? What makes them different from `1:5`?\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx1 <- array(1:5, c(1, 1, 5))\nx2 <- array(1:5, c(1, 5, 1))\nx3 <- array(1:5, c(5, 1, 1))\n```\n:::\n\n\n<details><summary>Answer(s)</summary>\n\n::: {.cell}\n\n```{.r .cell-code}\nx1 <- array(1:5, c(1, 1, 5)) # 1 row, 1 column, 5 in third dim.\nx2 <- array(1:5, c(1, 5, 1)) # 1 row, 5 columns, 1 in third dim.\nx3 <- array(1:5, c(5, 1, 1)) # 5 rows, 1 column, 1 in third dim.\n```\n:::\n\n</details>\n\n\n4. An early draft used this code to illustrate `structure()`:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nstructure(1:5, comment = \"my attribute\")\n#> [1] 1 2 3 4 5\n```\n:::\n\n\nBut when you print that object you don’t see the comment attribute. Why? Is the attribute missing, or is there something else special about it?\n\n<details><summary>Answer(s)</summary>\nThe documentation states (see `?comment`):\n\n> Contrary to other attributes, the comment is not printed (by print or print.default).\n\nAlso, from `?attributes:`\n\n> Note that some attributes (namely class, comment, dim, dimnames, names, row.names and tsp) are treated specially and have restrictions on the values which can be set.\n\nWe can retrieve comment attributes by calling them explicitly:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nfoo <- structure(1:5, comment = \"my attribute\")\n\nattributes(foo)\n#> $comment\n#> [1] \"my attribute\"\nattr(foo, which = \"comment\")\n#> [1] \"my attribute\"\n```\n:::\n\n\n</details>\n\n\n\n## **Class** - S3 atomic vectors\n\n \n\nCredit: [Advanced R](https://adv-r.hadley.nz/index.html) by Hadley Wickham\n\n**Having a class attribute turns an object into an S3 object.**\n\nWhat makes S3 atomic vectors different?\n\n1. behave differently from a regular vector when passed to a generic function \n2. often store additional information in other attributes\n\nFour important S3 vectors used in base R:\n\n1. **Factors** (categorical data)\n2. **Dates**\n3. **Date-times** (POSIXct)\n4. **Durations** (difftime)\n\n### Factors\n\nA factor is a vector used to store categorical data that can contain only predefined values.\n\nFactors are integer vectors with:\n\n- Class: \"factor\"\n- Attributes: \"levels\", or the set of allowed values\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncolors = c('red', 'blue', 'green','red','red', 'green')\n# Build a factor\na_factor <- factor(\n # values\n x = colors,\n # exhaustive list of values\n levels = c('red', 'blue', 'green', 'yellow')\n)\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Useful when some possible values are not present in the data\ntable(colors)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> colors\n#> blue green red \n#> 1 2 3\n```\n\n\n:::\n\n```{.r .cell-code}\ntable(a_factor)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> a_factor\n#> red blue green yellow \n#> 3 1 2 0\n```\n\n\n:::\n\n```{.r .cell-code}\n# - type\ntypeof(a_factor)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"integer\"\n```\n\n\n:::\n\n```{.r .cell-code}\nclass(a_factor)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"factor\"\n```\n\n\n:::\n\n```{.r .cell-code}\n# - attributes\nattributes(a_factor)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> $levels\n#> [1] \"red\" \"blue\" \"green\" \"yellow\"\n#> \n#> $class\n#> [1] \"factor\"\n```\n\n\n:::\n:::\n\n\n#### Custom Order\n\nFactors can be ordered. This can be useful for models or visualizations where order matters.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nvalues <- c('high', 'med', 'low', 'med', 'high', 'low', 'med', 'high')\n\nordered_factor <- ordered(\n # values\n x = values,\n # levels in ascending order\n levels = c('low', 'med', 'high')\n)\n\n# Inspect\nordered_factor\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] high med low med high low med high\n#> Levels: low < med < high\n```\n\n\n:::\n\n```{.r .cell-code}\ntable(values)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> values\n#> high low med \n#> 3 2 3\n```\n\n\n:::\n\n```{.r .cell-code}\ntable(ordered_factor)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> ordered_factor\n#> low med high \n#> 2 3 3\n```\n\n\n:::\n:::\n\n\n### Dates\n\nDates are:\n\n- Double vectors\n- With class \"Date\"\n- No other attributes\n\n\n::: {.cell}\n\n```{.r .cell-code}\nnotes_date <- Sys.Date()\n\n# type\ntypeof(notes_date)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"double\"\n```\n\n\n:::\n\n```{.r .cell-code}\n# class\nattributes(notes_date)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> $class\n#> [1] \"Date\"\n```\n\n\n:::\n:::\n\n\nThe double component represents the number of days since since the [Unix epoch](https://en.wikipedia.org/wiki/Unix_time) `1970-01-01`\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndate <- as.Date(\"1970-02-01\")\nunclass(date)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 31\n```\n\n\n:::\n:::\n\n\n### Date-times\n\nThere are 2 Date-time representations in base R:\n\n- POSIXct, where \"ct\" denotes *calendar time*\n- POSIXlt, where \"lt\" designates *local time*\n\n<!--\n\nJust for fun:\n\"How to pronounce 'POSIXct'?\"\nhttps://www.howtopronounce.com/posixct\n\n-->\n\nWe'll focus on POSIXct because:\n\n- Simplest\n- Built on an atomic (double) vector\n- Most appropriate for use in a data frame\n\nLet's now build and deconstruct a Date-time\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Build\nnote_date_time <- as.POSIXct(\n x = Sys.time(), # time\n tz = \"America/New_York\" # time zone, used only for formatting\n)\n\n# Inspect\nnote_date_time\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"2025-08-04 12:16:00 EDT\"\n```\n\n\n:::\n\n```{.r .cell-code}\n# - type\ntypeof(note_date_time)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"double\"\n```\n\n\n:::\n\n```{.r .cell-code}\n# - attributes\nattributes(note_date_time)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> $class\n#> [1] \"POSIXct\" \"POSIXt\" \n#> \n#> $tzone\n#> [1] \"America/New_York\"\n```\n\n\n:::\n\n```{.r .cell-code}\nstructure(note_date_time, tzone = \"Europe/Paris\")\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"2025-08-04 18:16:00 CEST\"\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndate_time <- as.POSIXct(\"2024-02-22 12:34:56\", tz = \"EST\")\nunclass(date_time)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1708623296\n#> attr(,\"tzone\")\n#> [1] \"EST\"\n```\n\n\n:::\n:::\n\n\n\n### Durations\n\nDurations represent the amount of time between pairs of dates or date-times.\n\n- Double vectors\n- Class: \"difftime\"\n- Attributes: \"units\", or the unit of duration (e.g., weeks, hours, minutes, seconds, etc.)\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Construct\none_minute <- as.difftime(1, units = \"mins\")\n# Inspect\none_minute\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> Time difference of 1 mins\n```\n\n\n:::\n\n```{.r .cell-code}\n# Dissect\n# - type\ntypeof(one_minute)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"double\"\n```\n\n\n:::\n\n```{.r .cell-code}\n# - attributes\nattributes(one_minute)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> $class\n#> [1] \"difftime\"\n#> \n#> $units\n#> [1] \"mins\"\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntime_since_01_01_1970 <- notes_date - date\ntime_since_01_01_1970\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> Time difference of 20273 days\n```\n\n\n:::\n:::\n\n\n\nSee also:\n\n- [`lubridate::make_difftime()`](https://lubridate.tidyverse.org/reference/make_difftime.html)\n- [`clock::date_time_build()`](https://clock.r-lib.org/reference/date_time_build.html)\n\n\n### Exercises\n\n1. What sort of object does `table()` return? What is its type? What attributes does it have? How does the dimensionality change as you tabulate more variables?\n\n<details><summary>Answer(s)</summary>\n\n`table()` returns a contingency table of its input variables. It is implemented as an integer vector with class table and dimensions (which makes it act like an array). Its attributes are dim (dimensions) and dimnames (one name for each input column). The dimensions correspond to the number of unique values (factor levels) in each input variable.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- table(mtcars[c(\"vs\", \"cyl\", \"am\")])\n\ntypeof(x)\n#> [1] \"integer\"\nattributes(x)\n#> $dim\n#> [1] 2 3 2\n#> \n#> $dimnames\n#> $dimnames$vs\n#> [1] \"0\" \"1\"\n#> \n#> $dimnames$cyl\n#> [1] \"4\" \"6\" \"8\"\n#> \n#> $dimnames$am\n#> [1] \"0\" \"1\"\n#> \n#> \n#> $class\n#> [1] \"table\"\n```\n:::\n\n</details>\n\n2. What happens to a factor when you modify its levels?\n\n\n::: {.cell}\n\n```{.r .cell-code}\nf1 <- factor(letters)\nlevels(f1) <- rev(levels(f1))\n```\n:::\n\n\n<details><summary>Answer(s)</summary>\nThe underlying integer values stay the same, but the levels are changed, making it look like the data has changed.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nf1 <- factor(letters)\nf1\n#> [1] a b c d e f g h i j k l m n o p q r s t u v w x y z\n#> Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z\nas.integer(f1)\n#> [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25\n#> [26] 26\n\nlevels(f1) <- rev(levels(f1))\nf1\n#> [1] z y x w v u t s r q p o n m l k j i h g f e d c b a\n#> Levels: z y x w v u t s r q p o n m l k j i h g f e d c b a\nas.integer(f1)\n#> [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25\n#> [26] 26\n```\n:::\n\n</details>\n\n3. What does this code do? How do `f2` and `f3` differ from `f1`?\n\n\n::: {.cell}\n\n```{.r .cell-code}\nf2 <- rev(factor(letters))\nf3 <- factor(letters, levels = rev(letters))\n```\n:::\n\n\n<details><summary>Answer(s)</summary>\nFor `f2` and `f3` either the order of the factor elements or its levels are being reversed. For `f1` both transformations are occurring.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Reverse element order\n(f2 <- rev(factor(letters)))\n#> [1] z y x w v u t s r q p o n m l k j i h g f e d c b a\n#> Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z\nas.integer(f2)\n#> [1] 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2\n#> [26] 1\n\n# Reverse factor levels (when creating factor)\n(f3 <- factor(letters, levels = rev(letters)))\n#> [1] a b c d e f g h i j k l m n o p q r s t u v w x y z\n#> Levels: z y x w v u t s r q p o n m l k j i h g f e d c b a\nas.integer(f3)\n#> [1] 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2\n#> [26] 1\n```\n:::\n\n</details>\n\n\n\n\n\n\n## Lists\n\n* sometimes called a generic vector or recursive vector\n* Recall ([section 2.3.3](https://adv-r.hadley.nz/names-values.html#list-references)): each element is really a *reference* to another object\n* an be composed of elements of different types (as opposed to atomic vectors which must be of only one type)\n\n### Constructing\n\nSimple lists:\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Construct\nsimple_list <- list(\n c(TRUE, FALSE), # logicals\n 1:20, # integers\n c(1.2, 2.3, 3.4), # doubles\n c(\"primo\", \"secundo\", \"tercio\") # characters\n)\n\nsimple_list\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [[1]]\n#> [1] TRUE FALSE\n#> \n#> [[2]]\n#> [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20\n#> \n#> [[3]]\n#> [1] 1.2 2.3 3.4\n#> \n#> [[4]]\n#> [1] \"primo\" \"secundo\" \"tercio\"\n```\n\n\n:::\n\n```{.r .cell-code}\n# Inspect\n# - type\ntypeof(simple_list)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"list\"\n```\n\n\n:::\n\n```{.r .cell-code}\n# - structure\nstr(simple_list)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> List of 4\n#> $ : logi [1:2] TRUE FALSE\n#> $ : int [1:20] 1 2 3 4 5 6 7 8 9 10 ...\n#> $ : num [1:3] 1.2 2.3 3.4\n#> $ : chr [1:3] \"primo\" \"secundo\" \"tercio\"\n```\n\n\n:::\n\n```{.r .cell-code}\n# Accessing\nsimple_list[1]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [[1]]\n#> [1] TRUE FALSE\n```\n\n\n:::\n\n```{.r .cell-code}\nsimple_list[2]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [[1]]\n#> [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20\n```\n\n\n:::\n\n```{.r .cell-code}\nsimple_list[3]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [[1]]\n#> [1] 1.2 2.3 3.4\n```\n\n\n:::\n\n```{.r .cell-code}\nsimple_list[4]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [[1]]\n#> [1] \"primo\" \"secundo\" \"tercio\"\n```\n\n\n:::\n\n```{.r .cell-code}\nsimple_list[[1]][2]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] FALSE\n```\n\n\n:::\n\n```{.r .cell-code}\nsimple_list[[2]][8]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 8\n```\n\n\n:::\n\n```{.r .cell-code}\nsimple_list[[3]][2]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 2.3\n```\n\n\n:::\n\n```{.r .cell-code}\nsimple_list[[4]][3]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"tercio\"\n```\n\n\n:::\n:::\n\n\nEven Simpler List\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Construct\nsimpler_list <- list(TRUE, FALSE, \n 1, 2, 3, 4, 5, \n 1.2, 2.3, 3.4, \n \"primo\", \"secundo\", \"tercio\")\n\n# Accessing\nsimpler_list[1]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [[1]]\n#> [1] TRUE\n```\n\n\n:::\n\n```{.r .cell-code}\nsimpler_list[5]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [[1]]\n#> [1] 3\n```\n\n\n:::\n\n```{.r .cell-code}\nsimpler_list[9]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [[1]]\n#> [1] 2.3\n```\n\n\n:::\n\n```{.r .cell-code}\nsimpler_list[11]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [[1]]\n#> [1] \"primo\"\n```\n\n\n:::\n:::\n\n\nNested lists:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nnested_list <- list(\n # first level\n list(\n # second level\n list(\n # third level\n list(1)\n )\n )\n)\n\nstr(nested_list)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> List of 1\n#> $ :List of 1\n#> ..$ :List of 1\n#> .. ..$ :List of 1\n#> .. .. ..$ : num 1\n```\n\n\n:::\n:::\n\n\nLike JSON.\n\nCombined lists\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# with list()\nlist_comb1 <- list(list(1, 2), list(3, 4))\n# with c()\nlist_comb2 <- c(list(1, 2), list(3, 4))\n\n# compare structure\nstr(list_comb1)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> List of 2\n#> $ :List of 2\n#> ..$ : num 1\n#> ..$ : num 2\n#> $ :List of 2\n#> ..$ : num 3\n#> ..$ : num 4\n```\n\n\n:::\n\n```{.r .cell-code}\nstr(list_comb2)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> List of 4\n#> $ : num 1\n#> $ : num 2\n#> $ : num 3\n#> $ : num 4\n```\n\n\n:::\n\n```{.r .cell-code}\n# does this work if they are different data types?\nlist_comb3 <- c(list(1, 2), list(TRUE, FALSE))\nstr(list_comb3)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> List of 4\n#> $ : num 1\n#> $ : num 2\n#> $ : logi TRUE\n#> $ : logi FALSE\n```\n\n\n:::\n:::\n\n\n### Testing\n\nCheck that is a list:\n\n- `is.list()`\n- \\`rlang::is_list()\\`\\`\n\nThe two do the same, except that the latter can check for the number of elements\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# is list\nbase::is.list(list_comb2)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] TRUE\n```\n\n\n:::\n\n```{.r .cell-code}\nrlang::is_list(list_comb2)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] TRUE\n```\n\n\n:::\n\n```{.r .cell-code}\n# is list of 4 elements\nrlang::is_list(x = list_comb2, n = 4)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] TRUE\n```\n\n\n:::\n\n```{.r .cell-code}\n# is a vector (of a special type)\n# remember the family tree?\nrlang::is_vector(list_comb2)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] TRUE\n```\n\n\n:::\n:::\n\n\n### Coercion\n\nUse `as.list()`\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlist(1:3)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [[1]]\n#> [1] 1 2 3\n```\n\n\n:::\n\n```{.r .cell-code}\nas.list(1:3)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [[1]]\n#> [1] 1\n#> \n#> [[2]]\n#> [1] 2\n#> \n#> [[3]]\n#> [1] 3\n```\n\n\n:::\n:::\n\n\n### Matrices and arrays\n\nAlthough not often used, the dimension attribute can be added to create **list-matrices** or **list-arrays**.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nl <- list(1:3, \"a\", TRUE, 1.0)\ndim(l) <- c(2, 2)\nl\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [,1] [,2]\n#> [1,] integer,3 TRUE\n#> [2,] \"a\" 1\n```\n\n\n:::\n\n```{.r .cell-code}\nl[[1, 1]]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1 2 3\n```\n\n\n:::\n:::\n\n\n\n### Exercises\n\n1. List all the ways that a list differs from an atomic vector.\n\n<details><summary>Answer(s)</summary>\n\n* Atomic vectors are always homogeneous (all elements must be of the same type). Lists may be heterogeneous (the elements can be of different types) as described in the introduction of the vectors chapter.\n* Atomic vectors point to one address in memory, while lists contain a separate reference for each element. (This was described in the list sections of the vectors and the names and values chapters.)\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlobstr::ref(1:2)\n#> [1:0x7fcd936f6e80] <int>\nlobstr::ref(list(1:2, 2))\n#> █ [1:0x7fcd93d53048] <list> \n#> ├─[2:0x7fcd91377e40] <int> \n#> └─[3:0x7fcd93b41eb0] <dbl>\n```\n:::\n\n\n\n* Subsetting with out-of-bounds and NA values leads to different output. For example, [ returns NA for atomics and NULL for lists. (This is described in more detail within the subsetting chapter.)\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Subsetting atomic vectors\n(1:2)[3]\n#> [1] NA\n(1:2)[NA]\n#> [1] NA NA\n\n# Subsetting lists\nas.list(1:2)[3]\n#> [[1]]\n#> NULL\nas.list(1:2)[NA]\n#> [[1]]\n#> NULL\n#> \n#> [[2]]\n#> NULL\n```\n:::\n\n\n\n</details>\n\n2. Why do you need to use `unlist()` to convert a list to an atomic vector? Why doesn’t `as.vector()` work?\n\n<details><summary>Answer(s)</summary>\nA list is already a vector, though not an atomic one! Note that as.vector() and is.vector() use different definitions of “vector!”\n\n\n::: {.cell}\n\n```{.r .cell-code}\nis.vector(as.vector(mtcars))\n#> [1] FALSE\n```\n:::\n\n\n</details>\n\n3. Compare and contrast `c()` and `unlist()` when combining a date and date-time into a single vector.\n\n<details><summary>Answer(s)</summary>\nDate and date-time objects are both built upon doubles. While dates store the number of days since the reference date 1970-01-01 (also known as “the Epoch”) in days, date-time-objects (POSIXct) store the time difference to this date in seconds.\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndate <- as.Date(\"1970-01-02\")\ndttm_ct <- as.POSIXct(\"1970-01-01 01:00\", tz = \"UTC\")\n\n# Internal representations\nunclass(date)\n#> [1] 1\nunclass(dttm_ct)\n#> [1] 3600\n#> attr(,\"tzone\")\n#> [1] \"UTC\"\n```\n:::\n\n\nAs the c() generic only dispatches on its first argument, combining date and date-time objects via c() could lead to surprising results in older R versions (pre R 4.0.0):\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Output in R version 3.6.2\nc(date, dttm_ct) # equal to c.Date(date, dttm_ct) \n#> [1] \"1970-01-02\" \"1979-11-10\"\nc(dttm_ct, date) # equal to c.POSIXct(date, dttm_ct)\n#> [1] \"1970-01-01 02:00:00 CET\" \"1970-01-01 01:00:01 CET\"\n```\n:::\n\n\nIn the first statement above c.Date() is executed, which incorrectly treats the underlying double of dttm_ct (3600) as days instead of seconds. Conversely, when c.POSIXct() is called on a date, one day is counted as one second only.\n\nWe can highlight these mechanics by the following code:\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Output in R version 3.6.2\nunclass(c(date, dttm_ct)) # internal representation\n#> [1] 1 3600\ndate + 3599\n#> \"1979-11-10\"\n```\n:::\n\n\nAs of R 4.0.0 these issues have been resolved and both methods now convert their input first into POSIXct and Date, respectively.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nc(dttm_ct, date)\n#> [1] \"1970-01-01 01:00:00 UTC\" \"1970-01-02 00:00:00 UTC\"\nunclass(c(dttm_ct, date))\n#> [1] 3600 86400\n\nc(date, dttm_ct)\n#> [1] \"1970-01-02\" \"1970-01-01\"\nunclass(c(date, dttm_ct))\n#> [1] 1 0\n```\n:::\n\n\nHowever, as c() strips the time zone (and other attributes) of POSIXct objects, some caution is still recommended.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n(dttm_ct <- as.POSIXct(\"1970-01-01 01:00\", tz = \"HST\"))\n#> [1] \"1970-01-01 01:00:00 HST\"\nattributes(c(dttm_ct))\n#> $class\n#> [1] \"POSIXct\" \"POSIXt\"\n```\n:::\n\n\nA package that deals with these kinds of problems in more depth and provides a structural solution for them is the {vctrs} package9 which is also used throughout the tidyverse.10\n\nLet’s look at unlist(), which operates on list input.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Attributes are stripped\nunlist(list(date, dttm_ct)) \n#> [1] 1 39600\n```\n:::\n\n\nWe see again that dates and date-times are internally stored as doubles. Unfortunately, this is all we are left with, when unlist strips the attributes of the list.\n\nTo summarise: c() coerces types and strips time zones. Errors may have occurred in older R versions because of inappropriate method dispatch/immature methods. unlist() strips attributes.\n</details>\n\n\n\n## Data frames and tibbles\n\n \n\nCredit: [Advanced R](https://adv-r.hadley.nz/index.html) by Hadley Wickham\n\n### Data frame\n\nA data frame is a:\n\n- Named list of vectors (i.e., column names)\n- Attributes:\n - (column) `names`\n - `row.names`\n - Class: \"data frame\"\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Construct\ndf <- data.frame(\n col1 = c(1, 2, 3), # named atomic vector\n col2 = c(\"un\", \"deux\", \"trois\") # another named atomic vector\n # ,stringsAsFactors = FALSE # default for versions after R 4.1\n)\n\n# Inspect\ndf\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> col1 col2\n#> 1 1 un\n#> 2 2 deux\n#> 3 3 trois\n```\n\n\n:::\n\n```{.r .cell-code}\n# Deconstruct\n# - type\ntypeof(df)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"list\"\n```\n\n\n:::\n\n```{.r .cell-code}\n# - attributes\nattributes(df)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> $names\n#> [1] \"col1\" \"col2\"\n#> \n#> $class\n#> [1] \"data.frame\"\n#> \n#> $row.names\n#> [1] 1 2 3\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nrownames(df)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"1\" \"2\" \"3\"\n```\n\n\n:::\n\n```{.r .cell-code}\ncolnames(df)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"col1\" \"col2\"\n```\n\n\n:::\n\n```{.r .cell-code}\nnames(df) # Same as colnames(df)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"col1\" \"col2\"\n```\n\n\n:::\n\n```{.r .cell-code}\nnrow(df) \n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 3\n```\n\n\n:::\n\n```{.r .cell-code}\nncol(df)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 2\n```\n\n\n:::\n\n```{.r .cell-code}\nlength(df) # Same as ncol(df)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 2\n```\n\n\n:::\n:::\n\n\nUnlike other lists, the length of each vector must be the same (i.e. as many vector elements as rows in the data frame).\n\n### Tibble\n\nCreated to relieve some of the frustrations and pain points created by data frames, tibbles are data frames that are:\n\n- Lazy (do less)\n- Surly (complain more)\n\n#### Lazy\n\nTibbles do not:\n\n- Coerce strings\n- Transform non-syntactic names\n- Recycle vectors of length greater than 1\n\n**Coerce strings**\n\n\n::: {.cell}\n\n```{.r .cell-code}\nchr_col <- c(\"don't\", \"factor\", \"me\", \"bro\")\n\n# data frame\ndf <- data.frame(\n a = chr_col,\n # in R 4.1 and earlier, this was the default\n stringsAsFactors = TRUE\n)\n\n# tibble\ntbl <- tibble::tibble(\n a = chr_col\n)\n\n# contrast the structure\nstr(df$a)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> Factor w/ 4 levels \"bro\",\"don't\",..: 2 3 4 1\n```\n\n\n:::\n\n```{.r .cell-code}\nstr(tbl$a)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> chr [1:4] \"don't\" \"factor\" \"me\" \"bro\"\n```\n\n\n:::\n:::\n\n\n**Transform non-syntactic names**\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# data frame\ndf <- data.frame(\n `1` = c(1, 2, 3)\n)\n\n# tibble\ntbl <- tibble::tibble(\n `1` = c(1, 2, 3)\n)\n\n# contrast the names\nnames(df)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"X1\"\n```\n\n\n:::\n\n```{.r .cell-code}\nnames(tbl)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"1\"\n```\n\n\n:::\n:::\n\n\n**Recycle vectors of length greater than 1**\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# data frame\ndf <- data.frame(\n col1 = c(1, 2, 3, 4),\n col2 = c(1, 2)\n)\n\n# tibble\ntbl <- tibble::tibble(\n col1 = c(1, 2, 3, 4),\n col2 = c(1, 2)\n)\n```\n\n::: {.cell-output .cell-output-error}\n\n```\n#> Error in `tibble::tibble()`:\n#> ! Tibble columns must have compatible sizes.\n#> • Size 4: Existing data.\n#> • Size 2: Column `col2`.\n#> ℹ Only values of size one are recycled.\n```\n\n\n:::\n:::\n\n\n#### Surly\n\nTibbles do only what they're asked and complain if what they're asked doesn't make sense:\n\n- Subsetting always yields a tibble\n- Complains if cannot find column\n\n**Subsetting always yields a tibble**\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# data frame\ndf <- data.frame(\n col1 = c(1, 2, 3, 4)\n)\n\n# tibble\ntbl <- tibble::tibble(\n col1 = c(1, 2, 3, 4)\n)\n\n# contrast\ndf_col <- df[, \"col1\"]\nstr(df_col)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> num [1:4] 1 2 3 4\n```\n\n\n:::\n\n```{.r .cell-code}\ntbl_col <- tbl[, \"col1\"]\nstr(tbl_col)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> tibble [4 × 1] (S3: tbl_df/tbl/data.frame)\n#> $ col1: num [1:4] 1 2 3 4\n```\n\n\n:::\n\n```{.r .cell-code}\n# to select a vector, do one of these instead\ntbl_col_1 <- tbl[[\"col1\"]]\nstr(tbl_col_1)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> num [1:4] 1 2 3 4\n```\n\n\n:::\n\n```{.r .cell-code}\ntbl_col_2 <- dplyr::pull(tbl, col1)\nstr(tbl_col_2)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> num [1:4] 1 2 3 4\n```\n\n\n:::\n:::\n\n\n**Complains if cannot find column**\n\n\n::: {.cell}\n\n```{.r .cell-code}\nnames(df)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"col1\"\n```\n\n\n:::\n\n```{.r .cell-code}\ndf$col\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1 2 3 4\n```\n\n\n:::\n\n```{.r .cell-code}\nnames(tbl)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"col1\"\n```\n\n\n:::\n\n```{.r .cell-code}\ntbl$col\n```\n\n::: {.cell-output .cell-output-stderr}\n\n```\n#> Warning: Unknown or uninitialised column: `col`.\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> NULL\n```\n\n\n:::\n:::\n\n\n#### One more difference\n\n**`tibble()` allows you to refer to variables created during construction**\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntibble::tibble(\n x = 1:3,\n y = x * 2 # x refers to the line above\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> # A tibble: 3 × 2\n#> x y\n#> <int> <dbl>\n#> 1 1 2\n#> 2 2 4\n#> 3 3 6\n```\n\n\n:::\n:::\n\n\n<details>\n<summary>Side Quest: Row Names</summary>\n\n- character vector containing only unique values\n- get and set with `rownames()`\n- can use them to subset rows\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndf3 <- data.frame(\n age = c(35, 27, 18),\n hair = c(\"blond\", \"brown\", \"black\"),\n row.names = c(\"Bob\", \"Susan\", \"Sam\")\n)\ndf3\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> age hair\n#> Bob 35 blond\n#> Susan 27 brown\n#> Sam 18 black\n```\n\n\n:::\n\n```{.r .cell-code}\nrownames(df3)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"Bob\" \"Susan\" \"Sam\"\n```\n\n\n:::\n\n```{.r .cell-code}\ndf3[\"Bob\", ]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> age hair\n#> Bob 35 blond\n```\n\n\n:::\n\n```{.r .cell-code}\nrownames(df3) <- c(\"Susan\", \"Bob\", \"Sam\")\nrownames(df3)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"Susan\" \"Bob\" \"Sam\"\n```\n\n\n:::\n\n```{.r .cell-code}\ndf3[\"Bob\", ]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> age hair\n#> Bob 27 brown\n```\n\n\n:::\n:::\n\n\nThere are three reasons why row names are undesirable:\n\n3. Metadata is data, so storing it in a different way to the rest of the data is fundamentally a bad idea. \n2. Row names are a poor abstraction for labelling rows because they only work when a row can be identified by a single string. This fails in many cases.\n3. Row names must be unique, so any duplication of rows (e.g. from bootstrapping) will create new row names.\n\n</details>\n\n\n### Printing\n\nData frames and tibbles print differently\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndf3\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> age hair\n#> Susan 35 blond\n#> Bob 27 brown\n#> Sam 18 black\n```\n\n\n:::\n\n```{.r .cell-code}\ntibble::as_tibble(df3)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> # A tibble: 3 × 2\n#> age hair \n#> <dbl> <chr>\n#> 1 35 blond\n#> 2 27 brown\n#> 3 18 black\n```\n\n\n:::\n:::\n\n\n\n### Subsetting\n\nTwo undesirable subsetting behaviours:\n\n1. When you subset columns with `df[, vars]`, you will get a vector if vars selects one variable, otherwise you’ll get a data frame, unless you always remember to use `df[, vars, drop = FALSE]`.\n2. When you attempt to extract a single column with `df$x` and there is no column `x`, a data frame will instead select any variable that starts with `x`. If no variable starts with `x`, `df$x` will return NULL.\n\nTibbles tweak these behaviours so that a [ always returns a tibble, and a $ doesn’t do partial matching and warns if it can’t find a variable (*this is what makes tibbles surly*).\n\n### Testing\n\nWhether data frame: `is.data.frame()`. Note: both data frame and tibble are data frames.\n\nWhether tibble: `tibble::is_tibble`. Note: only tibbles are tibbles. Vanilla data frames are not.\n\n### Coercion\n\n- To data frame: `as.data.frame()`\n- To tibble: `tibble::as_tibble()`\n\n### List Columns\n\nList-columns are allowed in data frames but you have to do a little extra work by either adding the list-column after creation or wrapping the list in `I()`\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndf4 <- data.frame(x = 1:3)\ndf4$y <- list(1:2, 1:3, 1:4)\ndf4\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> x y\n#> 1 1 1, 2\n#> 2 2 1, 2, 3\n#> 3 3 1, 2, 3, 4\n```\n\n\n:::\n\n```{.r .cell-code}\ndf5 <- data.frame(\n x = 1:3, \n y = I(list(1:2, 1:3, 1:4))\n)\ndf5\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> x y\n#> 1 1 1, 2\n#> 2 2 1, 2, 3\n#> 3 3 1, 2, 3, 4\n```\n\n\n:::\n:::\n\n\n### Matrix and data frame columns\n\n- As long as the number of rows matches the data frame, it’s also possible to have a matrix or data frame as a column of a data frame.\n- same as list-columns, must either addi the list-column after creation or wrapping the list in `I()`\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndfm <- data.frame(\n x = 1:3 * 10,\n y = I(matrix(1:9, nrow = 3))\n)\n\ndfm$z <- data.frame(a = 3:1, b = letters[1:3], stringsAsFactors = FALSE)\n\nstr(dfm)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> 'data.frame':\t3 obs. of 3 variables:\n#> $ x: num 10 20 30\n#> $ y: 'AsIs' int [1:3, 1:3] 1 2 3 4 5 6 7 8 9\n#> $ z:'data.frame':\t3 obs. of 2 variables:\n#> ..$ a: int 3 2 1\n#> ..$ b: chr \"a\" \"b\" \"c\"\n```\n\n\n:::\n\n```{.r .cell-code}\ndfm$y\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [,1] [,2] [,3]\n#> [1,] 1 4 7\n#> [2,] 2 5 8\n#> [3,] 3 6 9\n```\n\n\n:::\n\n```{.r .cell-code}\ndfm$z\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> a b\n#> 1 3 a\n#> 2 2 b\n#> 3 1 c\n```\n\n\n:::\n:::\n\n\n\n### Exercises\n\n1. Can you have a data frame with zero rows? What about zero columns?\n\n<details><summary>Answer(s)</summary>\nYes, you can create these data frames easily; either during creation or via subsetting. Even both dimensions can be zero. Create a 0-row, 0-column, or an empty data frame directly:\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndata.frame(a = integer(), b = logical())\n#> [1] a b\n#> <0 rows> (or 0-length row.names)\n\ndata.frame(row.names = 1:3) # or data.frame()[1:3, ]\n#> data frame with 0 columns and 3 rows\n\ndata.frame()\n#> data frame with 0 columns and 0 rows\n```\n:::\n\n\nCreate similar data frames via subsetting the respective dimension with either 0, `NULL`, `FALSE` or a valid 0-length atomic (`logical(0)`, `character(0)`, `integer(0)`, `double(0)`). Negative integer sequences would also work. The following example uses a zero:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmtcars[0, ]\n#> [1] mpg cyl disp hp drat wt qsec vs am gear carb\n#> <0 rows> (or 0-length row.names)\n\nmtcars[ , 0] # or mtcars[0]\n#> data frame with 0 columns and 32 rows\n\nmtcars[0, 0]\n#> data frame with 0 columns and 0 rows\n```\n:::\n\n\n\n</details>\n\n2. What happens if you attempt to set rownames that are not unique?\n\n<details><summary>Answer(s)</summary>\nMatrices can have duplicated row names, so this does not cause problems.\n\nData frames, however, require unique rownames and you get different results depending on how you attempt to set them. If you set them directly or via `row.names()`, you get an error:\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndata.frame(row.names = c(\"x\", \"y\", \"y\"))\n#> Error in data.frame(row.names = c(\"x\", \"y\", \"y\")): duplicate row.names: y\n\ndf <- data.frame(x = 1:3)\nrow.names(df) <- c(\"x\", \"y\", \"y\")\n#> Warning: non-unique value when setting 'row.names': 'y'\n#> Error in `.rowNamesDF<-`(x, value = value): duplicate 'row.names' are not allowed\n```\n:::\n\n\nIf you use subsetting, `[` automatically deduplicates:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nrow.names(df) <- c(\"x\", \"y\", \"z\")\ndf[c(1, 1, 1), , drop = FALSE]\n#> x\n#> x 1\n#> x.1 1\n#> x.2 1\n```\n:::\n\n\n</details>\n\n3. If `df` is a data frame, what can you say about `t(df)`, and `t(t(df))`? Perform some experiments, making sure to try different column types.\n\n<details><summary>Answer(s)</summary>\nBoth of `t(df)` and `t(t(df))` will return matrices:\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndf <- data.frame(x = 1:3, y = letters[1:3])\nis.matrix(df)\n#> [1] FALSE\nis.matrix(t(df))\n#> [1] TRUE\nis.matrix(t(t(df)))\n#> [1] TRUE\n```\n:::\n\n\nThe dimensions will respect the typical transposition rules:\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndim(df)\n#> [1] 3 2\ndim(t(df))\n#> [1] 2 3\ndim(t(t(df)))\n#> [1] 3 2\n```\n:::\n\n\nBecause the output is a matrix, every column is coerced to the same type. (It is implemented within `t.data.frame()` via `as.matrix()` which is described below).\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndf\n#> x y\n#> 1 1 a\n#> 2 2 b\n#> 3 3 c\nt(df)\n#> [,1] [,2] [,3]\n#> x \"1\" \"2\" \"3\" \n#> y \"a\" \"b\" \"c\"\n```\n:::\n\n\n</details>\n\n4. What does `as.matrix()` do when applied to a data frame with columns of different types? How does it differ from `data.matrix()`?\n\n<details><summary>Answer(s)</summary>\nThe type of the result of as.matrix depends on the types of the input columns (see `?as.matrix`):\n\n> The method for data frames will return a character matrix if there is only atomic columns and any non-(numeric/logical/complex) column, applying as.vector to factors and format to other non-character columns. Otherwise the usual coercion hierarchy (logical < integer < double < complex) will be used, e.g. all-logical data frames will be coerced to a logical matrix, mixed logical-integer will give an integer matrix, etc.\n\nOn the other hand, `data.matrix` will always return a numeric matrix (see `?data.matrix()`).\n\n> Return the matrix obtained by converting all the variables in a data frame to numeric mode and then binding them together as the columns of a matrix. Factors and ordered factors are replaced by their internal codes. […] Character columns are first converted to factors and then to integers.\n\nWe can illustrate and compare the mechanics of these functions using a concrete example. `as.matrix()` makes it possible to retrieve most of the original information from the data frame but leaves us with characters. To retrieve all information from `data.matrix()`’s output, we would need a lookup table for each column.\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndf_coltypes <- data.frame(\n a = c(\"a\", \"b\"),\n b = c(TRUE, FALSE),\n c = c(1L, 0L),\n d = c(1.5, 2),\n e = factor(c(\"f1\", \"f2\"))\n)\n\nas.matrix(df_coltypes)\n#> a b c d e \n#> [1,] \"a\" \"TRUE\" \"1\" \"1.5\" \"f1\"\n#> [2,] \"b\" \"FALSE\" \"0\" \"2.0\" \"f2\"\ndata.matrix(df_coltypes)\n#> a b c d e\n#> [1,] 1 1 1 1.5 1\n#> [2,] 2 0 0 2.0 2\n```\n:::\n\n\n</details>\n\n\n\n\n\n\n\n## `NULL`\n\nSpecial type of object that:\n\n- Length 0\n- Cannot have attributes\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntypeof(NULL)\n#> [1] \"NULL\"\n\nlength(NULL)\n#> [1] 0\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- NULL\nattr(x, \"y\") <- 1\n```\n\n::: {.cell-output .cell-output-error}\n\n```\n#> Error in attr(x, \"y\") <- 1: attempt to set an attribute on NULL\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nis.null(NULL)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] TRUE\n```\n\n\n:::\n:::\n\n\n\n## Digestif\n\nLet is use some of this chapter's skills on the `penguins` data.\n\n### Attributes\n\n\n::: {.cell}\n\n```{.r .cell-code}\nstr(penguins_raw)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> tibble [344 × 17] (S3: tbl_df/tbl/data.frame)\n#> $ studyName : chr [1:344] \"PAL0708\" \"PAL0708\" \"PAL0708\" \"PAL0708\" ...\n#> $ Sample Number : num [1:344] 1 2 3 4 5 6 7 8 9 10 ...\n#> $ Species : chr [1:344] \"Adelie Penguin (Pygoscelis adeliae)\" \"Adelie Penguin (Pygoscelis adeliae)\" \"Adelie Penguin (Pygoscelis adeliae)\" \"Adelie Penguin (Pygoscelis adeliae)\" ...\n#> $ Region : chr [1:344] \"Anvers\" \"Anvers\" \"Anvers\" \"Anvers\" ...\n#> $ Island : chr [1:344] \"Torgersen\" \"Torgersen\" \"Torgersen\" \"Torgersen\" ...\n#> $ Stage : chr [1:344] \"Adult, 1 Egg Stage\" \"Adult, 1 Egg Stage\" \"Adult, 1 Egg Stage\" \"Adult, 1 Egg Stage\" ...\n#> $ Individual ID : chr [1:344] \"N1A1\" \"N1A2\" \"N2A1\" \"N2A2\" ...\n#> $ Clutch Completion : chr [1:344] \"Yes\" \"Yes\" \"Yes\" \"Yes\" ...\n#> $ Date Egg : Date[1:344], format: \"2007-11-11\" \"2007-11-11\" ...\n#> $ Culmen Length (mm) : num [1:344] 39.1 39.5 40.3 NA 36.7 39.3 38.9 39.2 34.1 42 ...\n#> $ Culmen Depth (mm) : num [1:344] 18.7 17.4 18 NA 19.3 20.6 17.8 19.6 18.1 20.2 ...\n#> $ Flipper Length (mm): num [1:344] 181 186 195 NA 193 190 181 195 193 190 ...\n#> $ Body Mass (g) : num [1:344] 3750 3800 3250 NA 3450 ...\n#> $ Sex : chr [1:344] \"MALE\" \"FEMALE\" \"FEMALE\" NA ...\n#> $ Delta 15 N (o/oo) : num [1:344] NA 8.95 8.37 NA 8.77 ...\n#> $ Delta 13 C (o/oo) : num [1:344] NA -24.7 -25.3 NA -25.3 ...\n#> $ Comments : chr [1:344] \"Not enough blood for isotopes.\" NA NA \"Adult not sampled.\" ...\n#> - attr(*, \"spec\")=List of 3\n#> ..$ cols :List of 17\n#> .. ..$ studyName : list()\n#> .. .. ..- attr(*, \"class\")= chr [1:2] \"collector_character\" \"collector\"\n#> .. ..$ Sample Number : list()\n#> .. .. ..- attr(*, \"class\")= chr [1:2] \"collector_double\" \"collector\"\n#> .. ..$ Species : list()\n#> .. .. ..- attr(*, \"class\")= chr [1:2] \"collector_character\" \"collector\"\n#> .. ..$ Region : list()\n#> .. .. ..- attr(*, \"class\")= chr [1:2] \"collector_character\" \"collector\"\n#> .. ..$ Island : list()\n#> .. .. ..- attr(*, \"class\")= chr [1:2] \"collector_character\" \"collector\"\n#> .. ..$ Stage : list()\n#> .. .. ..- attr(*, \"class\")= chr [1:2] \"collector_character\" \"collector\"\n#> .. ..$ Individual ID : list()\n#> .. .. ..- attr(*, \"class\")= chr [1:2] \"collector_character\" \"collector\"\n#> .. ..$ Clutch Completion : list()\n#> .. .. ..- attr(*, \"class\")= chr [1:2] \"collector_character\" \"collector\"\n#> .. ..$ Date Egg :List of 1\n#> .. .. ..$ format: chr \"\"\n#> .. .. ..- attr(*, \"class\")= chr [1:2] \"collector_date\" \"collector\"\n#> .. ..$ Culmen Length (mm) : list()\n#> .. .. ..- attr(*, \"class\")= chr [1:2] \"collector_double\" \"collector\"\n#> .. ..$ Culmen Depth (mm) : list()\n#> .. .. ..- attr(*, \"class\")= chr [1:2] \"collector_double\" \"collector\"\n#> .. ..$ Flipper Length (mm): list()\n#> .. .. ..- attr(*, \"class\")= chr [1:2] \"collector_double\" \"collector\"\n#> .. ..$ Body Mass (g) : list()\n#> .. .. ..- attr(*, \"class\")= chr [1:2] \"collector_double\" \"collector\"\n#> .. ..$ Sex : list()\n#> .. .. ..- attr(*, \"class\")= chr [1:2] \"collector_character\" \"collector\"\n#> .. ..$ Delta 15 N (o/oo) : list()\n#> .. .. ..- attr(*, \"class\")= chr [1:2] \"collector_double\" \"collector\"\n#> .. ..$ Delta 13 C (o/oo) : list()\n#> .. .. ..- attr(*, \"class\")= chr [1:2] \"collector_double\" \"collector\"\n#> .. ..$ Comments : list()\n#> .. .. ..- attr(*, \"class\")= chr [1:2] \"collector_character\" \"collector\"\n#> ..$ default: list()\n#> .. ..- attr(*, \"class\")= chr [1:2] \"collector_guess\" \"collector\"\n#> ..$ skip : num 1\n#> ..- attr(*, \"class\")= chr \"col_spec\"\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nstr(penguins_raw, give.attr = FALSE)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> tibble [344 × 17] (S3: tbl_df/tbl/data.frame)\n#> $ studyName : chr [1:344] \"PAL0708\" \"PAL0708\" \"PAL0708\" \"PAL0708\" ...\n#> $ Sample Number : num [1:344] 1 2 3 4 5 6 7 8 9 10 ...\n#> $ Species : chr [1:344] \"Adelie Penguin (Pygoscelis adeliae)\" \"Adelie Penguin (Pygoscelis adeliae)\" \"Adelie Penguin (Pygoscelis adeliae)\" \"Adelie Penguin (Pygoscelis adeliae)\" ...\n#> $ Region : chr [1:344] \"Anvers\" \"Anvers\" \"Anvers\" \"Anvers\" ...\n#> $ Island : chr [1:344] \"Torgersen\" \"Torgersen\" \"Torgersen\" \"Torgersen\" ...\n#> $ Stage : chr [1:344] \"Adult, 1 Egg Stage\" \"Adult, 1 Egg Stage\" \"Adult, 1 Egg Stage\" \"Adult, 1 Egg Stage\" ...\n#> $ Individual ID : chr [1:344] \"N1A1\" \"N1A2\" \"N2A1\" \"N2A2\" ...\n#> $ Clutch Completion : chr [1:344] \"Yes\" \"Yes\" \"Yes\" \"Yes\" ...\n#> $ Date Egg : Date[1:344], format: \"2007-11-11\" \"2007-11-11\" ...\n#> $ Culmen Length (mm) : num [1:344] 39.1 39.5 40.3 NA 36.7 39.3 38.9 39.2 34.1 42 ...\n#> $ Culmen Depth (mm) : num [1:344] 18.7 17.4 18 NA 19.3 20.6 17.8 19.6 18.1 20.2 ...\n#> $ Flipper Length (mm): num [1:344] 181 186 195 NA 193 190 181 195 193 190 ...\n#> $ Body Mass (g) : num [1:344] 3750 3800 3250 NA 3450 ...\n#> $ Sex : chr [1:344] \"MALE\" \"FEMALE\" \"FEMALE\" NA ...\n#> $ Delta 15 N (o/oo) : num [1:344] NA 8.95 8.37 NA 8.77 ...\n#> $ Delta 13 C (o/oo) : num [1:344] NA -24.7 -25.3 NA -25.3 ...\n#> $ Comments : chr [1:344] \"Not enough blood for isotopes.\" NA NA \"Adult not sampled.\" ...\n```\n\n\n:::\n:::\n\n\n### Data Frames vs Tibbles\n\n\n::: {.cell}\n\n```{.r .cell-code}\npenguins_df <- data.frame(penguins)\npenguins_tb <- penguins #i.e. penguins was already a tibble\n```\n:::\n\n\n#### Printing\n\n* Tip: print out these results in RStudio under different editor themes\n\n\n::: {.cell}\n\n```{.r .cell-code}\nprint(penguins_df) #don't run this\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nhead(penguins_df)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g\n#> 1 Adelie Torgersen 39.1 18.7 181 3750\n#> 2 Adelie Torgersen 39.5 17.4 186 3800\n#> 3 Adelie Torgersen 40.3 18.0 195 3250\n#> 4 Adelie Torgersen NA NA NA NA\n#> 5 Adelie Torgersen 36.7 19.3 193 3450\n#> 6 Adelie Torgersen 39.3 20.6 190 3650\n#> sex year\n#> 1 male 2007\n#> 2 female 2007\n#> 3 female 2007\n#> 4 <NA> 2007\n#> 5 female 2007\n#> 6 male 2007\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\npenguins_tb\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> # A tibble: 344 × 8\n#> species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g\n#> <fct> <fct> <dbl> <dbl> <int> <int>\n#> 1 Adelie Torgersen 39.1 18.7 181 3750\n#> 2 Adelie Torgersen 39.5 17.4 186 3800\n#> 3 Adelie Torgersen 40.3 18 195 3250\n#> 4 Adelie Torgersen NA NA NA NA\n#> 5 Adelie Torgersen 36.7 19.3 193 3450\n#> 6 Adelie Torgersen 39.3 20.6 190 3650\n#> 7 Adelie Torgersen 38.9 17.8 181 3625\n#> 8 Adelie Torgersen 39.2 19.6 195 4675\n#> 9 Adelie Torgersen 34.1 18.1 193 3475\n#> 10 Adelie Torgersen 42 20.2 190 4250\n#> # ℹ 334 more rows\n#> # ℹ 2 more variables: sex <fct>, year <int>\n```\n\n\n:::\n:::\n\n\n### Atomic Vectors\n\n\n::: {.cell}\n\n```{.r .cell-code}\nspecies_vector_df <- penguins_df |> select(species)\nspecies_unlist_df <- penguins_df |> select(species) |> unlist()\nspecies_pull_df <- penguins_df |> select(species) |> pull()\n\nspecies_vector_tb <- penguins_tb |> select(species)\nspecies_unlist_tb <- penguins_tb |> select(species) |> unlist()\nspecies_pull_tb <- penguins_tb |> select(species) |> pull()\n```\n:::\n\n\n<details>\n<summary>`typeof()` and `class()`</summary>\n\n::: {.cell}\n\n```{.r .cell-code}\ntypeof(species_vector_df)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"list\"\n```\n\n\n:::\n\n```{.r .cell-code}\nclass(species_vector_df)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"data.frame\"\n```\n\n\n:::\n\n```{.r .cell-code}\ntypeof(species_unlist_df)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"integer\"\n```\n\n\n:::\n\n```{.r .cell-code}\nclass(species_unlist_df)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"factor\"\n```\n\n\n:::\n\n```{.r .cell-code}\ntypeof(species_pull_df)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"integer\"\n```\n\n\n:::\n\n```{.r .cell-code}\nclass(species_pull_df)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"factor\"\n```\n\n\n:::\n\n```{.r .cell-code}\ntypeof(species_vector_tb)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"list\"\n```\n\n\n:::\n\n```{.r .cell-code}\nclass(species_vector_tb)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"tbl_df\" \"tbl\" \"data.frame\"\n```\n\n\n:::\n\n```{.r .cell-code}\ntypeof(species_unlist_tb)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"integer\"\n```\n\n\n:::\n\n```{.r .cell-code}\nclass(species_unlist_tb)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"factor\"\n```\n\n\n:::\n\n```{.r .cell-code}\ntypeof(species_pull_tb)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"integer\"\n```\n\n\n:::\n\n```{.r .cell-code}\nclass(species_pull_tb)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"factor\"\n```\n\n\n:::\n:::\n\n</details>\n\n### Column Names\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncolnames(penguins_tb)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"species\" \"island\" \"bill_length_mm\" \n#> [4] \"bill_depth_mm\" \"flipper_length_mm\" \"body_mass_g\" \n#> [7] \"sex\" \"year\"\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nnames(penguins_tb) == colnames(penguins_tb)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nnames(penguins_df) == names(penguins_tb)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE\n```\n\n\n:::\n:::\n\n\n* What if we only invoke a partial name of a column of a tibble?\n\n\n::: {.cell}\n\n```{.r .cell-code}\npenguins_tb$y \n```\n\n::: {.cell-output .cell-output-stderr}\n\n```\n#> Warning: Unknown or uninitialised column: `y`.\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> NULL\n```\n\n\n:::\n:::\n\n\n\n\n* What if we only invoke a partial name of a column of a data frame?\n\n\n::: {.cell}\n\n```{.r .cell-code}\nhead(penguins_df$y) #instead of `year`\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 2007 2007 2007 2007 2007 2007\n```\n\n\n:::\n:::\n\n\n* Is this evaluation in alphabetical order or column order?\n\n\n::: {.cell}\n\n```{.r .cell-code}\npenguins_df_se_sp <- penguins_df |> select(sex, species)\npenguins_df_sp_se <- penguins_df |> select(species, sex)\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nhead(penguins_df_se_sp$s)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> NULL\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nhead(penguins_df_sp_se$s)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> NULL\n```\n\n\n:::\n:::\n\n\n\n## Chapter Quiz\n\n1. What are the four common types of atomic vectors? What are the two rare types?\n\n<details><summary>Answer(s)</summary>\nThe four common types of atomic vector are logical, integer, double and character. The two rarer types are complex and raw.\n</details>\n\n2. What are attributes? How do you get them and set them?\n\n<details><summary>Answer(s)</summary>\nAttributes allow you to associate arbitrary additional metadata to any object. You can get and set individual attributes with `attr(x, \"y\")` and `attr(x, \"y\") <- value`; or you can get and set all attributes at once with `attributes()`.\n</details>\n\n3. How is a list different from an atomic vector? How is a matrix different from a data frame?\n\n<details><summary>Answer(s)</summary>\nThe elements of a list can be any type (even a list); the elements of an atomic vector are all of the same type. Similarly, every element of a matrix must be the same type; in a data frame, different columns can have different types.\n</details>\n\n4. Can you have a list that is a matrix? Can a data frame have a column that is a matrix?\n\n<details><summary>Answer(s)</summary>\nYou can make a list-array by assigning dimensions to a list. You can make a matrix a column of a data frame with `df$x <- matrix()`, or by using `I()` when creating a new data frame `data.frame(x = I(matrix()))`.\n</details>\n\n5. How do tibbles behave differently from data frames?\n\n<details><summary>Answer(s)</summary>\nTibbles have an enhanced print method, never coerce strings to factors, and provide stricter subsetting methods.\n</details>\n\n\n\n\n\n\n\n\n\n\n\n\n\n## Meeting Videos\n\n### Cohort 1\n\n<iframe src=\"https://www.youtube.com/embed/pQ-xDAPEQaw\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 2\n\n<iframe src=\"https://www.youtube.com/embed/CpLM6SdpTFY\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 3\n\n<iframe src=\"https://www.youtube.com/embed/9E4RlbW8vxU\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 4\n\n<iframe src=\"https://www.youtube.com/embed/LCAgxwm5Ydg\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 5\n\n<iframe src=\"https://www.youtube.com/embed/DrVY6DE9ymY\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 6\n\n<iframe src=\"https://www.youtube.com/embed/mmcnkIjANps\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n<details>\n<summary>Meeting chat log</summary>\n\n```\n00:10:18\tOluwafemi Oyedele:\tHi, good evening\n00:23:31\tFederica Gazzelloni:\tHi Kiante!\n00:24:21\tFederica Gazzelloni:\tThanks Arthur\n00:25:46\tTrevin:\tWelcome Matt!\n00:26:02\tMatt Dupree:\thello! thank you!\n00:30:34\tFederica Gazzelloni:\tHello Matt!\n00:30:46\tMatt Dupree:\thello!\n00:38:24\tRyan Metcalf:\t`rlang::cpl()` = “complex”. For example `0+1i`\n00:55:37\tTrevin:\t> two <- c(1,2,3)\n> names(two) <- c(\"one\", \"two\")\n> two\n one two <NA> \n 1 2 3\n00:57:25\tRyan Metcalf:\tExcellent Trevin. You beat me to the output! Assuming we didn't supply the string, `NA` is entered instead.\n01:08:50\tRyan Metcalf:\tWithout further research, this is the \"Unix Epoch”. However, varying operating systems use different Epochs.\n```\n</details>\n\n### Cohort 7\n\n<iframe src=\"https://www.youtube.com/embed/QcdByYHo1ms\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n<details>\n\n<summary>Meeting chat log</summary>\n```\n00:54:07\tRon:\thttps://www.tidyverse.org/blog/2021/03/clock-0-1-0/\n01:14:39\tRobert Hilly:\thttps://www.amazon.com/Effective-Pandas-Patterns-Manipulation-Treading/dp/B09MYXXSFM\n```\n</details>\n", + "markdown": "---\nengine: knitr\ntitle: Vectors\n---\n\n## Learning objectives:\n\n- Learn about different types of vectors and their attributes\n- Navigate through vector types and their value types\n- Venture into factors and date-time objects\n- Discuss the differences between data frames and tibbles\n- Do not get absorbed by the `NA` and `NULL` black hole\n\n\n::: {.cell}\n\n:::\n\n\n\n<details>\n<summary>Session Info</summary>\n\n::: {.cell}\n::: {.cell-output .cell-output-stdout}\n\n```\n#> R version 4.5.1 (2025-06-13 ucrt)\n#> Platform: x86_64-w64-mingw32/x64\n#> Running under: Windows 11 x64 (build 26100)\n#> \n#> Matrix products: default\n#> LAPACK version 3.12.1\n#> \n#> locale:\n#> [1] LC_COLLATE=English_United States.utf8 \n#> [2] LC_CTYPE=English_United States.utf8 \n#> [3] LC_MONETARY=English_United States.utf8\n#> [4] LC_NUMERIC=C \n#> [5] LC_TIME=English_United States.utf8 \n#> \n#> time zone: America/Chicago\n#> tzcode source: internal\n#> \n#> attached base packages:\n#> [1] stats graphics grDevices utils datasets methods base \n#> \n#> other attached packages:\n#> [1] palmerpenguins_0.1.1 gt_1.0.0 dplyr_1.1.4 \n#> \n#> loaded via a namespace (and not attached):\n#> [1] digest_0.6.37 R6_2.6.1 fastmap_1.2.0 tidyselect_1.2.1 \n#> [5] xfun_0.52 magrittr_2.0.3 glue_1.8.0 tibble_3.3.0 \n#> [9] knitr_1.50 pkgconfig_2.0.3 htmltools_0.5.8.1 rmarkdown_2.29 \n#> [13] generics_0.1.4 lifecycle_1.0.4 xml2_1.3.8 cli_3.6.5 \n#> [17] vctrs_0.6.5 compiler_4.5.1 tools_4.5.1 pillar_1.11.0 \n#> [21] evaluate_1.0.4 yaml_2.3.10 rlang_1.1.6 jsonlite_2.0.0 \n#> [25] keyring_1.4.1\n```\n\n\n:::\n:::\n\n</details>\n\n## Aperitif\n\n\n\n### Counting Penguins\n\nConsider this code to count the number of Gentoo penguins in the `penguins` data set. We see that there are 124 Gentoo penguins.\n\n\n::: {.cell}\n\n:::\n\n\n### In\n\nOne subtle error can arise in trying out `%in%` here instead.\n\n\n::: {.cell}\n\n:::\n\n\n\n\n### Fix: base R \n\n\n::: {.cell}\n\n:::\n\n\n### Fix: dplyr\n\n\n::: {.cell}\n\n:::\n\n\n### Motivation\n\n* What are the different types of vectors?\n* How does this affect accessing vectors?\n\n<details>\n<summary>Side Quest: Looking up the `%in%` operator</summary>\nIf you want to look up the manual pages for the `%in%` operator with the `?`, use backticks:\n\n\n::: {.cell}\n\n:::\n\n\nand we find that `%in%` is a wrapper for the `match()` function.\n\n</details>\n\n\n## Types of Vectors\n\n \n\nTwo main types:\n\n- **Atomic**: Elements all the same type.\n- **List**: Elements are different Types.\n\nClosely related but not technically a vector:\n\n- **NULL**: Null elements. Often length zero.\n\n## Atomic Vectors\n\n### Types of atomic vectors\n\n \n\n- **Logical**: True/False\n- **Integer**: Numeric (discrete, no decimals)\n- **Double**: Numeric (continuous, decimals)\n- **Character**: String\n\n### Vectors of Length One\n\n**Scalars** are vectors that consist of a single value.\n\n#### Logicals\n\n\n::: {.cell}\n\n:::\n\n\n#### Doubles\n\n\n::: {.cell}\n\n:::\n\n\n#### Integers\n\nIntegers must be followed by L and cannot have fractional values\n\n\n::: {.cell}\n\n:::\n\n\n<details>\n<summary>Pop Quiz: Why \"L\" for integers?</summary>\nWickham notes that the use of `L` dates back to the **C** programming language and its \"long int\" type for memory allocation.\n</details>\n\n#### Strings\n\nStrings can use single or double quotes and special characters are escaped with \\\n\n\n::: {.cell}\n\n:::\n\n\n### Longer\n\nThere are several ways to make longer vectors:\n\n**1. With single values** inside c() for combine.\n\n\n::: {.cell}\n\n:::\n\n\n \n\n**2. With other vectors**\n\n\n::: {.cell}\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1 2 3 4\n```\n\n\n:::\n:::\n\n\n<details>\n<summary>Side Quest: rlang</summary>\n\n`{rlang}` has [vector constructor functions too](https://rlang.r-lib.org/reference/vector-construction.html):\n\n- `rlang::lgl(...)`\n- `rlang::int(...)`\n- `rlang::dbl(...)`\n- `rlang::chr(...)`\n\nThey look to do both more and less than `c()`.\n\n- More:\n - Enforce type\n - Splice lists\n - More types: `rlang::bytes()`, `rlang::cpl(...)`\n- Less:\n - Stricter rules on names\n\nNote: currently has `questioning` lifecycle badge, since these constructors may get moved to `vctrs`\n\n</details>\n\n### Type and Length\n\nWe can determine the type of a vector with `typeof()` and its length with `length()`\n\n\n::: {.cell}\n::: {.cell-output-display}\n\n```{=html}\n<div id=\"gvfpwanldj\" style=\"padding-left:0px;padding-right:0px;padding-top:10px;padding-bottom:10px;overflow-x:auto;overflow-y:auto;width:auto;height:auto;\">\n<style>#gvfpwanldj table {\n font-family: system-ui, 'Segoe UI', Roboto, Helvetica, Arial, sans-serif, 'Apple Color Emoji', 'Segoe UI Emoji', 'Segoe UI Symbol', 'Noto Color Emoji';\n -webkit-font-smoothing: antialiased;\n -moz-osx-font-smoothing: grayscale;\n}\n\n#gvfpwanldj thead, #gvfpwanldj tbody, #gvfpwanldj tfoot, #gvfpwanldj tr, #gvfpwanldj td, #gvfpwanldj th {\n border-style: none;\n}\n\n#gvfpwanldj p {\n margin: 0;\n padding: 0;\n}\n\n#gvfpwanldj .gt_table {\n display: table;\n border-collapse: collapse;\n line-height: normal;\n margin-left: auto;\n margin-right: auto;\n color: #333333;\n font-size: 16px;\n font-weight: normal;\n font-style: normal;\n background-color: #FFFFFF;\n width: auto;\n border-top-style: solid;\n border-top-width: 2px;\n border-top-color: #A8A8A8;\n border-right-style: none;\n border-right-width: 2px;\n border-right-color: #D3D3D3;\n border-bottom-style: solid;\n border-bottom-width: 2px;\n border-bottom-color: #A8A8A8;\n border-left-style: none;\n border-left-width: 2px;\n border-left-color: #D3D3D3;\n}\n\n#gvfpwanldj .gt_caption {\n padding-top: 4px;\n padding-bottom: 4px;\n}\n\n#gvfpwanldj .gt_title {\n color: #333333;\n font-size: 125%;\n font-weight: initial;\n padding-top: 4px;\n padding-bottom: 4px;\n padding-left: 5px;\n padding-right: 5px;\n border-bottom-color: #FFFFFF;\n border-bottom-width: 0;\n}\n\n#gvfpwanldj .gt_subtitle {\n color: #333333;\n font-size: 85%;\n font-weight: initial;\n padding-top: 3px;\n padding-bottom: 5px;\n padding-left: 5px;\n padding-right: 5px;\n border-top-color: #FFFFFF;\n border-top-width: 0;\n}\n\n#gvfpwanldj .gt_heading {\n background-color: #FFFFFF;\n text-align: center;\n border-bottom-color: #FFFFFF;\n border-left-style: none;\n border-left-width: 1px;\n border-left-color: #D3D3D3;\n border-right-style: none;\n border-right-width: 1px;\n border-right-color: #D3D3D3;\n}\n\n#gvfpwanldj .gt_bottom_border {\n border-bottom-style: solid;\n border-bottom-width: 2px;\n border-bottom-color: #D3D3D3;\n}\n\n#gvfpwanldj .gt_col_headings {\n border-top-style: solid;\n border-top-width: 2px;\n border-top-color: #D3D3D3;\n border-bottom-style: solid;\n border-bottom-width: 2px;\n border-bottom-color: #D3D3D3;\n border-left-style: none;\n border-left-width: 1px;\n border-left-color: #D3D3D3;\n border-right-style: none;\n border-right-width: 1px;\n border-right-color: #D3D3D3;\n}\n\n#gvfpwanldj .gt_col_heading {\n color: #333333;\n background-color: #FFFFFF;\n font-size: 100%;\n font-weight: normal;\n text-transform: inherit;\n border-left-style: none;\n border-left-width: 1px;\n border-left-color: #D3D3D3;\n border-right-style: none;\n border-right-width: 1px;\n border-right-color: #D3D3D3;\n vertical-align: bottom;\n padding-top: 5px;\n padding-bottom: 6px;\n padding-left: 5px;\n padding-right: 5px;\n overflow-x: hidden;\n}\n\n#gvfpwanldj .gt_column_spanner_outer {\n color: #333333;\n background-color: #FFFFFF;\n font-size: 100%;\n font-weight: normal;\n text-transform: inherit;\n padding-top: 0;\n padding-bottom: 0;\n padding-left: 4px;\n padding-right: 4px;\n}\n\n#gvfpwanldj .gt_column_spanner_outer:first-child {\n padding-left: 0;\n}\n\n#gvfpwanldj .gt_column_spanner_outer:last-child {\n padding-right: 0;\n}\n\n#gvfpwanldj .gt_column_spanner {\n border-bottom-style: solid;\n border-bottom-width: 2px;\n border-bottom-color: #D3D3D3;\n vertical-align: bottom;\n padding-top: 5px;\n padding-bottom: 5px;\n overflow-x: hidden;\n display: inline-block;\n width: 100%;\n}\n\n#gvfpwanldj .gt_spanner_row {\n border-bottom-style: hidden;\n}\n\n#gvfpwanldj .gt_group_heading {\n padding-top: 8px;\n padding-bottom: 8px;\n padding-left: 5px;\n padding-right: 5px;\n color: #333333;\n background-color: #FFFFFF;\n font-size: 100%;\n font-weight: initial;\n text-transform: inherit;\n border-top-style: solid;\n border-top-width: 2px;\n border-top-color: #D3D3D3;\n border-bottom-style: solid;\n border-bottom-width: 2px;\n border-bottom-color: #D3D3D3;\n border-left-style: none;\n border-left-width: 1px;\n border-left-color: #D3D3D3;\n border-right-style: none;\n border-right-width: 1px;\n border-right-color: #D3D3D3;\n vertical-align: middle;\n text-align: left;\n}\n\n#gvfpwanldj .gt_empty_group_heading {\n padding: 0.5px;\n color: #333333;\n background-color: #FFFFFF;\n font-size: 100%;\n font-weight: initial;\n border-top-style: solid;\n border-top-width: 2px;\n border-top-color: #D3D3D3;\n border-bottom-style: solid;\n border-bottom-width: 2px;\n border-bottom-color: #D3D3D3;\n vertical-align: middle;\n}\n\n#gvfpwanldj .gt_from_md > :first-child {\n margin-top: 0;\n}\n\n#gvfpwanldj .gt_from_md > :last-child {\n margin-bottom: 0;\n}\n\n#gvfpwanldj .gt_row {\n padding-top: 8px;\n padding-bottom: 8px;\n padding-left: 5px;\n padding-right: 5px;\n margin: 10px;\n border-top-style: solid;\n border-top-width: 1px;\n border-top-color: #D3D3D3;\n border-left-style: none;\n border-left-width: 1px;\n border-left-color: #D3D3D3;\n border-right-style: none;\n border-right-width: 1px;\n border-right-color: #D3D3D3;\n vertical-align: middle;\n overflow-x: hidden;\n}\n\n#gvfpwanldj .gt_stub {\n color: #333333;\n background-color: #FFFFFF;\n font-size: 100%;\n font-weight: initial;\n text-transform: inherit;\n border-right-style: solid;\n border-right-width: 2px;\n border-right-color: #D3D3D3;\n padding-left: 5px;\n padding-right: 5px;\n}\n\n#gvfpwanldj .gt_stub_row_group {\n color: #333333;\n background-color: #FFFFFF;\n font-size: 100%;\n font-weight: initial;\n text-transform: inherit;\n border-right-style: solid;\n border-right-width: 2px;\n border-right-color: #D3D3D3;\n padding-left: 5px;\n padding-right: 5px;\n vertical-align: top;\n}\n\n#gvfpwanldj .gt_row_group_first td {\n border-top-width: 2px;\n}\n\n#gvfpwanldj .gt_row_group_first th {\n border-top-width: 2px;\n}\n\n#gvfpwanldj .gt_summary_row {\n color: #333333;\n background-color: #FFFFFF;\n text-transform: inherit;\n padding-top: 8px;\n padding-bottom: 8px;\n padding-left: 5px;\n padding-right: 5px;\n}\n\n#gvfpwanldj .gt_first_summary_row {\n border-top-style: solid;\n border-top-color: #D3D3D3;\n}\n\n#gvfpwanldj .gt_first_summary_row.thick {\n border-top-width: 2px;\n}\n\n#gvfpwanldj .gt_last_summary_row {\n padding-top: 8px;\n padding-bottom: 8px;\n padding-left: 5px;\n padding-right: 5px;\n border-bottom-style: solid;\n border-bottom-width: 2px;\n border-bottom-color: #D3D3D3;\n}\n\n#gvfpwanldj .gt_grand_summary_row {\n color: #333333;\n background-color: #FFFFFF;\n text-transform: inherit;\n padding-top: 8px;\n padding-bottom: 8px;\n padding-left: 5px;\n padding-right: 5px;\n}\n\n#gvfpwanldj .gt_first_grand_summary_row {\n padding-top: 8px;\n padding-bottom: 8px;\n padding-left: 5px;\n padding-right: 5px;\n border-top-style: double;\n border-top-width: 6px;\n border-top-color: #D3D3D3;\n}\n\n#gvfpwanldj .gt_last_grand_summary_row_top {\n padding-top: 8px;\n padding-bottom: 8px;\n padding-left: 5px;\n padding-right: 5px;\n border-bottom-style: double;\n border-bottom-width: 6px;\n border-bottom-color: #D3D3D3;\n}\n\n#gvfpwanldj .gt_striped {\n background-color: rgba(128, 128, 128, 0.05);\n}\n\n#gvfpwanldj .gt_table_body {\n border-top-style: solid;\n border-top-width: 2px;\n border-top-color: #D3D3D3;\n border-bottom-style: solid;\n border-bottom-width: 2px;\n border-bottom-color: #D3D3D3;\n}\n\n#gvfpwanldj .gt_footnotes {\n color: #333333;\n background-color: #FFFFFF;\n border-bottom-style: none;\n border-bottom-width: 2px;\n border-bottom-color: #D3D3D3;\n border-left-style: none;\n border-left-width: 2px;\n border-left-color: #D3D3D3;\n border-right-style: none;\n border-right-width: 2px;\n border-right-color: #D3D3D3;\n}\n\n#gvfpwanldj .gt_footnote {\n margin: 0px;\n font-size: 90%;\n padding-top: 4px;\n padding-bottom: 4px;\n padding-left: 5px;\n padding-right: 5px;\n}\n\n#gvfpwanldj .gt_sourcenotes {\n color: #333333;\n background-color: #FFFFFF;\n border-bottom-style: none;\n border-bottom-width: 2px;\n border-bottom-color: #D3D3D3;\n border-left-style: none;\n border-left-width: 2px;\n border-left-color: #D3D3D3;\n border-right-style: none;\n border-right-width: 2px;\n border-right-color: #D3D3D3;\n}\n\n#gvfpwanldj .gt_sourcenote {\n font-size: 90%;\n padding-top: 4px;\n padding-bottom: 4px;\n padding-left: 5px;\n padding-right: 5px;\n}\n\n#gvfpwanldj .gt_left {\n text-align: left;\n}\n\n#gvfpwanldj .gt_center {\n text-align: center;\n}\n\n#gvfpwanldj .gt_right {\n text-align: right;\n font-variant-numeric: tabular-nums;\n}\n\n#gvfpwanldj .gt_font_normal {\n font-weight: normal;\n}\n\n#gvfpwanldj .gt_font_bold {\n font-weight: bold;\n}\n\n#gvfpwanldj .gt_font_italic {\n font-style: italic;\n}\n\n#gvfpwanldj .gt_super {\n font-size: 65%;\n}\n\n#gvfpwanldj .gt_footnote_marks {\n font-size: 75%;\n vertical-align: 0.4em;\n position: initial;\n}\n\n#gvfpwanldj .gt_asterisk {\n font-size: 100%;\n vertical-align: 0;\n}\n\n#gvfpwanldj .gt_indent_1 {\n text-indent: 5px;\n}\n\n#gvfpwanldj .gt_indent_2 {\n text-indent: 10px;\n}\n\n#gvfpwanldj .gt_indent_3 {\n text-indent: 15px;\n}\n\n#gvfpwanldj .gt_indent_4 {\n text-indent: 20px;\n}\n\n#gvfpwanldj .gt_indent_5 {\n text-indent: 25px;\n}\n\n#gvfpwanldj .katex-display {\n display: inline-flex !important;\n margin-bottom: 0.75em !important;\n}\n\n#gvfpwanldj div.Reactable > div.rt-table > div.rt-thead > div.rt-tr.rt-tr-group-header > div.rt-th-group:after {\n height: 0px !important;\n}\n</style>\n<table class=\"gt_table\" data-quarto-disable-processing=\"false\" data-quarto-bootstrap=\"false\">\n <thead>\n <tr class=\"gt_heading\">\n <td colspan=\"4\" class=\"gt_heading gt_title gt_font_normal gt_bottom_border\" style>Types of Atomic Vectors<span class=\"gt_footnote_marks\" style=\"white-space:nowrap;font-style:italic;font-weight:normal;line-height:0;\"><sup>1</sup></span></td>\n </tr>\n \n <tr class=\"gt_col_headings\">\n <th class=\"gt_col_heading gt_columns_bottom_border gt_center\" rowspan=\"1\" colspan=\"1\" scope=\"col\" id=\"var_names\">name</th>\n <th class=\"gt_col_heading gt_columns_bottom_border gt_center\" rowspan=\"1\" colspan=\"1\" scope=\"col\" id=\"var_values\">value</th>\n <th class=\"gt_col_heading gt_columns_bottom_border gt_center\" rowspan=\"1\" colspan=\"1\" scope=\"col\" id=\"var_type\">typeof()</th>\n <th class=\"gt_col_heading gt_columns_bottom_border gt_center\" rowspan=\"1\" colspan=\"1\" scope=\"col\" id=\"var_length\">length()</th>\n </tr>\n </thead>\n <tbody class=\"gt_table_body\">\n <tr><td headers=\"var_names\" class=\"gt_row gt_center\">lgl_var</td>\n<td headers=\"var_values\" class=\"gt_row gt_center\">TRUE, FALSE</td>\n<td headers=\"var_type\" class=\"gt_row gt_center\" style=\"background-color: #F9E3D6;\">logical</td>\n<td headers=\"var_length\" class=\"gt_row gt_center\" style=\"background-color: #E0FFFF;\">2</td></tr>\n <tr><td headers=\"var_names\" class=\"gt_row gt_center\">int_var</td>\n<td headers=\"var_values\" class=\"gt_row gt_center\">1L, 6L, 10L</td>\n<td headers=\"var_type\" class=\"gt_row gt_center\" style=\"background-color: #F9E3D6;\">integer</td>\n<td headers=\"var_length\" class=\"gt_row gt_center\" style=\"background-color: #E0FFFF;\">3</td></tr>\n <tr><td headers=\"var_names\" class=\"gt_row gt_center\">dbl_var</td>\n<td headers=\"var_values\" class=\"gt_row gt_center\">1, 2.5, 4.5</td>\n<td headers=\"var_type\" class=\"gt_row gt_center\" style=\"background-color: #F9E3D6;\">double</td>\n<td headers=\"var_length\" class=\"gt_row gt_center\" style=\"background-color: #E0FFFF;\">3</td></tr>\n <tr><td headers=\"var_names\" class=\"gt_row gt_center\">chr_var</td>\n<td headers=\"var_values\" class=\"gt_row gt_center\">'these are', 'some strings'</td>\n<td headers=\"var_type\" class=\"gt_row gt_center\" style=\"background-color: #F9E3D6;\">character</td>\n<td headers=\"var_length\" class=\"gt_row gt_center\" style=\"background-color: #E0FFFF;\">2</td></tr>\n </tbody>\n \n <tfoot class=\"gt_footnotes\">\n <tr>\n <td class=\"gt_footnote\" colspan=\"4\"><span class=\"gt_footnote_marks\" style=\"white-space:nowrap;font-style:italic;font-weight:normal;line-height:0;\"><sup>1</sup></span> Source: https://adv-r.hadley.nz/index.html</td>\n </tr>\n </tfoot>\n</table>\n</div>\n```\n\n:::\n:::\n\n\n<details>\n<summary>Side Quest: Penguins</summary>\n\n::: {.cell}\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"integer\"\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"factor\"\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"integer\"\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"factor\"\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"integer\"\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"factor\"\n```\n\n\n:::\n:::\n\n\n</details>\n\n### Missing values\n\n#### Contagion\n\nFor most computations, an operation over values that includes a missing value yields a missing value (unless you're careful)\n\n\n::: {.cell}\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] NA\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] NA\n```\n\n\n:::\n:::\n\n\n#### Exceptions\n\n\n::: {.cell}\n\n:::\n\n\n\n#### Innoculation\n\n\n::: {.cell}\n\n:::\n\n\nTo search for missing values use `is.na()`\n\n\n::: {.cell}\n\n:::\n\n\n\n::: {.cell}\n\n:::\n\n\n<details>\n<summary>Side Quest: NA Types</summary>\n\nEach type has its own NA type\n\n- Logical: `NA`\n- Integer: `NA_integer`\n- Double: `NA_double`\n- Character: `NA_character`\n\nThis may not matter in many contexts.\n\nBut this does matter for operations where types matter like `dplyr::if_else()`.\n</details>\n\n\n### Testing\n\n**What type of vector `is.*`() it?**\n\nTest data type:\n\n- Logical: `is.logical()`\n- Integer: `is.integer()`\n- Double: `is.double()`\n- Character: `is.character()`\n\n**What type of object is it?**\n\nDon't test objects with these tools:\n\n- `is.vector()`\n- `is.atomic()`\n- `is.numeric()` \n\nThey don’t test if you have a vector, atomic vector, or numeric vector; you’ll need to carefully read the documentation to figure out what they actually do (preview: *attributes*)\n\n<details>\n<summary>Side Quest: rlang</summary>\n\nInstead, maybe, use `{rlang}`\n\n- `rlang::is_vector`\n- `rlang::is_atomic`\n\n\n::: {.cell}\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] TRUE\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] TRUE\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] TRUE\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] FALSE\n```\n\n\n:::\n:::\n\n\nSee more [here](https://rlang.r-lib.org/reference/type-predicates.html)\n</details>\n\n\n### Coercion\n\n* R follows rules for coercion: character → double → integer → logical\n\n* R can coerce either automatically or explicitly\n\n#### **Automatic**\n\nTwo contexts for automatic coercion:\n\n1. Combination\n2. Mathematical\n\n##### Coercion by Combination:\n\n\n::: {.cell}\n::: {.cell-output .cell-output-stdout}\n\n```\n#> chr [1:2] \"TRUE\" \"TRUE\"\n```\n\n\n:::\n:::\n\n\n##### Coercion by Mathematical operations:\n\n\n::: {.cell}\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 3\n```\n\n\n:::\n:::\n\n\n#### **Explicit**\n\n<!--\n\nUse `as.*()`\n\n- Logical: `as.logical()`\n- Integer: `as.integer()`\n- Double: `as.double()`\n- Character: `as.character()`\n\n-->\n\n\n::: {.cell}\n::: {.cell-output-display}\n\n```{=html}\n<div id=\"abudsgypeu\" style=\"padding-left:0px;padding-right:0px;padding-top:10px;padding-bottom:10px;overflow-x:auto;overflow-y:auto;width:auto;height:auto;\">\n<style>#abudsgypeu table {\n font-family: system-ui, 'Segoe UI', Roboto, Helvetica, Arial, sans-serif, 'Apple Color Emoji', 'Segoe UI Emoji', 'Segoe UI Symbol', 'Noto Color Emoji';\n -webkit-font-smoothing: antialiased;\n -moz-osx-font-smoothing: grayscale;\n}\n\n#abudsgypeu thead, #abudsgypeu tbody, #abudsgypeu tfoot, #abudsgypeu tr, #abudsgypeu td, #abudsgypeu th {\n border-style: none;\n}\n\n#abudsgypeu p {\n margin: 0;\n padding: 0;\n}\n\n#abudsgypeu .gt_table {\n display: table;\n border-collapse: collapse;\n line-height: normal;\n margin-left: auto;\n margin-right: auto;\n color: #333333;\n font-size: 16px;\n font-weight: normal;\n font-style: normal;\n background-color: #FFFFFF;\n width: auto;\n border-top-style: solid;\n border-top-width: 2px;\n border-top-color: #A8A8A8;\n border-right-style: none;\n border-right-width: 2px;\n border-right-color: #D3D3D3;\n border-bottom-style: solid;\n border-bottom-width: 2px;\n border-bottom-color: #A8A8A8;\n border-left-style: none;\n border-left-width: 2px;\n border-left-color: #D3D3D3;\n}\n\n#abudsgypeu .gt_caption {\n padding-top: 4px;\n padding-bottom: 4px;\n}\n\n#abudsgypeu .gt_title {\n color: #333333;\n font-size: 125%;\n font-weight: initial;\n padding-top: 4px;\n padding-bottom: 4px;\n padding-left: 5px;\n padding-right: 5px;\n border-bottom-color: #FFFFFF;\n border-bottom-width: 0;\n}\n\n#abudsgypeu .gt_subtitle {\n color: #333333;\n font-size: 85%;\n font-weight: initial;\n padding-top: 3px;\n padding-bottom: 5px;\n padding-left: 5px;\n padding-right: 5px;\n border-top-color: #FFFFFF;\n border-top-width: 0;\n}\n\n#abudsgypeu .gt_heading {\n background-color: #FFFFFF;\n text-align: center;\n border-bottom-color: #FFFFFF;\n border-left-style: none;\n border-left-width: 1px;\n border-left-color: #D3D3D3;\n border-right-style: none;\n border-right-width: 1px;\n border-right-color: #D3D3D3;\n}\n\n#abudsgypeu .gt_bottom_border {\n border-bottom-style: solid;\n border-bottom-width: 2px;\n border-bottom-color: #D3D3D3;\n}\n\n#abudsgypeu .gt_col_headings {\n border-top-style: solid;\n border-top-width: 2px;\n border-top-color: #D3D3D3;\n border-bottom-style: solid;\n border-bottom-width: 2px;\n border-bottom-color: #D3D3D3;\n border-left-style: none;\n border-left-width: 1px;\n border-left-color: #D3D3D3;\n border-right-style: none;\n border-right-width: 1px;\n border-right-color: #D3D3D3;\n}\n\n#abudsgypeu .gt_col_heading {\n color: #333333;\n background-color: #FFFFFF;\n font-size: 100%;\n font-weight: normal;\n text-transform: inherit;\n border-left-style: none;\n border-left-width: 1px;\n border-left-color: #D3D3D3;\n border-right-style: none;\n border-right-width: 1px;\n border-right-color: #D3D3D3;\n vertical-align: bottom;\n padding-top: 5px;\n padding-bottom: 6px;\n padding-left: 5px;\n padding-right: 5px;\n overflow-x: hidden;\n}\n\n#abudsgypeu .gt_column_spanner_outer {\n color: #333333;\n background-color: #FFFFFF;\n font-size: 100%;\n font-weight: normal;\n text-transform: inherit;\n padding-top: 0;\n padding-bottom: 0;\n padding-left: 4px;\n padding-right: 4px;\n}\n\n#abudsgypeu .gt_column_spanner_outer:first-child {\n padding-left: 0;\n}\n\n#abudsgypeu .gt_column_spanner_outer:last-child {\n padding-right: 0;\n}\n\n#abudsgypeu .gt_column_spanner {\n border-bottom-style: solid;\n border-bottom-width: 2px;\n border-bottom-color: #D3D3D3;\n vertical-align: bottom;\n padding-top: 5px;\n padding-bottom: 5px;\n overflow-x: hidden;\n display: inline-block;\n width: 100%;\n}\n\n#abudsgypeu .gt_spanner_row {\n border-bottom-style: hidden;\n}\n\n#abudsgypeu .gt_group_heading {\n padding-top: 8px;\n padding-bottom: 8px;\n padding-left: 5px;\n padding-right: 5px;\n color: #333333;\n background-color: #FFFFFF;\n font-size: 100%;\n font-weight: initial;\n text-transform: inherit;\n border-top-style: solid;\n border-top-width: 2px;\n border-top-color: #D3D3D3;\n border-bottom-style: solid;\n border-bottom-width: 2px;\n border-bottom-color: #D3D3D3;\n border-left-style: none;\n border-left-width: 1px;\n border-left-color: #D3D3D3;\n border-right-style: none;\n border-right-width: 1px;\n border-right-color: #D3D3D3;\n vertical-align: middle;\n text-align: left;\n}\n\n#abudsgypeu .gt_empty_group_heading {\n padding: 0.5px;\n color: #333333;\n background-color: #FFFFFF;\n font-size: 100%;\n font-weight: initial;\n border-top-style: solid;\n border-top-width: 2px;\n border-top-color: #D3D3D3;\n border-bottom-style: solid;\n border-bottom-width: 2px;\n border-bottom-color: #D3D3D3;\n vertical-align: middle;\n}\n\n#abudsgypeu .gt_from_md > :first-child {\n margin-top: 0;\n}\n\n#abudsgypeu .gt_from_md > :last-child {\n margin-bottom: 0;\n}\n\n#abudsgypeu .gt_row {\n padding-top: 8px;\n padding-bottom: 8px;\n padding-left: 5px;\n padding-right: 5px;\n margin: 10px;\n border-top-style: solid;\n border-top-width: 1px;\n border-top-color: #D3D3D3;\n border-left-style: none;\n border-left-width: 1px;\n border-left-color: #D3D3D3;\n border-right-style: none;\n border-right-width: 1px;\n border-right-color: #D3D3D3;\n vertical-align: middle;\n overflow-x: hidden;\n}\n\n#abudsgypeu .gt_stub {\n color: #333333;\n background-color: #FFFFFF;\n font-size: 100%;\n font-weight: initial;\n text-transform: inherit;\n border-right-style: solid;\n border-right-width: 2px;\n border-right-color: #D3D3D3;\n padding-left: 5px;\n padding-right: 5px;\n}\n\n#abudsgypeu .gt_stub_row_group {\n color: #333333;\n background-color: #FFFFFF;\n font-size: 100%;\n font-weight: initial;\n text-transform: inherit;\n border-right-style: solid;\n border-right-width: 2px;\n border-right-color: #D3D3D3;\n padding-left: 5px;\n padding-right: 5px;\n vertical-align: top;\n}\n\n#abudsgypeu .gt_row_group_first td {\n border-top-width: 2px;\n}\n\n#abudsgypeu .gt_row_group_first th {\n border-top-width: 2px;\n}\n\n#abudsgypeu .gt_summary_row {\n color: #333333;\n background-color: #FFFFFF;\n text-transform: inherit;\n padding-top: 8px;\n padding-bottom: 8px;\n padding-left: 5px;\n padding-right: 5px;\n}\n\n#abudsgypeu .gt_first_summary_row {\n border-top-style: solid;\n border-top-color: #D3D3D3;\n}\n\n#abudsgypeu .gt_first_summary_row.thick {\n border-top-width: 2px;\n}\n\n#abudsgypeu .gt_last_summary_row {\n padding-top: 8px;\n padding-bottom: 8px;\n padding-left: 5px;\n padding-right: 5px;\n border-bottom-style: solid;\n border-bottom-width: 2px;\n border-bottom-color: #D3D3D3;\n}\n\n#abudsgypeu .gt_grand_summary_row {\n color: #333333;\n background-color: #FFFFFF;\n text-transform: inherit;\n padding-top: 8px;\n padding-bottom: 8px;\n padding-left: 5px;\n padding-right: 5px;\n}\n\n#abudsgypeu .gt_first_grand_summary_row {\n padding-top: 8px;\n padding-bottom: 8px;\n padding-left: 5px;\n padding-right: 5px;\n border-top-style: double;\n border-top-width: 6px;\n border-top-color: #D3D3D3;\n}\n\n#abudsgypeu .gt_last_grand_summary_row_top {\n padding-top: 8px;\n padding-bottom: 8px;\n padding-left: 5px;\n padding-right: 5px;\n border-bottom-style: double;\n border-bottom-width: 6px;\n border-bottom-color: #D3D3D3;\n}\n\n#abudsgypeu .gt_striped {\n background-color: rgba(128, 128, 128, 0.05);\n}\n\n#abudsgypeu .gt_table_body {\n border-top-style: solid;\n border-top-width: 2px;\n border-top-color: #D3D3D3;\n border-bottom-style: solid;\n border-bottom-width: 2px;\n border-bottom-color: #D3D3D3;\n}\n\n#abudsgypeu .gt_footnotes {\n color: #333333;\n background-color: #FFFFFF;\n border-bottom-style: none;\n border-bottom-width: 2px;\n border-bottom-color: #D3D3D3;\n border-left-style: none;\n border-left-width: 2px;\n border-left-color: #D3D3D3;\n border-right-style: none;\n border-right-width: 2px;\n border-right-color: #D3D3D3;\n}\n\n#abudsgypeu .gt_footnote {\n margin: 0px;\n font-size: 90%;\n padding-top: 4px;\n padding-bottom: 4px;\n padding-left: 5px;\n padding-right: 5px;\n}\n\n#abudsgypeu .gt_sourcenotes {\n color: #333333;\n background-color: #FFFFFF;\n border-bottom-style: none;\n border-bottom-width: 2px;\n border-bottom-color: #D3D3D3;\n border-left-style: none;\n border-left-width: 2px;\n border-left-color: #D3D3D3;\n border-right-style: none;\n border-right-width: 2px;\n border-right-color: #D3D3D3;\n}\n\n#abudsgypeu .gt_sourcenote {\n font-size: 90%;\n padding-top: 4px;\n padding-bottom: 4px;\n padding-left: 5px;\n padding-right: 5px;\n}\n\n#abudsgypeu .gt_left {\n text-align: left;\n}\n\n#abudsgypeu .gt_center {\n text-align: center;\n}\n\n#abudsgypeu .gt_right {\n text-align: right;\n font-variant-numeric: tabular-nums;\n}\n\n#abudsgypeu .gt_font_normal {\n font-weight: normal;\n}\n\n#abudsgypeu .gt_font_bold {\n font-weight: bold;\n}\n\n#abudsgypeu .gt_font_italic {\n font-style: italic;\n}\n\n#abudsgypeu .gt_super {\n font-size: 65%;\n}\n\n#abudsgypeu .gt_footnote_marks {\n font-size: 75%;\n vertical-align: 0.4em;\n position: initial;\n}\n\n#abudsgypeu .gt_asterisk {\n font-size: 100%;\n vertical-align: 0;\n}\n\n#abudsgypeu .gt_indent_1 {\n text-indent: 5px;\n}\n\n#abudsgypeu .gt_indent_2 {\n text-indent: 10px;\n}\n\n#abudsgypeu .gt_indent_3 {\n text-indent: 15px;\n}\n\n#abudsgypeu .gt_indent_4 {\n text-indent: 20px;\n}\n\n#abudsgypeu .gt_indent_5 {\n text-indent: 25px;\n}\n\n#abudsgypeu .katex-display {\n display: inline-flex !important;\n margin-bottom: 0.75em !important;\n}\n\n#abudsgypeu div.Reactable > div.rt-table > div.rt-thead > div.rt-tr.rt-tr-group-header > div.rt-th-group:after {\n height: 0px !important;\n}\n</style>\n<table class=\"gt_table\" data-quarto-disable-processing=\"false\" data-quarto-bootstrap=\"false\">\n <thead>\n <tr class=\"gt_heading\">\n <td colspan=\"6\" class=\"gt_heading gt_title gt_font_normal gt_bottom_border\" style>Coercion of Atomic Vectors<span class=\"gt_footnote_marks\" style=\"white-space:nowrap;font-style:italic;font-weight:normal;line-height:0;\"><sup>1</sup></span></td>\n </tr>\n \n <tr class=\"gt_col_headings\">\n <th class=\"gt_col_heading gt_columns_bottom_border gt_center\" rowspan=\"1\" colspan=\"1\" scope=\"col\" id=\"var_names\">name</th>\n <th class=\"gt_col_heading gt_columns_bottom_border gt_center\" rowspan=\"1\" colspan=\"1\" scope=\"col\" id=\"var_values\">value</th>\n <th class=\"gt_col_heading gt_columns_bottom_border gt_center\" rowspan=\"1\" colspan=\"1\" scope=\"col\" id=\"as_logical\">as.logical()</th>\n <th class=\"gt_col_heading gt_columns_bottom_border gt_center\" rowspan=\"1\" colspan=\"1\" scope=\"col\" id=\"as_integer\">as.integer()</th>\n <th class=\"gt_col_heading gt_columns_bottom_border gt_center\" rowspan=\"1\" colspan=\"1\" scope=\"col\" id=\"as_double\">as.double()</th>\n <th class=\"gt_col_heading gt_columns_bottom_border gt_center\" rowspan=\"1\" colspan=\"1\" scope=\"col\" id=\"as_character\">as.character()</th>\n </tr>\n </thead>\n <tbody class=\"gt_table_body\">\n <tr><td headers=\"var_names\" class=\"gt_row gt_center\">lgl_var</td>\n<td headers=\"var_values\" class=\"gt_row gt_center\">TRUE, FALSE</td>\n<td headers=\"as_logical\" class=\"gt_row gt_center\" style=\"background-color: #F9E3D6;\">TRUE FALSE</td>\n<td headers=\"as_integer\" class=\"gt_row gt_center\" style=\"background-color: #E0FFFF;\">1 0</td>\n<td headers=\"as_double\" class=\"gt_row gt_center\" style=\"background-color: #F9E3D6;\">1 0</td>\n<td headers=\"as_character\" class=\"gt_row gt_center\" style=\"background-color: #E0FFFF;\">'TRUE' 'FALSE'</td></tr>\n <tr><td headers=\"var_names\" class=\"gt_row gt_center\">int_var</td>\n<td headers=\"var_values\" class=\"gt_row gt_center\">1L, 6L, 10L</td>\n<td headers=\"as_logical\" class=\"gt_row gt_center\" style=\"background-color: #F9E3D6;\">TRUE TRUE TRUE</td>\n<td headers=\"as_integer\" class=\"gt_row gt_center\" style=\"background-color: #E0FFFF;\">1 6 10</td>\n<td headers=\"as_double\" class=\"gt_row gt_center\" style=\"background-color: #F9E3D6;\">1 6 10</td>\n<td headers=\"as_character\" class=\"gt_row gt_center\" style=\"background-color: #E0FFFF;\">'1' '6' '10'</td></tr>\n <tr><td headers=\"var_names\" class=\"gt_row gt_center\">dbl_var</td>\n<td headers=\"var_values\" class=\"gt_row gt_center\">1, 2.5, 4.5</td>\n<td headers=\"as_logical\" class=\"gt_row gt_center\" style=\"background-color: #F9E3D6;\">TRUE TRUE TRUE</td>\n<td headers=\"as_integer\" class=\"gt_row gt_center\" style=\"background-color: #E0FFFF;\">1 2 4</td>\n<td headers=\"as_double\" class=\"gt_row gt_center\" style=\"background-color: #F9E3D6;\">1.0 2.5 4.5</td>\n<td headers=\"as_character\" class=\"gt_row gt_center\" style=\"background-color: #E0FFFF;\">'1' '2.5' '4.5'</td></tr>\n <tr><td headers=\"var_names\" class=\"gt_row gt_center\">chr_var</td>\n<td headers=\"var_values\" class=\"gt_row gt_center\">'these are', 'some strings'</td>\n<td headers=\"as_logical\" class=\"gt_row gt_center\" style=\"background-color: #F9E3D6;\">NA NA</td>\n<td headers=\"as_integer\" class=\"gt_row gt_center\" style=\"background-color: #E0FFFF;\">NA_integer</td>\n<td headers=\"as_double\" class=\"gt_row gt_center\" style=\"background-color: #F9E3D6;\">NA_double</td>\n<td headers=\"as_character\" class=\"gt_row gt_center\" style=\"background-color: #E0FFFF;\">'these are', 'some strings'</td></tr>\n </tbody>\n \n <tfoot class=\"gt_footnotes\">\n <tr>\n <td class=\"gt_footnote\" colspan=\"6\"><span class=\"gt_footnote_marks\" style=\"white-space:nowrap;font-style:italic;font-weight:normal;line-height:0;\"><sup>1</sup></span> Source: https://adv-r.hadley.nz/index.html</td>\n </tr>\n </tfoot>\n</table>\n</div>\n```\n\n:::\n:::\n\n\nBut note that coercion may fail in one of two ways, or both:\n\n- With warning/error\n- NAs\n\n\n::: {.cell}\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1 2 NA\n```\n\n\n:::\n:::\n\n\n### Exercises\n\n1. How do you create raw and complex scalars?\n\n<details><summary>Answer(s)</summary>\n\n::: {.cell}\n\n:::\n\n</details>\n\n2. Test your knowledge of the vector coercion rules by predicting the output of the following uses of c():\n\n\n::: {.cell}\n\n:::\n\n\n<details><summary>Answer(s)</summary>\n\n::: {.cell}\n\n:::\n\n</details>\n\n3. Why is `1 == \"1\"` true? Why is `-1 < FALSE` true? Why is `\"one\" < 2` false?\n\n<details><summary>Answer(s)</summary>\nThese comparisons are carried out by operator-functions (==, <), which coerce their arguments to a common type. In the examples above, these types will be character, double and character: 1 will be coerced to \"1\", FALSE is represented as 0 and 2 turns into \"2\" (and numbers precede letters in lexicographic order (may depend on locale)).\n\n</details>\n\n4. Why is the default missing value, NA, a logical vector? What’s special about logical vectors?\n\n<details><summary>Answer(s)</summary>\nThe presence of missing values shouldn’t affect the type of an object. Recall that there is a type-hierarchy for coercion from character → double → integer → logical. When combining `NA`s with other atomic types, the `NA`s will be coerced to integer (`NA_integer_`), double (`NA_real_`) or character (`NA_character_`) and not the other way round. If `NA` were a character and added to a set of other values all of these would be coerced to character as well.\n</details>\n\n5. Precisely what do `is.atomic()`, `is.numeric()`, and `is.vector()` test for?\n\n<details><summary>Answer(s)</summary>\nThe documentation states that:\n\n* `is.atomic()` tests if an object is an atomic vector (as defined in *Advanced R*) or is `NULL` (!).\n* `is.numeric()` tests if an object has type integer or double and is not of class `factor`, `Date`, `POSIXt` or `difftime`.\n* `is.vector()` tests if an object is a vector (as defined in *Advanced R*) or an expression and has no attributes, apart from names.\n\nAtomic vectors are defined in *Advanced R* as objects of type logical, integer, double, complex, character or raw. Vectors are defined as atomic vectors or lists.\n</details>\n\n\n\n## Attributes\n\nAttributes are name-value pairs that attach metadata to an object(vector).\n\n* **Name-value pairs**: attributes have a name and a value\n* **Metadata**: not data itself, but data about the data\n\n### How? \n\n#### Getting and Setting\n\nThree functions:\n\n1. retrieve and modify single attributes with `attr()`\n2. retrieve en masse with `attributes()`\n3. set en masse with `structure()`\n\n**Single attribute**\n\nUse `attr()`\n\n\n::: {.cell}\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"some attribute\"\n```\n\n\n:::\n:::\n\n\n**Multiple attributes**\n\nTo set multiple attributes, use `structure()` To get multiple attributes, use `attributes()`\n\n\n::: {.cell}\n\n:::\n\n\n\n\n::: {.cell}\n\n:::\n\n\n \n\n### Why\n\nThree particularly important attributes: \n\n1. **names** - a character vector giving each element a name\n2. **dimension** - (or dim) turns vectors into matrices and arrays \n3. **class** - powers the S3 object system (we'll learn more about this in chapter 13)\n\nMost attributes are lost by most operations. Only two attributes are routinely preserved: names and dimension.\n\n#### Names\n\n~~Three~~ Four ways to name:\n\n\n::: {.cell}\n::: {.cell-output .cell-output-stdout}\n\n```\n#> A B C \n#> 1 2 3\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> a b c \n#> 1 2 3\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> a b c \n#> 1 2 3\n```\n\n\n:::\n:::\n\n\n \n\n\n::: {.cell}\n::: {.cell-output .cell-output-stdout}\n\n```\n#> a b c \n#> 1 2 3\n```\n\n\n:::\n:::\n\n\n \n\n* You can remove names from a vector by using `x <- unname(x)` or `names(x) <- NULL`.\n* Thematically but not directly related: labelled class vectors with `haven::labelled()`\n\n\n#### Dimensions\n\nCreate matrices and arrays with `matrix()` and `array()`, or by using the assignment form of `dim()`:\n\n\n::: {.cell}\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [,1] [,2] [,3]\n#> [1,] 1 3 5\n#> [2,] 2 4 6\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> , , 1\n#> \n#> [,1] [,2] [,3]\n#> [1,] 1 3 5\n#> [2,] 2 4 6\n#> \n#> , , 2\n#> \n#> [,1] [,2] [,3]\n#> [1,] 7 9 11\n#> [2,] 8 10 12\n#> \n#> , , 3\n#> \n#> [,1] [,2] [,3]\n#> [1,] 13 15 17\n#> [2,] 14 16 18\n#> \n#> , , 4\n#> \n#> [,1] [,2] [,3]\n#> [1,] 19 21 23\n#> [2,] 20 22 24\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [,1] [,2] [,3]\n#> [1,] 1 3 5\n#> [2,] 2 4 6\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> , , 1\n#> \n#> [,1] [,2] [,3]\n#> [1,] 1 3 5\n#> [2,] 2 4 6\n#> \n#> , , 2\n#> \n#> [,1] [,2] [,3]\n#> [1,] 7 9 11\n#> [2,] 8 10 12\n#> \n#> , , 3\n#> \n#> [,1] [,2] [,3]\n#> [1,] 13 15 17\n#> [2,] 14 16 18\n#> \n#> , , 4\n#> \n#> [,1] [,2] [,3]\n#> [1,] 19 21 23\n#> [2,] 20 22 24\n```\n\n\n:::\n:::\n\n\n##### Functions for working with vectors, matrices and arrays:\n\nVector | Matrix\t| Array\n:----- | :---------- | :-----\n`names()` | `rownames()`, `colnames()` | `dimnames()`\n`length()` | `nrow()`, `ncol()` | `dim()`\n`c()` | `rbind()`, `cbind()` | `abind::abind()`\n— | `t()` | `aperm()`\n`is.null(dim(x))` | `is.matrix()` | `is.array()`\n\n* **Caution**: A vector without a `dim` attribute set is often thought of as 1-dimensional, but actually has `NULL` dimensions.\n* One dimension?\n\n\n::: {.cell}\n\n:::\n\n\n\n### Exercises\n\n1. How is `setNames()` implemented? How is `unname()` implemented? Read the source code.\n\n<details><summary>Answer(s)</summary>\n`setNames()` is implemented as:\n\n\n::: {.cell}\n\n:::\n\n\nBecause the data argument comes first, `setNames()` also works well with the magrittr-pipe operator. When no first argument is given, the result is a named vector (this is rather untypical as required arguments usually come first):\n\n\n::: {.cell}\n\n:::\n\n\n`unname()` is implemented in the following way:\n\n\n::: {.cell}\n\n:::\n\n\n`unname()` removes existing names (or dimnames) by setting them to `NULL`.\n</details>\n\n2. What does `dim()` return when applied to a 1-dimensional vector? When might you use `NROW()` or `NCOL()`?\n\n<details><summary>Answer(s)</summary>\n\n> dim() will return NULL when applied to a 1d vector.\n\nOne may want to use `NROW()` or `NCOL()` to handle atomic vectors, lists and NULL values in the same way as one column matrices or data frames. For these objects `nrow()` and `ncol()` return NULL:\n\n\n::: {.cell}\n\n:::\n\n\n</details>\n\n3. How would you describe the following three objects? What makes them different from `1:5`?\n\n\n::: {.cell}\n\n:::\n\n\n<details><summary>Answer(s)</summary>\n\n::: {.cell}\n\n:::\n\n</details>\n\n\n4. An early draft used this code to illustrate `structure()`:\n\n\n::: {.cell}\n\n:::\n\n\nBut when you print that object you don’t see the comment attribute. Why? Is the attribute missing, or is there something else special about it?\n\n<details><summary>Answer(s)</summary>\nThe documentation states (see `?comment`):\n\n> Contrary to other attributes, the comment is not printed (by print or print.default).\n\nAlso, from `?attributes:`\n\n> Note that some attributes (namely class, comment, dim, dimnames, names, row.names and tsp) are treated specially and have restrictions on the values which can be set.\n\nWe can retrieve comment attributes by calling them explicitly:\n\n\n::: {.cell}\n\n:::\n\n\n</details>\n\n\n\n## **Class** - S3 atomic vectors\n\n \n\nCredit: [Advanced R](https://adv-r.hadley.nz/index.html) by Hadley Wickham\n\n**Having a class attribute turns an object into an S3 object.**\n\nWhat makes S3 atomic vectors different?\n\n1. behave differently from a regular vector when passed to a generic function \n2. often store additional information in other attributes\n\nFour important S3 vectors used in base R:\n\n1. **Factors** (categorical data)\n2. **Dates**\n3. **Date-times** (POSIXct)\n4. **Durations** (difftime)\n\n### Factors\n\nA factor is a vector used to store categorical data that can contain only predefined values.\n\nFactors are integer vectors with:\n\n- Class: \"factor\"\n- Attributes: \"levels\", or the set of allowed values\n\n\n::: {.cell}\n\n:::\n\n\n\n::: {.cell}\n::: {.cell-output .cell-output-stdout}\n\n```\n#> colors\n#> blue green red \n#> 1 2 3\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> a_factor\n#> red blue green yellow \n#> 3 1 2 0\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"integer\"\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"factor\"\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> $levels\n#> [1] \"red\" \"blue\" \"green\" \"yellow\"\n#> \n#> $class\n#> [1] \"factor\"\n```\n\n\n:::\n:::\n\n\n#### Custom Order\n\nFactors can be ordered. This can be useful for models or visualizations where order matters.\n\n\n::: {.cell}\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] high med low med high low med high\n#> Levels: low < med < high\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> values\n#> high low med \n#> 3 2 3\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> ordered_factor\n#> low med high \n#> 2 3 3\n```\n\n\n:::\n:::\n\n\n### Dates\n\nDates are:\n\n- Double vectors\n- With class \"Date\"\n- No other attributes\n\n\n::: {.cell}\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"double\"\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> $class\n#> [1] \"Date\"\n```\n\n\n:::\n:::\n\n\nThe double component represents the number of days since since the [Unix epoch](https://en.wikipedia.org/wiki/Unix_time) `1970-01-01`\n\n\n::: {.cell}\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 31\n```\n\n\n:::\n:::\n\n\n### Date-times\n\nThere are 2 Date-time representations in base R:\n\n- POSIXct, where \"ct\" denotes *calendar time*\n- POSIXlt, where \"lt\" designates *local time*\n\n<!--\n\nJust for fun:\n\"How to pronounce 'POSIXct'?\"\nhttps://www.howtopronounce.com/posixct\n\n-->\n\nWe'll focus on POSIXct because:\n\n- Simplest\n- Built on an atomic (double) vector\n- Most appropriate for use in a data frame\n\nLet's now build and deconstruct a Date-time\n\n\n::: {.cell}\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"2025-08-04 15:39:49 EDT\"\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"double\"\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> $class\n#> [1] \"POSIXct\" \"POSIXt\" \n#> \n#> $tzone\n#> [1] \"America/New_York\"\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"2025-08-04 21:39:49 CEST\"\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1708623296\n#> attr(,\"tzone\")\n#> [1] \"EST\"\n```\n\n\n:::\n:::\n\n\n\n### Durations\n\nDurations represent the amount of time between pairs of dates or date-times.\n\n- Double vectors\n- Class: \"difftime\"\n- Attributes: \"units\", or the unit of duration (e.g., weeks, hours, minutes, seconds, etc.)\n\n\n::: {.cell}\n::: {.cell-output .cell-output-stdout}\n\n```\n#> Time difference of 1 mins\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"double\"\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> $class\n#> [1] \"difftime\"\n#> \n#> $units\n#> [1] \"mins\"\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n::: {.cell-output .cell-output-stdout}\n\n```\n#> Time difference of 20273 days\n```\n\n\n:::\n:::\n\n\n\nSee also:\n\n- [`lubridate::make_difftime()`](https://lubridate.tidyverse.org/reference/make_difftime.html)\n- [`clock::date_time_build()`](https://clock.r-lib.org/reference/date_time_build.html)\n\n\n### Exercises\n\n1. What sort of object does `table()` return? What is its type? What attributes does it have? How does the dimensionality change as you tabulate more variables?\n\n<details><summary>Answer(s)</summary>\n\n`table()` returns a contingency table of its input variables. It is implemented as an integer vector with class table and dimensions (which makes it act like an array). Its attributes are dim (dimensions) and dimnames (one name for each input column). The dimensions correspond to the number of unique values (factor levels) in each input variable.\n\n\n::: {.cell}\n\n:::\n\n</details>\n\n2. What happens to a factor when you modify its levels?\n\n\n::: {.cell}\n\n:::\n\n\n<details><summary>Answer(s)</summary>\nThe underlying integer values stay the same, but the levels are changed, making it look like the data has changed.\n\n\n::: {.cell}\n\n:::\n\n</details>\n\n3. What does this code do? How do `f2` and `f3` differ from `f1`?\n\n\n::: {.cell}\n\n:::\n\n\n<details><summary>Answer(s)</summary>\nFor `f2` and `f3` either the order of the factor elements or its levels are being reversed. For `f1` both transformations are occurring.\n\n\n::: {.cell}\n\n:::\n\n</details>\n\n\n\n\n\n\n## Lists\n\n* sometimes called a generic vector or recursive vector\n* Recall ([section 2.3.3](https://adv-r.hadley.nz/names-values.html#list-references)): each element is really a *reference* to another object\n* an be composed of elements of different types (as opposed to atomic vectors which must be of only one type)\n\n### Constructing\n\nSimple lists:\n\n\n::: {.cell}\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [[1]]\n#> [1] TRUE FALSE\n#> \n#> [[2]]\n#> [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20\n#> \n#> [[3]]\n#> [1] 1.2 2.3 3.4\n#> \n#> [[4]]\n#> [1] \"primo\" \"secundo\" \"tercio\"\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"list\"\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> List of 4\n#> $ : logi [1:2] TRUE FALSE\n#> $ : int [1:20] 1 2 3 4 5 6 7 8 9 10 ...\n#> $ : num [1:3] 1.2 2.3 3.4\n#> $ : chr [1:3] \"primo\" \"secundo\" \"tercio\"\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [[1]]\n#> [1] TRUE FALSE\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [[1]]\n#> [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [[1]]\n#> [1] 1.2 2.3 3.4\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [[1]]\n#> [1] \"primo\" \"secundo\" \"tercio\"\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] FALSE\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 8\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 2.3\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"tercio\"\n```\n\n\n:::\n:::\n\n\nEven Simpler List\n\n\n::: {.cell}\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [[1]]\n#> [1] TRUE\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [[1]]\n#> [1] 3\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [[1]]\n#> [1] 2.3\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [[1]]\n#> [1] \"primo\"\n```\n\n\n:::\n:::\n\n\nNested lists:\n\n\n::: {.cell}\n::: {.cell-output .cell-output-stdout}\n\n```\n#> List of 1\n#> $ :List of 1\n#> ..$ :List of 1\n#> .. ..$ :List of 1\n#> .. .. ..$ : num 1\n```\n\n\n:::\n:::\n\n\nLike JSON.\n\nCombined lists\n\n\n::: {.cell}\n::: {.cell-output .cell-output-stdout}\n\n```\n#> List of 2\n#> $ :List of 2\n#> ..$ : num 1\n#> ..$ : num 2\n#> $ :List of 2\n#> ..$ : num 3\n#> ..$ : num 4\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> List of 4\n#> $ : num 1\n#> $ : num 2\n#> $ : num 3\n#> $ : num 4\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> List of 4\n#> $ : num 1\n#> $ : num 2\n#> $ : logi TRUE\n#> $ : logi FALSE\n```\n\n\n:::\n:::\n\n\n### Testing\n\nCheck that is a list:\n\n- `is.list()`\n- \\`rlang::is_list()\\`\\`\n\nThe two do the same, except that the latter can check for the number of elements\n\n\n::: {.cell}\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] TRUE\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] TRUE\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] TRUE\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] TRUE\n```\n\n\n:::\n:::\n\n\n### Coercion\n\nUse `as.list()`\n\n\n::: {.cell}\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [[1]]\n#> [1] 1 2 3\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [[1]]\n#> [1] 1\n#> \n#> [[2]]\n#> [1] 2\n#> \n#> [[3]]\n#> [1] 3\n```\n\n\n:::\n:::\n\n\n### Matrices and arrays\n\nAlthough not often used, the dimension attribute can be added to create **list-matrices** or **list-arrays**.\n\n\n::: {.cell}\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [,1] [,2]\n#> [1,] integer,3 TRUE\n#> [2,] \"a\" 1\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1 2 3\n```\n\n\n:::\n:::\n\n\n\n### Exercises\n\n1. List all the ways that a list differs from an atomic vector.\n\n<details><summary>Answer(s)</summary>\n\n* Atomic vectors are always homogeneous (all elements must be of the same type). Lists may be heterogeneous (the elements can be of different types) as described in the introduction of the vectors chapter.\n* Atomic vectors point to one address in memory, while lists contain a separate reference for each element. (This was described in the list sections of the vectors and the names and values chapters.)\n\n\n::: {.cell}\n\n:::\n\n\n\n* Subsetting with out-of-bounds and NA values leads to different output. For example, [ returns NA for atomics and NULL for lists. (This is described in more detail within the subsetting chapter.)\n\n\n::: {.cell}\n\n:::\n\n\n\n</details>\n\n2. Why do you need to use `unlist()` to convert a list to an atomic vector? Why doesn’t `as.vector()` work?\n\n<details><summary>Answer(s)</summary>\nA list is already a vector, though not an atomic one! Note that as.vector() and is.vector() use different definitions of “vector!”\n\n\n::: {.cell}\n\n:::\n\n\n</details>\n\n3. Compare and contrast `c()` and `unlist()` when combining a date and date-time into a single vector.\n\n<details><summary>Answer(s)</summary>\nDate and date-time objects are both built upon doubles. While dates store the number of days since the reference date 1970-01-01 (also known as “the Epoch”) in days, date-time-objects (POSIXct) store the time difference to this date in seconds.\n\n\n::: {.cell}\n\n:::\n\n\nAs the c() generic only dispatches on its first argument, combining date and date-time objects via c() could lead to surprising results in older R versions (pre R 4.0.0):\n\n\n::: {.cell}\n\n:::\n\n\nIn the first statement above c.Date() is executed, which incorrectly treats the underlying double of dttm_ct (3600) as days instead of seconds. Conversely, when c.POSIXct() is called on a date, one day is counted as one second only.\n\nWe can highlight these mechanics by the following code:\n\n\n::: {.cell}\n\n:::\n\n\nAs of R 4.0.0 these issues have been resolved and both methods now convert their input first into POSIXct and Date, respectively.\n\n\n::: {.cell}\n\n:::\n\n\nHowever, as c() strips the time zone (and other attributes) of POSIXct objects, some caution is still recommended.\n\n\n::: {.cell}\n\n:::\n\n\nA package that deals with these kinds of problems in more depth and provides a structural solution for them is the {vctrs} package9 which is also used throughout the tidyverse.10\n\nLet’s look at unlist(), which operates on list input.\n\n\n::: {.cell}\n\n:::\n\n\nWe see again that dates and date-times are internally stored as doubles. Unfortunately, this is all we are left with, when unlist strips the attributes of the list.\n\nTo summarise: c() coerces types and strips time zones. Errors may have occurred in older R versions because of inappropriate method dispatch/immature methods. unlist() strips attributes.\n</details>\n\n\n\n## Data frames and tibbles\n\n \n\nCredit: [Advanced R](https://adv-r.hadley.nz/index.html) by Hadley Wickham\n\n### Data frame\n\nA data frame is a:\n\n- Named list of vectors (i.e., column names)\n- Attributes:\n - (column) `names`\n - `row.names`\n - Class: \"data frame\"\n\n\n::: {.cell}\n::: {.cell-output .cell-output-stdout}\n\n```\n#> col1 col2\n#> 1 1 un\n#> 2 2 deux\n#> 3 3 trois\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"list\"\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> $names\n#> [1] \"col1\" \"col2\"\n#> \n#> $class\n#> [1] \"data.frame\"\n#> \n#> $row.names\n#> [1] 1 2 3\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"1\" \"2\" \"3\"\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"col1\" \"col2\"\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"col1\" \"col2\"\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 3\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 2\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 2\n```\n\n\n:::\n:::\n\n\nUnlike other lists, the length of each vector must be the same (i.e. as many vector elements as rows in the data frame).\n\n### Tibble\n\nCreated to relieve some of the frustrations and pain points created by data frames, tibbles are data frames that are:\n\n- Lazy (do less)\n- Surly (complain more)\n\n#### Lazy\n\nTibbles do not:\n\n- Coerce strings\n- Transform non-syntactic names\n- Recycle vectors of length greater than 1\n\n**Coerce strings**\n\n\n::: {.cell}\n::: {.cell-output .cell-output-stdout}\n\n```\n#> Factor w/ 4 levels \"bro\",\"don't\",..: 2 3 4 1\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> chr [1:4] \"don't\" \"factor\" \"me\" \"bro\"\n```\n\n\n:::\n:::\n\n\n**Transform non-syntactic names**\n\n\n::: {.cell}\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"X1\"\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"1\"\n```\n\n\n:::\n:::\n\n\n**Recycle vectors of length greater than 1**\n\n\n::: {.cell}\n::: {.cell-output .cell-output-error}\n\n```\n#> Error in `tibble::tibble()`:\n#> ! Tibble columns must have compatible sizes.\n#> • Size 4: Existing data.\n#> • Size 2: Column `col2`.\n#> ℹ Only values of size one are recycled.\n```\n\n\n:::\n:::\n\n\n#### Surly\n\nTibbles do only what they're asked and complain if what they're asked doesn't make sense:\n\n- Subsetting always yields a tibble\n- Complains if cannot find column\n\n**Subsetting always yields a tibble**\n\n\n::: {.cell}\n::: {.cell-output .cell-output-stdout}\n\n```\n#> num [1:4] 1 2 3 4\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> tibble [4 × 1] (S3: tbl_df/tbl/data.frame)\n#> $ col1: num [1:4] 1 2 3 4\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> num [1:4] 1 2 3 4\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> num [1:4] 1 2 3 4\n```\n\n\n:::\n:::\n\n\n**Complains if cannot find column**\n\n\n::: {.cell}\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"col1\"\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1 2 3 4\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"col1\"\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\n#> Warning: Unknown or uninitialised column: `col`.\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> NULL\n```\n\n\n:::\n:::\n\n\n#### One more difference\n\n**`tibble()` allows you to refer to variables created during construction**\n\n\n::: {.cell}\n::: {.cell-output .cell-output-stdout}\n\n```\n#> # A tibble: 3 × 2\n#> x y\n#> <int> <dbl>\n#> 1 1 2\n#> 2 2 4\n#> 3 3 6\n```\n\n\n:::\n:::\n\n\n<details>\n<summary>Side Quest: Row Names</summary>\n\n- character vector containing only unique values\n- get and set with `rownames()`\n- can use them to subset rows\n\n\n::: {.cell}\n::: {.cell-output .cell-output-stdout}\n\n```\n#> age hair\n#> Bob 35 blond\n#> Susan 27 brown\n#> Sam 18 black\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"Bob\" \"Susan\" \"Sam\"\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> age hair\n#> Bob 35 blond\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"Susan\" \"Bob\" \"Sam\"\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> age hair\n#> Bob 27 brown\n```\n\n\n:::\n:::\n\n\nThere are three reasons why row names are undesirable:\n\n3. Metadata is data, so storing it in a different way to the rest of the data is fundamentally a bad idea. \n2. Row names are a poor abstraction for labelling rows because they only work when a row can be identified by a single string. This fails in many cases.\n3. Row names must be unique, so any duplication of rows (e.g. from bootstrapping) will create new row names.\n\n</details>\n\n\n### Printing\n\nData frames and tibbles print differently\n\n\n::: {.cell}\n::: {.cell-output .cell-output-stdout}\n\n```\n#> age hair\n#> Susan 35 blond\n#> Bob 27 brown\n#> Sam 18 black\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> # A tibble: 3 × 2\n#> age hair \n#> <dbl> <chr>\n#> 1 35 blond\n#> 2 27 brown\n#> 3 18 black\n```\n\n\n:::\n:::\n\n\n\n### Subsetting\n\nTwo undesirable subsetting behaviours:\n\n1. When you subset columns with `df[, vars]`, you will get a vector if vars selects one variable, otherwise you’ll get a data frame, unless you always remember to use `df[, vars, drop = FALSE]`.\n2. When you attempt to extract a single column with `df$x` and there is no column `x`, a data frame will instead select any variable that starts with `x`. If no variable starts with `x`, `df$x` will return NULL.\n\nTibbles tweak these behaviours so that a [ always returns a tibble, and a $ doesn’t do partial matching and warns if it can’t find a variable (*this is what makes tibbles surly*).\n\n### Testing\n\nWhether data frame: `is.data.frame()`. Note: both data frame and tibble are data frames.\n\nWhether tibble: `tibble::is_tibble`. Note: only tibbles are tibbles. Vanilla data frames are not.\n\n### Coercion\n\n- To data frame: `as.data.frame()`\n- To tibble: `tibble::as_tibble()`\n\n### List Columns\n\nList-columns are allowed in data frames but you have to do a little extra work by either adding the list-column after creation or wrapping the list in `I()`\n\n\n::: {.cell}\n::: {.cell-output .cell-output-stdout}\n\n```\n#> x y\n#> 1 1 1, 2\n#> 2 2 1, 2, 3\n#> 3 3 1, 2, 3, 4\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> x y\n#> 1 1 1, 2\n#> 2 2 1, 2, 3\n#> 3 3 1, 2, 3, 4\n```\n\n\n:::\n:::\n\n\n### Matrix and data frame columns\n\n- As long as the number of rows matches the data frame, it’s also possible to have a matrix or data frame as a column of a data frame.\n- same as list-columns, must either addi the list-column after creation or wrapping the list in `I()`\n\n\n::: {.cell}\n::: {.cell-output .cell-output-stdout}\n\n```\n#> 'data.frame':\t3 obs. of 3 variables:\n#> $ x: num 10 20 30\n#> $ y: 'AsIs' int [1:3, 1:3] 1 2 3 4 5 6 7 8 9\n#> $ z:'data.frame':\t3 obs. of 2 variables:\n#> ..$ a: int 3 2 1\n#> ..$ b: chr \"a\" \"b\" \"c\"\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [,1] [,2] [,3]\n#> [1,] 1 4 7\n#> [2,] 2 5 8\n#> [3,] 3 6 9\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> a b\n#> 1 3 a\n#> 2 2 b\n#> 3 1 c\n```\n\n\n:::\n:::\n\n\n\n### Exercises\n\n1. Can you have a data frame with zero rows? What about zero columns?\n\n<details><summary>Answer(s)</summary>\nYes, you can create these data frames easily; either during creation or via subsetting. Even both dimensions can be zero. Create a 0-row, 0-column, or an empty data frame directly:\n\n\n::: {.cell}\n\n:::\n\n\nCreate similar data frames via subsetting the respective dimension with either 0, `NULL`, `FALSE` or a valid 0-length atomic (`logical(0)`, `character(0)`, `integer(0)`, `double(0)`). Negative integer sequences would also work. The following example uses a zero:\n\n\n::: {.cell}\n\n:::\n\n\n\n</details>\n\n2. What happens if you attempt to set rownames that are not unique?\n\n<details><summary>Answer(s)</summary>\nMatrices can have duplicated row names, so this does not cause problems.\n\nData frames, however, require unique rownames and you get different results depending on how you attempt to set them. If you set them directly or via `row.names()`, you get an error:\n\n\n::: {.cell}\n\n:::\n\n\nIf you use subsetting, `[` automatically deduplicates:\n\n\n::: {.cell}\n\n:::\n\n\n</details>\n\n3. If `df` is a data frame, what can you say about `t(df)`, and `t(t(df))`? Perform some experiments, making sure to try different column types.\n\n<details><summary>Answer(s)</summary>\nBoth of `t(df)` and `t(t(df))` will return matrices:\n\n\n::: {.cell}\n\n:::\n\n\nThe dimensions will respect the typical transposition rules:\n\n\n::: {.cell}\n\n:::\n\n\nBecause the output is a matrix, every column is coerced to the same type. (It is implemented within `t.data.frame()` via `as.matrix()` which is described below).\n\n\n::: {.cell}\n\n:::\n\n\n</details>\n\n4. What does `as.matrix()` do when applied to a data frame with columns of different types? How does it differ from `data.matrix()`?\n\n<details><summary>Answer(s)</summary>\nThe type of the result of as.matrix depends on the types of the input columns (see `?as.matrix`):\n\n> The method for data frames will return a character matrix if there is only atomic columns and any non-(numeric/logical/complex) column, applying as.vector to factors and format to other non-character columns. Otherwise the usual coercion hierarchy (logical < integer < double < complex) will be used, e.g. all-logical data frames will be coerced to a logical matrix, mixed logical-integer will give an integer matrix, etc.\n\nOn the other hand, `data.matrix` will always return a numeric matrix (see `?data.matrix()`).\n\n> Return the matrix obtained by converting all the variables in a data frame to numeric mode and then binding them together as the columns of a matrix. Factors and ordered factors are replaced by their internal codes. […] Character columns are first converted to factors and then to integers.\n\nWe can illustrate and compare the mechanics of these functions using a concrete example. `as.matrix()` makes it possible to retrieve most of the original information from the data frame but leaves us with characters. To retrieve all information from `data.matrix()`’s output, we would need a lookup table for each column.\n\n\n::: {.cell}\n\n:::\n\n\n</details>\n\n\n\n\n\n\n\n## `NULL`\n\nSpecial type of object that:\n\n- Length 0\n- Cannot have attributes\n\n\n::: {.cell}\n\n:::\n\n\n\n::: {.cell}\n::: {.cell-output .cell-output-error}\n\n```\n#> Error in attr(x, \"y\") <- 1: attempt to set an attribute on NULL\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] TRUE\n```\n\n\n:::\n:::\n\n\n\n## Digestif\n\nLet is use some of this chapter's skills on the `penguins` data.\n\n### Attributes\n\n\n::: {.cell}\n::: {.cell-output .cell-output-stdout}\n\n```\n#> tibble [344 × 17] (S3: tbl_df/tbl/data.frame)\n#> $ studyName : chr [1:344] \"PAL0708\" \"PAL0708\" \"PAL0708\" \"PAL0708\" ...\n#> $ Sample Number : num [1:344] 1 2 3 4 5 6 7 8 9 10 ...\n#> $ Species : chr [1:344] \"Adelie Penguin (Pygoscelis adeliae)\" \"Adelie Penguin (Pygoscelis adeliae)\" \"Adelie Penguin (Pygoscelis adeliae)\" \"Adelie Penguin (Pygoscelis adeliae)\" ...\n#> $ Region : chr [1:344] \"Anvers\" \"Anvers\" \"Anvers\" \"Anvers\" ...\n#> $ Island : chr [1:344] \"Torgersen\" \"Torgersen\" \"Torgersen\" \"Torgersen\" ...\n#> $ Stage : chr [1:344] \"Adult, 1 Egg Stage\" \"Adult, 1 Egg Stage\" \"Adult, 1 Egg Stage\" \"Adult, 1 Egg Stage\" ...\n#> $ Individual ID : chr [1:344] \"N1A1\" \"N1A2\" \"N2A1\" \"N2A2\" ...\n#> $ Clutch Completion : chr [1:344] \"Yes\" \"Yes\" \"Yes\" \"Yes\" ...\n#> $ Date Egg : Date[1:344], format: \"2007-11-11\" \"2007-11-11\" ...\n#> $ Culmen Length (mm) : num [1:344] 39.1 39.5 40.3 NA 36.7 39.3 38.9 39.2 34.1 42 ...\n#> $ Culmen Depth (mm) : num [1:344] 18.7 17.4 18 NA 19.3 20.6 17.8 19.6 18.1 20.2 ...\n#> $ Flipper Length (mm): num [1:344] 181 186 195 NA 193 190 181 195 193 190 ...\n#> $ Body Mass (g) : num [1:344] 3750 3800 3250 NA 3450 ...\n#> $ Sex : chr [1:344] \"MALE\" \"FEMALE\" \"FEMALE\" NA ...\n#> $ Delta 15 N (o/oo) : num [1:344] NA 8.95 8.37 NA 8.77 ...\n#> $ Delta 13 C (o/oo) : num [1:344] NA -24.7 -25.3 NA -25.3 ...\n#> $ Comments : chr [1:344] \"Not enough blood for isotopes.\" NA NA \"Adult not sampled.\" ...\n#> - attr(*, \"spec\")=List of 3\n#> ..$ cols :List of 17\n#> .. ..$ studyName : list()\n#> .. .. ..- attr(*, \"class\")= chr [1:2] \"collector_character\" \"collector\"\n#> .. ..$ Sample Number : list()\n#> .. .. ..- attr(*, \"class\")= chr [1:2] \"collector_double\" \"collector\"\n#> .. ..$ Species : list()\n#> .. .. ..- attr(*, \"class\")= chr [1:2] \"collector_character\" \"collector\"\n#> .. ..$ Region : list()\n#> .. .. ..- attr(*, \"class\")= chr [1:2] \"collector_character\" \"collector\"\n#> .. ..$ Island : list()\n#> .. .. ..- attr(*, \"class\")= chr [1:2] \"collector_character\" \"collector\"\n#> .. ..$ Stage : list()\n#> .. .. ..- attr(*, \"class\")= chr [1:2] \"collector_character\" \"collector\"\n#> .. ..$ Individual ID : list()\n#> .. .. ..- attr(*, \"class\")= chr [1:2] \"collector_character\" \"collector\"\n#> .. ..$ Clutch Completion : list()\n#> .. .. ..- attr(*, \"class\")= chr [1:2] \"collector_character\" \"collector\"\n#> .. ..$ Date Egg :List of 1\n#> .. .. ..$ format: chr \"\"\n#> .. .. ..- attr(*, \"class\")= chr [1:2] \"collector_date\" \"collector\"\n#> .. ..$ Culmen Length (mm) : list()\n#> .. .. ..- attr(*, \"class\")= chr [1:2] \"collector_double\" \"collector\"\n#> .. ..$ Culmen Depth (mm) : list()\n#> .. .. ..- attr(*, \"class\")= chr [1:2] \"collector_double\" \"collector\"\n#> .. ..$ Flipper Length (mm): list()\n#> .. .. ..- attr(*, \"class\")= chr [1:2] \"collector_double\" \"collector\"\n#> .. ..$ Body Mass (g) : list()\n#> .. .. ..- attr(*, \"class\")= chr [1:2] \"collector_double\" \"collector\"\n#> .. ..$ Sex : list()\n#> .. .. ..- attr(*, \"class\")= chr [1:2] \"collector_character\" \"collector\"\n#> .. ..$ Delta 15 N (o/oo) : list()\n#> .. .. ..- attr(*, \"class\")= chr [1:2] \"collector_double\" \"collector\"\n#> .. ..$ Delta 13 C (o/oo) : list()\n#> .. .. ..- attr(*, \"class\")= chr [1:2] \"collector_double\" \"collector\"\n#> .. ..$ Comments : list()\n#> .. .. ..- attr(*, \"class\")= chr [1:2] \"collector_character\" \"collector\"\n#> ..$ default: list()\n#> .. ..- attr(*, \"class\")= chr [1:2] \"collector_guess\" \"collector\"\n#> ..$ skip : num 1\n#> ..- attr(*, \"class\")= chr \"col_spec\"\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n::: {.cell-output .cell-output-stdout}\n\n```\n#> tibble [344 × 17] (S3: tbl_df/tbl/data.frame)\n#> $ studyName : chr [1:344] \"PAL0708\" \"PAL0708\" \"PAL0708\" \"PAL0708\" ...\n#> $ Sample Number : num [1:344] 1 2 3 4 5 6 7 8 9 10 ...\n#> $ Species : chr [1:344] \"Adelie Penguin (Pygoscelis adeliae)\" \"Adelie Penguin (Pygoscelis adeliae)\" \"Adelie Penguin (Pygoscelis adeliae)\" \"Adelie Penguin (Pygoscelis adeliae)\" ...\n#> $ Region : chr [1:344] \"Anvers\" \"Anvers\" \"Anvers\" \"Anvers\" ...\n#> $ Island : chr [1:344] \"Torgersen\" \"Torgersen\" \"Torgersen\" \"Torgersen\" ...\n#> $ Stage : chr [1:344] \"Adult, 1 Egg Stage\" \"Adult, 1 Egg Stage\" \"Adult, 1 Egg Stage\" \"Adult, 1 Egg Stage\" ...\n#> $ Individual ID : chr [1:344] \"N1A1\" \"N1A2\" \"N2A1\" \"N2A2\" ...\n#> $ Clutch Completion : chr [1:344] \"Yes\" \"Yes\" \"Yes\" \"Yes\" ...\n#> $ Date Egg : Date[1:344], format: \"2007-11-11\" \"2007-11-11\" ...\n#> $ Culmen Length (mm) : num [1:344] 39.1 39.5 40.3 NA 36.7 39.3 38.9 39.2 34.1 42 ...\n#> $ Culmen Depth (mm) : num [1:344] 18.7 17.4 18 NA 19.3 20.6 17.8 19.6 18.1 20.2 ...\n#> $ Flipper Length (mm): num [1:344] 181 186 195 NA 193 190 181 195 193 190 ...\n#> $ Body Mass (g) : num [1:344] 3750 3800 3250 NA 3450 ...\n#> $ Sex : chr [1:344] \"MALE\" \"FEMALE\" \"FEMALE\" NA ...\n#> $ Delta 15 N (o/oo) : num [1:344] NA 8.95 8.37 NA 8.77 ...\n#> $ Delta 13 C (o/oo) : num [1:344] NA -24.7 -25.3 NA -25.3 ...\n#> $ Comments : chr [1:344] \"Not enough blood for isotopes.\" NA NA \"Adult not sampled.\" ...\n```\n\n\n:::\n:::\n\n\n### Data Frames vs Tibbles\n\n\n::: {.cell}\n\n:::\n\n\n#### Printing\n\n* Tip: print out these results in RStudio under different editor themes\n\n\n::: {.cell}\n\n:::\n\n\n\n::: {.cell}\n::: {.cell-output .cell-output-stdout}\n\n```\n#> species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g\n#> 1 Adelie Torgersen 39.1 18.7 181 3750\n#> 2 Adelie Torgersen 39.5 17.4 186 3800\n#> 3 Adelie Torgersen 40.3 18.0 195 3250\n#> 4 Adelie Torgersen NA NA NA NA\n#> 5 Adelie Torgersen 36.7 19.3 193 3450\n#> 6 Adelie Torgersen 39.3 20.6 190 3650\n#> sex year\n#> 1 male 2007\n#> 2 female 2007\n#> 3 female 2007\n#> 4 <NA> 2007\n#> 5 female 2007\n#> 6 male 2007\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n::: {.cell-output .cell-output-stdout}\n\n```\n#> # A tibble: 344 × 8\n#> species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g\n#> <fct> <fct> <dbl> <dbl> <int> <int>\n#> 1 Adelie Torgersen 39.1 18.7 181 3750\n#> 2 Adelie Torgersen 39.5 17.4 186 3800\n#> 3 Adelie Torgersen 40.3 18 195 3250\n#> 4 Adelie Torgersen NA NA NA NA\n#> 5 Adelie Torgersen 36.7 19.3 193 3450\n#> 6 Adelie Torgersen 39.3 20.6 190 3650\n#> 7 Adelie Torgersen 38.9 17.8 181 3625\n#> 8 Adelie Torgersen 39.2 19.6 195 4675\n#> 9 Adelie Torgersen 34.1 18.1 193 3475\n#> 10 Adelie Torgersen 42 20.2 190 4250\n#> # ℹ 334 more rows\n#> # ℹ 2 more variables: sex <fct>, year <int>\n```\n\n\n:::\n:::\n\n\n### Atomic Vectors\n\n\n::: {.cell}\n\n:::\n\n\n<details>\n<summary>`typeof()` and `class()`</summary>\n\n::: {.cell}\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"list\"\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"data.frame\"\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"integer\"\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"factor\"\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"integer\"\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"factor\"\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"list\"\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"tbl_df\" \"tbl\" \"data.frame\"\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"integer\"\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"factor\"\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"integer\"\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"factor\"\n```\n\n\n:::\n:::\n\n</details>\n\n### Column Names\n\n\n::: {.cell}\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"species\" \"island\" \"bill_length_mm\" \n#> [4] \"bill_depth_mm\" \"flipper_length_mm\" \"body_mass_g\" \n#> [7] \"sex\" \"year\"\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE\n```\n\n\n:::\n:::\n\n\n* What if we only invoke a partial name of a column of a tibble?\n\n\n::: {.cell}\n::: {.cell-output .cell-output-stdout}\n\n```\n#> NULL\n```\n\n\n:::\n:::\n\n\n\n\n* What if we only invoke a partial name of a column of a data frame?\n\n\n::: {.cell}\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 2007 2007 2007 2007 2007 2007\n```\n\n\n:::\n:::\n\n\n* Is this evaluation in alphabetical order or column order?\n\n\n::: {.cell}\n\n:::\n\n\n\n::: {.cell}\n::: {.cell-output .cell-output-stdout}\n\n```\n#> NULL\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n::: {.cell-output .cell-output-stdout}\n\n```\n#> NULL\n```\n\n\n:::\n:::\n\n\n\n## Chapter Quiz\n\n1. What are the four common types of atomic vectors? What are the two rare types?\n\n<details><summary>Answer(s)</summary>\nThe four common types of atomic vector are logical, integer, double and character. The two rarer types are complex and raw.\n</details>\n\n2. What are attributes? How do you get them and set them?\n\n<details><summary>Answer(s)</summary>\nAttributes allow you to associate arbitrary additional metadata to any object. You can get and set individual attributes with `attr(x, \"y\")` and `attr(x, \"y\") <- value`; or you can get and set all attributes at once with `attributes()`.\n</details>\n\n3. How is a list different from an atomic vector? How is a matrix different from a data frame?\n\n<details><summary>Answer(s)</summary>\nThe elements of a list can be any type (even a list); the elements of an atomic vector are all of the same type. Similarly, every element of a matrix must be the same type; in a data frame, different columns can have different types.\n</details>\n\n4. Can you have a list that is a matrix? Can a data frame have a column that is a matrix?\n\n<details><summary>Answer(s)</summary>\nYou can make a list-array by assigning dimensions to a list. You can make a matrix a column of a data frame with `df$x <- matrix()`, or by using `I()` when creating a new data frame `data.frame(x = I(matrix()))`.\n</details>\n\n5. How do tibbles behave differently from data frames?\n\n<details><summary>Answer(s)</summary>\nTibbles have an enhanced print method, never coerce strings to factors, and provide stricter subsetting methods.\n</details>\n", "supporting": [], "filters": [ "rmarkdown/pagebreak.lua" ], - "includes": {}, + "includes": { + "include-after-body": [ + "\n<script>\n // htmlwidgets need to know to resize themselves when slides are shown/hidden.\n // Fire the \"slideenter\" event (handled by htmlwidgets.js) when the current\n // slide changes (different for each slide format).\n (function () {\n // dispatch for htmlwidgets\n function fireSlideEnter() {\n const event = window.document.createEvent(\"Event\");\n event.initEvent(\"slideenter\", true, true);\n window.document.dispatchEvent(event);\n }\n\n function fireSlideChanged(previousSlide, currentSlide) {\n fireSlideEnter();\n\n // dispatch for shiny\n if (window.jQuery) {\n if (previousSlide) {\n window.jQuery(previousSlide).trigger(\"hidden\");\n }\n if (currentSlide) {\n window.jQuery(currentSlide).trigger(\"shown\");\n }\n }\n }\n\n // hookup for slidy\n if (window.w3c_slidy) {\n window.w3c_slidy.add_observer(function (slide_num) {\n // slide_num starts at position 1\n fireSlideChanged(null, w3c_slidy.slides[slide_num - 1]);\n });\n }\n\n })();\n</script>\n\n" + ] + }, "engineDependencies": {}, "preserve": {}, "postProcess": true diff --git a/_freeze/slides/04/execute-results/html.json b/_freeze/slides/04/execute-results/html.json @@ -1,9 +1,11 @@ { - "hash": "d5f624e204ffbbbd3b10c1cdf0b87972", + "hash": "706ff898197dfba9ce51c9d83ac97658", "result": { "engine": "knitr", - "markdown": "---\nengine: knitr\ntitle: Subsetting\n---\n\n## Learning objectives:\n\n- Learn about the 6 ways to subset atomic vectors\n- Learn about the 3 subsetting operators: `[[`, `[`, and `$`\n- Learn how subsetting works with different vector types\n- Learn how subsetting can be combined with assignment\n\n## Selecting multiple elements\n\n### Atomic Vectors\n\n- 6 ways to subset atomic vectors\n\nLet's take a look with an example vector.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- c(1.1, 2.2, 3.3, 4.4)\n```\n:::\n\n\n**Positive integer indices**\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# return elements at specified positions which can be out of order\nx[c(4, 1)]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 4.4 1.1\n```\n\n\n:::\n\n```{.r .cell-code}\n# duplicate indices return duplicate values\nx[c(2, 2)]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 2.2 2.2\n```\n\n\n:::\n\n```{.r .cell-code}\n# real numbers truncate to integers\n# so this behaves as if it is x[c(3, 3)]\nx[c(3.2, 3.8)]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 3.3 3.3\n```\n\n\n:::\n:::\n\n\n**Negative integer indices**\n\n\n::: {.cell}\n\n```{.r .cell-code}\n### excludes elements at specified positions\nx[-c(1, 3)] # same as x[c(-1, -3)] or x[c(2, 4)]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 2.2 4.4\n```\n\n\n:::\n\n```{.r .cell-code}\n### mixing positive and negative is a no-no\nx[c(-1, 3)]\n```\n\n::: {.cell-output .cell-output-error}\n\n```\n#> Error in x[c(-1, 3)]: only 0's may be mixed with negative subscripts\n```\n\n\n:::\n:::\n\n\n**Logical Vectors**\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx[c(TRUE, TRUE, FALSE, TRUE)]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1.1 2.2 4.4\n```\n\n\n:::\n\n```{.r .cell-code}\nx[x < 3]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1.1 2.2\n```\n\n\n:::\n\n```{.r .cell-code}\ncond <- x > 2.5\nx[cond]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 3.3 4.4\n```\n\n\n:::\n:::\n\n\n- **Recyling rules** applies when the two vectors are of different lengths\n- the shorter of the two is recycled to the length of the longer\n- Easy to understand if x or y is 1, best to avoid other lengths\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx[c(F, T)] # equivalent to: x[c(FALSE, TRUE, FALSE, TRUE)]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 2.2 4.4\n```\n\n\n:::\n:::\n\n\n**Missing values (NA)**\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Missing values in index will also return NA in output\nx[c(NA, TRUE)]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] NA 2.2 NA 4.4\n```\n\n\n:::\n:::\n\n\n**Nothing**\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# returns the original vector\nx[]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1.1 2.2 3.3 4.4\n```\n\n\n:::\n:::\n\n\n**Zero**\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# returns a zero-length vector\nx[0]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> numeric(0)\n```\n\n\n:::\n:::\n\n\n**Character vectors**\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# if name, you can use to return matched elements\n(y <- setNames(x, letters[1:4]))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> a b c d \n#> 1.1 2.2 3.3 4.4\n```\n\n\n:::\n\n```{.r .cell-code}\ny[c(\"d\", \"b\", \"a\")]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> d b a \n#> 4.4 2.2 1.1\n```\n\n\n:::\n\n```{.r .cell-code}\n# Like integer indices, you can repeat indices\ny[c(\"a\", \"a\", \"a\")]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> a a a \n#> 1.1 1.1 1.1\n```\n\n\n:::\n\n```{.r .cell-code}\n# When subsetting with [, names are always matched exactly\nz <- c(abc = 1, def = 2)\nz\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> abc def \n#> 1 2\n```\n\n\n:::\n\n```{.r .cell-code}\nz[c(\"a\", \"d\")]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> <NA> <NA> \n#> NA NA\n```\n\n\n:::\n:::\n\n\n### Lists\n\n- Subsetting works the same way\n- `[` always returns a list\n- `[[` and `$` let you pull elements out of a list\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmy_list <- list(a = c(T, F), b = letters[5:15], c = 100:108)\nmy_list\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> $a\n#> [1] TRUE FALSE\n#> \n#> $b\n#> [1] \"e\" \"f\" \"g\" \"h\" \"i\" \"j\" \"k\" \"l\" \"m\" \"n\" \"o\"\n#> \n#> $c\n#> [1] 100 101 102 103 104 105 106 107 108\n```\n\n\n:::\n:::\n\n\n**Return a (named) list**\n\n\n::: {.cell}\n\n```{.r .cell-code}\nl1 <- my_list[2]\nl1\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> $b\n#> [1] \"e\" \"f\" \"g\" \"h\" \"i\" \"j\" \"k\" \"l\" \"m\" \"n\" \"o\"\n```\n\n\n:::\n:::\n\n\n**Return a vector**\n\n\n::: {.cell}\n\n```{.r .cell-code}\nl2 <- my_list[[2]]\nl2\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"e\" \"f\" \"g\" \"h\" \"i\" \"j\" \"k\" \"l\" \"m\" \"n\" \"o\"\n```\n\n\n:::\n\n```{.r .cell-code}\nl2b <- my_list$b\nl2b\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"e\" \"f\" \"g\" \"h\" \"i\" \"j\" \"k\" \"l\" \"m\" \"n\" \"o\"\n```\n\n\n:::\n:::\n\n\n**Return a specific element**\n\n\n::: {.cell}\n\n```{.r .cell-code}\nl3 <- my_list[[2]][3]\nl3\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"g\"\n```\n\n\n:::\n\n```{.r .cell-code}\nl4 <- my_list[['b']][3]\nl4\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"g\"\n```\n\n\n:::\n\n```{.r .cell-code}\nl4b <- my_list$b[3]\nl4b\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"g\"\n```\n\n\n:::\n:::\n\n\n**Visual Representation**\n\n \n\nSee this stackoverflow article for more detailed information about the differences: https://stackoverflow.com/questions/1169456/the-difference-between-bracket-and-double-bracket-for-accessing-the-el\n\n### Matrices and arrays\n\nYou can subset higher dimensional structures in three ways:\n\n- with multiple vectors\n- with a single vector\n- with a matrix\n\n\n::: {.cell}\n\n```{.r .cell-code}\na <- matrix(1:12, nrow = 3)\ncolnames(a) <- c(\"A\", \"B\", \"C\", \"D\")\n\n# single row\na[1, ]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> A B C D \n#> 1 4 7 10\n```\n\n\n:::\n\n```{.r .cell-code}\n# single column\na[, 1]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1 2 3\n```\n\n\n:::\n\n```{.r .cell-code}\n# single element\na[1, 1]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> A \n#> 1\n```\n\n\n:::\n\n```{.r .cell-code}\n# two rows from two columns\na[1:2, 3:4]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> C D\n#> [1,] 7 10\n#> [2,] 8 11\n```\n\n\n:::\n\n```{.r .cell-code}\na[c(TRUE, FALSE, TRUE), c(\"B\", \"A\")]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> B A\n#> [1,] 4 1\n#> [2,] 6 3\n```\n\n\n:::\n\n```{.r .cell-code}\n# zero index and negative index\na[0, -2]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> A C D\n```\n\n\n:::\n:::\n\n\n**Subset a matrix with a matrix**\n\n\n::: {.cell}\n\n```{.r .cell-code}\nb <- matrix(1:4, nrow = 2)\nb\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [,1] [,2]\n#> [1,] 1 3\n#> [2,] 2 4\n```\n\n\n:::\n\n```{.r .cell-code}\na[b]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 7 11\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nvals <- outer(1:5, 1:5, FUN = \"paste\", sep = \",\")\nvals\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [,1] [,2] [,3] [,4] [,5] \n#> [1,] \"1,1\" \"1,2\" \"1,3\" \"1,4\" \"1,5\"\n#> [2,] \"2,1\" \"2,2\" \"2,3\" \"2,4\" \"2,5\"\n#> [3,] \"3,1\" \"3,2\" \"3,3\" \"3,4\" \"3,5\"\n#> [4,] \"4,1\" \"4,2\" \"4,3\" \"4,4\" \"4,5\"\n#> [5,] \"5,1\" \"5,2\" \"5,3\" \"5,4\" \"5,5\"\n```\n\n\n:::\n\n```{.r .cell-code}\nselect <- matrix(ncol = 2, byrow = TRUE, \n c(1, 1,\n 3, 1,\n 2, 4))\nselect\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [,1] [,2]\n#> [1,] 1 1\n#> [2,] 3 1\n#> [3,] 2 4\n```\n\n\n:::\n\n```{.r .cell-code}\nvals[select]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"1,1\" \"3,1\" \"2,4\"\n```\n\n\n:::\n:::\n\n\nMatrices and arrays are just special vectors; can subset with a single vector\n(arrays in R stored column wise)\n\n\n::: {.cell}\n\n```{.r .cell-code}\nvals[c(3, 15, 16, 17)]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"3,1\" \"5,3\" \"1,4\" \"2,4\"\n```\n\n\n:::\n:::\n\n\n### Data frames and tibbles\n\nData frames act like both lists and matrices\n\n- When subsetting with a single index, they behave like lists and index the columns, so `df[1:2]` selects the first two columns.\n- When subsetting with two indices, they behave like matrices, so `df[1:3, ]` selects the first three rows (and all the columns).\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(palmerpenguins)\n```\n\n::: {.cell-output .cell-output-stderr}\n\n```\n#> \n#> Attaching package: 'palmerpenguins'\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\n#> The following objects are masked from 'package:datasets':\n#> \n#> penguins, penguins_raw\n```\n\n\n:::\n\n```{.r .cell-code}\npenguins <- penguins\n\n# single index selects first two columns\ntwo_cols <- penguins[2:3] # or penguins[c(2,3)]\nhead(two_cols)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> # A tibble: 6 × 2\n#> island bill_length_mm\n#> <fct> <dbl>\n#> 1 Torgersen 39.1\n#> 2 Torgersen 39.5\n#> 3 Torgersen 40.3\n#> 4 Torgersen NA \n#> 5 Torgersen 36.7\n#> 6 Torgersen 39.3\n```\n\n\n:::\n\n```{.r .cell-code}\n# equivalent to the above code\nsame_two_cols <- penguins[c(\"island\", \"bill_length_mm\")]\nhead(same_two_cols)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> # A tibble: 6 × 2\n#> island bill_length_mm\n#> <fct> <dbl>\n#> 1 Torgersen 39.1\n#> 2 Torgersen 39.5\n#> 3 Torgersen 40.3\n#> 4 Torgersen NA \n#> 5 Torgersen 36.7\n#> 6 Torgersen 39.3\n```\n\n\n:::\n\n```{.r .cell-code}\n# two indices separated by comma (first two rows of 3rd and 4th columns)\npenguins[1:2, 3:4]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> # A tibble: 2 × 2\n#> bill_length_mm bill_depth_mm\n#> <dbl> <dbl>\n#> 1 39.1 18.7\n#> 2 39.5 17.4\n```\n\n\n:::\n\n```{.r .cell-code}\n# Can't do this...\npenguins[[3:4]][c(1:4)]\n```\n\n::: {.cell-output .cell-output-error}\n\n```\n#> Error:\n#> ! The `j` argument of `[[.tbl_df()` can't be a vector of length 2 as of\n#> tibble 3.0.0.\n#> ℹ Recursive subsetting is deprecated for tibbles.\n```\n\n\n:::\n\n```{.r .cell-code}\n# ...but this works...\npenguins[[3]][c(1:4)]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 39.1 39.5 40.3 NA\n```\n\n\n:::\n\n```{.r .cell-code}\n# ...or this equivalent...\npenguins$bill_length_mm[1:4]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 39.1 39.5 40.3 NA\n```\n\n\n:::\n:::\n\n\nSubsetting a tibble with `[` always returns a tibble\n\n### Preserving dimensionality\n\n- Data frames and tibbles behave differently\n- tibble will default to preserve dimensionality, data frames do not\n- this can lead to unexpected behavior and code breaking in the future\n- Use `drop = FALSE` to preserve dimensionality when subsetting a data frame or use tibbles\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntb <- tibble::tibble(a = 1:2, b = 1:2)\n\n# returns tibble\nstr(tb[, \"a\"])\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> tibble [2 × 1] (S3: tbl_df/tbl/data.frame)\n#> $ a: int [1:2] 1 2\n```\n\n\n:::\n\n```{.r .cell-code}\ntb[, \"a\"] # equivalent to tb[, \"a\", drop = FALSE]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> # A tibble: 2 × 1\n#> a\n#> <int>\n#> 1 1\n#> 2 2\n```\n\n\n:::\n\n```{.r .cell-code}\n# returns integer vector\n# str(tb[, \"a\", drop = TRUE])\ntb[, \"a\", drop = TRUE]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1 2\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndf <- data.frame(a = 1:2, b = 1:2)\n\n# returns integer vector\n# str(df[, \"a\"])\ndf[, \"a\"]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1 2\n```\n\n\n:::\n\n```{.r .cell-code}\n# returns data frame with one column\n# str(df[, \"a\", drop = FALSE])\ndf[, \"a\", drop = FALSE]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> a\n#> 1 1\n#> 2 2\n```\n\n\n:::\n:::\n\n**Factors**\n\nFactor subsetting drop argument controls whether or not levels (rather than dimensions) are preserved.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nz <- factor(c(\"a\", \"b\", \"c\"))\nz[1]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] a\n#> Levels: a b c\n```\n\n\n:::\n\n```{.r .cell-code}\nz[1, drop = TRUE]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] a\n#> Levels: a\n```\n\n\n:::\n:::\n\n\n## Selecting a single element\n\n`[[` and `$` are used to extract single elements (note: a vector can be a single element)\n\n### `[[]]`\n\nBecause `[[]]` can return only a single item, you must use it with either a single positive integer or a single string. \n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- list(1:3, \"a\", 4:6)\nx[[1]]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1 2 3\n```\n\n\n:::\n:::\n\n\nHadley Wickham recommends using `[[]]` with atomic vectors whenever you want to extract a single value to reinforce the expectation that you are getting and setting individual values. \n\n### `$`\n\n- `x$y` is equivalent to `x[[\"y\"]]`\n\nthe `$` operator doesn't work with stored vals\n\n\n::: {.cell}\n\n```{.r .cell-code}\nvar <- \"cyl\"\n\n# Doesn't work - mtcars$var translated to mtcars[[\"var\"]]\nmtcars$var\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> NULL\n```\n\n\n:::\n\n```{.r .cell-code}\n# Instead use [[\nmtcars[[var]]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4\n```\n\n\n:::\n:::\n\n\n`$` allows partial matching, `[[]]` does not\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- list(abc = 1)\nx$a\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1\n```\n\n\n:::\n\n```{.r .cell-code}\nx[[\"a\"]]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> NULL\n```\n\n\n:::\n:::\n\n\nHadley advises to change Global settings:\n\n\n::: {.cell}\n\n```{.r .cell-code}\noptions(warnPartialMatchDollar = TRUE)\nx$a\n```\n\n::: {.cell-output .cell-output-stderr}\n\n```\n#> Warning in x$a: partial match of 'a' to 'abc'\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1\n```\n\n\n:::\n:::\n\n\ntibbles don't have this behavior\n\n\n::: {.cell}\n\n```{.r .cell-code}\npenguins$s\n```\n\n::: {.cell-output .cell-output-stderr}\n\n```\n#> Warning: Unknown or uninitialised column: `s`.\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> NULL\n```\n\n\n:::\n:::\n\n\n### missing and out of bound indices\n\n- Due to the inconsistency of how R handles such indices, `purrr::pluck()` and `purrr::chuck()` are recommended\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- list(\n a = list(1, 2, 3),\n b = list(3, 4, 5)\n)\npurrr::pluck(x, \"a\", 1)\n# [1] 1\npurrr::pluck(x, \"c\", 1)\n# NULL\npurrr::pluck(x, \"c\", 1, .default = NA)\n# [1] NA\n```\n:::\n\n\n### `@` and `slot()`\n- `@` is `$` for S4 objects (to be revisited in Chapter 15)\n\n- `slot()` is `[[ ]]` for S4 objects\n\n## Subsetting and Assignment\n\n- Subsetting can be combined with assignment to edit values\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- c(\"Tigers\", \"Royals\", \"White Sox\", \"Twins\", \"Indians\")\n\nx[5] <- \"Guardians\"\n\nx\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"Tigers\" \"Royals\" \"White Sox\" \"Twins\" \"Guardians\"\n```\n\n\n:::\n:::\n\n\n- length of the subset and assignment vector should be the same to avoid recycling\n\nYou can use NULL to remove a component\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- list(a = 1, b = 2)\nx[[\"b\"]] <- NULL\nstr(x)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> List of 1\n#> $ a: num 1\n```\n\n\n:::\n:::\n\n\nSubsetting with nothing can preserve structure of original object\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# mtcars[] <- lapply(mtcars, as.integer)\n# is.data.frame(mtcars)\n# [1] TRUE\n# mtcars <- lapply(mtcars, as.integer)\n#> is.data.frame(mtcars)\n# [1] FALSE\n```\n:::\n\n\n## Applications\n\nApplications copied from cohort 2 slide\n\n### Lookup tables (character subsetting)\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- c(\"m\", \"f\", \"u\", \"f\", \"f\", \"m\", \"m\")\nlookup <- c(m = \"Male\", f = \"Female\", u = NA)\nlookup[x]\n# m f u f f m m \n# \"Male\" \"Female\" NA \"Female\" \"Female\" \"Male\" \"Male\"\n```\n:::\n\n\n### Matching and merging by hand (integer subsetting)\n\n- The `match()` function allows merging a vector with a table\n\n\n::: {.cell}\n\n```{.r .cell-code}\ngrades <- c(\"D\", \"A\", \"C\", \"B\", \"F\")\ninfo <- data.frame(\n grade = c(\"A\", \"B\", \"C\", \"D\", \"F\"),\n desc = c(\"Excellent\", \"Very Good\", \"Average\", \"Fair\", \"Poor\"),\n fail = c(F, F, F, F, T)\n)\nid <- match(grades, info$grade)\nid\n# [1] 3 2 2 1 3\ninfo[id, ]\n# grade desc fail\n# 4 D Fair FALSE\n# 1 A Excellent FALSE\n# 3 C Average FALSE\n# 2 B Very Good FALSE\n# 5 F Poor TRUE\n```\n:::\n\n\n### Random samples and bootstrapping (integer subsetting)\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# mtcars[sample(nrow(mtcars), 3), ] # use replace = TRUE to replace\n# mpg cyl disp hp drat wt qsec vs am gear carb\n# Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2\n# Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4\n# Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4\n```\n:::\n\n\n### Ordering (integer subsetting)\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# mtcars[order(mtcars$mpg), ]\n# mpg cyl disp hp drat wt qsec vs am gear carb\n# Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4\n# Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4\n# Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4\n# Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4\n# Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4\n# Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8\n# ...\n```\n:::\n\n\n### Expanding aggregated counts (integer subsetting)\n\n- We can expand a count column by using `rep()`\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndf <- tibble::tibble(x = c(\"Amy\", \"Julie\", \"Brian\"), n = c(2, 1, 3))\ndf[rep(1:nrow(df), df$n), ]\n# A tibble: 6 x 2\n# x n\n# <chr> <dbl>\n# 1 Amy 2\n# 2 Amy 2\n# 3 Julie 1\n# 4 Brian 3\n# 5 Brian 3\n# 6 Brian 3\n```\n:::\n\n\n### Removing columns from data frames (character)\n\n- We can remove a column by subsetting, which does not change the object\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndf[, 1]\n# A tibble: 3 x 1\n# x \n# <chr>\n# 1 Amy \n# 2 Julie\n# 3 Brian\n```\n:::\n\n\n- We can also delete the column using `NULL`\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndf$n <- NULL\ndf\n# A tibble: 3 x 1\n# x \n# <chr>\n# 1 Amy \n# 2 Julie\n# 3 Brian\n```\n:::\n\n\n### Selecting rows based on a condition (logical subsetting)\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# mtcars[mtcars$gear == 5, ]\n# mpg cyl disp hp drat wt qsec vs am gear carb\n# Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.7 0 1 5 2\n# Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.9 1 1 5 2\n# Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.5 0 1 5 4\n# Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.5 0 1 5 6\n# Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.6 0 1 5 8\n```\n:::\n\n\n### Boolean algebra versus sets (logical and integer)\n\n- `which()` gives the indices of a Boolean vector\n\n\n::: {.cell}\n\n```{.r .cell-code}\n(x1 <- 1:10 %% 2 == 0) # 1-10 divisible by 2\n# [1] FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE\n(x2 <- which(x1))\n# [1] 2 4 6 8 10\n(y1 <- 1:10 %% 5 == 0) # 1-10 divisible by 5\n# [1] FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE TRUE\n(y2 <- which(y1))\n# [1] 5 10\nx1 & y1\n# [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE\n```\n:::\n\n\n## Meeting Videos\n\n### Cohort 1\n\n<iframe src=\"https://www.youtube.com/embed/eLMpCc0t1cg\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 2\n\n<iframe src=\"https://www.youtube.com/embed/Mhq-TX4eA64\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 3\n\n<iframe src=\"https://www.youtube.com/embed/CvvXkXiF3Ig\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 4\n\n<iframe src=\"https://www.youtube.com/embed/Hxghhpe9fYs\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 5\n\n<iframe src=\"https://www.youtube.com/embed/qtUgKhw39Yo\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 6\n\n<iframe src=\"https://www.youtube.com/embed/-WjBA6yqW0Q\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n<details>\n<summary> Meeting chat log </summary>\n\n```\n00:36:02\tArthur Shaw:\tTIL that the subset operator has parameters. Thanks, Trevin!\n00:38:55\tVaibhav Janve:\tits interesting that carriage \"a\" has two set of wheels instread of 4. I wonder that choice is because its atomic.\n00:40:44\tArthur Shaw:\t@Vaibhav, because the load is lighter, the carriage needs fewer axles? ;) I agree: it's a confusing graphical choice.\n00:41:11\tVaibhav Janve:\tlol\n01:05:53\tVaibhav Janve:\tThank you Trevin!\n```\n</details>\n\n### Cohort 7\n\n<iframe src=\"https://www.youtube.com/embed/LBU-Ew_IM7A\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n<iframe src=\"https://www.youtube.com/embed/W9CoQ15NlOc\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n<details>\n\n<summary>Meeting chat log</summary>\n```\n00:40:18\tRyan Honomichl:\tWhat type of vector does each of the following calls to ifelse() return?\n\n* \"ifelse returns a value with the same shape as test which is filled with elements selected from either yes or no depending on whether the element of test is TRUE or FALSE.\"\n00:42:11\tRyan Honomichl:\t\"I recommend assigning the results of an if statement only when the entire expression fits on one line; otherwise it tends to be hard to read\"\n00:42:46\tRyan Honomichl:\t* When you use the single argument form without an `else` statement, `if` invisibly returns NULL if the condition is FALSE. \n\n- Since functions like c() and paste() drop NULL inputs, this allows for a compact expression of certain idioms\n00:54:15\tcollinberke:\thttps://docs.google.com/spreadsheets/d/1ScrbEw_-vB9DruaJhjtVY8HLQmuNPqyWeOOjmG6OY1M/edit?usp=sharing\n00:58:46\tcollinberke:\thttps://www.youtube.com/@safe4democracy/videos\n```\n</details>\n", - "supporting": [], + "markdown": "---\nengine: knitr\ntitle: Subsetting\n---\n\n## Learning objectives:\n\n- Learn about the 6 ways to subset atomic vectors\n- Learn about the 3 subsetting operators: `[[`, `[`, and `$`\n- Learn how subsetting works with different vector types\n- Learn how subsetting can be combined with assignment\n\n## Selecting multiple elements\n\n### Atomic Vectors\n\n- 6 ways to subset atomic vectors\n\nLet's take a look with an example vector.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- c(1.1, 2.2, 3.3, 4.4)\n```\n:::\n\n\n**Positive integer indices**\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# return elements at specified positions which can be out of order\nx[c(4, 1)]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 4.4 1.1\n```\n\n\n:::\n\n```{.r .cell-code}\n# duplicate indices return duplicate values\nx[c(2, 2)]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 2.2 2.2\n```\n\n\n:::\n\n```{.r .cell-code}\n# real numbers truncate to integers\n# so this behaves as if it is x[c(3, 3)]\nx[c(3.2, 3.8)]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 3.3 3.3\n```\n\n\n:::\n:::\n\n\n**Negative integer indices**\n\n\n::: {.cell}\n\n```{.r .cell-code}\n### excludes elements at specified positions\nx[-c(1, 3)] # same as x[c(-1, -3)] or x[c(2, 4)]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 2.2 4.4\n```\n\n\n:::\n\n```{.r .cell-code}\n### mixing positive and negative is a no-no\nx[c(-1, 3)]\n```\n\n::: {.cell-output .cell-output-error}\n\n```\n#> Error in x[c(-1, 3)]: only 0's may be mixed with negative subscripts\n```\n\n\n:::\n:::\n\n\n**Logical Vectors**\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx[c(TRUE, TRUE, FALSE, TRUE)]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1.1 2.2 4.4\n```\n\n\n:::\n\n```{.r .cell-code}\nx[x < 3]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1.1 2.2\n```\n\n\n:::\n\n```{.r .cell-code}\ncond <- x > 2.5\nx[cond]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 3.3 4.4\n```\n\n\n:::\n:::\n\n\n- **Recyling rules** applies when the two vectors are of different lengths\n- the shorter of the two is recycled to the length of the longer\n- Easy to understand if x or y is 1, best to avoid other lengths\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx[c(F, T)] # equivalent to: x[c(FALSE, TRUE, FALSE, TRUE)]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 2.2 4.4\n```\n\n\n:::\n:::\n\n\n**Missing values (NA)**\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Missing values in index will also return NA in output\nx[c(NA, TRUE)]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] NA 2.2 NA 4.4\n```\n\n\n:::\n:::\n\n\n**Nothing**\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# returns the original vector\nx[]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1.1 2.2 3.3 4.4\n```\n\n\n:::\n:::\n\n\n**Zero**\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# returns a zero-length vector\nx[0]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> numeric(0)\n```\n\n\n:::\n:::\n\n\n**Character vectors**\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# if name, you can use to return matched elements\n(y <- setNames(x, letters[1:4]))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> a b c d \n#> 1.1 2.2 3.3 4.4\n```\n\n\n:::\n\n```{.r .cell-code}\ny[c(\"d\", \"b\", \"a\")]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> d b a \n#> 4.4 2.2 1.1\n```\n\n\n:::\n\n```{.r .cell-code}\n# Like integer indices, you can repeat indices\ny[c(\"a\", \"a\", \"a\")]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> a a a \n#> 1.1 1.1 1.1\n```\n\n\n:::\n\n```{.r .cell-code}\n# When subsetting with [, names are always matched exactly\nz <- c(abc = 1, def = 2)\nz\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> abc def \n#> 1 2\n```\n\n\n:::\n\n```{.r .cell-code}\nz[c(\"a\", \"d\")]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> <NA> <NA> \n#> NA NA\n```\n\n\n:::\n:::\n\n\n### Lists\n\n- Subsetting works the same way\n- `[` always returns a list\n- `[[` and `$` let you pull elements out of a list\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmy_list <- list(a = c(T, F), b = letters[5:15], c = 100:108)\nmy_list\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> $a\n#> [1] TRUE FALSE\n#> \n#> $b\n#> [1] \"e\" \"f\" \"g\" \"h\" \"i\" \"j\" \"k\" \"l\" \"m\" \"n\" \"o\"\n#> \n#> $c\n#> [1] 100 101 102 103 104 105 106 107 108\n```\n\n\n:::\n:::\n\n\n**Return a (named) list**\n\n\n::: {.cell}\n\n```{.r .cell-code}\nl1 <- my_list[2]\nl1\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> $b\n#> [1] \"e\" \"f\" \"g\" \"h\" \"i\" \"j\" \"k\" \"l\" \"m\" \"n\" \"o\"\n```\n\n\n:::\n:::\n\n\n**Return a vector**\n\n\n::: {.cell}\n\n```{.r .cell-code}\nl2 <- my_list[[2]]\nl2\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"e\" \"f\" \"g\" \"h\" \"i\" \"j\" \"k\" \"l\" \"m\" \"n\" \"o\"\n```\n\n\n:::\n\n```{.r .cell-code}\nl2b <- my_list$b\nl2b\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"e\" \"f\" \"g\" \"h\" \"i\" \"j\" \"k\" \"l\" \"m\" \"n\" \"o\"\n```\n\n\n:::\n:::\n\n\n**Return a specific element**\n\n\n::: {.cell}\n\n```{.r .cell-code}\nl3 <- my_list[[2]][3]\nl3\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"g\"\n```\n\n\n:::\n\n```{.r .cell-code}\nl4 <- my_list[['b']][3]\nl4\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"g\"\n```\n\n\n:::\n\n```{.r .cell-code}\nl4b <- my_list$b[3]\nl4b\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"g\"\n```\n\n\n:::\n:::\n\n\n**Visual Representation**\n\n \n\nSee this stackoverflow article for more detailed information about the differences: https://stackoverflow.com/questions/1169456/the-difference-between-bracket-and-double-bracket-for-accessing-the-el\n\n### Matrices and arrays\n\nYou can subset higher dimensional structures in three ways:\n\n- with multiple vectors\n- with a single vector\n- with a matrix\n\n\n::: {.cell}\n\n```{.r .cell-code}\na <- matrix(1:12, nrow = 3)\ncolnames(a) <- c(\"A\", \"B\", \"C\", \"D\")\n\n# single row\na[1, ]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> A B C D \n#> 1 4 7 10\n```\n\n\n:::\n\n```{.r .cell-code}\n# single column\na[, 1]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1 2 3\n```\n\n\n:::\n\n```{.r .cell-code}\n# single element\na[1, 1]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> A \n#> 1\n```\n\n\n:::\n\n```{.r .cell-code}\n# two rows from two columns\na[1:2, 3:4]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> C D\n#> [1,] 7 10\n#> [2,] 8 11\n```\n\n\n:::\n\n```{.r .cell-code}\na[c(TRUE, FALSE, TRUE), c(\"B\", \"A\")]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> B A\n#> [1,] 4 1\n#> [2,] 6 3\n```\n\n\n:::\n\n```{.r .cell-code}\n# zero index and negative index\na[0, -2]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> A C D\n```\n\n\n:::\n:::\n\n\n**Subset a matrix with a matrix**\n\n\n::: {.cell}\n\n```{.r .cell-code}\nb <- matrix(1:4, nrow = 2)\nb\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [,1] [,2]\n#> [1,] 1 3\n#> [2,] 2 4\n```\n\n\n:::\n\n```{.r .cell-code}\na[b]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 7 11\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nvals <- outer(1:5, 1:5, FUN = \"paste\", sep = \",\")\nvals\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [,1] [,2] [,3] [,4] [,5] \n#> [1,] \"1,1\" \"1,2\" \"1,3\" \"1,4\" \"1,5\"\n#> [2,] \"2,1\" \"2,2\" \"2,3\" \"2,4\" \"2,5\"\n#> [3,] \"3,1\" \"3,2\" \"3,3\" \"3,4\" \"3,5\"\n#> [4,] \"4,1\" \"4,2\" \"4,3\" \"4,4\" \"4,5\"\n#> [5,] \"5,1\" \"5,2\" \"5,3\" \"5,4\" \"5,5\"\n```\n\n\n:::\n\n```{.r .cell-code}\nselect <- matrix(ncol = 2, byrow = TRUE, \n c(1, 1,\n 3, 1,\n 2, 4))\nselect\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [,1] [,2]\n#> [1,] 1 1\n#> [2,] 3 1\n#> [3,] 2 4\n```\n\n\n:::\n\n```{.r .cell-code}\nvals[select]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"1,1\" \"3,1\" \"2,4\"\n```\n\n\n:::\n:::\n\n\nMatrices and arrays are just special vectors; can subset with a single vector\n(arrays in R stored column wise)\n\n\n::: {.cell}\n\n```{.r .cell-code}\nvals[c(3, 15, 16, 17)]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"3,1\" \"5,3\" \"1,4\" \"2,4\"\n```\n\n\n:::\n:::\n\n\n### Data frames and tibbles\n\nData frames act like both lists and matrices\n\n- When subsetting with a single index, they behave like lists and index the columns, so `df[1:2]` selects the first two columns.\n- When subsetting with two indices, they behave like matrices, so `df[1:3, ]` selects the first three rows (and all the columns).\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(palmerpenguins)\n```\n\n::: {.cell-output .cell-output-stderr}\n\n```\n#> \n#> Attaching package: 'palmerpenguins'\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\n#> The following objects are masked from 'package:datasets':\n#> \n#> penguins, penguins_raw\n```\n\n\n:::\n\n```{.r .cell-code}\npenguins <- penguins\n\n# single index selects first two columns\ntwo_cols <- penguins[2:3] # or penguins[c(2,3)]\nhead(two_cols)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> # A tibble: 6 × 2\n#> island bill_length_mm\n#> <fct> <dbl>\n#> 1 Torgersen 39.1\n#> 2 Torgersen 39.5\n#> 3 Torgersen 40.3\n#> 4 Torgersen NA \n#> 5 Torgersen 36.7\n#> 6 Torgersen 39.3\n```\n\n\n:::\n\n```{.r .cell-code}\n# equivalent to the above code\nsame_two_cols <- penguins[c(\"island\", \"bill_length_mm\")]\nhead(same_two_cols)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> # A tibble: 6 × 2\n#> island bill_length_mm\n#> <fct> <dbl>\n#> 1 Torgersen 39.1\n#> 2 Torgersen 39.5\n#> 3 Torgersen 40.3\n#> 4 Torgersen NA \n#> 5 Torgersen 36.7\n#> 6 Torgersen 39.3\n```\n\n\n:::\n\n```{.r .cell-code}\n# two indices separated by comma (first two rows of 3rd and 4th columns)\npenguins[1:2, 3:4]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> # A tibble: 2 × 2\n#> bill_length_mm bill_depth_mm\n#> <dbl> <dbl>\n#> 1 39.1 18.7\n#> 2 39.5 17.4\n```\n\n\n:::\n\n```{.r .cell-code}\n# Can't do this...\npenguins[[3:4]][c(1:4)]\n```\n\n::: {.cell-output .cell-output-error}\n\n```\n#> Error:\n#> ! The `j` argument of `[[.tbl_df()` can't be a vector of length 2 as of\n#> tibble 3.0.0.\n#> ℹ Recursive subsetting is deprecated for tibbles.\n```\n\n\n:::\n\n```{.r .cell-code}\n# ...but this works...\npenguins[[3]][c(1:4)]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 39.1 39.5 40.3 NA\n```\n\n\n:::\n\n```{.r .cell-code}\n# ...or this equivalent...\npenguins$bill_length_mm[1:4]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 39.1 39.5 40.3 NA\n```\n\n\n:::\n:::\n\n\nSubsetting a tibble with `[` always returns a tibble\n\n### Preserving dimensionality\n\n- Data frames and tibbles behave differently\n- tibble will default to preserve dimensionality, data frames do not\n- this can lead to unexpected behavior and code breaking in the future\n- Use `drop = FALSE` to preserve dimensionality when subsetting a data frame or use tibbles\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntb <- tibble::tibble(a = 1:2, b = 1:2)\n\n# returns tibble\nstr(tb[, \"a\"])\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> tibble [2 × 1] (S3: tbl_df/tbl/data.frame)\n#> $ a: int [1:2] 1 2\n```\n\n\n:::\n\n```{.r .cell-code}\ntb[, \"a\"] # equivalent to tb[, \"a\", drop = FALSE]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> # A tibble: 2 × 1\n#> a\n#> <int>\n#> 1 1\n#> 2 2\n```\n\n\n:::\n\n```{.r .cell-code}\n# returns integer vector\n# str(tb[, \"a\", drop = TRUE])\ntb[, \"a\", drop = TRUE]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1 2\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndf <- data.frame(a = 1:2, b = 1:2)\n\n# returns integer vector\n# str(df[, \"a\"])\ndf[, \"a\"]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1 2\n```\n\n\n:::\n\n```{.r .cell-code}\n# returns data frame with one column\n# str(df[, \"a\", drop = FALSE])\ndf[, \"a\", drop = FALSE]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> a\n#> 1 1\n#> 2 2\n```\n\n\n:::\n:::\n\n**Factors**\n\nFactor subsetting drop argument controls whether or not levels (rather than dimensions) are preserved.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nz <- factor(c(\"a\", \"b\", \"c\"))\nz[1]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] a\n#> Levels: a b c\n```\n\n\n:::\n\n```{.r .cell-code}\nz[1, drop = TRUE]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] a\n#> Levels: a\n```\n\n\n:::\n:::\n\n\n## Selecting a single element\n\n`[[` and `$` are used to extract single elements (note: a vector can be a single element)\n\n### `[[]]`\n\nBecause `[[]]` can return only a single item, you must use it with either a single positive integer or a single string. \n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- list(1:3, \"a\", 4:6)\nx[[1]]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1 2 3\n```\n\n\n:::\n:::\n\n\nHadley Wickham recommends using `[[]]` with atomic vectors whenever you want to extract a single value to reinforce the expectation that you are getting and setting individual values. \n\n### `$`\n\n- `x$y` is equivalent to `x[[\"y\"]]`\n\nthe `$` operator doesn't work with stored vals\n\n\n::: {.cell}\n\n```{.r .cell-code}\nvar <- \"cyl\"\n\n# Doesn't work - mtcars$var translated to mtcars[[\"var\"]]\nmtcars$var\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> NULL\n```\n\n\n:::\n\n```{.r .cell-code}\n# Instead use [[\nmtcars[[var]]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4\n```\n\n\n:::\n:::\n\n\n`$` allows partial matching, `[[]]` does not\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- list(abc = 1)\nx$a\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1\n```\n\n\n:::\n\n```{.r .cell-code}\nx[[\"a\"]]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> NULL\n```\n\n\n:::\n:::\n\n\nHadley advises to change Global settings:\n\n\n::: {.cell}\n\n```{.r .cell-code}\noptions(warnPartialMatchDollar = TRUE)\nx$a\n```\n\n::: {.cell-output .cell-output-stderr}\n\n```\n#> Warning in x$a: partial match of 'a' to 'abc'\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1\n```\n\n\n:::\n:::\n\n\ntibbles don't have this behavior\n\n\n::: {.cell}\n\n```{.r .cell-code}\npenguins$s\n```\n\n::: {.cell-output .cell-output-stderr}\n\n```\n#> Warning: Unknown or uninitialised column: `s`.\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> NULL\n```\n\n\n:::\n:::\n\n\n### missing and out of bound indices\n\n- Due to the inconsistency of how R handles such indices, `purrr::pluck()` and `purrr::chuck()` are recommended\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- list(\n a = list(1, 2, 3),\n b = list(3, 4, 5)\n)\npurrr::pluck(x, \"a\", 1)\n# [1] 1\npurrr::pluck(x, \"c\", 1)\n# NULL\npurrr::pluck(x, \"c\", 1, .default = NA)\n# [1] NA\n```\n:::\n\n\n### `@` and `slot()`\n- `@` is `$` for S4 objects (to be revisited in Chapter 15)\n\n- `slot()` is `[[ ]]` for S4 objects\n\n## Subsetting and Assignment\n\n- Subsetting can be combined with assignment to edit values\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- c(\"Tigers\", \"Royals\", \"White Sox\", \"Twins\", \"Indians\")\n\nx[5] <- \"Guardians\"\n\nx\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"Tigers\" \"Royals\" \"White Sox\" \"Twins\" \"Guardians\"\n```\n\n\n:::\n:::\n\n\n- length of the subset and assignment vector should be the same to avoid recycling\n\nYou can use NULL to remove a component\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- list(a = 1, b = 2)\nx[[\"b\"]] <- NULL\nstr(x)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> List of 1\n#> $ a: num 1\n```\n\n\n:::\n:::\n\n\nSubsetting with nothing can preserve structure of original object\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# mtcars[] <- lapply(mtcars, as.integer)\n# is.data.frame(mtcars)\n# [1] TRUE\n# mtcars <- lapply(mtcars, as.integer)\n#> is.data.frame(mtcars)\n# [1] FALSE\n```\n:::\n\n\n## Applications\n\nApplications copied from cohort 2 slide\n\n### Lookup tables (character subsetting)\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- c(\"m\", \"f\", \"u\", \"f\", \"f\", \"m\", \"m\")\nlookup <- c(m = \"Male\", f = \"Female\", u = NA)\nlookup[x]\n# m f u f f m m \n# \"Male\" \"Female\" NA \"Female\" \"Female\" \"Male\" \"Male\"\n```\n:::\n\n\n### Matching and merging by hand (integer subsetting)\n\n- The `match()` function allows merging a vector with a table\n\n\n::: {.cell}\n\n```{.r .cell-code}\ngrades <- c(\"D\", \"A\", \"C\", \"B\", \"F\")\ninfo <- data.frame(\n grade = c(\"A\", \"B\", \"C\", \"D\", \"F\"),\n desc = c(\"Excellent\", \"Very Good\", \"Average\", \"Fair\", \"Poor\"),\n fail = c(F, F, F, F, T)\n)\nid <- match(grades, info$grade)\nid\n# [1] 3 2 2 1 3\ninfo[id, ]\n# grade desc fail\n# 4 D Fair FALSE\n# 1 A Excellent FALSE\n# 3 C Average FALSE\n# 2 B Very Good FALSE\n# 5 F Poor TRUE\n```\n:::\n\n\n### Random samples and bootstrapping (integer subsetting)\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# mtcars[sample(nrow(mtcars), 3), ] # use replace = TRUE to replace\n# mpg cyl disp hp drat wt qsec vs am gear carb\n# Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2\n# Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4\n# Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4\n```\n:::\n\n\n### Ordering (integer subsetting)\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# mtcars[order(mtcars$mpg), ]\n# mpg cyl disp hp drat wt qsec vs am gear carb\n# Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4\n# Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4\n# Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4\n# Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4\n# Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4\n# Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8\n# ...\n```\n:::\n\n\n### Expanding aggregated counts (integer subsetting)\n\n- We can expand a count column by using `rep()`\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndf <- tibble::tibble(x = c(\"Amy\", \"Julie\", \"Brian\"), n = c(2, 1, 3))\ndf[rep(1:nrow(df), df$n), ]\n# A tibble: 6 x 2\n# x n\n# <chr> <dbl>\n# 1 Amy 2\n# 2 Amy 2\n# 3 Julie 1\n# 4 Brian 3\n# 5 Brian 3\n# 6 Brian 3\n```\n:::\n\n\n### Removing columns from data frames (character)\n\n- We can remove a column by subsetting, which does not change the object\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndf[, 1]\n# A tibble: 3 x 1\n# x \n# <chr>\n# 1 Amy \n# 2 Julie\n# 3 Brian\n```\n:::\n\n\n- We can also delete the column using `NULL`\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndf$n <- NULL\ndf\n# A tibble: 3 x 1\n# x \n# <chr>\n# 1 Amy \n# 2 Julie\n# 3 Brian\n```\n:::\n\n\n### Selecting rows based on a condition (logical subsetting)\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# mtcars[mtcars$gear == 5, ]\n# mpg cyl disp hp drat wt qsec vs am gear carb\n# Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.7 0 1 5 2\n# Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.9 1 1 5 2\n# Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.5 0 1 5 4\n# Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.5 0 1 5 6\n# Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.6 0 1 5 8\n```\n:::\n\n\n### Boolean algebra versus sets (logical and integer)\n\n- `which()` gives the indices of a Boolean vector\n\n\n::: {.cell}\n\n```{.r .cell-code}\n(x1 <- 1:10 %% 2 == 0) # 1-10 divisible by 2\n# [1] FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE\n(x2 <- which(x1))\n# [1] 2 4 6 8 10\n(y1 <- 1:10 %% 5 == 0) # 1-10 divisible by 5\n# [1] FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE TRUE\n(y2 <- which(y1))\n# [1] 5 10\nx1 & y1\n# [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE\n```\n:::\n\n", + "supporting": [ + "04_files" + ], "filters": [ "rmarkdown/pagebreak.lua" ], diff --git a/_freeze/slides/05/execute-results/html.json b/_freeze/slides/05/execute-results/html.json @@ -1,8 +1,8 @@ { - "hash": "4832252d8573ca7203301b8e1bf043cf", + "hash": "80a70ab33aa94f43d33430a9c9b397e1", "result": { "engine": "knitr", - "markdown": "---\nengine: knitr\ntitle: Control flow\n---\n\n## Learning objectives:\n\n- Learn the **tools** for controlling flow of execution.\n\n- Learn some technical pitfalls and (perhaps lesser known) useful features.\n\n\n::: {.cell layout-align=\"left\"}\n::: {.cell-output-display}\n{fig-align='left' width=518}\n:::\n:::\n\n\n::: {.cell layout-align=\"right\"}\n::: {.cell-output-display}\n{fig-align='right' width=520}\n:::\n:::\n\n\n---\n\n## Introduction\n\nThere are two main groups of flow control tools: **choices** and **loops**: \n\n- Choices (`if`, `switch`, `ifelse`, `dplyr::if_else`, `dplyr::case_when`) allow you to run different code depending on the input. \n \n- Loops (`for`, `while`, `repeat`) allow you to repeatedly run code \n\n\n---\n\n\n## Choices\n\n\n\n`if()` and `else`\n\nUse `if` to specify a block of code to be executed, if a specified condition is true. Use `else` to specify a block of code to be executed, if the same condition is false. \n\n\n::: {.cell}\n\n```{.r .cell-code}\nif (condition) true_action\nif (condition) true_action else false_action\n```\n:::\n\n\n(Note braces are only *needed* for compound expressions)\n\n\n::: {.cell}\n\n```{.r .cell-code}\nif (test_expression) { \n true_action\n} else {\n false_action\n}\n```\n:::\n\n\n\nCan be expanded to more alternatives:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nif (test_expression) { \n true_action\n} else if (other_test_expression) {\n other_action\n} else {\n false_action\n}\n```\n:::\n\n\n\n## Exercise {-}\nWhy does this work?\n```\nx <- 1:10\nif (length(x)) \"not empty\" else \"empty\"\n#> [1] \"not empty\"\n\nx <- numeric()\nif (length(x)) \"not empty\" else \"empty\"\n#> [1] \"empty\"\n```\n\n`if` returns a value which can be assigned\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx1 <- if (TRUE) 1 else 2\nx2 <- if (FALSE) 1 else 2\n\nc(x1, x2)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1 2\n```\n\n\n:::\n:::\n\n\nThe book recommends assigning the results of an if statement only when the entire expression fits on one line; otherwise it tends to be hard to read.\n\n\n## Single if without else {-}\n\nWhen you use the single argument form without an else statement, if invisibly (Section 6.7.2) returns NULL if the condition is FALSE. Since functions like c() and paste() drop NULL inputs, this allows for a compact expression of certain idioms:\n\n\n::: {.cell}\n\n```{.r .cell-code}\ngreet <- function(name, birthday = FALSE) {\n paste0(\n \"Hi \", name,\n if (birthday) \" and HAPPY BIRTHDAY\"\n )\n}\ngreet(\"Maria\", FALSE)\n#> [1] \"Hi Maria\"\ngreet(\"Jaime\", TRUE)\n#> [1] \"Hi Jaime and HAPPY BIRTHDAY\"\n```\n:::\n\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nformat_lane_text <- function(number){\n\n paste0(\n number,\n \" lane\",\n if (number > 1) \"s\",\n \" of sequencing\"\n )\n}\n\nformat_lane_text(1)\n#> [1] \"1 lane of sequencing\"\nformat_lane_text(4)\n#> [1] \"4 lanes of sequencing\"\n```\n:::\n\n\n\n\n\n## Invalid inputs {-}\n\n- *Condition* must evaluate to a *single* `TRUE` or `FALSE`\n\nA single number gets coerced to a logical type. \n\n\n::: {.cell}\n\n```{.r .cell-code}\nif (56) 1\n#> [1] 1\nif (0.3) 1\n#> [1] 1\nif (0) 1\n```\n:::\n\n\nIf the condition cannot evaluate to a *single* `TRUE` or `FALSE`, an error is (usually) produced.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nif (\"text\") 1\n#> Error in if (\"text\") 1: argument is not interpretable as logical\nif (\"true\") 1 \n#> 1\nif (numeric()) 1\n#> Error in if (numeric()) 1: argument is of length zero\nif (NULL) 1\n#> Error in if (NULL) 1 : argument is of length zero\nif (NA) 1\n#> Error in if (NA) 1: missing value where TRUE/FALSE needed\n```\n:::\n\n\n\nException is a logical vector of length greater than 1, which only generates a warning, unless you have `_R_CHECK_LENGTH_1_CONDITION_` set to `TRUE`. \nThis seems to have been the default since R-4.2.0\n\n\n::: {.cell}\n\n```{.r .cell-code}\nif (c(TRUE, FALSE)) 1\n#>Error in if (c(TRUE, FALSE)) 1 : the condition has length > 1\n```\n:::\n\n\n## Vectorized choices {-}\n\n- `ifelse()` is a vectorized version of `if`:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- 1:10\nifelse(x %% 5 == 0, \"XXX\", as.character(x))\n#> [1] \"1\" \"2\" \"3\" \"4\" \"XXX\" \"6\" \"7\" \"8\" \"9\" \"XXX\"\n\nifelse(x %% 2 == 0, \"even\", \"odd\")\n#> [1] \"odd\" \"even\" \"odd\" \"even\" \"odd\" \"even\" \"odd\" \"even\" \"odd\" \"even\"\n```\n:::\n\n\n- `dplyr::if_else()`\n\n- Book recommends only using `ifelse()` \"only when the yes and no vectors are the same type as it is otherwise hard to predict the output type.\" \n\n- `dplyr::if_else()` enforces this recommendation.\n\n**For example:**\n\n\n::: {.cell}\n\n```{.r .cell-code}\nifelse(c(TRUE,TRUE,FALSE),\"a\",3)\n#> [1] \"a\" \"a\" \"3\"\ndplyr::if_else(c(TRUE,TRUE,FALSE),\"a\",3)\n#> Error in `dplyr::if_else()`:\n#> ! `false` must be a character vector, not a double vector.\n```\n:::\n\n \n## Switch {-}\n\nRather then string together multiple if - else if chains, you can often use `switch`.\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncentre <- function(x, type) {\n switch(type,\n mean = mean(x),\n median = median(x),\n trimmed = mean(x, trim = .1),\n stop(\"Invalid `type` value\")\n )\n}\n```\n:::\n\n\nLast component should always throw an error, as unmatched inputs would otherwise invisibly return NULL.\nBook recommends to only use character inputs for `switch()`.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nvec <- c(1:20,50:55)\ncentre(vec, \"mean\")\n#> [1] 20.19231\ncentre(vec, \"median\")\n#> [1] 13.5\ncentre(vec, \"trimmed\")\n#> [1] 18.77273\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nset.seed(123)\nx <- rlnorm(100)\n\ncenters <- data.frame(type = c('mean', 'median', 'trimmed'))\ncenters$value = sapply(centers$type, \\(t){centre(x,t)})\n\nrequire(ggplot2)\nggplot(data = data.frame(x), aes(x))+\n geom_density()+\n geom_vline(data = centers, \n mapping = aes(color = type, xintercept = value), \n linewidth=0.5,linetype=\"dashed\") +\n xlim(-1,10)+\n theme_bw()\n```\n\n::: {.cell-output-display}\n{width=672}\n:::\n:::\n\n\n\nExample from book of \"falling through\" to next value\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlegs <- function(x) {\n switch(x,\n cow = ,\n horse = ,\n dog = 4,\n human = ,\n chicken = 2,\n plant = 0,\n stop(\"Unknown input\")\n )\n}\nlegs(\"cow\")\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 4\n```\n\n\n:::\n\n```{.r .cell-code}\n#> [1] 4\nlegs(\"dog\")\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 4\n```\n\n\n:::\n\n```{.r .cell-code}\n#> [1] 4\n```\n:::\n\n\n\n\n\n## Using `dplyr::case_when` {-}\n\n- `case_when` is a more general `if_else` and can be used often in place of multiple chained `if_else` or sapply'ing `switch`.\n\n- It uses a special syntax to allow any number of condition-vector pairs:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nset.seed(123)\nx <- rlnorm(100)\n\ncenters <- data.frame(type = c('mean', 'median', 'trimmed'))\n\ncenters$value = dplyr::case_when(\n centers$type == 'mean' ~ mean(x),\n centers$type == 'median' ~ median(x),\n centers$type == 'trimmed' ~ mean(x, trim = 0.1),\n .default = 1000\n )\n\ncenters\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> type value\n#> 1 mean 1.652545\n#> 2 median 1.063744\n#> 3 trimmed 1.300568\n```\n\n\n:::\n:::\n\n\n \n\n## Loops\n\n- Iteration over a elements of a vector\n\n`for (item in vector) perform_action`\n\n**First example**\n\n::: {.cell}\n\n```{.r .cell-code}\nfor(i in 1:5) {\n print(1:i)\n}\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1\n#> [1] 1 2\n#> [1] 1 2 3\n#> [1] 1 2 3 4\n#> [1] 1 2 3 4 5\n```\n\n\n:::\n\n```{.r .cell-code}\nx <- numeric(length=5L)\ndf <- data.frame(x=1:5)\n\nfor(i in 1:5) {\n df$y[[i]] <- i+1\n}\n```\n:::\n\n\n\n**Second example**: terminate a *for loop* earlier\n\n- `next` skips rest of current iteration\n- `break` exits the loop entirely\n\n\n::: {.cell}\n\n```{.r .cell-code}\nfor (i in 1:10) {\n if (i < 3) \n next\n\n print(i)\n \n if (i >= 5)\n break\n}\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 3\n#> [1] 4\n#> [1] 5\n```\n\n\n:::\n:::\n\n\n## Exercise {-}\n\nWhen the following code is evaluated, what can you say about the vector being iterated?\n```\nxs <- c(1, 2, 3)\nfor (x in xs) {\n xs <- c(xs, x * 2)\n}\nxs\n#> [1] 1 2 3 2 4 6\n```\n\n## Pitfalls {-}\n\n- Preallocate output containers to avoid *slow* code. \n\n- Beware that `1:length(v)` when `v` has length 0 results in a iterating backwards over `1:0`, probably not what is intended. Use `seq_along(v)` instead.\n\n- When iterating over S3 vectors, use `[[]]` yourself to avoid stripping attributes. \n\n```\nxs <- as.Date(c(\"2020-01-01\", \"2010-01-01\"))\nfor (x in xs) {\n print(x)\n}\n#> [1] 18262\n#> [1] 14610\n```\nvs. \n```\nfor (i in seq_along(xs)) {\n print(xs[[i]])\n}\n#> [1] \"2020-01-01\"\n#> [1] \"2010-01-01\"\n```\n\n## Related tools {-}\n\n- `while(condition) action`: performs action while condition is TRUE.\n- `repeat(action)`: repeats action forever (i.e. until it encounters break).\n\n- Note that `for` can be rewritten as `while` and while can be rewritten as `repeat` (this goes in one direction only!); *however*:\n\n>Good practice is to use the least-flexible solution to a problem, so you should use `for` wherever possible.\nBUT you shouldn't even use for loops for data analysis tasks as `map()` and `apply()` already provide *less flexible* solutions to most problems. (More in Chapter 9.)\n\n\n::: {.cell}\n\n```{.r .cell-code}\nfor (i in 1:5) {\n print(i)\n}\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1\n#> [1] 2\n#> [1] 3\n#> [1] 4\n#> [1] 5\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx_option <- function(x) {\n switch(x,\n a = \"option 1\",\n b = \"option 2\",\n c = \"option 3\"#,\n #stop(\"Invalid `x` value\")\n )\n}\n```\n:::\n\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ni <- 1\n\nwhile(i <=5 ) {\n print(i)\n i <- i+1\n}\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1\n#> [1] 2\n#> [1] 3\n#> [1] 4\n#> [1] 5\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ni <- 1\n\nrepeat {\n print(i)\n i <- i+1\n if (i > 5) break\n}\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1\n#> [1] 2\n#> [1] 3\n#> [1] 4\n#> [1] 5\n```\n\n\n:::\n:::\n\n\n\n---\n\n\n\n\n## Meeting Videos\n\n### Cohort 1\n\n<iframe src=\"https://www.youtube.com/embed/96eY6YS_3hU\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 2\n\n<iframe src=\"https://www.youtube.com/embed/x5I_uHnMxIk\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 3\n\n<iframe src=\"https://www.youtube.com/embed/u6UMGWDuxDE\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 4\n\n<iframe src=\"https://www.youtube.com/embed/G4YOvwsSw2Q\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 5\n\n<iframe src=\"https://www.youtube.com/embed/AZwJjsl8xiI\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 6\n\n<iframe src=\"https://www.youtube.com/embed/wg2QZ3rMIqM\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n<details>\n<summary> Meeting chat log </summary>\n\n```\n00:16:34\tFederica Gazzelloni:\thttps://github.com/r4ds/bookclub-Advanced_R\n00:22:28\tFederica Gazzelloni:\thttps://stackoverflow.com/questions/50646133/dplyr-if-else-vs-base-r-ifelse\n00:26:20\tTrevin:\tcase_when() is great, makes it easy to read\n00:54:01\tTrevin:\tout[I, ]\n00:54:14\tTrevin:\tout[i, ]\n00:55:03\tTrevin:\tI think you have to specify number of rows and columns before..\n00:55:30\tTrevin:\titerations = 10\n variables = 2\n\n output <- matrix(ncol=variables, nrow=iterations)\n00:55:43\tTrevin:\thttps://stackoverflow.com/questions/13442461/populating-a-data-frame-in-r-in-a-loop\n```\n</details>\n\n### Cohort 7\n\n<iframe src=\"https://www.youtube.com/embed/W9CoQ15NlOc\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n<details>\n\n<summary>Meeting chat log</summary>\n```\n00:40:18\tRyan Honomichl:\tWhat type of vector does each of the following calls to ifelse() return?\n\n* \"ifelse returns a value with the same shape as test which is filled with elements selected from either yes or no depending on whether the element of test is TRUE or FALSE.\"\n00:42:11\tRyan Honomichl:\t\"I recommend assigning the results of an if statement only when the entire expression fits on one line; otherwise it tends to be hard to read\"\n00:42:46\tRyan Honomichl:\t* When you use the single argument form without an `else` statement, `if` invisibly returns NULL if the condition is FALSE. \n\n- Since functions like c() and paste() drop NULL inputs, this allows for a compact expression of certain idioms\n00:54:15\tcollinberke:\thttps://docs.google.com/spreadsheets/d/1ScrbEw_-vB9DruaJhjtVY8HLQmuNPqyWeOOjmG6OY1M/edit?usp=sharing\n00:58:46\tcollinberke:\thttps://www.youtube.com/@safe4democracy/videos\n```\n</details>\n", + "markdown": "---\nengine: knitr\ntitle: Control flow\n---\n\n## Learning objectives:\n\n- Learn the **tools** for controlling flow of execution.\n\n- Learn some technical pitfalls and (perhaps lesser known) useful features.\n\n\n::: {.cell layout-align=\"left\"}\n::: {.cell-output-display}\n{fig-align='left' width=518}\n:::\n:::\n\n\n::: {.cell layout-align=\"right\"}\n::: {.cell-output-display}\n{fig-align='right' width=520}\n:::\n:::\n\n\n---\n\n## Introduction\n\nThere are two main groups of flow control tools: **choices** and **loops**: \n\n- Choices (`if`, `switch`, `ifelse`, `dplyr::if_else`, `dplyr::case_when`) allow you to run different code depending on the input. \n \n- Loops (`for`, `while`, `repeat`) allow you to repeatedly run code \n\n\n---\n\n\n## Choices\n\n\n\n`if()` and `else`\n\nUse `if` to specify a block of code to be executed, if a specified condition is true. Use `else` to specify a block of code to be executed, if the same condition is false. \n\n\n::: {.cell}\n\n```{.r .cell-code}\nif (condition) true_action\nif (condition) true_action else false_action\n```\n:::\n\n\n(Note braces are only *needed* for compound expressions)\n\n\n::: {.cell}\n\n```{.r .cell-code}\nif (test_expression) { \n true_action\n} else {\n false_action\n}\n```\n:::\n\n\n\nCan be expanded to more alternatives:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nif (test_expression) { \n true_action\n} else if (other_test_expression) {\n other_action\n} else {\n false_action\n}\n```\n:::\n\n\n\n## Exercise {-}\nWhy does this work?\n```\nx <- 1:10\nif (length(x)) \"not empty\" else \"empty\"\n#> [1] \"not empty\"\n\nx <- numeric()\nif (length(x)) \"not empty\" else \"empty\"\n#> [1] \"empty\"\n```\n\n`if` returns a value which can be assigned\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx1 <- if (TRUE) 1 else 2\nx2 <- if (FALSE) 1 else 2\n\nc(x1, x2)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1 2\n```\n\n\n:::\n:::\n\n\nThe book recommends assigning the results of an if statement only when the entire expression fits on one line; otherwise it tends to be hard to read.\n\n\n## Single if without else {-}\n\nWhen you use the single argument form without an else statement, if invisibly (Section 6.7.2) returns NULL if the condition is FALSE. Since functions like c() and paste() drop NULL inputs, this allows for a compact expression of certain idioms:\n\n\n::: {.cell}\n\n```{.r .cell-code}\ngreet <- function(name, birthday = FALSE) {\n paste0(\n \"Hi \", name,\n if (birthday) \" and HAPPY BIRTHDAY\"\n )\n}\ngreet(\"Maria\", FALSE)\n#> [1] \"Hi Maria\"\ngreet(\"Jaime\", TRUE)\n#> [1] \"Hi Jaime and HAPPY BIRTHDAY\"\n```\n:::\n\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nformat_lane_text <- function(number){\n\n paste0(\n number,\n \" lane\",\n if (number > 1) \"s\",\n \" of sequencing\"\n )\n}\n\nformat_lane_text(1)\n#> [1] \"1 lane of sequencing\"\nformat_lane_text(4)\n#> [1] \"4 lanes of sequencing\"\n```\n:::\n\n\n\n\n\n## Invalid inputs {-}\n\n- *Condition* must evaluate to a *single* `TRUE` or `FALSE`\n\nA single number gets coerced to a logical type. \n\n\n::: {.cell}\n\n```{.r .cell-code}\nif (56) 1\n#> [1] 1\nif (0.3) 1\n#> [1] 1\nif (0) 1\n```\n:::\n\n\nIf the condition cannot evaluate to a *single* `TRUE` or `FALSE`, an error is (usually) produced.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nif (\"text\") 1\n#> Error in if (\"text\") 1: argument is not interpretable as logical\nif (\"true\") 1 \n#> 1\nif (numeric()) 1\n#> Error in if (numeric()) 1: argument is of length zero\nif (NULL) 1\n#> Error in if (NULL) 1 : argument is of length zero\nif (NA) 1\n#> Error in if (NA) 1: missing value where TRUE/FALSE needed\n```\n:::\n\n\n\nException is a logical vector of length greater than 1, which only generates a warning, unless you have `_R_CHECK_LENGTH_1_CONDITION_` set to `TRUE`. \nThis seems to have been the default since R-4.2.0\n\n\n::: {.cell}\n\n```{.r .cell-code}\nif (c(TRUE, FALSE)) 1\n#>Error in if (c(TRUE, FALSE)) 1 : the condition has length > 1\n```\n:::\n\n\n## Vectorized choices {-}\n\n- `ifelse()` is a vectorized version of `if`:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- 1:10\nifelse(x %% 5 == 0, \"XXX\", as.character(x))\n#> [1] \"1\" \"2\" \"3\" \"4\" \"XXX\" \"6\" \"7\" \"8\" \"9\" \"XXX\"\n\nifelse(x %% 2 == 0, \"even\", \"odd\")\n#> [1] \"odd\" \"even\" \"odd\" \"even\" \"odd\" \"even\" \"odd\" \"even\" \"odd\" \"even\"\n```\n:::\n\n\n- `dplyr::if_else()`\n\n- Book recommends only using `ifelse()` \"only when the yes and no vectors are the same type as it is otherwise hard to predict the output type.\" \n\n- `dplyr::if_else()` enforces this recommendation.\n\n**For example:**\n\n\n::: {.cell}\n\n```{.r .cell-code}\nifelse(c(TRUE,TRUE,FALSE),\"a\",3)\n#> [1] \"a\" \"a\" \"3\"\ndplyr::if_else(c(TRUE,TRUE,FALSE),\"a\",3)\n#> Error in `dplyr::if_else()`:\n#> ! `false` must be a character vector, not a double vector.\n```\n:::\n\n \n## Switch {-}\n\nRather then string together multiple if - else if chains, you can often use `switch`.\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncentre <- function(x, type) {\n switch(type,\n mean = mean(x),\n median = median(x),\n trimmed = mean(x, trim = .1),\n stop(\"Invalid `type` value\")\n )\n}\n```\n:::\n\n\nLast component should always throw an error, as unmatched inputs would otherwise invisibly return NULL.\nBook recommends to only use character inputs for `switch()`.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nvec <- c(1:20,50:55)\ncentre(vec, \"mean\")\n#> [1] 20.19231\ncentre(vec, \"median\")\n#> [1] 13.5\ncentre(vec, \"trimmed\")\n#> [1] 18.77273\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nset.seed(123)\nx <- rlnorm(100)\n\ncenters <- data.frame(type = c('mean', 'median', 'trimmed'))\ncenters$value = sapply(centers$type, \\(t){centre(x,t)})\n\nrequire(ggplot2)\nggplot(data = data.frame(x), aes(x))+\n geom_density()+\n geom_vline(data = centers, \n mapping = aes(color = type, xintercept = value), \n linewidth=0.5,linetype=\"dashed\") +\n xlim(-1,10)+\n theme_bw()\n```\n\n::: {.cell-output-display}\n{width=672}\n:::\n:::\n\n\n\nExample from book of \"falling through\" to next value\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlegs <- function(x) {\n switch(x,\n cow = ,\n horse = ,\n dog = 4,\n human = ,\n chicken = 2,\n plant = 0,\n stop(\"Unknown input\")\n )\n}\nlegs(\"cow\")\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 4\n```\n\n\n:::\n\n```{.r .cell-code}\n#> [1] 4\nlegs(\"dog\")\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 4\n```\n\n\n:::\n\n```{.r .cell-code}\n#> [1] 4\n```\n:::\n\n\n\n\n\n## Using `dplyr::case_when` {-}\n\n- `case_when` is a more general `if_else` and can be used often in place of multiple chained `if_else` or sapply'ing `switch`.\n\n- It uses a special syntax to allow any number of condition-vector pairs:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nset.seed(123)\nx <- rlnorm(100)\n\ncenters <- data.frame(type = c('mean', 'median', 'trimmed'))\n\ncenters$value = dplyr::case_when(\n centers$type == 'mean' ~ mean(x),\n centers$type == 'median' ~ median(x),\n centers$type == 'trimmed' ~ mean(x, trim = 0.1),\n .default = 1000\n )\n\ncenters\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> type value\n#> 1 mean 1.652545\n#> 2 median 1.063744\n#> 3 trimmed 1.300568\n```\n\n\n:::\n:::\n\n\n \n\n## Loops\n\n- Iteration over a elements of a vector\n\n`for (item in vector) perform_action`\n\n**First example**\n\n::: {.cell}\n\n```{.r .cell-code}\nfor(i in 1:5) {\n print(1:i)\n}\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1\n#> [1] 1 2\n#> [1] 1 2 3\n#> [1] 1 2 3 4\n#> [1] 1 2 3 4 5\n```\n\n\n:::\n\n```{.r .cell-code}\nx <- numeric(length=5L)\ndf <- data.frame(x=1:5)\n\nfor(i in 1:5) {\n df$y[[i]] <- i+1\n}\n```\n:::\n\n\n\n**Second example**: terminate a *for loop* earlier\n\n- `next` skips rest of current iteration\n- `break` exits the loop entirely\n\n\n::: {.cell}\n\n```{.r .cell-code}\nfor (i in 1:10) {\n if (i < 3) \n next\n\n print(i)\n \n if (i >= 5)\n break\n}\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 3\n#> [1] 4\n#> [1] 5\n```\n\n\n:::\n:::\n\n\n## Exercise {-}\n\nWhen the following code is evaluated, what can you say about the vector being iterated?\n```\nxs <- c(1, 2, 3)\nfor (x in xs) {\n xs <- c(xs, x * 2)\n}\nxs\n#> [1] 1 2 3 2 4 6\n```\n\n## Pitfalls {-}\n\n- Preallocate output containers to avoid *slow* code. \n\n- Beware that `1:length(v)` when `v` has length 0 results in a iterating backwards over `1:0`, probably not what is intended. Use `seq_along(v)` instead.\n\n- When iterating over S3 vectors, use `[[]]` yourself to avoid stripping attributes. \n\n```\nxs <- as.Date(c(\"2020-01-01\", \"2010-01-01\"))\nfor (x in xs) {\n print(x)\n}\n#> [1] 18262\n#> [1] 14610\n```\nvs. \n```\nfor (i in seq_along(xs)) {\n print(xs[[i]])\n}\n#> [1] \"2020-01-01\"\n#> [1] \"2010-01-01\"\n```\n\n## Related tools {-}\n\n- `while(condition) action`: performs action while condition is TRUE.\n- `repeat(action)`: repeats action forever (i.e. until it encounters break).\n\n- Note that `for` can be rewritten as `while` and while can be rewritten as `repeat` (this goes in one direction only!); *however*:\n\n>Good practice is to use the least-flexible solution to a problem, so you should use `for` wherever possible.\nBUT you shouldn't even use for loops for data analysis tasks as `map()` and `apply()` already provide *less flexible* solutions to most problems. (More in Chapter 9.)\n\n\n::: {.cell}\n\n```{.r .cell-code}\nfor (i in 1:5) {\n print(i)\n}\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1\n#> [1] 2\n#> [1] 3\n#> [1] 4\n#> [1] 5\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx_option <- function(x) {\n switch(x,\n a = \"option 1\",\n b = \"option 2\",\n c = \"option 3\"#,\n #stop(\"Invalid `x` value\")\n )\n}\n```\n:::\n\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ni <- 1\n\nwhile(i <=5 ) {\n print(i)\n i <- i+1\n}\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1\n#> [1] 2\n#> [1] 3\n#> [1] 4\n#> [1] 5\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ni <- 1\n\nrepeat {\n print(i)\n i <- i+1\n if (i > 5) break\n}\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1\n#> [1] 2\n#> [1] 3\n#> [1] 4\n#> [1] 5\n```\n\n\n:::\n:::\n\n", "supporting": [ "05_files" ], diff --git a/_freeze/slides/05/figure-html/unnamed-chunk-16-1.png b/_freeze/slides/05/figure-html/unnamed-chunk-16-1.png Binary files differ. diff --git a/_freeze/slides/06/execute-results/html.json b/_freeze/slides/06/execute-results/html.json @@ -1,8 +1,8 @@ { - "hash": "7b44c118f36a5ad2df2f9d673709df32", + "hash": "583568141f76bf50bb670a205c5884b6", "result": { "engine": "knitr", - "markdown": "---\nengine: knitr\ntitle: Functions\n---\n\n## Learning objectives:\n\n- How to make functions in R\n- What are the parts of a function\n- Nested functions\n\n## How to make a simple function in R\n\nLet's imagine the structure of a function:\n\n\n::: {.cell layout-align=\"center\"}\n::: {.cell-output-display}\n{fig-align='center' width=133}\n:::\n:::\n\n\n\n**Function components**\n\nFunctions have three parts, `formals()`, `body()`, and `environment()`.\n\n\n::: {.cell layout-align=\"center\"}\n::: {.cell-output-display}\n\n```{=html}\n<div id=\"htmlwidget-3af49631d9fe5bef303e\" style=\"width:100%;height:100%;\" class=\"DiagrammeR html-widget\"></div>\n<script type=\"application/json\" data-for=\"htmlwidget-3af49631d9fe5bef303e\">{\"x\":{\"diagram\":\"\\n graph LR\\n A{formals}-->B(body)\\n B-->C(environment)\\nstyle A fill:#bbf,stroke:#f66,stroke-width:2px,color:#fff,stroke-dasharray: 5 5\\n\"},\"evals\":[],\"jsHooks\":[]}</script>\n```\n\n:::\n:::\n\n\n**Example**\n\n\n::: {.cell}\n\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncoffee_ratings%>%slice(1:3)%>%select(1:5)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> # A tibble: 3 × 5\n#> total_cup_points species owner country_of_origin farm_name \n#> <dbl> <chr> <chr> <chr> <chr> \n#> 1 90.6 Arabica metad plc Ethiopia \"metad pl…\n#> 2 89.9 Arabica metad plc Ethiopia \"metad pl…\n#> 3 89.8 Arabica grounds for health admin Guatemala \"san marc…\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\navg_points <- function(species){\n # this function is for calculating the mean\n avg <- coffee_ratings %>% \n filter(species == species) %>% \n summarise(mean = mean(total_cup_points))\n \n return(avg)\n}\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\navg_points(\"Arabica\")\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> # A tibble: 1 × 1\n#> mean\n#> <dbl>\n#> 1 82.1\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nformals(avg_points)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> $species\n```\n\n\n:::\n:::\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nbody(avg_points)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> {\n#> avg <- coffee_ratings %>% filter(species == species) %>% \n#> summarise(mean = mean(total_cup_points))\n#> return(avg)\n#> }\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nenvironment(avg_points)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> <environment: R_GlobalEnv>\n```\n\n\n:::\n:::\n\n\nFunctions uses attributes, one attribute used by base R is `srcref`, short for **source reference**. It points to the source code used to create the function. It contains code comments and other formatting.\n\n::: {.cell}\n\n```{.r .cell-code}\nattr(avg_points, \"srcref\")\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> NULL\n```\n\n\n:::\n:::\n\n\n## Primitive functions\n\nAre the core function in base R, such as `sum()`\n\n::: {.cell}\n\n```{.r .cell-code}\nsum\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> function (..., na.rm = FALSE) .Primitive(\"sum\")\n```\n\n\n:::\n:::\n\n\nType of primitives:\n\n- builtin \n- special\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntypeof(sum)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"builtin\"\n```\n\n\n:::\n:::\n\n\n\nThese core functions have components to NULL.\n\n## Anonymous functions\n\nIf you don't provide a name to a function\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlapply(mtcars%>%select(mpg,cyl), function(x) length(unique(x)))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> $mpg\n#> [1] 25\n#> \n#> $cyl\n#> [1] 3\n```\n\n\n:::\n:::\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nvector_len <- function(x) {\n length(unique(x))\n}\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlapply(mtcars%>%select(mpg,cyl), vector_len)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> $mpg\n#> [1] 25\n#> \n#> $cyl\n#> [1] 3\n```\n\n\n:::\n:::\n\n\n**Invoking a function**\n\n\n::: {.cell}\n\n```{.r .cell-code}\nargs <- unique(coffee_ratings$species) %>% \n `[[`(1) %>% \n as.list()\n \n \n \ndo.call(avg_points, args)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> # A tibble: 1 × 1\n#> mean\n#> <dbl>\n#> 1 82.1\n```\n\n\n:::\n:::\n\n\n\n\n## Function composition\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsquare <- function(x) x^2\ndeviation <- function(x) x - mean(x)\nx <- runif(100)\nsqrt(mean(square(deviation(x))))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 0.2933335\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nout <- deviation(x)\nout <- square(out)\nout <- mean(out)\nout <- sqrt(out)\nout\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 0.2933335\n```\n\n\n:::\n:::\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx %>%\n deviation() %>%\n square() %>%\n mean() %>%\n sqrt()\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 0.2933335\n```\n\n\n:::\n:::\n\n\n## More about functions insights\n\n### Lexical scoping\n\n**Rules**\n\n- Name masking\n- Functions versus variables\n- A fresh start\n- Dynamic lookup\n\n\n**Debugging**\n\nThis function \n\n::: {.cell}\n\n```{.r .cell-code}\ng12 <- function() x + 1\nx <- 15\ng12()\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 16\n```\n\n\n:::\n:::\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncodetools::findGlobals(g12)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"+\" \"x\"\n```\n\n\n:::\n:::\n\n\nYou can change the function’s environment to an environment which contains nothing:\n\n::: {.cell}\n\n```{.r .cell-code}\n# environment(g12) <- emptyenv()\n# g12()\n# Error in x + 1 : could not find function \"+\"\n```\n:::\n\n\n\n### ... (dot-dot-dot)\n\n**Example**\n\n\n::: {.cell}\n\n```{.r .cell-code}\ni01 <- function(y, z) {\n list(y = y, z = z)\n}\ni02 <- function(x, ...) {\n i01(...)\n}\nstr(i02(x = 1, y = 2, z = 3))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> List of 2\n#> $ y: num 2\n#> $ z: num 3\n```\n\n\n:::\n\n```{.r .cell-code}\n#> List of 2\n#> $ y: num 2\n#> $ z: num 3\n```\n:::\n\n\n### Exiting a function\n\n1. Implicit or explicit returns\n\n2. Invisibility (`<-` most famous function that returns an invisible value)\n\n3. `stop()` to stop a function with an error.\n\n4. Exit handlers\n\n\n### Function forms\n\n>Everything that exists is an object.\nEverything that happens is a function call.\n— John Chambers\n\n\n\n::: {.cell}\n::: {.cell-output-display}\n{width=584}\n:::\n:::\n\n\n---\n\n\n## Case Study: SIR model function\n\nThis is an interesting example taken from a course on Coursera: [Infectious disease modelling-ICL](https://www.coursera.org/specializations/infectious-disease-modelling)\n\nThe purpose of this example is to show how to make a model passing through making a function.\n\nFirst we need to load some useful libraries:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(deSolve)\nlibrary(reshape2)\n```\n:::\n\n\n\nThen set the model inputs:\n\n- population size (N)\n- number of susceptable (S)\n- infected (I)\n- recovered (R)\n\nAnd add the model parameters:\n\n- infection rate ($\\beta$)\n- recovery rate ($\\gamma$)\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nN<- 100000 # population\n\nstate_values<- c(S = N -1, # susceptible\n I = 1, # infected\n R = 0) # recovered\n\nparameters<- c(beta = 1/2, # infection rate days^-1\n gamma = 1/4) # recovery rate days^-1\n```\n:::\n\n\n\nThen we set the **time** as an important factor, which defines the length of time we are looking at this model run. It is intended as the time range in which the infections spread out, let’s say that we are aiming to investigate an infection period of 100 days.\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntimes<- seq(0, 100, by = 1)\n```\n:::\n\n\nFinally, we set up the **SIR model**, the susceptable, infected and recovered model. How do we do that is passing the paramenters through a function of the time, and state.\n\nWithin the model function we calculate one more paramenter, the **force of infection**: $\\lambda$ \n\n\n::: {.cell}\n\n```{.r .cell-code}\nsir_model<- function(time, state, parameters){\n with(as.list(c(state, parameters)),{\n N<- S + I + R\n lambda = beta * I/N # force of infection\n dS<- - lambda * S \n dI<- lambda * S - gamma * I\n dR<- gamma * I\n return(list(c(dS,dI,dR)))\n })\n}\n```\n:::\n\n\n\nOnce we have our **SIR model** function ready, we can calculate the **output** of the model, with the help of the function `ode()` from {deSolve} package.\n\n\n::: {.cell}\n\n```{.r .cell-code}\noutput<- as.data.frame(ode(y = state_values,\n times = times,\n func = sir_model,\n parms = parameters))\noutput %>% head\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> time S I R\n#> 1 0 99999.00 1.000000 0.0000000\n#> 2 1 99998.43 1.284018 0.2840252\n#> 3 2 99997.70 1.648696 0.6487171\n#> 4 3 99996.77 2.116939 1.1169863\n#> 5 4 99995.56 2.718152 1.7182450\n#> 6 5 99994.02 3.490086 2.4902600\n```\n\n\n:::\n:::\n\n\nIn addition to our builtin SIR model function we can have a look at:\n\n\n::: {.cell}\n\n```{.r .cell-code}\n?deSolve::ode()\n```\n:::\n\n\nIt solves **Ordinary Differential Equations**. \n\n\n::: {.cell}\n\n```{.r .cell-code}\ndeSolve:::ode\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> function (y, times, func, parms, method = c(\"lsoda\", \"lsode\", \n#> \"lsodes\", \"lsodar\", \"vode\", \"daspk\", \"euler\", \"rk4\", \"ode23\", \n#> \"ode45\", \"radau\", \"bdf\", \"bdf_d\", \"adams\", \"impAdams\", \"impAdams_d\", \n#> \"iteration\"), ...) \n#> {\n#> if (is.null(method)) \n#> method <- \"lsoda\"\n#> if (is.list(method)) {\n#> if (!inherits(method, \"rkMethod\")) \n#> stop(\"'method' should be given as string or as a list of class 'rkMethod'\")\n#> out <- rk(y, times, func, parms, method = method, ...)\n#> }\n#> else if (is.function(method)) \n#> out <- method(y, times, func, parms, ...)\n#> else if (is.complex(y)) \n#> out <- switch(match.arg(method), vode = zvode(y, times, \n#> func, parms, ...), bdf = zvode(y, times, func, parms, \n#> mf = 22, ...), bdf_d = zvode(y, times, func, parms, \n#> mf = 23, ...), adams = zvode(y, times, func, parms, \n#> mf = 10, ...), impAdams = zvode(y, times, func, parms, \n#> mf = 12, ...), impAdams_d = zvode(y, times, func, \n#> parms, mf = 13, ...))\n#> else out <- switch(match.arg(method), lsoda = lsoda(y, times, \n#> func, parms, ...), vode = vode(y, times, func, parms, \n#> ...), lsode = lsode(y, times, func, parms, ...), lsodes = lsodes(y, \n#> times, func, parms, ...), lsodar = lsodar(y, times, func, \n#> parms, ...), daspk = daspk(y, times, func, parms, ...), \n#> euler = rk(y, times, func, parms, method = \"euler\", ...), \n#> rk4 = rk(y, times, func, parms, method = \"rk4\", ...), \n#> ode23 = rk(y, times, func, parms, method = \"ode23\", ...), \n#> ode45 = rk(y, times, func, parms, method = \"ode45\", ...), \n#> radau = radau(y, times, func, parms, ...), bdf = lsode(y, \n#> times, func, parms, mf = 22, ...), bdf_d = lsode(y, \n#> times, func, parms, mf = 23, ...), adams = lsode(y, \n#> times, func, parms, mf = 10, ...), impAdams = lsode(y, \n#> times, func, parms, mf = 12, ...), impAdams_d = lsode(y, \n#> times, func, parms, mf = 13, ...), iteration = iteration(y, \n#> times, func, parms, ...))\n#> return(out)\n#> }\n#> <bytecode: 0x0000015da4cbceb0>\n#> <environment: namespace:deSolve>\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmethods(\"ode\")\n```\n\n::: {.cell-output .cell-output-stderr}\n\n```\n#> Warning in .S3methods(generic.function, class, envir, all.names = all.names, :\n#> function 'ode' appears not to be S3 generic; found functions that look like S3\n#> methods\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] ode.1D ode.2D ode.3D ode.band\n#> see '?methods' for accessing help and source code\n```\n\n\n:::\n:::\n\n\n\n---\n\nWith the help of the {reshape2} package we use the function `melt()` to reshape the output:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmelt(output,id=\"time\") %>% head\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> time variable value\n#> 1 0 S 99999.00\n#> 2 1 S 99998.43\n#> 3 2 S 99997.70\n#> 4 3 S 99996.77\n#> 5 4 S 99995.56\n#> 6 5 S 99994.02\n```\n\n\n:::\n:::\n\n\n\nThe same as usign `pivot_longer()` function.\n\n::: {.cell}\n\n```{.r .cell-code}\noutput%>%\n pivot_longer(cols = c(\"S\",\"I\",\"R\"),\n names_to=\"variable\",\n values_to=\"values\") %>%\n arrange(desc(variable)) %>%\n head\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> # A tibble: 6 × 3\n#> time variable values\n#> <dbl> <chr> <dbl>\n#> 1 0 S 99999 \n#> 2 1 S 99998.\n#> 3 2 S 99998.\n#> 4 3 S 99997.\n#> 5 4 S 99996.\n#> 6 5 S 99994.\n```\n\n\n:::\n:::\n\n\n---\n\n\nBefore to proceed with the visualization of the SIR model output we do a bit of investigations.\n\n**What if we want to see how `melt()` function works?**\n\n**What instruments we can use to see inside the function and understand how it works?**\n\n\nUsing just the function name **melt** or `structure()` function with *melt* as an argument, we obtain the same output. To select just the argument of the function we can do `args(melt)`\n\n\n::: {.cell}\n\n```{.r .cell-code}\nreshape2:::melt\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> function (data, ..., na.rm = FALSE, value.name = \"value\") \n#> {\n#> UseMethod(\"melt\", data)\n#> }\n#> <bytecode: 0x0000015dabe62738>\n#> <environment: namespace:reshape2>\n```\n\n\n:::\n:::\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nbody(melt)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> {\n#> UseMethod(\"melt\", data)\n#> }\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nformals(melt)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> $data\n#> \n#> \n#> $...\n#> \n#> \n#> $na.rm\n#> [1] FALSE\n#> \n#> $value.name\n#> [1] \"value\"\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nenvironment(melt)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> <environment: namespace:reshape2>\n```\n\n\n:::\n:::\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntypeof(melt)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"closure\"\n```\n\n\n:::\n:::\n\n\n> \"R functions simulate a closure by keeping an explicit reference to the environment that was active when the function was defined.\"\n\n\nref: [closures](https://www.r-bloggers.com/2015/03/using-closures-as-objects-in-r/)\n\n\n\nTry with `methods()`, or `print(methods(melt))`: Non-visible functions are asterisked!\n\n> The S3 method name is followed by an asterisk * if the method definition is not exported from the package namespace in which the method is defined.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmethods(\"melt\", data)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] melt.array* melt.data.frame* melt.default* melt.list* \n#> [5] melt.matrix* melt.table* \n#> see '?methods' for accessing help and source code\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmethods(class=\"table\")\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] [ aperm as.data.frame as_tibble Axis \n#> [6] coerce initialize lines melt plot \n#> [11] points print show slotsFromS3 summary \n#> [16] tail \n#> see '?methods' for accessing help and source code\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nhelp(UseMethod)\n```\n:::\n\n\n\nWe can access to some of the above calls with `getAnywhere()`, for example here is done for \"melt.data.frame\":\n\n::: {.cell}\n\n```{.r .cell-code}\ngetAnywhere(\"melt.data.frame\")\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> A single object matching 'melt.data.frame' was found\n#> It was found in the following places\n#> registered S3 method for melt from namespace reshape2\n#> namespace:reshape2\n#> with value\n#> \n#> function (data, id.vars, measure.vars, variable.name = \"variable\", \n#> ..., na.rm = FALSE, value.name = \"value\", factorsAsStrings = TRUE) \n#> {\n#> vars <- melt_check(data, id.vars, measure.vars, variable.name, \n#> value.name)\n#> id.ind <- match(vars$id, names(data))\n#> measure.ind <- match(vars$measure, names(data))\n#> if (!length(measure.ind)) {\n#> return(data[id.vars])\n#> }\n#> args <- normalize_melt_arguments(data, measure.ind, factorsAsStrings)\n#> measure.attributes <- args$measure.attributes\n#> factorsAsStrings <- args$factorsAsStrings\n#> valueAsFactor <- \"factor\" %in% measure.attributes$class\n#> df <- melt_dataframe(data, as.integer(id.ind - 1), as.integer(measure.ind - \n#> 1), as.character(variable.name), as.character(value.name), \n#> as.pairlist(measure.attributes), as.logical(factorsAsStrings), \n#> as.logical(valueAsFactor))\n#> if (na.rm) {\n#> return(df[!is.na(df[[value.name]]), ])\n#> }\n#> else {\n#> return(df)\n#> }\n#> }\n#> <bytecode: 0x0000015da1c4f580>\n#> <environment: namespace:reshape2>\n```\n\n\n:::\n:::\n\n\n\nReferences:\n\n- [stackoverflow article](https://stackoverflow.com/questions/11173683/how-can-i-read-the-source-code-for-an-r-function)\n- [Rnews bulletin: R Help Desk](https://www.r-project.org/doc/Rnews/Rnews_2006-4.pdf)\n\n\n\n---\n\nGoing back to out model output visualization.\n\n\n::: {.cell}\n\n```{.r .cell-code}\noutput_full<- melt(output,id=\"time\")\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\noutput_full$proportion<- output_full$value/sum(state_values)\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nggplot(data = output, aes(x = time, y = I)) + \n geom_line() + \n xlab(\"Time(days)\") +\n ylab(\"Number of Infected\") + \n labs(\"SIR Model: prevalence of infection\")\n```\n\n::: {.cell-output-display}\n{width=672}\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nggplot(output_full, aes(time, proportion, color = variable, group = variable)) + \n geom_line() + \n xlab(\"Time(days)\") +\n ylab(\"Prevalence\") + \n labs(color = \"Compartment\", title = \"SIR Model\")\n```\n\n::: {.cell-output-display}\n{width=672}\n:::\n:::\n\n\n\n\n\n---\n\n\n\n## Meeting Videos\n\n### Cohort 1\n\n<iframe src=\"https://www.youtube.com/embed/UwzGhMndWzs\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 2\n\n<iframe src=\"https://www.youtube.com/embed/51PMEM4Efb8\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 3\n\n<iframe src=\"https://www.youtube.com/embed/Vwuo-e_Ir0s\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 4\n\n<iframe src=\"https://www.youtube.com/embed/lg5rzOU6lsg\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 5\n\n<iframe src=\"https://www.youtube.com/embed/q8K0Jl5hiV0\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 6\n\n<iframe src=\"https://www.youtube.com/embed/BPd6-G9e32I\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n<details>\n<summary> Meeting chat log </summary>\n\n```\n00:01:11\tOluwafemi Oyedele:\tHi, Good evening\n00:01:22\tFederica Gazzelloni:\tHello!\n00:43:19\tFederica Gazzelloni:\thttps://r4ds.github.io/bookclub-Advanced_R/QandA/docs/welcome.html\n00:52:48\tPriyanka:\tsounds good actually\n00:52:59\tFederica Gazzelloni:\t👍🏻\n```\n</details>\n\n<iframe src=\"https://www.youtube.com/embed/GCDXXkBQrGk\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n<details>\n<summary> Meeting chat log </summary>\n\n```\n00:09:30\tOluwafemi Oyedele:\tHi, Good evening\n00:10:41\tFederica Gazzelloni:\tHi\n00:14:40\tFederica Gazzelloni:\tthat's great!\n00:54:24\tTrevin:\tAlso, sorry if you are repeating 🙂\n00:54:52\tArthur Shaw:\t@ryan, thank you so much for the awesome synthesis! Could you share your reference list? I'd love to dive more deeply into the material you presented.\n00:57:02\tRyan Metcalf:\thttps://cran.r-project.org/doc/manuals/r-release/R-lang.pdf\n00:59:32\tTrevin:\thttps://github.com/COHHIO/RmData\n01:01:48\tRyan Metcalf:\thttps://mastering-shiny.org/\n01:02:02\tRyan Metcalf:\thttps://engineering-shiny.org/\n01:02:15\tArthur Shaw:\t@trevin, if you get bored with beepr, move to BRRR ;)\n01:02:16\tArthur Shaw:\thttps://github.com/brooke-watson/BRRR\n01:09:27\tRyan Metcalf:\tThis is amazing Trevin! I'll take a closer look. Is it ok to reach out to you with any questions?\n01:09:43\tTrevin:\tYeah, feel free to reach out\n```\n</details>\n\n<iframe src=\"https://www.youtube.com/embed/NaiQa_u-j1k\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n<details>\n<summary> Meeting chat log </summary>\n\n```\n00:05:34\tTrevin:\tI didn't catch that\n00:06:02\tpriyanka gagneja:\ti won't be presenting I said .. so you two have the stage\n00:08:39\tFederica Gazzelloni:\tno worries\n00:08:46\tFederica Gazzelloni:\tnext time you do it\n00:08:56\tFederica Gazzelloni:\tdid you sign up?\n00:09:45\tTrevin:\tDiscord is free: https://discord.gg/rstudioconf2022\n00:10:04\tTrevin:\tFree stream link: https://www.rstudio.com/conference/stream\n00:24:32\tArthur Shaw:\tMaybe silly question: is the magrittr pipe an infix function?\n00:32:15\tTrevin:\thttps://colinfay.me/playing-r-infix-functions/\n00:33:23\tArthur Shaw:\tMaybe another example of an infix function: lubridate's `%within%`\n00:33:47\tTrevin:\tThat's a good one too ^\n00:33:55\tpriyanka gagneja:\tyes within would be good.\n00:40:13\tArthur Shaw:\tno\n00:49:50\tArthur Shaw:\tSorry for dropping in and out. My WiFi router is having issues today--maybe is failing.\n01:08:59\tTrevin:\tLooking forward to it 🙂\n```\n</details>\n\n### Cohort 7\n\n<iframe src=\"https://www.youtube.com/embed/tz2z9l41IhU\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n<details>\n\n<summary>Meeting chat log</summary>\n```\n00:31:54\tRonald Legere:\thttps://en.wikipedia.org/wiki/First-class_function\n00:42:55\tRonald Legere:\thttps://en.wikipedia.org/wiki/Immediately_invoked_function_expression\n```\n</details>\n\n<iframe src=\"https://www.youtube.com/embed/AbdcI57vbcg\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n<details>\n\n<summary>Meeting chat log</summary>\n```\n00:54:02\tRon:\tBook gives this simple example of when you might want to use prefix form of an infix operator: lapply(list(1:3, 4:5), `+`, 3)\n00:56:49\tcollinberke:\thttps://colinfay.me/playing-r-infix-functions/#:~:text=What%20are%20infix%20functions%3F,%2C%20%2B%20%2C%20and%20so%20on.\n01:07:36\tRon:\tx[3] <- 33\n01:07:51\tRon:\t`[<-`(x,3,value =33)\n```\n</details>\n", + "markdown": "---\nengine: knitr\ntitle: Functions\n---\n\n## Learning objectives:\n\n- How to make functions in R\n- What are the parts of a function\n- Nested functions\n\n## How to make a simple function in R\n\nLet's imagine the structure of a function:\n\n\n::: {.cell layout-align=\"center\"}\n::: {.cell-output-display}\n{fig-align='center' width=133}\n:::\n:::\n\n\n\n**Function components**\n\nFunctions have three parts, `formals()`, `body()`, and `environment()`.\n\n\n::: {.cell layout-align=\"center\"}\n::: {.cell-output-display}\n\n```{=html}\n<div id=\"htmlwidget-401a8f3d7780b76c2b3b\" style=\"width:100%;height:100%;\" class=\"DiagrammeR html-widget\"></div>\n<script type=\"application/json\" data-for=\"htmlwidget-401a8f3d7780b76c2b3b\">{\"x\":{\"diagram\":\"\\n graph LR\\n A{formals}-->B(body)\\n B-->C(environment)\\nstyle A fill:#bbf,stroke:#f66,stroke-width:2px,color:#fff,stroke-dasharray: 5 5\\n\"},\"evals\":[],\"jsHooks\":[]}</script>\n```\n\n:::\n:::\n\n\n**Example**\n\n\n::: {.cell}\n\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncoffee_ratings%>%slice(1:3)%>%select(1:5)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> # A tibble: 3 × 5\n#> total_cup_points species owner country_of_origin farm_name \n#> <dbl> <chr> <chr> <chr> <chr> \n#> 1 90.6 Arabica metad plc Ethiopia \"metad pl…\n#> 2 89.9 Arabica metad plc Ethiopia \"metad pl…\n#> 3 89.8 Arabica grounds for health admin Guatemala \"san marc…\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\navg_points <- function(species){\n # this function is for calculating the mean\n avg <- coffee_ratings %>% \n filter(species == species) %>% \n summarise(mean = mean(total_cup_points))\n \n return(avg)\n}\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\navg_points(\"Arabica\")\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> # A tibble: 1 × 1\n#> mean\n#> <dbl>\n#> 1 82.1\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nformals(avg_points)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> $species\n```\n\n\n:::\n:::\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nbody(avg_points)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> {\n#> avg <- coffee_ratings %>% filter(species == species) %>% \n#> summarise(mean = mean(total_cup_points))\n#> return(avg)\n#> }\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nenvironment(avg_points)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> <environment: R_GlobalEnv>\n```\n\n\n:::\n:::\n\n\nFunctions uses attributes, one attribute used by base R is `srcref`, short for **source reference**. It points to the source code used to create the function. It contains code comments and other formatting.\n\n::: {.cell}\n\n```{.r .cell-code}\nattr(avg_points, \"srcref\")\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> NULL\n```\n\n\n:::\n:::\n\n\n## Primitive functions\n\nAre the core function in base R, such as `sum()`\n\n::: {.cell}\n\n```{.r .cell-code}\nsum\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> function (..., na.rm = FALSE) .Primitive(\"sum\")\n```\n\n\n:::\n:::\n\n\nType of primitives:\n\n- builtin \n- special\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntypeof(sum)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"builtin\"\n```\n\n\n:::\n:::\n\n\n\nThese core functions have components to NULL.\n\n## Anonymous functions\n\nIf you don't provide a name to a function\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlapply(mtcars%>%select(mpg,cyl), function(x) length(unique(x)))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> $mpg\n#> [1] 25\n#> \n#> $cyl\n#> [1] 3\n```\n\n\n:::\n:::\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nvector_len <- function(x) {\n length(unique(x))\n}\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlapply(mtcars%>%select(mpg,cyl), vector_len)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> $mpg\n#> [1] 25\n#> \n#> $cyl\n#> [1] 3\n```\n\n\n:::\n:::\n\n\n**Invoking a function**\n\n\n::: {.cell}\n\n```{.r .cell-code}\nargs <- unique(coffee_ratings$species) %>% \n `[[`(1) %>% \n as.list()\n \n \n \ndo.call(avg_points, args)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> # A tibble: 1 × 1\n#> mean\n#> <dbl>\n#> 1 82.1\n```\n\n\n:::\n:::\n\n\n\n\n## Function composition\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsquare <- function(x) x^2\ndeviation <- function(x) x - mean(x)\nx <- runif(100)\nsqrt(mean(square(deviation(x))))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 0.2841442\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nout <- deviation(x)\nout <- square(out)\nout <- mean(out)\nout <- sqrt(out)\nout\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 0.2841442\n```\n\n\n:::\n:::\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx %>%\n deviation() %>%\n square() %>%\n mean() %>%\n sqrt()\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 0.2841442\n```\n\n\n:::\n:::\n\n\n## More about functions insights\n\n### Lexical scoping\n\n**Rules**\n\n- Name masking\n- Functions versus variables\n- A fresh start\n- Dynamic lookup\n\n\n**Debugging**\n\nThis function \n\n::: {.cell}\n\n```{.r .cell-code}\ng12 <- function() x + 1\nx <- 15\ng12()\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 16\n```\n\n\n:::\n:::\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncodetools::findGlobals(g12)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"+\" \"x\"\n```\n\n\n:::\n:::\n\n\nYou can change the function’s environment to an environment which contains nothing:\n\n::: {.cell}\n\n```{.r .cell-code}\n# environment(g12) <- emptyenv()\n# g12()\n# Error in x + 1 : could not find function \"+\"\n```\n:::\n\n\n\n### ... (dot-dot-dot)\n\n**Example**\n\n\n::: {.cell}\n\n```{.r .cell-code}\ni01 <- function(y, z) {\n list(y = y, z = z)\n}\ni02 <- function(x, ...) {\n i01(...)\n}\nstr(i02(x = 1, y = 2, z = 3))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> List of 2\n#> $ y: num 2\n#> $ z: num 3\n```\n\n\n:::\n\n```{.r .cell-code}\n#> List of 2\n#> $ y: num 2\n#> $ z: num 3\n```\n:::\n\n\n### Exiting a function\n\n1. Implicit or explicit returns\n\n2. Invisibility (`<-` most famous function that returns an invisible value)\n\n3. `stop()` to stop a function with an error.\n\n4. Exit handlers\n\n\n### Function forms\n\n>Everything that exists is an object.\nEverything that happens is a function call.\n— John Chambers\n\n\n\n::: {.cell}\n::: {.cell-output-display}\n{width=584}\n:::\n:::\n\n\n---\n\n\n## Case Study: SIR model function\n\nThis is an interesting example taken from a course on Coursera: [Infectious disease modelling-ICL](https://www.coursera.org/specializations/infectious-disease-modelling)\n\nThe purpose of this example is to show how to make a model passing through making a function.\n\nFirst we need to load some useful libraries:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(deSolve)\nlibrary(reshape2)\n```\n:::\n\n\n\nThen set the model inputs:\n\n- population size (N)\n- number of susceptable (S)\n- infected (I)\n- recovered (R)\n\nAnd add the model parameters:\n\n- infection rate ($\\beta$)\n- recovery rate ($\\gamma$)\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nN<- 100000 # population\n\nstate_values<- c(S = N -1, # susceptible\n I = 1, # infected\n R = 0) # recovered\n\nparameters<- c(beta = 1/2, # infection rate days^-1\n gamma = 1/4) # recovery rate days^-1\n```\n:::\n\n\n\nThen we set the **time** as an important factor, which defines the length of time we are looking at this model run. It is intended as the time range in which the infections spread out, let’s say that we are aiming to investigate an infection period of 100 days.\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntimes<- seq(0, 100, by = 1)\n```\n:::\n\n\nFinally, we set up the **SIR model**, the susceptable, infected and recovered model. How do we do that is passing the paramenters through a function of the time, and state.\n\nWithin the model function we calculate one more paramenter, the **force of infection**: $\\lambda$ \n\n\n::: {.cell}\n\n```{.r .cell-code}\nsir_model<- function(time, state, parameters){\n with(as.list(c(state, parameters)),{\n N<- S + I + R\n lambda = beta * I/N # force of infection\n dS<- - lambda * S \n dI<- lambda * S - gamma * I\n dR<- gamma * I\n return(list(c(dS,dI,dR)))\n })\n}\n```\n:::\n\n\n\nOnce we have our **SIR model** function ready, we can calculate the **output** of the model, with the help of the function `ode()` from {deSolve} package.\n\n\n::: {.cell}\n\n```{.r .cell-code}\noutput<- as.data.frame(ode(y = state_values,\n times = times,\n func = sir_model,\n parms = parameters))\noutput %>% head\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> time S I R\n#> 1 0 99999.00 1.000000 0.0000000\n#> 2 1 99998.43 1.284018 0.2840252\n#> 3 2 99997.70 1.648696 0.6487171\n#> 4 3 99996.77 2.116939 1.1169863\n#> 5 4 99995.56 2.718152 1.7182450\n#> 6 5 99994.02 3.490086 2.4902600\n```\n\n\n:::\n:::\n\n\nIn addition to our builtin SIR model function we can have a look at:\n\n\n::: {.cell}\n\n```{.r .cell-code}\n?deSolve::ode()\n```\n:::\n\n\nIt solves **Ordinary Differential Equations**. \n\n\n::: {.cell}\n\n```{.r .cell-code}\ndeSolve:::ode\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> function (y, times, func, parms, method = c(\"lsoda\", \"lsode\", \n#> \"lsodes\", \"lsodar\", \"vode\", \"daspk\", \"euler\", \"rk4\", \"ode23\", \n#> \"ode45\", \"radau\", \"bdf\", \"bdf_d\", \"adams\", \"impAdams\", \"impAdams_d\", \n#> \"iteration\"), ...) \n#> {\n#> if (is.null(method)) \n#> method <- \"lsoda\"\n#> if (is.list(method)) {\n#> if (!inherits(method, \"rkMethod\")) \n#> stop(\"'method' should be given as string or as a list of class 'rkMethod'\")\n#> out <- rk(y, times, func, parms, method = method, ...)\n#> }\n#> else if (is.function(method)) \n#> out <- method(y, times, func, parms, ...)\n#> else if (is.complex(y)) \n#> out <- switch(match.arg(method), vode = zvode(y, times, \n#> func, parms, ...), bdf = zvode(y, times, func, parms, \n#> mf = 22, ...), bdf_d = zvode(y, times, func, parms, \n#> mf = 23, ...), adams = zvode(y, times, func, parms, \n#> mf = 10, ...), impAdams = zvode(y, times, func, parms, \n#> mf = 12, ...), impAdams_d = zvode(y, times, func, \n#> parms, mf = 13, ...))\n#> else out <- switch(match.arg(method), lsoda = lsoda(y, times, \n#> func, parms, ...), vode = vode(y, times, func, parms, \n#> ...), lsode = lsode(y, times, func, parms, ...), lsodes = lsodes(y, \n#> times, func, parms, ...), lsodar = lsodar(y, times, func, \n#> parms, ...), daspk = daspk(y, times, func, parms, ...), \n#> euler = rk(y, times, func, parms, method = \"euler\", ...), \n#> rk4 = rk(y, times, func, parms, method = \"rk4\", ...), \n#> ode23 = rk(y, times, func, parms, method = \"ode23\", ...), \n#> ode45 = rk(y, times, func, parms, method = \"ode45\", ...), \n#> radau = radau(y, times, func, parms, ...), bdf = lsode(y, \n#> times, func, parms, mf = 22, ...), bdf_d = lsode(y, \n#> times, func, parms, mf = 23, ...), adams = lsode(y, \n#> times, func, parms, mf = 10, ...), impAdams = lsode(y, \n#> times, func, parms, mf = 12, ...), impAdams_d = lsode(y, \n#> times, func, parms, mf = 13, ...), iteration = iteration(y, \n#> times, func, parms, ...))\n#> return(out)\n#> }\n#> <bytecode: 0x0000022b02095d58>\n#> <environment: namespace:deSolve>\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmethods(\"ode\")\n```\n\n::: {.cell-output .cell-output-stderr}\n\n```\n#> Warning in .S3methods(generic.function, class, envir, all.names = all.names, :\n#> function 'ode' appears not to be S3 generic; found functions that look like S3\n#> methods\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] ode.1D ode.2D ode.3D ode.band\n#> see '?methods' for accessing help and source code\n```\n\n\n:::\n:::\n\n\n\n---\n\nWith the help of the {reshape2} package we use the function `melt()` to reshape the output:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmelt(output,id=\"time\") %>% head\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> time variable value\n#> 1 0 S 99999.00\n#> 2 1 S 99998.43\n#> 3 2 S 99997.70\n#> 4 3 S 99996.77\n#> 5 4 S 99995.56\n#> 6 5 S 99994.02\n```\n\n\n:::\n:::\n\n\n\nThe same as usign `pivot_longer()` function.\n\n::: {.cell}\n\n```{.r .cell-code}\noutput%>%\n pivot_longer(cols = c(\"S\",\"I\",\"R\"),\n names_to=\"variable\",\n values_to=\"values\") %>%\n arrange(desc(variable)) %>%\n head\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> # A tibble: 6 × 3\n#> time variable values\n#> <dbl> <chr> <dbl>\n#> 1 0 S 99999 \n#> 2 1 S 99998.\n#> 3 2 S 99998.\n#> 4 3 S 99997.\n#> 5 4 S 99996.\n#> 6 5 S 99994.\n```\n\n\n:::\n:::\n\n\n---\n\n\nBefore to proceed with the visualization of the SIR model output we do a bit of investigations.\n\n**What if we want to see how `melt()` function works?**\n\n**What instruments we can use to see inside the function and understand how it works?**\n\n\nUsing just the function name **melt** or `structure()` function with *melt* as an argument, we obtain the same output. To select just the argument of the function we can do `args(melt)`\n\n\n::: {.cell}\n\n```{.r .cell-code}\nreshape2:::melt\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> function (data, ..., na.rm = FALSE, value.name = \"value\") \n#> {\n#> UseMethod(\"melt\", data)\n#> }\n#> <bytecode: 0x0000022b08d810e8>\n#> <environment: namespace:reshape2>\n```\n\n\n:::\n:::\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nbody(melt)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> {\n#> UseMethod(\"melt\", data)\n#> }\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nformals(melt)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> $data\n#> \n#> \n#> $...\n#> \n#> \n#> $na.rm\n#> [1] FALSE\n#> \n#> $value.name\n#> [1] \"value\"\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nenvironment(melt)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> <environment: namespace:reshape2>\n```\n\n\n:::\n:::\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntypeof(melt)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"closure\"\n```\n\n\n:::\n:::\n\n\n> \"R functions simulate a closure by keeping an explicit reference to the environment that was active when the function was defined.\"\n\n\nref: [closures](https://www.r-bloggers.com/2015/03/using-closures-as-objects-in-r/)\n\n\n\nTry with `methods()`, or `print(methods(melt))`: Non-visible functions are asterisked!\n\n> The S3 method name is followed by an asterisk * if the method definition is not exported from the package namespace in which the method is defined.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmethods(\"melt\", data)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] melt.array* melt.data.frame* melt.default* melt.list* \n#> [5] melt.matrix* melt.table* \n#> see '?methods' for accessing help and source code\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmethods(class=\"table\")\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] [ aperm as.data.frame as_tibble Axis \n#> [6] coerce initialize lines melt plot \n#> [11] points print show slotsFromS3 summary \n#> [16] tail \n#> see '?methods' for accessing help and source code\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nhelp(UseMethod)\n```\n:::\n\n\n\nWe can access to some of the above calls with `getAnywhere()`, for example here is done for \"melt.data.frame\":\n\n::: {.cell}\n\n```{.r .cell-code}\ngetAnywhere(\"melt.data.frame\")\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> A single object matching 'melt.data.frame' was found\n#> It was found in the following places\n#> registered S3 method for melt from namespace reshape2\n#> namespace:reshape2\n#> with value\n#> \n#> function (data, id.vars, measure.vars, variable.name = \"variable\", \n#> ..., na.rm = FALSE, value.name = \"value\", factorsAsStrings = TRUE) \n#> {\n#> vars <- melt_check(data, id.vars, measure.vars, variable.name, \n#> value.name)\n#> id.ind <- match(vars$id, names(data))\n#> measure.ind <- match(vars$measure, names(data))\n#> if (!length(measure.ind)) {\n#> return(data[id.vars])\n#> }\n#> args <- normalize_melt_arguments(data, measure.ind, factorsAsStrings)\n#> measure.attributes <- args$measure.attributes\n#> factorsAsStrings <- args$factorsAsStrings\n#> valueAsFactor <- \"factor\" %in% measure.attributes$class\n#> df <- melt_dataframe(data, as.integer(id.ind - 1), as.integer(measure.ind - \n#> 1), as.character(variable.name), as.character(value.name), \n#> as.pairlist(measure.attributes), as.logical(factorsAsStrings), \n#> as.logical(valueAsFactor))\n#> if (na.rm) {\n#> return(df[!is.na(df[[value.name]]), ])\n#> }\n#> else {\n#> return(df)\n#> }\n#> }\n#> <bytecode: 0x0000022afff9dd58>\n#> <environment: namespace:reshape2>\n```\n\n\n:::\n:::\n\n\n\nReferences:\n\n- [stackoverflow article](https://stackoverflow.com/questions/11173683/how-can-i-read-the-source-code-for-an-r-function)\n- [Rnews bulletin: R Help Desk](https://www.r-project.org/doc/Rnews/Rnews_2006-4.pdf)\n\n\n\n---\n\nGoing back to out model output visualization.\n\n\n::: {.cell}\n\n```{.r .cell-code}\noutput_full<- melt(output,id=\"time\")\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\noutput_full$proportion<- output_full$value/sum(state_values)\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nggplot(data = output, aes(x = time, y = I)) + \n geom_line() + \n xlab(\"Time(days)\") +\n ylab(\"Number of Infected\") + \n labs(\"SIR Model: prevalence of infection\")\n```\n\n::: {.cell-output-display}\n{width=672}\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nggplot(output_full, aes(time, proportion, color = variable, group = variable)) + \n geom_line() + \n xlab(\"Time(days)\") +\n ylab(\"Prevalence\") + \n labs(color = \"Compartment\", title = \"SIR Model\")\n```\n\n::: {.cell-output-display}\n{width=672}\n:::\n:::\n\n", "supporting": [ "06_files" ], diff --git a/_freeze/slides/06/figure-html/unnamed-chunk-44-1.png b/_freeze/slides/06/figure-html/unnamed-chunk-44-1.png Binary files differ. diff --git a/_freeze/slides/06/figure-html/unnamed-chunk-45-1.png b/_freeze/slides/06/figure-html/unnamed-chunk-45-1.png Binary files differ. diff --git a/_freeze/slides/07/execute-results/html.json b/_freeze/slides/07/execute-results/html.json @@ -1,9 +1,11 @@ { - "hash": "3e6c5d41a5ab585daada3a45abb9ebf6", + "hash": "2f1e06c5bb778afb8a0eac49f88ba610", "result": { "engine": "knitr", - "markdown": "---\nengine: knitr\ntitle: Environments\n---\n\n## Learning objectives:\n\n- THESE ARE NICE TO HAVE BUT NOT ABSOLUTELY NECESSARY\n\n## SLIDE 1\n\n- ADD SLIDES AS SECTIONS (`##`).\n- TRY TO KEEP THEM RELATIVELY SLIDE-LIKE; THESE ARE NOTES, NOT THE BOOK ITSELF.\n\n## Meeting Videos\n\n### Cohort 1\n\n<iframe src=\"https://www.youtube.com/embed/mk7iu1-P8ZU\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 2\n\n<iframe src=\"https://www.youtube.com/embed/syRMRYKN30k\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 3\n\n<iframe src=\"https://www.youtube.com/embed/fW7Di01gLhw\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 4\n\n<iframe src=\"https://www.youtube.com/embed/Aw_Q7PMYJkA\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 5\n\n<iframe src=\"https://www.youtube.com/embed/tuafimbMyKk\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 6\n\n<iframe src=\"https://www.youtube.com/embed/buUaaOu89EQ\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n<details>\n<summary> Meeting chat log </summary>\n\n```\n00:08:41\tArthur Shaw:\tHello, everyone!\n00:21:31\tFederica Gazzelloni:\t?walk: Apply a function to each element of a list or atomic vector\n00:23:15\tFederica Gazzelloni:\t?caller_env: Get properties of the current or caller frame\n00:24:56\tTrevin:\tpurrr::walk(x, function(x, ce, ace = rlang::caller_env()) {\n .ce <- rlang::caller_env()\n message(\"Internal: \")\n print(.ce)\n message(\"Argument: \")\n print(ace)\n message(\"External: \")\n print(ce)\n message(\"Internal: \",paste0(ls(.ce), collapse = \"\\n\"))\n message(\"Argument: \",paste0(ls(ace), collapse = \"\\n\"))\n message(\"External: \",paste0(ls(ce), collapse = \"\\n\"))\n}, ce = rlang::caller_env())\n00:29:39\tFederica Gazzelloni:\t??iwalk: Apply a function to each element of a vector, and its index\n00:35:30\tArthur Shaw:\thttps://magrittr.tidyverse.org/reference/tee.html\n00:36:05\tFederica Gazzelloni:\t?`%T>%`\n00:46:59\tTrevin:\t?eval\n01:06:03\tFederica Gazzelloni:\thttps://cran.r-project.org/web/packages/withr/index.html\n01:09:21\tFederica Gazzelloni:\thttps://github.com/r-lib/withr\n01:10:38\tTrevin:\tI'm okay if we meet next week\n01:10:53\tOluwafemi Oyedele:\tI am ok with next week\n```\n</details>\n\n### Cohort 7\n\n<iframe src=\"https://www.youtube.com/embed/PUXrijnsWy0\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n<details>\n<summary>Meeting chat log</summary>\n```\n00:06:49\tRyan Honomichl:\thttps://r4ds.github.io/bookclub-Advanced_R/QandA/docs/environments.html\n```\n</details>\n\n<iframe src=\"https://www.youtube.com/embed/6xECnY4ro48\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n<details>\n<summary>Meeting chat log</summary>\n```\n00:14:44\tcollinberke:\thttps://ivelasq.rbind.io/blog/macos-rig/index.html\n00:21:10\tcollinberke:\thttps://github.com/tidyverse/dplyr/blob/main/NAMESPACE\n01:00:21\tcollinberke:\thttps://r4ds.hadley.nz/iteration.html\n```\n</details>\n", - "supporting": [], + "markdown": "---\nengine: knitr\ntitle: Environments\n---\n\n## Learning objectives:\n\n- THESE ARE NICE TO HAVE BUT NOT ABSOLUTELY NECESSARY\n\n## SLIDE 1\n\n- ADD SLIDES AS SECTIONS (`##`).\n- TRY TO KEEP THEM RELATIVELY SLIDE-LIKE; THESE ARE NOTES, NOT THE BOOK ITSELF.\n", + "supporting": [ + "07_files" + ], "filters": [ "rmarkdown/pagebreak.lua" ], diff --git a/_freeze/slides/08/execute-results/html.json b/_freeze/slides/08/execute-results/html.json @@ -1,9 +1,11 @@ { - "hash": "bf0085f1b11b3881ad010e2ab3ff06d8", + "hash": "4732d46753d0ca29c20f998ad3e58e75", "result": { "engine": "knitr", - "markdown": "---\nengine: knitr\ntitle: Conditions\n---\n\n## Learning objectives:\n\n- What conditions are\n- How to use them\n\n## Introduction\n\nWhat are conditions? Problems that happen in functions:\n\n- Error\n- Warning\n- Message\n\nAs a function author, one can signal them--that is, say there's a problem.\n\nAs a function consumer, one can handle them--for example, react or ignore.\n\n## Signalling conditions\n\n### Types of conditions\n\nThree types of conditions:\n\n- ❌ **Errors.** Problem arose, and the function cannot continue. \n- ⚠️ **Warnings.** Problem arose, but the function can continue, if only partially.\n- 💬 **Messages.** Something happened, and the user should know.\n\n### ❌ Errors\n\nHow to throw errors\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# with base R\nstop(\"... in the name of love...\")\n```\n\n::: {.cell-output .cell-output-error}\n\n```\n#> Error: ... in the name of love...\n```\n\n\n:::\n\n```{.r .cell-code}\n# with rlang\nrlang::abort(\"...before you break my heart...\")\n```\n\n::: {.cell-output .cell-output-error}\n\n```\n#> Error:\n#> ! ...before you break my heart...\n```\n\n\n:::\n\n```{.r .cell-code}\n# with base R; without call\nstop(\"... think it o-o-over...\", call. = FALSE)\n```\n\n::: {.cell-output .cell-output-error}\n\n```\n#> Error: ... think it o-o-over...\n```\n\n\n:::\n:::\n\nComposing error messages\n\n- Mechanics.\n - `stop()` pastes together arguments\n\n::: {.cell}\n\n```{.r .cell-code}\nsome_val <- 1\nstop(\"Your value is: \", some_val, call. = FALSE)\n```\n\n::: {.cell-output .cell-output-error}\n\n```\n#> Error: Your value is: 1\n```\n\n\n:::\n:::\n\n - `abort()` requires `{glue}`\n\n::: {.cell}\n\n```{.r .cell-code}\nsome_val <- 1\nrlang::abort(glue::glue(\"Your value is: {some_val}\"))\n```\n\n::: {.cell-output .cell-output-error}\n\n```\n#> Error:\n#> ! Your value is: 1\n```\n\n\n:::\n:::\n\n- Style. See [here](http://style.tidyverse.org/error-messages.html).\n\n### ⚠️ Warnings\n\nMay have multiple warnings per call\n\n\n::: {.cell}\n\n```{.r .cell-code}\nwarn <- function() {\n warning(\"This is your first warning\")\n warning(\"This is your second warning\")\n warning(\"This is your LAST warning\")\n}\n```\n:::\n\n\nPrint all warnings once call is complete.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nwarn()\n```\n\n::: {.cell-output .cell-output-stderr}\n\n```\n#> Warning in warn(): This is your first warning\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\n#> Warning in warn(): This is your second warning\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\n#> Warning in warn(): This is your LAST warning\n```\n\n\n:::\n:::\n\n\nLike errors, `warning()` has\n\n- a call argument\n- an `{rlang}` analog\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# base R\n# ... with call (implicitly .call = TRUE)\nwarning(\"Warning\")\n```\n\n::: {.cell-output .cell-output-stderr}\n\n```\n#> Warning: Warning\n```\n\n\n:::\n\n```{.r .cell-code}\n# ... with call suppressed\nwarning(\"Warning\", call. = FALSE)\n```\n\n::: {.cell-output .cell-output-stderr}\n\n```\n#> Warning: Warning\n```\n\n\n:::\n\n```{.r .cell-code}\n# rlang\n# note: call suppressed by default\nrlang::warn(\"Warning\")\n```\n\n::: {.cell-output .cell-output-stderr}\n\n```\n#> Warning: Warning\n```\n\n\n:::\n:::\n\n\n(Hadley's) advice on usage:\n\n- Err on the side of errors. In other words, error rather than warn.\n- But warnings make sense in a few cases:\n - Function is being deprecated. Warn that it is reaching end of life.\n - Function is reasonably sure to recover from issue.\n\n### 💬 Messages\n\nMechanics:\n\n- Issued immediately\n- Do not have a call argument\n\nStyle:\n\nMessages are best when they inform about:\n\n- Default arguments\n- Status updates of for functions used primarily for side-effects (e.g., interaction with web API, file downloaded, etc.)\n- Progress of long-running process (in the absence of a status bar).\n- Package loading message (e.g., attaching package, objects masked)\n\n## Ignoring conditions\n\nA few ways:\n\n- `try()`\n- `suppressWarnings()`\n- `suppressMessages()`\n\n### `try()`\n\nWhat it does:\n\n- Displays error\n- But continues execution after error\n\n\n::: {.cell}\n\n```{.r .cell-code}\nbad_log <- function(x) {\n try(log(x))\n 10\n}\n\nbad_log(\"bad\")\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> Error in log(x) : non-numeric argument to mathematical function\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 10\n```\n\n\n:::\n:::\n\n\nBetter ways to react to/recover from errors:\n\n1. Use `tryCatch()` to \"catch\" the error and perform a different action in the event of an error.\n1. Set a default value inside the call. See below.\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndefault <- NULL\ntry(default <- read.csv(\"possibly-bad-input.csv\"), silent = TRUE)\n```\n\n::: {.cell-output .cell-output-stderr}\n\n```\n#> Warning in file(file, \"rt\"): cannot open file 'possibly-bad-input.csv': No such\n#> file or directory\n```\n\n\n:::\n:::\n\n\n\n### `suppressWarnings()`, `suppressMessages()`\n\nWhat it does:\n\n- Supresses all warnings (messages)\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# suppress warnings (from our `warn()` function above)\nsuppressWarnings(warn())\n\n# suppress messages\nmany_messages <- function() {\n message(\"Message 1\")\n message(\"Message 2\")\n message(\"Message 3\")\n}\n\nsuppressMessages(many_messages())\n```\n:::\n\n\n## Handling conditions\n\nEvery condition has a default behavior:\n\n- ❌ Errors halt execution\n- ⚠️ Warnings are collected during execution and displayed in bulk after execution\n- 💬 Messages are displayed immediately\n\nCondition handlers allow one to change that behavior (within the scope of a function).\n\nTwo handler functions:\n\n- `tryCatch()`\n- `withCallingHandlers()`\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# try to run `code_to_try_to_run`\n# if (error) condition is signalled, fun some other code\ntryCatch(\n error = function(cnd) {\n # code to run when error is thrown\n },\n code_to_try_to_run\n)\n\n# try to `code_to_try_to_run`\n# if condition is signalled, run code corresponding to condition type\nwithCallingHandlers(\n warning = function(cnd) {\n # code to run when warning is signalled\n },\n message = function(cnd) {\n # code to run when message is signalled\n },\n code_to_try_to_run\n)\n```\n:::\n\n\n\n### Condition objects\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# catch a condition\ncnd <- rlang::catch_cnd(stop(\"An error\"))\n# inspect it\nstr(cnd)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> List of 2\n#> $ message: chr \"An error\"\n#> $ call : language force(expr)\n#> - attr(*, \"class\")= chr [1:3] \"simpleError\" \"error\" \"condition\"\n```\n\n\n:::\n:::\n\n\nThe standard components\n\n- `message`. The error message. To extract it, use `conditionMessage(cnd)`.\n- `call`. The function call that triggered the condition. To extract it, use `conditionCall(cnd)`.\n\nBut custom conditions may contain other components.\n\n### Exiting handlers\n\nIf a condition is signalled, this type of handler controls what code to run before exiting the function call. \n\n\n::: {.cell}\n\n```{.r .cell-code}\nf3 <- function(x) {\n tryCatch(\n # if error signalled, return NA\n error = function(cnd) NA,\n # try to run log\n log(x)\n )\n}\n\nf3(\"x\")\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] NA\n```\n\n\n:::\n:::\n\n\nWhen a condition is signalled, control moves to the handler and never returns to the original code.\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntryCatch(\n message = function(cnd) \"There\",\n {\n message(\"Here\")\n stop(\"This code is never run!\")\n }\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"There\"\n```\n\n\n:::\n:::\n\n\nThe `tryCatch()` exit handler has one final argument: `finally`. This is run regardless of the condition of the original code. This is often used for clean-up.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# try to write text to disk\n# if an error is signalled--for example, `path` does not exist\n# or if no condition is signalled\n# that is in both cases, the code block in `finally` is executed\npath <- tempfile()\ntryCatch(\n {\n writeLines(\"Hi!\", path)\n # ...\n },\n finally = {\n # always run\n unlink(path)\n }\n)\n```\n:::\n\n\n### Calling handlers\n\nDefinition by verbal comparison:\n\n- With exit handlers, code exits the normal flow once a condition is signalled\n- With calling handlers, code continues in the normal flow once control is returned by the handler.\n\nDefinition by code comparison:\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# with an exit handler, control moves to the handler once condition signalled and does not move back\ntryCatch(\n message = function(cnd) cat(\"Caught a message!\\n\"), \n {\n message(\"Someone there?\")\n message(\"Why, yes!\")\n }\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> Caught a message!\n```\n\n\n:::\n\n```{.r .cell-code}\n# with a calling handler, control moves first to the handler and the moves back to the main code\nwithCallingHandlers(\n message = function(cnd) cat(\"Caught a message!\\n\"), \n {\n message(\"Someone there?\")\n message(\"Why, yes!\")\n }\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> Caught a message!\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\n#> Someone there?\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> Caught a message!\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\n#> Why, yes!\n```\n\n\n:::\n:::\n\n\n### By default, conditions propagate\n\nLet's suppose that there are nested handlers. If a condition is signalled in the child, it propagates to its parent handler(s).\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Bubbles all the way up to default handler which generates the message\nwithCallingHandlers(\n message = function(cnd) cat(\"Level 2\\n\"),\n withCallingHandlers(\n message = function(cnd) cat(\"Level 1\\n\"),\n message(\"Hello\")\n )\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> Level 1\n#> Level 2\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\n#> Hello\n```\n\n\n:::\n\n```{.r .cell-code}\n# Bubbles up to tryCatch\ntryCatch(\n message = function(cnd) cat(\"Level 2\\n\"),\n withCallingHandlers(\n message = function(cnd) cat(\"Level 1\\n\"),\n message(\"Hello\")\n )\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> Level 1\n#> Level 2\n```\n\n\n:::\n:::\n\n\n### But conditions can be muffled\n\nIf one wants to \"muffle\" the siginal, one needs to use `rlang::cnd_muffle()`\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Muffles the default handler which prints the messages\nwithCallingHandlers(\n message = function(cnd) {\n cat(\"Level 2\\n\")\n rlang::cnd_muffle(cnd)\n },\n withCallingHandlers(\n message = function(cnd) cat(\"Level 1\\n\"),\n message(\"Hello\")\n )\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> Level 1\n#> Level 2\n```\n\n\n:::\n\n```{.r .cell-code}\n# Muffles level 2 handler and the default handler\nwithCallingHandlers(\n message = function(cnd) cat(\"Level 2\\n\"),\n withCallingHandlers(\n message = function(cnd) {\n cat(\"Level 1\\n\")\n rlang::cnd_muffle(cnd)\n },\n message(\"Hello\")\n )\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> Level 1\n```\n\n\n:::\n:::\n\n\n### Call stacks\n\nCall stacks of exiting and calling handlers differ.\n\nWhy? \n\n> Calling handlers are called in the context of the call that signalled the condition\n> exiting handlers are called in the context of the call to tryCatch()\n\nTo see this, consider how the call stacks differ for a toy example.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# create a function\nf <- function() g()\ng <- function() h()\nh <- function() message\n\n# call stack of calling handlers\nwithCallingHandlers(f(), message = function(cnd) {\n lobstr::cst()\n rlang::cnd_muffle(cnd)\n})\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> function (..., domain = NULL, appendLF = TRUE) \n#> {\n#> cond <- if (...length() == 1L && inherits(..1, \"condition\")) {\n#> if (nargs() > 1L) \n#> warning(\"additional arguments ignored in message()\")\n#> ..1\n#> }\n#> else {\n#> msg <- .makeMessage(..., domain = domain, appendLF = appendLF)\n#> call <- sys.call()\n#> simpleMessage(msg, call)\n#> }\n#> defaultHandler <- function(c) {\n#> cat(conditionMessage(c), file = stderr(), sep = \"\")\n#> }\n#> withRestarts({\n#> signalCondition(cond)\n#> defaultHandler(cond)\n#> }, muffleMessage = function() NULL)\n#> invisible()\n#> }\n#> <bytecode: 0x000001c80b980b98>\n#> <environment: namespace:base>\n```\n\n\n:::\n\n```{.r .cell-code}\n# call stack of exit handlers\ntryCatch(f(), message = function(cnd) lobstr::cst())\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> function (..., domain = NULL, appendLF = TRUE) \n#> {\n#> cond <- if (...length() == 1L && inherits(..1, \"condition\")) {\n#> if (nargs() > 1L) \n#> warning(\"additional arguments ignored in message()\")\n#> ..1\n#> }\n#> else {\n#> msg <- .makeMessage(..., domain = domain, appendLF = appendLF)\n#> call <- sys.call()\n#> simpleMessage(msg, call)\n#> }\n#> defaultHandler <- function(c) {\n#> cat(conditionMessage(c), file = stderr(), sep = \"\")\n#> }\n#> withRestarts({\n#> signalCondition(cond)\n#> defaultHandler(cond)\n#> }, muffleMessage = function() NULL)\n#> invisible()\n#> }\n#> <bytecode: 0x000001c80b980b98>\n#> <environment: namespace:base>\n```\n\n\n:::\n\n```{.r .cell-code}\ntryCatch(f(), message = function(cnd) lobstr::cst())\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> function (..., domain = NULL, appendLF = TRUE) \n#> {\n#> cond <- if (...length() == 1L && inherits(..1, \"condition\")) {\n#> if (nargs() > 1L) \n#> warning(\"additional arguments ignored in message()\")\n#> ..1\n#> }\n#> else {\n#> msg <- .makeMessage(..., domain = domain, appendLF = appendLF)\n#> call <- sys.call()\n#> simpleMessage(msg, call)\n#> }\n#> defaultHandler <- function(c) {\n#> cat(conditionMessage(c), file = stderr(), sep = \"\")\n#> }\n#> withRestarts({\n#> signalCondition(cond)\n#> defaultHandler(cond)\n#> }, muffleMessage = function() NULL)\n#> invisible()\n#> }\n#> <bytecode: 0x000001c80b980b98>\n#> <environment: namespace:base>\n```\n\n\n:::\n:::\n\n\n## Custom conditions\n\n### Motivation\n\nThe `base::log()` function provides a minimal error message.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlog(letters)\n```\n\n::: {.cell-output .cell-output-error}\n\n```\n#> Error in log(letters): non-numeric argument to mathematical function\n```\n\n\n:::\n\n```{.r .cell-code}\nlog(1:10, base = letters)\n```\n\n::: {.cell-output .cell-output-error}\n\n```\n#> Error in log(1:10, base = letters): non-numeric argument to mathematical function\n```\n\n\n:::\n:::\n\n\nOne could make a more informative error message about which argument is problematic.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmy_log <- function(x, base = exp(1)) {\n if (!is.numeric(x)) {\n rlang::abort(paste0(\n \"`x` must be a numeric vector; not \", typeof(x), \".\"\n ))\n }\n if (!is.numeric(base)) {\n rlang::abort(paste0(\n \"`base` must be a numeric vector; not \", typeof(base), \".\"\n ))\n }\n\n base::log(x, base = base)\n}\n```\n:::\n\n\nConsider the difference:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmy_log(letters)\n```\n\n::: {.cell-output .cell-output-error}\n\n```\n#> Error in `my_log()`:\n#> ! `x` must be a numeric vector; not character.\n```\n\n\n:::\n\n```{.r .cell-code}\nmy_log(1:10, base = letters)\n```\n\n::: {.cell-output .cell-output-error}\n\n```\n#> Error in `my_log()`:\n#> ! `base` must be a numeric vector; not character.\n```\n\n\n:::\n:::\n\n\n\n### Signalling\n\nCreate a helper function to describe errors:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nabort_bad_argument <- function(arg, must, not = NULL) {\n msg <- glue::glue(\"`{arg}` must {must}\")\n if (!is.null(not)) {\n not <- typeof(not)\n msg <- glue::glue(\"{msg}; not {not}.\")\n }\n \n rlang::abort(\n \"error_bad_argument\", # <- this is the (error) class, I believe\n message = msg, \n arg = arg, \n must = must, \n not = not\n )\n}\n```\n:::\n\n\nRewrite the log function to use this helper function:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmy_log <- function(x, base = exp(1)) {\n if (!is.numeric(x)) {\n abort_bad_argument(\"x\", must = \"be numeric\", not = x)\n }\n if (!is.numeric(base)) {\n abort_bad_argument(\"base\", must = \"be numeric\", not = base)\n }\n\n base::log(x, base = base)\n}\n```\n:::\n\n\nSee the result for the end user:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmy_log(letters)\n```\n\n::: {.cell-output .cell-output-error}\n\n```\n#> Error in `abort_bad_argument()`:\n#> ! `x` must be numeric; not character.\n```\n\n\n:::\n\n```{.r .cell-code}\nmy_log(1:10, base = letters)\n```\n\n::: {.cell-output .cell-output-error}\n\n```\n#> Error in `abort_bad_argument()`:\n#> ! `base` must be numeric; not character.\n```\n\n\n:::\n:::\n\n\n### Handling\n\nUse class of condition object to allow for different handling of different types of errors\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntryCatch(\n error_bad_argument = function(cnd) \"bad_argument\",\n error = function(cnd) \"other error\",\n my_log(\"a\")\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"bad_argument\"\n```\n\n\n:::\n:::\n\n\nBut note that the first handler that matches any of the signal's class, potentially in a vector of signal classes, will get control. So put the most specific handlers first.\n\n## Applications\n\nSee [the sub-section in the book](https://adv-r.hadley.nz/conditions.html#condition-applications) for excellent examples.\n\n## Resources\n\n- Conditions articles in rlang vignettes: \n - [Including function calls in error messages](https://rlang.r-lib.org/reference/topic-error-call.html)\n - [Including contextual information with error chains](https://rlang.r-lib.org/reference/topic-error-chaining.html)\n - [Formatting messages with cli](https://rlang.r-lib.org/reference/topic-condition-formatting.html)\n- [Other resources](https://github.com/rstudio-conf-2022/pkg-dev-masterclass/blob/main/materials/5-error-resources.md) from error message segment of rstudio::conf(2022) workshop \"Package Development Masterclass\"\n\n## Meeting Videos\n\n### Cohort 1\n\n<iframe src=\"https://www.youtube.com/embed/mwiNe083DLU\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 2\n\n<iframe src=\"https://www.youtube.com/embed/ZFUr7YRSu2o\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 3\n\n<iframe src=\"https://www.youtube.com/embed/UZhrsVz6wi0\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n<iframe src=\"https://www.youtube.com/embed/Wt7p71_BuYY\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 4\n\n<iframe src=\"https://www.youtube.com/embed/WinIo5mrUZo\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 5\n\n<iframe src=\"https://www.youtube.com/embed/VFs-2sl5C70\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 6\n\n<iframe src=\"https://www.youtube.com/embed/VwmrbPUQY1k\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n<details>\n<summary> Meeting chat log </summary>\n\n```\n00:19:16\tTrevin:\thttps://style.tidyverse.org/error-messages.html\n00:20:14\tTrevin:\tMore on errors in the design guide: https://design.tidyverse.org/\n01:14:27\tFederica Gazzelloni:\tmore info here: https://colinfay.me/learn-shiny-production/\n```\n</details>\n\n### Cohort 7\n\n<iframe src=\"https://www.youtube.com/embed/t1N6XdidvNo\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n<details>\n\n<summary>Meeting chat log</summary>\n```\n00:34:09\tRon:\tSomeone did: https://cran.r-project.org/web/packages/comprehenr/vignettes/Introduction.html\n00:47:58\tcollinberke:\thttps://purrr.tidyverse.org/reference/safely.html\n00:48:24\tRon:\tit's a function operator !\n00:49:37\tRon:\t\\(x) length(unique(x) is not too verbose though\n00:49:39\tRon:\t;)\n01:06:50\tcollinberke:\thttps://colinfay.me/purrr-mappers/\n01:07:45\tcollinberke:\thttps://colinfay.me/purrr-web-mining/\n```\n</details>\n", - "supporting": [], + "markdown": "---\nengine: knitr\ntitle: Conditions\n---\n\n## Learning objectives:\n\n- What conditions are\n- How to use them\n\n## Introduction\n\nWhat are conditions? Problems that happen in functions:\n\n- Error\n- Warning\n- Message\n\nAs a function author, one can signal them--that is, say there's a problem.\n\nAs a function consumer, one can handle them--for example, react or ignore.\n\n## Signalling conditions\n\n### Types of conditions\n\nThree types of conditions:\n\n- ❌ **Errors.** Problem arose, and the function cannot continue. \n- ⚠️ **Warnings.** Problem arose, but the function can continue, if only partially.\n- 💬 **Messages.** Something happened, and the user should know.\n\n### ❌ Errors\n\nHow to throw errors\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# with base R\nstop(\"... in the name of love...\")\n```\n\n::: {.cell-output .cell-output-error}\n\n```\n#> Error: ... in the name of love...\n```\n\n\n:::\n\n```{.r .cell-code}\n# with rlang\nrlang::abort(\"...before you break my heart...\")\n```\n\n::: {.cell-output .cell-output-error}\n\n```\n#> Error:\n#> ! ...before you break my heart...\n```\n\n\n:::\n\n```{.r .cell-code}\n# with base R; without call\nstop(\"... think it o-o-over...\", call. = FALSE)\n```\n\n::: {.cell-output .cell-output-error}\n\n```\n#> Error: ... think it o-o-over...\n```\n\n\n:::\n:::\n\nComposing error messages\n\n- Mechanics.\n - `stop()` pastes together arguments\n\n::: {.cell}\n\n```{.r .cell-code}\nsome_val <- 1\nstop(\"Your value is: \", some_val, call. = FALSE)\n```\n\n::: {.cell-output .cell-output-error}\n\n```\n#> Error: Your value is: 1\n```\n\n\n:::\n:::\n\n - `abort()` requires `{glue}`\n\n::: {.cell}\n\n```{.r .cell-code}\nsome_val <- 1\nrlang::abort(glue::glue(\"Your value is: {some_val}\"))\n```\n\n::: {.cell-output .cell-output-error}\n\n```\n#> Error:\n#> ! Your value is: 1\n```\n\n\n:::\n:::\n\n- Style. See [here](http://style.tidyverse.org/error-messages.html).\n\n### ⚠️ Warnings\n\nMay have multiple warnings per call\n\n\n::: {.cell}\n\n```{.r .cell-code}\nwarn <- function() {\n warning(\"This is your first warning\")\n warning(\"This is your second warning\")\n warning(\"This is your LAST warning\")\n}\n```\n:::\n\n\nPrint all warnings once call is complete.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nwarn()\n```\n\n::: {.cell-output .cell-output-stderr}\n\n```\n#> Warning in warn(): This is your first warning\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\n#> Warning in warn(): This is your second warning\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\n#> Warning in warn(): This is your LAST warning\n```\n\n\n:::\n:::\n\n\nLike errors, `warning()` has\n\n- a call argument\n- an `{rlang}` analog\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# base R\n# ... with call (implicitly .call = TRUE)\nwarning(\"Warning\")\n```\n\n::: {.cell-output .cell-output-stderr}\n\n```\n#> Warning: Warning\n```\n\n\n:::\n\n```{.r .cell-code}\n# ... with call suppressed\nwarning(\"Warning\", call. = FALSE)\n```\n\n::: {.cell-output .cell-output-stderr}\n\n```\n#> Warning: Warning\n```\n\n\n:::\n\n```{.r .cell-code}\n# rlang\n# note: call suppressed by default\nrlang::warn(\"Warning\")\n```\n\n::: {.cell-output .cell-output-stderr}\n\n```\n#> Warning: Warning\n```\n\n\n:::\n:::\n\n\n(Hadley's) advice on usage:\n\n- Err on the side of errors. In other words, error rather than warn.\n- But warnings make sense in a few cases:\n - Function is being deprecated. Warn that it is reaching end of life.\n - Function is reasonably sure to recover from issue.\n\n### 💬 Messages\n\nMechanics:\n\n- Issued immediately\n- Do not have a call argument\n\nStyle:\n\nMessages are best when they inform about:\n\n- Default arguments\n- Status updates of for functions used primarily for side-effects (e.g., interaction with web API, file downloaded, etc.)\n- Progress of long-running process (in the absence of a status bar).\n- Package loading message (e.g., attaching package, objects masked)\n\n## Ignoring conditions\n\nA few ways:\n\n- `try()`\n- `suppressWarnings()`\n- `suppressMessages()`\n\n### `try()`\n\nWhat it does:\n\n- Displays error\n- But continues execution after error\n\n\n::: {.cell}\n\n```{.r .cell-code}\nbad_log <- function(x) {\n try(log(x))\n 10\n}\n\nbad_log(\"bad\")\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> Error in log(x) : non-numeric argument to mathematical function\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 10\n```\n\n\n:::\n:::\n\n\nBetter ways to react to/recover from errors:\n\n1. Use `tryCatch()` to \"catch\" the error and perform a different action in the event of an error.\n1. Set a default value inside the call. See below.\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndefault <- NULL\ntry(default <- read.csv(\"possibly-bad-input.csv\"), silent = TRUE)\n```\n\n::: {.cell-output .cell-output-stderr}\n\n```\n#> Warning in file(file, \"rt\"): cannot open file 'possibly-bad-input.csv': No such\n#> file or directory\n```\n\n\n:::\n:::\n\n\n\n### `suppressWarnings()`, `suppressMessages()`\n\nWhat it does:\n\n- Supresses all warnings (messages)\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# suppress warnings (from our `warn()` function above)\nsuppressWarnings(warn())\n\n# suppress messages\nmany_messages <- function() {\n message(\"Message 1\")\n message(\"Message 2\")\n message(\"Message 3\")\n}\n\nsuppressMessages(many_messages())\n```\n:::\n\n\n## Handling conditions\n\nEvery condition has a default behavior:\n\n- ❌ Errors halt execution\n- ⚠️ Warnings are collected during execution and displayed in bulk after execution\n- 💬 Messages are displayed immediately\n\nCondition handlers allow one to change that behavior (within the scope of a function).\n\nTwo handler functions:\n\n- `tryCatch()`\n- `withCallingHandlers()`\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# try to run `code_to_try_to_run`\n# if (error) condition is signalled, fun some other code\ntryCatch(\n error = function(cnd) {\n # code to run when error is thrown\n },\n code_to_try_to_run\n)\n\n# try to `code_to_try_to_run`\n# if condition is signalled, run code corresponding to condition type\nwithCallingHandlers(\n warning = function(cnd) {\n # code to run when warning is signalled\n },\n message = function(cnd) {\n # code to run when message is signalled\n },\n code_to_try_to_run\n)\n```\n:::\n\n\n\n### Condition objects\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# catch a condition\ncnd <- rlang::catch_cnd(stop(\"An error\"))\n# inspect it\nstr(cnd)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> List of 2\n#> $ message: chr \"An error\"\n#> $ call : language force(expr)\n#> - attr(*, \"class\")= chr [1:3] \"simpleError\" \"error\" \"condition\"\n```\n\n\n:::\n:::\n\n\nThe standard components\n\n- `message`. The error message. To extract it, use `conditionMessage(cnd)`.\n- `call`. The function call that triggered the condition. To extract it, use `conditionCall(cnd)`.\n\nBut custom conditions may contain other components.\n\n### Exiting handlers\n\nIf a condition is signalled, this type of handler controls what code to run before exiting the function call. \n\n\n::: {.cell}\n\n```{.r .cell-code}\nf3 <- function(x) {\n tryCatch(\n # if error signalled, return NA\n error = function(cnd) NA,\n # try to run log\n log(x)\n )\n}\n\nf3(\"x\")\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] NA\n```\n\n\n:::\n:::\n\n\nWhen a condition is signalled, control moves to the handler and never returns to the original code.\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntryCatch(\n message = function(cnd) \"There\",\n {\n message(\"Here\")\n stop(\"This code is never run!\")\n }\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"There\"\n```\n\n\n:::\n:::\n\n\nThe `tryCatch()` exit handler has one final argument: `finally`. This is run regardless of the condition of the original code. This is often used for clean-up.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# try to write text to disk\n# if an error is signalled--for example, `path` does not exist\n# or if no condition is signalled\n# that is in both cases, the code block in `finally` is executed\npath <- tempfile()\ntryCatch(\n {\n writeLines(\"Hi!\", path)\n # ...\n },\n finally = {\n # always run\n unlink(path)\n }\n)\n```\n:::\n\n\n### Calling handlers\n\nDefinition by verbal comparison:\n\n- With exit handlers, code exits the normal flow once a condition is signalled\n- With calling handlers, code continues in the normal flow once control is returned by the handler.\n\nDefinition by code comparison:\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# with an exit handler, control moves to the handler once condition signalled and does not move back\ntryCatch(\n message = function(cnd) cat(\"Caught a message!\\n\"), \n {\n message(\"Someone there?\")\n message(\"Why, yes!\")\n }\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> Caught a message!\n```\n\n\n:::\n\n```{.r .cell-code}\n# with a calling handler, control moves first to the handler and the moves back to the main code\nwithCallingHandlers(\n message = function(cnd) cat(\"Caught a message!\\n\"), \n {\n message(\"Someone there?\")\n message(\"Why, yes!\")\n }\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> Caught a message!\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\n#> Someone there?\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> Caught a message!\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\n#> Why, yes!\n```\n\n\n:::\n:::\n\n\n### By default, conditions propagate\n\nLet's suppose that there are nested handlers. If a condition is signalled in the child, it propagates to its parent handler(s).\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Bubbles all the way up to default handler which generates the message\nwithCallingHandlers(\n message = function(cnd) cat(\"Level 2\\n\"),\n withCallingHandlers(\n message = function(cnd) cat(\"Level 1\\n\"),\n message(\"Hello\")\n )\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> Level 1\n#> Level 2\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\n#> Hello\n```\n\n\n:::\n\n```{.r .cell-code}\n# Bubbles up to tryCatch\ntryCatch(\n message = function(cnd) cat(\"Level 2\\n\"),\n withCallingHandlers(\n message = function(cnd) cat(\"Level 1\\n\"),\n message(\"Hello\")\n )\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> Level 1\n#> Level 2\n```\n\n\n:::\n:::\n\n\n### But conditions can be muffled\n\nIf one wants to \"muffle\" the siginal, one needs to use `rlang::cnd_muffle()`\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Muffles the default handler which prints the messages\nwithCallingHandlers(\n message = function(cnd) {\n cat(\"Level 2\\n\")\n rlang::cnd_muffle(cnd)\n },\n withCallingHandlers(\n message = function(cnd) cat(\"Level 1\\n\"),\n message(\"Hello\")\n )\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> Level 1\n#> Level 2\n```\n\n\n:::\n\n```{.r .cell-code}\n# Muffles level 2 handler and the default handler\nwithCallingHandlers(\n message = function(cnd) cat(\"Level 2\\n\"),\n withCallingHandlers(\n message = function(cnd) {\n cat(\"Level 1\\n\")\n rlang::cnd_muffle(cnd)\n },\n message(\"Hello\")\n )\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> Level 1\n```\n\n\n:::\n:::\n\n\n### Call stacks\n\nCall stacks of exiting and calling handlers differ.\n\nWhy? \n\n> Calling handlers are called in the context of the call that signalled the condition\n> exiting handlers are called in the context of the call to tryCatch()\n\nTo see this, consider how the call stacks differ for a toy example.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# create a function\nf <- function() g()\ng <- function() h()\nh <- function() message\n\n# call stack of calling handlers\nwithCallingHandlers(f(), message = function(cnd) {\n lobstr::cst()\n rlang::cnd_muffle(cnd)\n})\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> function (..., domain = NULL, appendLF = TRUE) \n#> {\n#> cond <- if (...length() == 1L && inherits(..1, \"condition\")) {\n#> if (nargs() > 1L) \n#> warning(\"additional arguments ignored in message()\")\n#> ..1\n#> }\n#> else {\n#> msg <- .makeMessage(..., domain = domain, appendLF = appendLF)\n#> call <- sys.call()\n#> simpleMessage(msg, call)\n#> }\n#> defaultHandler <- function(c) {\n#> cat(conditionMessage(c), file = stderr(), sep = \"\")\n#> }\n#> withRestarts({\n#> signalCondition(cond)\n#> defaultHandler(cond)\n#> }, muffleMessage = function() NULL)\n#> invisible()\n#> }\n#> <bytecode: 0x000002ba6a952b98>\n#> <environment: namespace:base>\n```\n\n\n:::\n\n```{.r .cell-code}\n# call stack of exit handlers\ntryCatch(f(), message = function(cnd) lobstr::cst())\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> function (..., domain = NULL, appendLF = TRUE) \n#> {\n#> cond <- if (...length() == 1L && inherits(..1, \"condition\")) {\n#> if (nargs() > 1L) \n#> warning(\"additional arguments ignored in message()\")\n#> ..1\n#> }\n#> else {\n#> msg <- .makeMessage(..., domain = domain, appendLF = appendLF)\n#> call <- sys.call()\n#> simpleMessage(msg, call)\n#> }\n#> defaultHandler <- function(c) {\n#> cat(conditionMessage(c), file = stderr(), sep = \"\")\n#> }\n#> withRestarts({\n#> signalCondition(cond)\n#> defaultHandler(cond)\n#> }, muffleMessage = function() NULL)\n#> invisible()\n#> }\n#> <bytecode: 0x000002ba6a952b98>\n#> <environment: namespace:base>\n```\n\n\n:::\n\n```{.r .cell-code}\ntryCatch(f(), message = function(cnd) lobstr::cst())\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> function (..., domain = NULL, appendLF = TRUE) \n#> {\n#> cond <- if (...length() == 1L && inherits(..1, \"condition\")) {\n#> if (nargs() > 1L) \n#> warning(\"additional arguments ignored in message()\")\n#> ..1\n#> }\n#> else {\n#> msg <- .makeMessage(..., domain = domain, appendLF = appendLF)\n#> call <- sys.call()\n#> simpleMessage(msg, call)\n#> }\n#> defaultHandler <- function(c) {\n#> cat(conditionMessage(c), file = stderr(), sep = \"\")\n#> }\n#> withRestarts({\n#> signalCondition(cond)\n#> defaultHandler(cond)\n#> }, muffleMessage = function() NULL)\n#> invisible()\n#> }\n#> <bytecode: 0x000002ba6a952b98>\n#> <environment: namespace:base>\n```\n\n\n:::\n:::\n\n\n## Custom conditions\n\n### Motivation\n\nThe `base::log()` function provides a minimal error message.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlog(letters)\n```\n\n::: {.cell-output .cell-output-error}\n\n```\n#> Error in log(letters): non-numeric argument to mathematical function\n```\n\n\n:::\n\n```{.r .cell-code}\nlog(1:10, base = letters)\n```\n\n::: {.cell-output .cell-output-error}\n\n```\n#> Error in log(1:10, base = letters): non-numeric argument to mathematical function\n```\n\n\n:::\n:::\n\n\nOne could make a more informative error message about which argument is problematic.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmy_log <- function(x, base = exp(1)) {\n if (!is.numeric(x)) {\n rlang::abort(paste0(\n \"`x` must be a numeric vector; not \", typeof(x), \".\"\n ))\n }\n if (!is.numeric(base)) {\n rlang::abort(paste0(\n \"`base` must be a numeric vector; not \", typeof(base), \".\"\n ))\n }\n\n base::log(x, base = base)\n}\n```\n:::\n\n\nConsider the difference:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmy_log(letters)\n```\n\n::: {.cell-output .cell-output-error}\n\n```\n#> Error in `my_log()`:\n#> ! `x` must be a numeric vector; not character.\n```\n\n\n:::\n\n```{.r .cell-code}\nmy_log(1:10, base = letters)\n```\n\n::: {.cell-output .cell-output-error}\n\n```\n#> Error in `my_log()`:\n#> ! `base` must be a numeric vector; not character.\n```\n\n\n:::\n:::\n\n\n\n### Signalling\n\nCreate a helper function to describe errors:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nabort_bad_argument <- function(arg, must, not = NULL) {\n msg <- glue::glue(\"`{arg}` must {must}\")\n if (!is.null(not)) {\n not <- typeof(not)\n msg <- glue::glue(\"{msg}; not {not}.\")\n }\n \n rlang::abort(\n \"error_bad_argument\", # <- this is the (error) class, I believe\n message = msg, \n arg = arg, \n must = must, \n not = not\n )\n}\n```\n:::\n\n\nRewrite the log function to use this helper function:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmy_log <- function(x, base = exp(1)) {\n if (!is.numeric(x)) {\n abort_bad_argument(\"x\", must = \"be numeric\", not = x)\n }\n if (!is.numeric(base)) {\n abort_bad_argument(\"base\", must = \"be numeric\", not = base)\n }\n\n base::log(x, base = base)\n}\n```\n:::\n\n\nSee the result for the end user:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmy_log(letters)\n```\n\n::: {.cell-output .cell-output-error}\n\n```\n#> Error in `abort_bad_argument()`:\n#> ! `x` must be numeric; not character.\n```\n\n\n:::\n\n```{.r .cell-code}\nmy_log(1:10, base = letters)\n```\n\n::: {.cell-output .cell-output-error}\n\n```\n#> Error in `abort_bad_argument()`:\n#> ! `base` must be numeric; not character.\n```\n\n\n:::\n:::\n\n\n### Handling\n\nUse class of condition object to allow for different handling of different types of errors\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntryCatch(\n error_bad_argument = function(cnd) \"bad_argument\",\n error = function(cnd) \"other error\",\n my_log(\"a\")\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"bad_argument\"\n```\n\n\n:::\n:::\n\n\nBut note that the first handler that matches any of the signal's class, potentially in a vector of signal classes, will get control. So put the most specific handlers first.\n\n## Applications\n\nSee [the sub-section in the book](https://adv-r.hadley.nz/conditions.html#condition-applications) for excellent examples.\n\n## Resources\n\n- Conditions articles in rlang vignettes: \n - [Including function calls in error messages](https://rlang.r-lib.org/reference/topic-error-call.html)\n - [Including contextual information with error chains](https://rlang.r-lib.org/reference/topic-error-chaining.html)\n - [Formatting messages with cli](https://rlang.r-lib.org/reference/topic-condition-formatting.html)\n- [Other resources](https://github.com/rstudio-conf-2022/pkg-dev-masterclass/blob/main/materials/5-error-resources.md) from error message segment of rstudio::conf(2022) workshop \"Package Development Masterclass\"\n", + "supporting": [ + "08_files" + ], "filters": [ "rmarkdown/pagebreak.lua" ], diff --git a/_freeze/slides/09/execute-results/html.json b/_freeze/slides/09/execute-results/html.json @@ -1,8 +1,8 @@ { - "hash": "5539c20384fa3742d2965a9507d112e9", + "hash": "991db3288a65eb2cbeef3e20bc63271f", "result": { "engine": "knitr", - "markdown": "---\nengine: knitr\ntitle: Functionals\n---\n\n## Learning objectives:\n\n- Define functionals.\n- Use the `purrr::map()` family of functionals.\n- Use the `purrr::walk()` family of functionals.\n- Use the `purrr::reduce()` and `purrr::accumulate()` family of functionals.\n- Use `purrr::safely()` and `purrr::possibly()` to deal with failure.\n\n9.1. **Introduction**\n\n9.2. **map()**\n\n9.3. **purrr** style\n\n9.4. **map_** variants\n\n9.5. **reduce()** and **accumulate** family of functions\n\n- Some functions that weren't covered\n\n\n## What are functionals {-}\n\n## Introduction \n\n__Functionals__ are functions that take function as input and return a vector as output. Functionals that you probably have used before are: `apply()`, `lapply()` or `tapply()`. \n\n\n- alternatives to loops\n\n- a functional is better than a `for` loop is better than `while` is better than `repeat`\n\n\n### Benefits {-}\n\n\n- encourages function logic to be separated from iteration logic\n\n- can collapse into vectors/data frames easily\n\n\n## Map\n\n`map()` has two arguments, a vector and a function. It performs the function on each element of the vector and returns a list. We can also pass in some additional argument into the function.\n\n\n::: {.cell}\n::: {.cell-output-display}\n{width=448}\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsimple_map <- function(x, f, ...) {\nout <- vector(\"list\", length(x))\nfor (i in seq_along(x)) {\nout[[i]] <- f(x[[i]], ...)\n}\nout\n}\n```\n:::\n\n\n## Benefit of using the map function in purrr {-}\n\n- `purrr::map()` is equivalent to `lapply()`\n\n- returns a list and is the most general\n\n- the length of the input == the length of the output\n\n- `map()` is more flexible, with additional arguments allowed\n\n- `map()` has a host of extensions\n\n\n\n\n::: {.cell}\n\n:::\n\n\n## Atomic vectors {-}\n\n\n- has 4 variants to return atomic vectors\n - `map_chr()`\n - `map_dbl()`\n - `map_int()`\n - `map_lgl()`\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntriple <- function(x) x * 3\nmap(.x=1:3, .f=triple)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [[1]]\n#> [1] 3\n#> \n#> [[2]]\n#> [1] 6\n#> \n#> [[3]]\n#> [1] 9\n```\n\n\n:::\n\n```{.r .cell-code}\nmap_dbl(.x=1:3, .f=triple)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 3 6 9\n```\n\n\n:::\n\n```{.r .cell-code}\nmap_lgl(.x=c(1, NA, 3), .f=is.na)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] FALSE TRUE FALSE\n```\n\n\n:::\n:::\n\n\n## Anonymous functions and shortcuts {-}\n\n **Anonymous functions** \n\n::: {.cell}\n\n```{.r .cell-code}\nmap_dbl(.x=mtcars, .f=function(x) mean(x, na.rm = TRUE)) |> \n head()\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> mpg cyl disp hp drat wt \n#> 20.090625 6.187500 230.721875 146.687500 3.596563 3.217250\n```\n\n\n:::\n:::\n\n\n- the \"twiddle\" uses a twiddle `~` to set a formula\n- can use `.x` to reference the input `map(.x = ..., .f = )`\n\n::: {.cell}\n\n```{.r .cell-code}\nmap_dbl(.x=mtcars, .f=~mean(.x, na.rm = TRUE))\n```\n:::\n\n\n- can be simplified further as\n\n::: {.cell}\n\n```{.r .cell-code}\nmap_dbl(.x=mtcars, .f=mean, na.rm = TRUE)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> mpg cyl disp hp drat wt qsec \n#> 20.090625 6.187500 230.721875 146.687500 3.596563 3.217250 17.848750 \n#> vs am gear carb \n#> 0.437500 0.406250 3.687500 2.812500\n```\n\n\n:::\n:::\n\n\n- what happens when we try a handful of variants of the task at hand? (how many unique values are there for each variable?)\n\nNote that `.x` is the **name** of the first argument in `map()` (`.f` is the name of the second argument).\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# the task\nmap_dbl(mtcars, function(x) length(unique(x)))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> mpg cyl disp hp drat wt qsec vs am gear carb \n#> 25 3 27 22 22 29 30 2 2 3 6\n```\n\n\n:::\n\n```{.r .cell-code}\nmap_dbl(mtcars, function(unicorn) length(unique(unicorn)))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> mpg cyl disp hp drat wt qsec vs am gear carb \n#> 25 3 27 22 22 29 30 2 2 3 6\n```\n\n\n:::\n\n```{.r .cell-code}\nmap_dbl(mtcars, ~length(unique(.x)))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> mpg cyl disp hp drat wt qsec vs am gear carb \n#> 25 3 27 22 22 29 30 2 2 3 6\n```\n\n\n:::\n\n```{.r .cell-code}\nmap_dbl(mtcars, ~length(unique(..1)))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> mpg cyl disp hp drat wt qsec vs am gear carb \n#> 25 3 27 22 22 29 30 2 2 3 6\n```\n\n\n:::\n\n```{.r .cell-code}\nmap_dbl(mtcars, ~length(unique(.)))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> mpg cyl disp hp drat wt qsec vs am gear carb \n#> 25 3 27 22 22 29 30 2 2 3 6\n```\n\n\n:::\n\n```{.r .cell-code}\n# not the task\nmap_dbl(mtcars, length)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> mpg cyl disp hp drat wt qsec vs am gear carb \n#> 32 32 32 32 32 32 32 32 32 32 32\n```\n\n\n:::\n\n```{.r .cell-code}\nmap_dbl(mtcars, length(unique))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> mpg cyl disp hp drat wt qsec vs am gear carb \n#> 21.00 6.00 160.00 110.00 3.90 2.62 16.46 0.00 1.00 4.00 4.00\n```\n\n\n:::\n\n```{.r .cell-code}\nmap_dbl(mtcars, 1)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> mpg cyl disp hp drat wt qsec vs am gear carb \n#> 21.00 6.00 160.00 110.00 3.90 2.62 16.46 0.00 1.00 4.00 4.00\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\n#error\nmap_dbl(mtcars, length(unique()))\n```\n\n::: {.cell-output .cell-output-error}\n\n```\n#> Error in unique.default(): argument \"x\" is missing, with no default\n```\n\n\n:::\n\n```{.r .cell-code}\nmap_dbl(mtcars, ~length(unique(x)))\n```\n\n::: {.cell-output .cell-output-error}\n\n```\n#> Error in `map_dbl()`:\n#> ℹ In index: 1.\n#> ℹ With name: mpg.\n#> Caused by error in `.f()`:\n#> ! object 'x' not found\n```\n\n\n:::\n:::\n\n\n\n## Modify {-}\n\nSometimes we might want the output to be the same as the input, then in that case we can use the modify function rather than map\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndf <- data.frame(x=1:3,y=6:4)\n\nmap(df, .f=~.x*3)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> $x\n#> [1] 3 6 9\n#> \n#> $y\n#> [1] 18 15 12\n```\n\n\n:::\n\n```{.r .cell-code}\nmodify(.x=df,.f=~.x*3)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> x y\n#> 1 3 18\n#> 2 6 15\n#> 3 9 12\n```\n\n\n:::\n:::\n\n\nNote that `modify()` always returns the same type of output (which is not necessarily true with `map()`). Additionally, `modify()` does not actually change the value of `df`.\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndf\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> x y\n#> 1 1 6\n#> 2 2 5\n#> 3 3 4\n```\n\n\n:::\n:::\n\n\n\n## `purrr` style\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmtcars |> \n map(head, 20) |> # pull first 20 of each column\n map_dbl(mean) |> # mean of each vector\n head()\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> mpg cyl disp hp drat wt \n#> 20.13000 6.20000 233.93000 136.20000 3.54500 3.39845\n```\n\n\n:::\n:::\n\n\nAn example from `tidytuesday`\n\n::: {.cell}\n\n```{.r .cell-code}\ntt <- tidytuesdayR::tt_load(\"2020-06-30\")\n\n# filter data & exclude columns with lost of nulls\nlist_df <- \n map(\n .x = tt[1:3], \n .f = \n ~ .x |> \n filter(issue <= 152 | issue > 200) |> \n mutate(timeframe = ifelse(issue <= 152, \"first 5 years\", \"last 5 years\")) |> \n select_if(~mean(is.na(.x)) < 0.2) \n )\n\n\n# write to global environment\niwalk(\n .x = list_df,\n .f = ~ assign(x = .y, value = .x, envir = globalenv())\n)\n```\n:::\n\n\n## `map_*()` variants \n\nThere are many variants\n\n\n\n\n## `map2_*()` {-}\n\n- raise each value `.x` by 2\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmap_dbl(\n .x = 1:5, \n .f = function(x) x ^ 2\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1 4 9 16 25\n```\n\n\n:::\n:::\n\n\n- raise each value `.x` by another value `.y`\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmap2_dbl(\n .x = 1:5, \n .y = 2:6, \n .f = ~ (.x ^ .y)\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1 8 81 1024 15625\n```\n\n\n:::\n:::\n\n\n\n## The benefit of using the map over apply family of function {-}\n- It is written in C\n- It preserves names\n- We always know the return value type\n- We can apply the function for multiple input values\n- We can pass additional arguments into the function\n\n\n## `walk()` {-}\n\n\n- We use `walk()` when we want to call a function for it side effect(s) rather than its return value, like generating plots, `write.csv()`, or `ggsave()`. If you don't want a return value, `map()` will print more info than you may want.\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmap(1:3, ~cat(.x, \"\\n\"))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> 1 \n#> 2 \n#> 3\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [[1]]\n#> NULL\n#> \n#> [[2]]\n#> NULL\n#> \n#> [[3]]\n#> NULL\n```\n\n\n:::\n:::\n\n\n- for these cases, use `walk()` instead\n\n::: {.cell}\n\n```{.r .cell-code}\nwalk(1:3, ~cat(.x, \"\\n\"))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> 1 \n#> 2 \n#> 3\n```\n\n\n:::\n:::\n\n\n`cat()` does have a result, it's just usually returned invisibly.\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncat(\"hello\")\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> hello\n```\n\n\n:::\n\n```{.r .cell-code}\n(cat(\"hello\"))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> hello\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> NULL\n```\n\n\n:::\n:::\n\n\n\nWe can use `pwalk()` to save a list of plot to disk. Note that the \"p\" in `pwalk()` means that we have more than 1 (or 2) variables to pipe into the function. Also note that the name of the first argument in all of the \"p\" functions is now `.l` (instead of `.x`).\n\n\n::: {.cell}\n\n```{.r .cell-code}\nplots <- mtcars |> \n split(mtcars$cyl) |> \n map(~ggplot(.x, aes(mpg,wt)) +\n geom_point())\n\npaths <- stringr::str_c(names(plots), '.png')\n\npwalk(.l = list(paths,plots), .f = ggsave, path = tempdir())\n```\n\n::: {.cell-output .cell-output-stderr}\n\n```\n#> Saving 7 x 5 in image\n#> Saving 7 x 5 in image\n#> Saving 7 x 5 in image\n```\n\n\n:::\n\n```{.r .cell-code}\npmap(.l = list(paths,plots), .f = ggsave, path = tempdir())\n```\n\n::: {.cell-output .cell-output-stderr}\n\n```\n#> Saving 7 x 5 in image\n#> Saving 7 x 5 in image\n#> Saving 7 x 5 in image\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [[1]]\n#> [1] \"C:\\\\Users\\\\jonth\\\\AppData\\\\Local\\\\Temp\\\\RtmpcH2AWr/4.png\"\n#> \n#> [[2]]\n#> [1] \"C:\\\\Users\\\\jonth\\\\AppData\\\\Local\\\\Temp\\\\RtmpcH2AWr/6.png\"\n#> \n#> [[3]]\n#> [1] \"C:\\\\Users\\\\jonth\\\\AppData\\\\Local\\\\Temp\\\\RtmpcH2AWr/8.png\"\n```\n\n\n:::\n:::\n\n\n- walk, walk2 and pwalk all invisibly return .x the first argument. This makes them suitable for use in the middle of pipelines.\n\n- note: I don't think that it is \"`.x`\" (or \"`.l`\") that they are returning invisibly. But I'm not sure what it is. Hadley says:\n\n> purrr provides the walk family of functions that ignore the return values of the `.f` and instead return `.x` invisibly.\n\nBut not in the first `cat()` example, it is the `NULL` values that get returned invisibly (those aren't the same as `.x`).\n\n## `imap()` {-}\n\n- `imap()` is like `map2()`except that `.y` is derived from `names(.x)` if named or `seq_along(.x)` if not.\n\n- These two produce the same result\n\n\n::: {.cell}\n\n```{.r .cell-code}\nimap_chr(.x = mtcars, \n .f = ~ paste(.y, \"has a mean of\", round(mean(.x), 1))) |> \nhead()\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> mpg cyl \n#> \"mpg has a mean of 20.1\" \"cyl has a mean of 6.2\" \n#> disp hp \n#> \"disp has a mean of 230.7\" \"hp has a mean of 146.7\" \n#> drat wt \n#> \"drat has a mean of 3.6\" \"wt has a mean of 3.2\"\n```\n\n\n:::\n\n```{.r .cell-code}\nmap2_chr(.x = mtcars, \n .y = names(mtcars),\n .f = ~ paste(.y, \"has a mean of\", round(mean(.x), 1))) |> \nhead()\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> mpg cyl \n#> \"mpg has a mean of 20.1\" \"cyl has a mean of 6.2\" \n#> disp hp \n#> \"disp has a mean of 230.7\" \"hp has a mean of 146.7\" \n#> drat wt \n#> \"drat has a mean of 3.6\" \"wt has a mean of 3.2\"\n```\n\n\n:::\n:::\n\n\n\n## `pmap()` {-}\n\n- you can pass a named list or dataframe as arguments to a function\n\n- for example `runif()` has the parameters `n`, `min` and `max`\n\n\n::: {.cell}\n\n```{.r .cell-code}\nparams <- tibble::tribble(\n ~ n, ~ min, ~ max,\n 1L, 1, 10,\n 2L, 10, 100,\n 3L, 100, 1000\n)\n\npmap(params, runif)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [[1]]\n#> [1] 3.436116\n#> \n#> [[2]]\n#> [1] 14.32734 25.54193\n#> \n#> [[3]]\n#> [1] 515.0551 359.4206 565.1489\n```\n\n\n:::\n:::\n\n\n- could also be\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlist(\n n = 1:3, \n min = 10 ^ (0:2), \n max = 10 ^ (1:3)\n) |> \npmap(runif)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [[1]]\n#> [1] 1.099975\n#> \n#> [[2]]\n#> [1] 29.59029 39.59901\n#> \n#> [[3]]\n#> [1] 675.4716 229.3796 538.4306\n```\n\n\n:::\n:::\n\n\n- I like to use `expand_grid()` when I want all possible parameter combinations.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nexpand_grid(n = 1:3,\n min = 10 ^ (0:1),\n max = 10 ^ (1:2))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> # A tibble: 12 × 3\n#> n min max\n#> <int> <dbl> <dbl>\n#> 1 1 1 10\n#> 2 1 1 100\n#> 3 1 10 10\n#> 4 1 10 100\n#> 5 2 1 10\n#> 6 2 1 100\n#> 7 2 10 10\n#> 8 2 10 100\n#> 9 3 1 10\n#> 10 3 1 100\n#> 11 3 10 10\n#> 12 3 10 100\n```\n\n\n:::\n\n```{.r .cell-code}\nexpand_grid(n = 1:3,\n min = 10 ^ (0:1),\n max = 10 ^ (1:2)) |> \npmap(runif)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [[1]]\n#> [1] 6.885943\n#> \n#> [[2]]\n#> [1] 6.636844\n#> \n#> [[3]]\n#> [1] 10\n#> \n#> [[4]]\n#> [1] 71.0812\n#> \n#> [[5]]\n#> [1] 9.260550 8.135717\n#> \n#> [[6]]\n#> [1] 62.45852 30.95404\n#> \n#> [[7]]\n#> [1] 10 10\n#> \n#> [[8]]\n#> [1] 64.86900 49.33175\n#> \n#> [[9]]\n#> [1] 3.953245 7.092795 2.996488\n#> \n#> [[10]]\n#> [1] 68.47693 78.00939 21.72014\n#> \n#> [[11]]\n#> [1] 10 10 10\n#> \n#> [[12]]\n#> [1] 39.80917 28.25112 93.72982\n```\n\n\n:::\n:::\n\n\n\n\n## `reduce()` family\n\nThe `reduce()` function is a powerful functional that allows you to abstract away from a sequence of functions that are applied in a fixed direction.\n\n`reduce()` takes a vector as its first argument, a function as its second argument, and an optional `.init` argument last. It will then apply the function repeatedly to the vector until there is only a single element left.\n\n(Hint: start at the top of the image and read down.)\n\n\n::: {.cell}\n::: {.cell-output-display}\n{width=508}\n:::\n:::\n\n\n\nLet me really quickly demonstrate `reduce()` in action.\n\nSay you wanted to add up the numbers 1 through 5 using only the plus operator `+`. You could do something like:\n\n\n::: {.cell}\n\n```{.r .cell-code}\n1 + 2 + 3 + 4 + 5\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 15\n```\n\n\n:::\n:::\n\n\nWhich is the same as:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nreduce(1:5, `+`)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 15\n```\n\n\n:::\n:::\n\n\nAnd if you want the start value to be something that is not the first argument of the vector, pass that value to the .init argument:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nidentical(\n 0.5 + 1 + 2 + 3 + 4 + 5,\n reduce(1:5, `+`, .init = 0.5)\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] TRUE\n```\n\n\n:::\n:::\n\n\n## ggplot2 example with reduce {-}\n\n\n::: {.cell}\n\n```{.r .cell-code}\nggplot(mtcars, aes(hp, mpg)) + \n geom_point(size = 8, alpha = .5, color = \"yellow\") +\n geom_point(size = 4, alpha = .5, color = \"red\") +\n geom_point(size = 2, alpha = .5, color = \"blue\")\n```\n\n::: {.cell-output-display}\n{width=672}\n:::\n:::\n\n\nLet us use the `reduce()` function. Note that `reduce2()` takes two arguments, but the first value (`..1`) is given by the `.init` value.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nreduce2(\n c(8, 4, 2),\n c(\"yellow\", \"red\", \"blue\"),\n ~ ..1 + geom_point(size = ..2, alpha = .5, color = ..3),\n .init = ggplot(mtcars, aes(hp, mpg))\n)\n```\n\n::: {.cell-output-display}\n{width=672}\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndf <- list(age=tibble(name='john',age=30),\n sex=tibble(name=c('john','mary'),sex=c('M','F'),\n trt=tibble(name='Mary',treatment='A')))\n\ndf\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> $age\n#> # A tibble: 1 × 2\n#> name age\n#> <chr> <dbl>\n#> 1 john 30\n#> \n#> $sex\n#> # A tibble: 2 × 3\n#> name sex trt$name $treatment\n#> <chr> <chr> <chr> <chr> \n#> 1 john M Mary A \n#> 2 mary F Mary A\n```\n\n\n:::\n\n```{.r .cell-code}\ndf |> reduce(.f = full_join)\n```\n\n::: {.cell-output .cell-output-stderr}\n\n```\n#> Joining with `by = join_by(name)`\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> # A tibble: 2 × 4\n#> name age sex trt$name $treatment\n#> <chr> <dbl> <chr> <chr> <chr> \n#> 1 john 30 M Mary A \n#> 2 mary NA F Mary A\n```\n\n\n:::\n\n```{.r .cell-code}\nreduce(.x = df,.f = full_join)\n```\n\n::: {.cell-output .cell-output-stderr}\n\n```\n#> Joining with `by = join_by(name)`\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> # A tibble: 2 × 4\n#> name age sex trt$name $treatment\n#> <chr> <dbl> <chr> <chr> <chr> \n#> 1 john 30 M Mary A \n#> 2 mary NA F Mary A\n```\n\n\n:::\n:::\n\n\n- to see all intermediate steps, use **accumulate()**\n\n\n::: {.cell}\n\n```{.r .cell-code}\nset.seed(1234)\naccumulate(1:5, `+`)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1 3 6 10 15\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\naccumulate2(\n c(8, 4, 2),\n c(\"yellow\", \"red\", \"blue\"),\n ~ ..1 + geom_point(size = ..2, alpha = .5, color = ..3),\n .init = ggplot(mtcars, aes(hp, mpg))\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [[1]]\n```\n\n\n:::\n\n::: {.cell-output-display}\n{width=672}\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> \n#> [[2]]\n```\n\n\n:::\n\n::: {.cell-output-display}\n{width=672}\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> \n#> [[3]]\n```\n\n\n:::\n\n::: {.cell-output-display}\n{width=672}\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> \n#> [[4]]\n```\n\n\n:::\n\n::: {.cell-output-display}\n{width=672}\n:::\n:::\n\n\n\n## `map_df*()` variants {-}\n\n- `map_dfr()` = row bind the results\n\n- `map_dfc()` = column bind the results\n\n- Note that `map_dfr()` has been superseded by `map() |> list_rbind()` and `map_dfc()` has been superseded by `map() |> list_cbind()`\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncol_stats <- function(n) {\n head(mtcars, n) |> \n summarise_all(mean) |> \n mutate_all(floor) |> \n mutate(n = paste(\"N =\", n))\n}\n\nmap((1:2) * 10, col_stats)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [[1]]\n#> mpg cyl disp hp drat wt qsec vs am gear carb n\n#> 1 20 5 208 122 3 3 18 0 0 3 2 N = 10\n#> \n#> [[2]]\n#> mpg cyl disp hp drat wt qsec vs am gear carb n\n#> 1 20 6 233 136 3 3 18 0 0 3 2 N = 20\n```\n\n\n:::\n\n```{.r .cell-code}\nmap_dfr((1:2) * 10, col_stats)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> mpg cyl disp hp drat wt qsec vs am gear carb n\n#> 1 20 5 208 122 3 3 18 0 0 3 2 N = 10\n#> 2 20 6 233 136 3 3 18 0 0 3 2 N = 20\n```\n\n\n:::\n\n```{.r .cell-code}\nmap((1:2) * 10, col_stats) |> list_rbind()\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> mpg cyl disp hp drat wt qsec vs am gear carb n\n#> 1 20 5 208 122 3 3 18 0 0 3 2 N = 10\n#> 2 20 6 233 136 3 3 18 0 0 3 2 N = 20\n```\n\n\n:::\n:::\n\n\n---\n\n## `pluck()` {-}\n\n- `pluck()` will pull a single element from a list\n\nI like the example from the book because the starting object is not particularly easy to work with (as many JSON objects might not be).\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmy_list <- list(\n list(-1, x = 1, y = c(2), z = \"a\"),\n list(-2, x = 4, y = c(5, 6), z = \"b\"),\n list(-3, x = 8, y = c(9, 10, 11))\n)\nmy_list\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [[1]]\n#> [[1]][[1]]\n#> [1] -1\n#> \n#> [[1]]$x\n#> [1] 1\n#> \n#> [[1]]$y\n#> [1] 2\n#> \n#> [[1]]$z\n#> [1] \"a\"\n#> \n#> \n#> [[2]]\n#> [[2]][[1]]\n#> [1] -2\n#> \n#> [[2]]$x\n#> [1] 4\n#> \n#> [[2]]$y\n#> [1] 5 6\n#> \n#> [[2]]$z\n#> [1] \"b\"\n#> \n#> \n#> [[3]]\n#> [[3]][[1]]\n#> [1] -3\n#> \n#> [[3]]$x\n#> [1] 8\n#> \n#> [[3]]$y\n#> [1] 9 10 11\n```\n\n\n:::\n:::\n\n\nNotice that the \"first element\" means something different in standard `pluck()` versus `map`ped `pluck()`.\n\n\n::: {.cell}\n\n```{.r .cell-code}\npluck(my_list, 1)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [[1]]\n#> [1] -1\n#> \n#> $x\n#> [1] 1\n#> \n#> $y\n#> [1] 2\n#> \n#> $z\n#> [1] \"a\"\n```\n\n\n:::\n\n```{.r .cell-code}\nmap(my_list, pluck, 1)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [[1]]\n#> [1] -1\n#> \n#> [[2]]\n#> [1] -2\n#> \n#> [[3]]\n#> [1] -3\n```\n\n\n:::\n\n```{.r .cell-code}\nmap_dbl(my_list, pluck, 1)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] -1 -2 -3\n```\n\n\n:::\n:::\n\n\nThe `map()` functions also have shortcuts for extracting elements from vectors (powered by `purrr::pluck()`). Note that `map(my_list, 3)` is a shortcut for `map(my_list, pluck, 3)`.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Select by name\nmap_dbl(my_list, \"x\")\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1 4 8\n```\n\n\n:::\n\n```{.r .cell-code}\n# Or by position\nmap_dbl(my_list, 1)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] -1 -2 -3\n```\n\n\n:::\n\n```{.r .cell-code}\n# Or by both\nmap_dbl(my_list, list(\"y\", 1))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 2 5 9\n```\n\n\n:::\n\n```{.r .cell-code}\n# You'll get an error if you try to retrieve an inside item that doesn't have \n# a consistent format and you want a numeric output\nmap_dbl(my_list, list(\"y\"))\n```\n\n::: {.cell-output .cell-output-error}\n\n```\n#> Error in `map_dbl()`:\n#> ℹ In index: 2.\n#> Caused by error:\n#> ! Result must be length 1, not 2.\n```\n\n\n:::\n\n```{.r .cell-code}\n# You'll get an error if a component doesn't exist:\nmap_chr(my_list, \"z\")\n```\n\n::: {.cell-output .cell-output-error}\n\n```\n#> Error in `map_chr()`:\n#> ℹ In index: 3.\n#> Caused by error:\n#> ! Result must be length 1, not 0.\n```\n\n\n:::\n\n```{.r .cell-code}\n#> Error: Result 3 must be a single string, not NULL of length 0\n\n# Unless you supply a .default value\nmap_chr(my_list, \"z\", .default = NA)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"a\" \"b\" NA\n```\n\n\n:::\n\n```{.r .cell-code}\n#> [1] \"a\" \"b\" NA\n```\n:::\n\n\n\n## Not covered: `flatten()` {-}\n\n- `flatten()` will turn a list of lists into a simpler vector.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmy_list <-\n list(\n a = 1:3,\n b = list(1:3)\n )\n\nmy_list\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> $a\n#> [1] 1 2 3\n#> \n#> $b\n#> $b[[1]]\n#> [1] 1 2 3\n```\n\n\n:::\n\n```{.r .cell-code}\nmap_if(my_list, is.list, pluck)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> $a\n#> [1] 1 2 3\n#> \n#> $b\n#> $b[[1]]\n#> [1] 1 2 3\n```\n\n\n:::\n\n```{.r .cell-code}\nmap_if(my_list, is.list, flatten_int)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> $a\n#> [1] 1 2 3\n#> \n#> $b\n#> [1] 1 2 3\n```\n\n\n:::\n\n```{.r .cell-code}\nmap_if(my_list, is.list, flatten_int) |> \n flatten_int()\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1 2 3 1 2 3\n```\n\n\n:::\n:::\n\n\n## Dealing with Failures {-}\n\n## Safely {-}\n\n`safely()` is an adverb. It takes a function (a verb) and returns a modified version. In this case, the modified function will never throw an error. Instead it always returns a list with two elements.\n\n- `result` is the original result. If there is an error this will be NULL\n\n- `error` is an error object. If the operation was successful the \"`error`\" will be NULL.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nA <- list(1, 10, \"a\")\n\nmap(.x = A, .f = safely(log))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [[1]]\n#> [[1]]$result\n#> [1] 0\n#> \n#> [[1]]$error\n#> NULL\n#> \n#> \n#> [[2]]\n#> [[2]]$result\n#> [1] 2.302585\n#> \n#> [[2]]$error\n#> NULL\n#> \n#> \n#> [[3]]\n#> [[3]]$result\n#> NULL\n#> \n#> [[3]]$error\n#> <simpleError in .Primitive(\"log\")(x, base): non-numeric argument to mathematical function>\n```\n\n\n:::\n:::\n\n\n## Possibly {-}\n\n`possibly()` always succeeds. It is simpler than `safely()`, because you can give it a default value to return when there is an error.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nA <- list(1,10,\"a\")\n\nmap_dbl(.x = A, .f = possibly(log, otherwise = NA_real_) )\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 0.000000 2.302585 NA\n```\n\n\n:::\n:::\n\n\n\n## Meeting Videos\n\n### Cohort 1\n\n<iframe src=\"https://www.youtube.com/embed/o0a6aJ4kCkU\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 2\n\n<iframe src=\"https://www.youtube.com/embed/YrZ13_4vUMw\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 3\n\n<iframe src=\"https://www.youtube.com/embed/DUHXo527mHs\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 4\n\n<iframe src=\"https://www.youtube.com/embed/SpDpmhW62Ns\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 5\n\n<iframe src=\"https://www.youtube.com/embed/tYqFMtmhmiI\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 6\n\n<iframe src=\"https://www.youtube.com/embed/HmDlvnp6uNQ\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n<details>\n<summary> Meeting chat log </summary>\n\n```\n00:15:49\tMatt Dupree:\tdid anyone else lose audio?\n00:15:59\tFederica Gazzelloni:\tnot me\n00:16:02\tArthur Shaw:\tNot me either\n00:16:04\tTrevin:\tokay for me\n00:16:27\tMatt Dupree:\tgonna try rejoining\n00:43:14\tMatt Dupree:\toh i didn't know they invisibly returned .x! That's useful!\n00:48:29\tArthur Shaw:\tVery cool trick !\n```\n</details>\n\n### Cohort 7\n\n<iframe src=\"https://www.youtube.com/embed/t1N6XdidvNo\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n<details>\n\n<summary>Meeting chat log</summary>\n```\n00:34:09\tRon:\tSomeone did: https://cran.r-project.org/web/packages/comprehenr/vignettes/Introduction.html\n00:47:58\tcollinberke:\thttps://purrr.tidyverse.org/reference/safely.html\n00:48:24\tRon:\tit's a function operator !\n00:49:37\tRon:\t\\(x) length(unique(x) is not too verbose though\n00:49:39\tRon:\t;)\n01:06:50\tcollinberke:\thttps://colinfay.me/purrr-mappers/\n01:07:45\tcollinberke:\thttps://colinfay.me/purrr-web-mining/\n```\n</details>\n\n<iframe src=\"https://www.youtube.com/embed/6gY3KZWYC00\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n", + "markdown": "---\nengine: knitr\ntitle: Functionals\n---\n\n## Learning objectives:\n\n- Define functionals.\n- Use the `purrr::map()` family of functionals.\n- Use the `purrr::walk()` family of functionals.\n- Use the `purrr::reduce()` and `purrr::accumulate()` family of functionals.\n- Use `purrr::safely()` and `purrr::possibly()` to deal with failure.\n\n9.1. **Introduction**\n\n9.2. **map()**\n\n9.3. **purrr** style\n\n9.4. **map_** variants\n\n9.5. **reduce()** and **accumulate** family of functions\n\n- Some functions that weren't covered\n\n\n## What are functionals {-}\n\n## Introduction \n\n__Functionals__ are functions that take function as input and return a vector as output. Functionals that you probably have used before are: `apply()`, `lapply()` or `tapply()`. \n\n\n- alternatives to loops\n\n- a functional is better than a `for` loop is better than `while` is better than `repeat`\n\n\n### Benefits {-}\n\n\n- encourages function logic to be separated from iteration logic\n\n- can collapse into vectors/data frames easily\n\n\n## Map\n\n`map()` has two arguments, a vector and a function. It performs the function on each element of the vector and returns a list. We can also pass in some additional argument into the function.\n\n\n::: {.cell}\n::: {.cell-output-display}\n{width=448}\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsimple_map <- function(x, f, ...) {\nout <- vector(\"list\", length(x))\nfor (i in seq_along(x)) {\nout[[i]] <- f(x[[i]], ...)\n}\nout\n}\n```\n:::\n\n\n## Benefit of using the map function in purrr {-}\n\n- `purrr::map()` is equivalent to `lapply()`\n\n- returns a list and is the most general\n\n- the length of the input == the length of the output\n\n- `map()` is more flexible, with additional arguments allowed\n\n- `map()` has a host of extensions\n\n\n\n\n::: {.cell}\n\n:::\n\n\n## Atomic vectors {-}\n\n\n- has 4 variants to return atomic vectors\n - `map_chr()`\n - `map_dbl()`\n - `map_int()`\n - `map_lgl()`\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntriple <- function(x) x * 3\nmap(.x=1:3, .f=triple)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [[1]]\n#> [1] 3\n#> \n#> [[2]]\n#> [1] 6\n#> \n#> [[3]]\n#> [1] 9\n```\n\n\n:::\n\n```{.r .cell-code}\nmap_dbl(.x=1:3, .f=triple)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 3 6 9\n```\n\n\n:::\n\n```{.r .cell-code}\nmap_lgl(.x=c(1, NA, 3), .f=is.na)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] FALSE TRUE FALSE\n```\n\n\n:::\n:::\n\n\n## Anonymous functions and shortcuts {-}\n\n **Anonymous functions** \n\n::: {.cell}\n\n```{.r .cell-code}\nmap_dbl(.x=mtcars, .f=function(x) mean(x, na.rm = TRUE)) |> \n head()\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> mpg cyl disp hp drat wt \n#> 20.090625 6.187500 230.721875 146.687500 3.596563 3.217250\n```\n\n\n:::\n:::\n\n\n- the \"twiddle\" uses a twiddle `~` to set a formula\n- can use `.x` to reference the input `map(.x = ..., .f = )`\n\n::: {.cell}\n\n```{.r .cell-code}\nmap_dbl(.x=mtcars, .f=~mean(.x, na.rm = TRUE))\n```\n:::\n\n\n- can be simplified further as\n\n::: {.cell}\n\n```{.r .cell-code}\nmap_dbl(.x=mtcars, .f=mean, na.rm = TRUE)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> mpg cyl disp hp drat wt qsec \n#> 20.090625 6.187500 230.721875 146.687500 3.596563 3.217250 17.848750 \n#> vs am gear carb \n#> 0.437500 0.406250 3.687500 2.812500\n```\n\n\n:::\n:::\n\n\n- what happens when we try a handful of variants of the task at hand? (how many unique values are there for each variable?)\n\nNote that `.x` is the **name** of the first argument in `map()` (`.f` is the name of the second argument).\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# the task\nmap_dbl(mtcars, function(x) length(unique(x)))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> mpg cyl disp hp drat wt qsec vs am gear carb \n#> 25 3 27 22 22 29 30 2 2 3 6\n```\n\n\n:::\n\n```{.r .cell-code}\nmap_dbl(mtcars, function(unicorn) length(unique(unicorn)))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> mpg cyl disp hp drat wt qsec vs am gear carb \n#> 25 3 27 22 22 29 30 2 2 3 6\n```\n\n\n:::\n\n```{.r .cell-code}\nmap_dbl(mtcars, ~length(unique(.x)))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> mpg cyl disp hp drat wt qsec vs am gear carb \n#> 25 3 27 22 22 29 30 2 2 3 6\n```\n\n\n:::\n\n```{.r .cell-code}\nmap_dbl(mtcars, ~length(unique(..1)))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> mpg cyl disp hp drat wt qsec vs am gear carb \n#> 25 3 27 22 22 29 30 2 2 3 6\n```\n\n\n:::\n\n```{.r .cell-code}\nmap_dbl(mtcars, ~length(unique(.)))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> mpg cyl disp hp drat wt qsec vs am gear carb \n#> 25 3 27 22 22 29 30 2 2 3 6\n```\n\n\n:::\n\n```{.r .cell-code}\n# not the task\nmap_dbl(mtcars, length)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> mpg cyl disp hp drat wt qsec vs am gear carb \n#> 32 32 32 32 32 32 32 32 32 32 32\n```\n\n\n:::\n\n```{.r .cell-code}\nmap_dbl(mtcars, length(unique))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> mpg cyl disp hp drat wt qsec vs am gear carb \n#> 21.00 6.00 160.00 110.00 3.90 2.62 16.46 0.00 1.00 4.00 4.00\n```\n\n\n:::\n\n```{.r .cell-code}\nmap_dbl(mtcars, 1)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> mpg cyl disp hp drat wt qsec vs am gear carb \n#> 21.00 6.00 160.00 110.00 3.90 2.62 16.46 0.00 1.00 4.00 4.00\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\n#error\nmap_dbl(mtcars, length(unique()))\n```\n\n::: {.cell-output .cell-output-error}\n\n```\n#> Error in unique.default(): argument \"x\" is missing, with no default\n```\n\n\n:::\n\n```{.r .cell-code}\nmap_dbl(mtcars, ~length(unique(x)))\n```\n\n::: {.cell-output .cell-output-error}\n\n```\n#> Error in `map_dbl()`:\n#> ℹ In index: 1.\n#> ℹ With name: mpg.\n#> Caused by error in `.f()`:\n#> ! object 'x' not found\n```\n\n\n:::\n:::\n\n\n\n## Modify {-}\n\nSometimes we might want the output to be the same as the input, then in that case we can use the modify function rather than map\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndf <- data.frame(x=1:3,y=6:4)\n\nmap(df, .f=~.x*3)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> $x\n#> [1] 3 6 9\n#> \n#> $y\n#> [1] 18 15 12\n```\n\n\n:::\n\n```{.r .cell-code}\nmodify(.x=df,.f=~.x*3)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> x y\n#> 1 3 18\n#> 2 6 15\n#> 3 9 12\n```\n\n\n:::\n:::\n\n\nNote that `modify()` always returns the same type of output (which is not necessarily true with `map()`). Additionally, `modify()` does not actually change the value of `df`.\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndf\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> x y\n#> 1 1 6\n#> 2 2 5\n#> 3 3 4\n```\n\n\n:::\n:::\n\n\n\n## `purrr` style\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmtcars |> \n map(head, 20) |> # pull first 20 of each column\n map_dbl(mean) |> # mean of each vector\n head()\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> mpg cyl disp hp drat wt \n#> 20.13000 6.20000 233.93000 136.20000 3.54500 3.39845\n```\n\n\n:::\n:::\n\n\nAn example from `tidytuesday`\n\n::: {.cell}\n\n```{.r .cell-code}\ntt <- tidytuesdayR::tt_load(\"2020-06-30\")\n\n# filter data & exclude columns with lost of nulls\nlist_df <- \n map(\n .x = tt[1:3], \n .f = \n ~ .x |> \n filter(issue <= 152 | issue > 200) |> \n mutate(timeframe = ifelse(issue <= 152, \"first 5 years\", \"last 5 years\")) |> \n select_if(~mean(is.na(.x)) < 0.2) \n )\n\n\n# write to global environment\niwalk(\n .x = list_df,\n .f = ~ assign(x = .y, value = .x, envir = globalenv())\n)\n```\n:::\n\n\n## `map_*()` variants \n\nThere are many variants\n\n\n\n\n## `map2_*()` {-}\n\n- raise each value `.x` by 2\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmap_dbl(\n .x = 1:5, \n .f = function(x) x ^ 2\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1 4 9 16 25\n```\n\n\n:::\n:::\n\n\n- raise each value `.x` by another value `.y`\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmap2_dbl(\n .x = 1:5, \n .y = 2:6, \n .f = ~ (.x ^ .y)\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1 8 81 1024 15625\n```\n\n\n:::\n:::\n\n\n\n## The benefit of using the map over apply family of function {-}\n- It is written in C\n- It preserves names\n- We always know the return value type\n- We can apply the function for multiple input values\n- We can pass additional arguments into the function\n\n\n## `walk()` {-}\n\n\n- We use `walk()` when we want to call a function for it side effect(s) rather than its return value, like generating plots, `write.csv()`, or `ggsave()`. If you don't want a return value, `map()` will print more info than you may want.\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmap(1:3, ~cat(.x, \"\\n\"))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> 1 \n#> 2 \n#> 3\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [[1]]\n#> NULL\n#> \n#> [[2]]\n#> NULL\n#> \n#> [[3]]\n#> NULL\n```\n\n\n:::\n:::\n\n\n- for these cases, use `walk()` instead\n\n::: {.cell}\n\n```{.r .cell-code}\nwalk(1:3, ~cat(.x, \"\\n\"))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> 1 \n#> 2 \n#> 3\n```\n\n\n:::\n:::\n\n\n`cat()` does have a result, it's just usually returned invisibly.\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncat(\"hello\")\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> hello\n```\n\n\n:::\n\n```{.r .cell-code}\n(cat(\"hello\"))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> hello\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> NULL\n```\n\n\n:::\n:::\n\n\n\nWe can use `pwalk()` to save a list of plot to disk. Note that the \"p\" in `pwalk()` means that we have more than 1 (or 2) variables to pipe into the function. Also note that the name of the first argument in all of the \"p\" functions is now `.l` (instead of `.x`).\n\n\n::: {.cell}\n\n```{.r .cell-code}\nplots <- mtcars |> \n split(mtcars$cyl) |> \n map(~ggplot(.x, aes(mpg,wt)) +\n geom_point())\n\npaths <- stringr::str_c(names(plots), '.png')\n\npwalk(.l = list(paths,plots), .f = ggsave, path = tempdir())\n```\n\n::: {.cell-output .cell-output-stderr}\n\n```\n#> Saving 7 x 5 in image\n#> Saving 7 x 5 in image\n#> Saving 7 x 5 in image\n```\n\n\n:::\n\n```{.r .cell-code}\npmap(.l = list(paths,plots), .f = ggsave, path = tempdir())\n```\n\n::: {.cell-output .cell-output-stderr}\n\n```\n#> Saving 7 x 5 in image\n#> Saving 7 x 5 in image\n#> Saving 7 x 5 in image\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [[1]]\n#> [1] \"C:\\\\Users\\\\jonth\\\\AppData\\\\Local\\\\Temp\\\\RtmpSMVSZz/4.png\"\n#> \n#> [[2]]\n#> [1] \"C:\\\\Users\\\\jonth\\\\AppData\\\\Local\\\\Temp\\\\RtmpSMVSZz/6.png\"\n#> \n#> [[3]]\n#> [1] \"C:\\\\Users\\\\jonth\\\\AppData\\\\Local\\\\Temp\\\\RtmpSMVSZz/8.png\"\n```\n\n\n:::\n:::\n\n\n- walk, walk2 and pwalk all invisibly return .x the first argument. This makes them suitable for use in the middle of pipelines.\n\n- note: I don't think that it is \"`.x`\" (or \"`.l`\") that they are returning invisibly. But I'm not sure what it is. Hadley says:\n\n> purrr provides the walk family of functions that ignore the return values of the `.f` and instead return `.x` invisibly.\n\nBut not in the first `cat()` example, it is the `NULL` values that get returned invisibly (those aren't the same as `.x`).\n\n## `imap()` {-}\n\n- `imap()` is like `map2()`except that `.y` is derived from `names(.x)` if named or `seq_along(.x)` if not.\n\n- These two produce the same result\n\n\n::: {.cell}\n\n```{.r .cell-code}\nimap_chr(.x = mtcars, \n .f = ~ paste(.y, \"has a mean of\", round(mean(.x), 1))) |> \nhead()\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> mpg cyl \n#> \"mpg has a mean of 20.1\" \"cyl has a mean of 6.2\" \n#> disp hp \n#> \"disp has a mean of 230.7\" \"hp has a mean of 146.7\" \n#> drat wt \n#> \"drat has a mean of 3.6\" \"wt has a mean of 3.2\"\n```\n\n\n:::\n\n```{.r .cell-code}\nmap2_chr(.x = mtcars, \n .y = names(mtcars),\n .f = ~ paste(.y, \"has a mean of\", round(mean(.x), 1))) |> \nhead()\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> mpg cyl \n#> \"mpg has a mean of 20.1\" \"cyl has a mean of 6.2\" \n#> disp hp \n#> \"disp has a mean of 230.7\" \"hp has a mean of 146.7\" \n#> drat wt \n#> \"drat has a mean of 3.6\" \"wt has a mean of 3.2\"\n```\n\n\n:::\n:::\n\n\n\n## `pmap()` {-}\n\n- you can pass a named list or dataframe as arguments to a function\n\n- for example `runif()` has the parameters `n`, `min` and `max`\n\n\n::: {.cell}\n\n```{.r .cell-code}\nparams <- tibble::tribble(\n ~ n, ~ min, ~ max,\n 1L, 1, 10,\n 2L, 10, 100,\n 3L, 100, 1000\n)\n\npmap(params, runif)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [[1]]\n#> [1] 9.52234\n#> \n#> [[2]]\n#> [1] 49.53679 46.47017\n#> \n#> [[3]]\n#> [1] 488.8100 796.6801 282.7772\n```\n\n\n:::\n:::\n\n\n- could also be\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlist(\n n = 1:3, \n min = 10 ^ (0:2), \n max = 10 ^ (1:3)\n) |> \npmap(runif)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [[1]]\n#> [1] 5.246834\n#> \n#> [[2]]\n#> [1] 73.39068 30.57879\n#> \n#> [[3]]\n#> [1] 169.6667 950.1126 820.9357\n```\n\n\n:::\n:::\n\n\n- I like to use `expand_grid()` when I want all possible parameter combinations.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nexpand_grid(n = 1:3,\n min = 10 ^ (0:1),\n max = 10 ^ (1:2))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> # A tibble: 12 × 3\n#> n min max\n#> <int> <dbl> <dbl>\n#> 1 1 1 10\n#> 2 1 1 100\n#> 3 1 10 10\n#> 4 1 10 100\n#> 5 2 1 10\n#> 6 2 1 100\n#> 7 2 10 10\n#> 8 2 10 100\n#> 9 3 1 10\n#> 10 3 1 100\n#> 11 3 10 10\n#> 12 3 10 100\n```\n\n\n:::\n\n```{.r .cell-code}\nexpand_grid(n = 1:3,\n min = 10 ^ (0:1),\n max = 10 ^ (1:2)) |> \npmap(runif)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [[1]]\n#> [1] 9.474848\n#> \n#> [[2]]\n#> [1] 10.63548\n#> \n#> [[3]]\n#> [1] 10\n#> \n#> [[4]]\n#> [1] 92.44257\n#> \n#> [[5]]\n#> [1] 7.165047 6.201947\n#> \n#> [[6]]\n#> [1] 64.79074 16.54110\n#> \n#> [[7]]\n#> [1] 10 10\n#> \n#> [[8]]\n#> [1] 62.12314 52.31713\n#> \n#> [[9]]\n#> [1] 6.806213 5.541865 8.580469\n#> \n#> [[10]]\n#> [1] 7.10806 51.56879 85.70133\n#> \n#> [[11]]\n#> [1] 10 10 10\n#> \n#> [[12]]\n#> [1] 74.48871 11.65879 58.31278\n```\n\n\n:::\n:::\n\n\n\n\n## `reduce()` family\n\nThe `reduce()` function is a powerful functional that allows you to abstract away from a sequence of functions that are applied in a fixed direction.\n\n`reduce()` takes a vector as its first argument, a function as its second argument, and an optional `.init` argument last. It will then apply the function repeatedly to the vector until there is only a single element left.\n\n(Hint: start at the top of the image and read down.)\n\n\n::: {.cell}\n::: {.cell-output-display}\n{width=508}\n:::\n:::\n\n\n\nLet me really quickly demonstrate `reduce()` in action.\n\nSay you wanted to add up the numbers 1 through 5 using only the plus operator `+`. You could do something like:\n\n\n::: {.cell}\n\n```{.r .cell-code}\n1 + 2 + 3 + 4 + 5\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 15\n```\n\n\n:::\n:::\n\n\nWhich is the same as:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nreduce(1:5, `+`)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 15\n```\n\n\n:::\n:::\n\n\nAnd if you want the start value to be something that is not the first argument of the vector, pass that value to the .init argument:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nidentical(\n 0.5 + 1 + 2 + 3 + 4 + 5,\n reduce(1:5, `+`, .init = 0.5)\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] TRUE\n```\n\n\n:::\n:::\n\n\n## ggplot2 example with reduce {-}\n\n\n::: {.cell}\n\n```{.r .cell-code}\nggplot(mtcars, aes(hp, mpg)) + \n geom_point(size = 8, alpha = .5, color = \"yellow\") +\n geom_point(size = 4, alpha = .5, color = \"red\") +\n geom_point(size = 2, alpha = .5, color = \"blue\")\n```\n\n::: {.cell-output-display}\n{width=672}\n:::\n:::\n\n\nLet us use the `reduce()` function. Note that `reduce2()` takes two arguments, but the first value (`..1`) is given by the `.init` value.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nreduce2(\n c(8, 4, 2),\n c(\"yellow\", \"red\", \"blue\"),\n ~ ..1 + geom_point(size = ..2, alpha = .5, color = ..3),\n .init = ggplot(mtcars, aes(hp, mpg))\n)\n```\n\n::: {.cell-output-display}\n{width=672}\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndf <- list(age=tibble(name='john',age=30),\n sex=tibble(name=c('john','mary'),sex=c('M','F'),\n trt=tibble(name='Mary',treatment='A')))\n\ndf\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> $age\n#> # A tibble: 1 × 2\n#> name age\n#> <chr> <dbl>\n#> 1 john 30\n#> \n#> $sex\n#> # A tibble: 2 × 3\n#> name sex trt$name $treatment\n#> <chr> <chr> <chr> <chr> \n#> 1 john M Mary A \n#> 2 mary F Mary A\n```\n\n\n:::\n\n```{.r .cell-code}\ndf |> reduce(.f = full_join)\n```\n\n::: {.cell-output .cell-output-stderr}\n\n```\n#> Joining with `by = join_by(name)`\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> # A tibble: 2 × 4\n#> name age sex trt$name $treatment\n#> <chr> <dbl> <chr> <chr> <chr> \n#> 1 john 30 M Mary A \n#> 2 mary NA F Mary A\n```\n\n\n:::\n\n```{.r .cell-code}\nreduce(.x = df,.f = full_join)\n```\n\n::: {.cell-output .cell-output-stderr}\n\n```\n#> Joining with `by = join_by(name)`\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> # A tibble: 2 × 4\n#> name age sex trt$name $treatment\n#> <chr> <dbl> <chr> <chr> <chr> \n#> 1 john 30 M Mary A \n#> 2 mary NA F Mary A\n```\n\n\n:::\n:::\n\n\n- to see all intermediate steps, use **accumulate()**\n\n\n::: {.cell}\n\n```{.r .cell-code}\nset.seed(1234)\naccumulate(1:5, `+`)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1 3 6 10 15\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\naccumulate2(\n c(8, 4, 2),\n c(\"yellow\", \"red\", \"blue\"),\n ~ ..1 + geom_point(size = ..2, alpha = .5, color = ..3),\n .init = ggplot(mtcars, aes(hp, mpg))\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [[1]]\n```\n\n\n:::\n\n::: {.cell-output-display}\n{width=672}\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> \n#> [[2]]\n```\n\n\n:::\n\n::: {.cell-output-display}\n{width=672}\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> \n#> [[3]]\n```\n\n\n:::\n\n::: {.cell-output-display}\n{width=672}\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> \n#> [[4]]\n```\n\n\n:::\n\n::: {.cell-output-display}\n{width=672}\n:::\n:::\n\n\n\n## `map_df*()` variants {-}\n\n- `map_dfr()` = row bind the results\n\n- `map_dfc()` = column bind the results\n\n- Note that `map_dfr()` has been superseded by `map() |> list_rbind()` and `map_dfc()` has been superseded by `map() |> list_cbind()`\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncol_stats <- function(n) {\n head(mtcars, n) |> \n summarise_all(mean) |> \n mutate_all(floor) |> \n mutate(n = paste(\"N =\", n))\n}\n\nmap((1:2) * 10, col_stats)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [[1]]\n#> mpg cyl disp hp drat wt qsec vs am gear carb n\n#> 1 20 5 208 122 3 3 18 0 0 3 2 N = 10\n#> \n#> [[2]]\n#> mpg cyl disp hp drat wt qsec vs am gear carb n\n#> 1 20 6 233 136 3 3 18 0 0 3 2 N = 20\n```\n\n\n:::\n\n```{.r .cell-code}\nmap_dfr((1:2) * 10, col_stats)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> mpg cyl disp hp drat wt qsec vs am gear carb n\n#> 1 20 5 208 122 3 3 18 0 0 3 2 N = 10\n#> 2 20 6 233 136 3 3 18 0 0 3 2 N = 20\n```\n\n\n:::\n\n```{.r .cell-code}\nmap((1:2) * 10, col_stats) |> list_rbind()\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> mpg cyl disp hp drat wt qsec vs am gear carb n\n#> 1 20 5 208 122 3 3 18 0 0 3 2 N = 10\n#> 2 20 6 233 136 3 3 18 0 0 3 2 N = 20\n```\n\n\n:::\n:::\n\n\n---\n\n## `pluck()` {-}\n\n- `pluck()` will pull a single element from a list\n\nI like the example from the book because the starting object is not particularly easy to work with (as many JSON objects might not be).\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmy_list <- list(\n list(-1, x = 1, y = c(2), z = \"a\"),\n list(-2, x = 4, y = c(5, 6), z = \"b\"),\n list(-3, x = 8, y = c(9, 10, 11))\n)\nmy_list\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [[1]]\n#> [[1]][[1]]\n#> [1] -1\n#> \n#> [[1]]$x\n#> [1] 1\n#> \n#> [[1]]$y\n#> [1] 2\n#> \n#> [[1]]$z\n#> [1] \"a\"\n#> \n#> \n#> [[2]]\n#> [[2]][[1]]\n#> [1] -2\n#> \n#> [[2]]$x\n#> [1] 4\n#> \n#> [[2]]$y\n#> [1] 5 6\n#> \n#> [[2]]$z\n#> [1] \"b\"\n#> \n#> \n#> [[3]]\n#> [[3]][[1]]\n#> [1] -3\n#> \n#> [[3]]$x\n#> [1] 8\n#> \n#> [[3]]$y\n#> [1] 9 10 11\n```\n\n\n:::\n:::\n\n\nNotice that the \"first element\" means something different in standard `pluck()` versus `map`ped `pluck()`.\n\n\n::: {.cell}\n\n```{.r .cell-code}\npluck(my_list, 1)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [[1]]\n#> [1] -1\n#> \n#> $x\n#> [1] 1\n#> \n#> $y\n#> [1] 2\n#> \n#> $z\n#> [1] \"a\"\n```\n\n\n:::\n\n```{.r .cell-code}\nmap(my_list, pluck, 1)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [[1]]\n#> [1] -1\n#> \n#> [[2]]\n#> [1] -2\n#> \n#> [[3]]\n#> [1] -3\n```\n\n\n:::\n\n```{.r .cell-code}\nmap_dbl(my_list, pluck, 1)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] -1 -2 -3\n```\n\n\n:::\n:::\n\n\nThe `map()` functions also have shortcuts for extracting elements from vectors (powered by `purrr::pluck()`). Note that `map(my_list, 3)` is a shortcut for `map(my_list, pluck, 3)`.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Select by name\nmap_dbl(my_list, \"x\")\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1 4 8\n```\n\n\n:::\n\n```{.r .cell-code}\n# Or by position\nmap_dbl(my_list, 1)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] -1 -2 -3\n```\n\n\n:::\n\n```{.r .cell-code}\n# Or by both\nmap_dbl(my_list, list(\"y\", 1))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 2 5 9\n```\n\n\n:::\n\n```{.r .cell-code}\n# You'll get an error if you try to retrieve an inside item that doesn't have \n# a consistent format and you want a numeric output\nmap_dbl(my_list, list(\"y\"))\n```\n\n::: {.cell-output .cell-output-error}\n\n```\n#> Error in `map_dbl()`:\n#> ℹ In index: 2.\n#> Caused by error:\n#> ! Result must be length 1, not 2.\n```\n\n\n:::\n\n```{.r .cell-code}\n# You'll get an error if a component doesn't exist:\nmap_chr(my_list, \"z\")\n```\n\n::: {.cell-output .cell-output-error}\n\n```\n#> Error in `map_chr()`:\n#> ℹ In index: 3.\n#> Caused by error:\n#> ! Result must be length 1, not 0.\n```\n\n\n:::\n\n```{.r .cell-code}\n#> Error: Result 3 must be a single string, not NULL of length 0\n\n# Unless you supply a .default value\nmap_chr(my_list, \"z\", .default = NA)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"a\" \"b\" NA\n```\n\n\n:::\n\n```{.r .cell-code}\n#> [1] \"a\" \"b\" NA\n```\n:::\n\n\n\n## Not covered: `flatten()` {-}\n\n- `flatten()` will turn a list of lists into a simpler vector.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmy_list <-\n list(\n a = 1:3,\n b = list(1:3)\n )\n\nmy_list\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> $a\n#> [1] 1 2 3\n#> \n#> $b\n#> $b[[1]]\n#> [1] 1 2 3\n```\n\n\n:::\n\n```{.r .cell-code}\nmap_if(my_list, is.list, pluck)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> $a\n#> [1] 1 2 3\n#> \n#> $b\n#> $b[[1]]\n#> [1] 1 2 3\n```\n\n\n:::\n\n```{.r .cell-code}\nmap_if(my_list, is.list, flatten_int)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> $a\n#> [1] 1 2 3\n#> \n#> $b\n#> [1] 1 2 3\n```\n\n\n:::\n\n```{.r .cell-code}\nmap_if(my_list, is.list, flatten_int) |> \n flatten_int()\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1 2 3 1 2 3\n```\n\n\n:::\n:::\n\n\n## Dealing with Failures {-}\n\n## Safely {-}\n\n`safely()` is an adverb. It takes a function (a verb) and returns a modified version. In this case, the modified function will never throw an error. Instead it always returns a list with two elements.\n\n- `result` is the original result. If there is an error this will be NULL\n\n- `error` is an error object. If the operation was successful the \"`error`\" will be NULL.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nA <- list(1, 10, \"a\")\n\nmap(.x = A, .f = safely(log))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [[1]]\n#> [[1]]$result\n#> [1] 0\n#> \n#> [[1]]$error\n#> NULL\n#> \n#> \n#> [[2]]\n#> [[2]]$result\n#> [1] 2.302585\n#> \n#> [[2]]$error\n#> NULL\n#> \n#> \n#> [[3]]\n#> [[3]]$result\n#> NULL\n#> \n#> [[3]]$error\n#> <simpleError in .Primitive(\"log\")(x, base): non-numeric argument to mathematical function>\n```\n\n\n:::\n:::\n\n\n## Possibly {-}\n\n`possibly()` always succeeds. It is simpler than `safely()`, because you can give it a default value to return when there is an error.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nA <- list(1,10,\"a\")\n\nmap_dbl(.x = A, .f = possibly(log, otherwise = NA_real_) )\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 0.000000 2.302585 NA\n```\n\n\n:::\n:::\n\n", "supporting": [ "09_files" ], diff --git a/_freeze/slides/09/figure-html/unnamed-chunk-28-1.png b/_freeze/slides/09/figure-html/unnamed-chunk-28-1.png Binary files differ. diff --git a/_freeze/slides/09/figure-html/unnamed-chunk-29-1.png b/_freeze/slides/09/figure-html/unnamed-chunk-29-1.png Binary files differ. diff --git a/_freeze/slides/09/figure-html/unnamed-chunk-32-1.png b/_freeze/slides/09/figure-html/unnamed-chunk-32-1.png Binary files differ. diff --git a/_freeze/slides/09/figure-html/unnamed-chunk-32-2.png b/_freeze/slides/09/figure-html/unnamed-chunk-32-2.png Binary files differ. diff --git a/_freeze/slides/09/figure-html/unnamed-chunk-32-3.png b/_freeze/slides/09/figure-html/unnamed-chunk-32-3.png Binary files differ. diff --git a/_freeze/slides/09/figure-html/unnamed-chunk-32-4.png b/_freeze/slides/09/figure-html/unnamed-chunk-32-4.png Binary files differ. diff --git a/_freeze/slides/10/execute-results/html.json b/_freeze/slides/10/execute-results/html.json @@ -1,8 +1,8 @@ { - "hash": "2e08d39a04b5d027cbc27b6e0102c450", + "hash": "3855cbc6f04c2a735a53df74125f5988", "result": { "engine": "knitr", - "markdown": "---\nengine: knitr\ntitle: Function factories\n---\n\n## Learning objectives:\n\n- Understand what a function factory is\n- Recognise how function factories work\n- Learn about non-obvious combination of function features\n- Generate a family of functions from data\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(rlang)\nlibrary(ggplot2)\nlibrary(scales)\n```\n:::\n\n\n\n## What is a function factory?\n\n\nA **function factory** is a function that makes (returns) functions\n\nFactory made function are **manufactured functions**.\n\n\n::: {.cell layout-align=\"center\"}\n::: {.cell-output-display}\n{fig-align='center' fig-alt='https://epsis.com/no/operations-centers-focus-on-ways-of-working/' width=512}\n:::\n:::\n\n\n\n\n## How does a function factory work?\n\n::: {.cell layout-align=\"center\"}\n::: {.cell-output-display}\n{fig-align='center' width=130}\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\npower1 <- function(exp) {\n function(x) {\n x ^ exp\n }\n}\n\nsquare <- power1(2)\ncube <- power1(3)\n```\n:::\n\n`power1()` is the function factory and `square()` and `cube()` are manufactured functions.\n\n## Important to remember\n\n1. R has First-class functions (can be created with `function()` and `<-`)\n\n> R functions are objects in their own right, a language property often called “first-class functions” \n> -- [Section 6.2.3](https://adv-r.hadley.nz/functions.html?q=first%20class#first-class-functions)\n\n2. Functions capture (enclose) environment in which they are created\n\n\n::: {.cell}\n\n```{.r .cell-code}\nf <- function(x) function(y) x + y\nfn_env(f) # The function f()\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> <environment: R_GlobalEnv>\n```\n\n\n:::\n\n```{.r .cell-code}\nfn_env(f()) # The function created by f()\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> <environment: 0x000001c9f325a8f8>\n```\n\n\n:::\n:::\n\n\n3. Functions create a new environment on each run\n\n::: {.cell}\n\n```{.r .cell-code}\nf <- function(x) {\n function() x + 1\n}\nff <- f(1)\nff()\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 2\n```\n\n\n:::\n\n```{.r .cell-code}\nff()\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 2\n```\n\n\n:::\n:::\n\n\n\n## Fundamentals - Environment\n\n- Environment when function is created defines arguments in the function\n- Use `env_print(fun)` and `fn_env()` to explore\n\n\n::: {.cell}\n\n```{.r .cell-code}\nenv_print(square)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> <environment: 0x000001c9f4370070>\n#> Parent: <environment: global>\n#> Bindings:\n#> • exp: <lazy>\n```\n\n\n:::\n\n```{.r .cell-code}\nfn_env(square)$exp\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 2\n```\n\n\n:::\n:::\n\n\n{width=50% fig-align=center}\n\n## Fundamentals - Forcing\n\n- Lazy evaluation means arguments only evaluated when used\n- \"[can] lead to a real head-scratcher of a bug\"\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- 2\nsquare <- power1(x)\nx <- 3\nsquare(4)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 64\n```\n\n\n:::\n:::\n\n\n- *Only applies if passing object as argument*\n- Here argument `2` evaluated when function called\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsquare <- power1(2)\nx <- 3\nsquare(4)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 16\n```\n\n\n:::\n:::\n\n\nSo use `force()`! (Unless you want it to change with the `x` in the parent environment)\n\n## Forcing - Reiterated\n\nOnly required if the argument is **not** evaluated before the new function is created:\n\n::: {.cell}\n\n```{.r .cell-code}\npower1 <- function(exp) {\n stopifnot(is.numeric(exp))\n function(x) x ^ exp\n}\n\nx <- 2\nsquare <- power1(x)\nx <- 3\nsquare(4)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 16\n```\n\n\n:::\n:::\n\n\n## Fundamentals - Stateful functions\n\nBecause\n\n- The enclosing environment is unique and constant, and\n- We have `<<-` (super assignment)\n\nWe can *change* that enclosing environment and keep track of that state\nacross iterations (!)\n\n- `<-` Assignment in *current* environment\n- `<<-` Assignment in *parent* environment\n\n\n::: {.cell}\n\n```{.r .cell-code}\nnew_counter <- function() {\n i <- 0 \n function() {\n i <<- i + 1 # second assignment (super assignment)\n i\n }\n}\n\ncounter_one <- new_counter()\ncounter_two <- new_counter()\nc(counter_one(), counter_one(), counter_one())\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1 2 3\n```\n\n\n:::\n\n```{.r .cell-code}\nc(counter_two(), counter_two(), counter_two())\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1 2 3\n```\n\n\n:::\n:::\n\n\n\n> \"As soon as your function starts managing the state of multiple variables, it’s better to switch to R6\"\n\n## Fundamentals - Garbage collection\n\n- Because environment is attached to (enclosed by) function, temporary objects\ndon't go away.\n\n**Cleaning up** using `rm()` inside a function:\n\n::: {.cell}\n\n```{.r .cell-code}\nf_dirty <- function(n) {\n x <- runif(n)\n m <- mean(x)\n function() m\n}\n\nf_clean <- function(n) {\n x <- runif(n)\n m <- mean(x)\n rm(x) # <---- Important part!\n function() m\n}\n\nlobstr::obj_size(f_dirty(1e6))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> 8.00 MB\n```\n\n\n:::\n\n```{.r .cell-code}\nlobstr::obj_size(f_clean(1e6))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> 504 B\n```\n\n\n:::\n:::\n\n\n\n## Useful Examples - Histograms and binwidth\n\n**Useful when...** \n\n- You need to pass a function \n- You don't want to have to re-write the function every time \n (the *default* behaviour of the function should be flexible)\n\n\nFor example, these bins are not appropriate\n\n::: {.cell}\n\n```{.r .cell-code}\nsd <- c(1, 5, 15)\nn <- 100\ndf <- data.frame(x = rnorm(3 * n, sd = sd), sd = rep(sd, n))\n\nggplot(df, aes(x)) + \n geom_histogram(binwidth = 2) + \n facet_wrap(~ sd, scales = \"free_x\") + \n labs(x = NULL)\n```\n\n::: {.cell-output-display}\n{width=672}\n:::\n:::\n\n\nWe could just make a function...\n\n::: {.cell}\n\n```{.r .cell-code}\nbinwidth_bins <- function(x) (max(x) - min(x)) / 20\n\nggplot(df, aes(x = x)) + \n geom_histogram(binwidth = binwidth_bins) + \n facet_wrap(~ sd, scales = \"free_x\") + \n labs(x = NULL)\n```\n\n::: {.cell-output-display}\n{width=672}\n:::\n:::\n\n\nBut if we want to change the number of bins (20) we'd have to re-write the function\neach time.\n\nIf we use a factory, we don't have to do that.\n\n::: {.cell}\n\n```{.r .cell-code}\nbinwidth_bins <- function(n) {\n force(n)\n function(x) (max(x) - min(x)) / n\n}\n\nggplot(df, aes(x = x)) + \n geom_histogram(binwidth = binwidth_bins(20)) + \n facet_wrap(~ sd, scales = \"free_x\") + \n labs(x = NULL, title = \"20 bins\")\n```\n\n::: {.cell-output-display}\n{width=672}\n:::\n\n```{.r .cell-code}\nggplot(df, aes(x = x)) + \n geom_histogram(binwidth = binwidth_bins(5)) + \n facet_wrap(~ sd, scales = \"free_x\") + \n labs(x = NULL, title = \"5 bins\")\n```\n\n::: {.cell-output-display}\n{width=672}\n:::\n:::\n\n\n> Similar benefit in Box-cox example\n\n## Useful Examples - Wrapper\n\n**Useful when...**\n\n- You want to create a function that wraps a bunch of other functions\n\nFor example, `ggsave()` wraps a bunch of different graphics device functions: \n\n\n::: {.cell}\n\n```{.r .cell-code}\n# (Even more simplified)\nplot_dev <- function(ext, dpi = 96) {\n force(dpi)\n \n switch(\n ext,\n svg = function(filename, ...) svglite::svglite(file = filename, ...),\n png = function(...) grDevices::png(..., res = dpi, units = \"in\"),\n jpg = ,\n jpeg = function(...) grDevices::jpeg(..., res = dpi, units = \"in\"),\n stop(\"Unknown graphics extension: \", ext, call. = FALSE)\n )\n}\n```\n:::\n\n\nThen `ggsave()` uses\n\n```\nggsave <- function(...) {\n dev <- plot_dev(device, filename, dpi = dpi)\n ...\n dev(filename = filename, width = dim[1], height = dim[2], bg = bg, ...)\n ...\n}\n```\n\nOtherwise, would have to do something like like a bunch of if/else statements.\n\n## Useful Examples - Optimizing\n\n**Useful when...**\n\n- Want to pass function on to `optimise()`/`optimize()`\n- Want to perform pre-computations to speed things up\n- Want to re-use this for other datasets\n\n(*Skipping to final results from section*)\n\nHere, using MLE want to to find the most likely value of lambda for a Poisson distribution\nand this data.\n\n::: {.cell}\n\n```{.r .cell-code}\nx1 <- c(41, 30, 31, 38, 29, 24, 30, 29, 31, 38)\n```\n:::\n\n\nWe'll create a function that creates a lambda assessment function for a given \ndata set.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nll_poisson <- function(x) {\n n <- length(x)\n sum_x <- sum(x)\n c <- sum(lfactorial(x))\n\n function(lambda) {\n log(lambda) * sum_x - n * lambda - c\n }\n}\n```\n:::\n\n\nWe can use this on different data sets, but here use ours `x1`\n\n::: {.cell}\n\n```{.r .cell-code}\nll <- ll_poisson(x1)\nll(10) # Log-probility of a lambda = 10\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] -183.6405\n```\n\n\n:::\n:::\n\n\nUse `optimise()` rather than trial and error\n\n::: {.cell}\n\n```{.r .cell-code}\noptimise(ll, c(0, 100), maximum = TRUE)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> $maximum\n#> [1] 32.09999\n#> \n#> $objective\n#> [1] -30.26755\n```\n\n\n:::\n:::\n\n\nResult: Highest log-probability is -30.3, best lambda is 32.1\n\n\n## Function factories + functionals\n\nCombine functionals and function factories to turn data into many functions.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nnames <- list(\n square = 2, \n cube = 3, \n root = 1/2, \n cuberoot = 1/3, \n reciprocal = -1\n)\nfuns <- purrr::map(names, power1)\nnames(funs)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"square\" \"cube\" \"root\" \"cuberoot\" \"reciprocal\"\n```\n\n\n:::\n\n```{.r .cell-code}\nfuns$root(64)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 8\n```\n\n\n:::\n\n```{.r .cell-code}\nfuns$square(3)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 9\n```\n\n\n:::\n:::\n\n\nAvoid the prefix with\n\n- `with()` - `with(funs, root(100))`\n - Temporary, clear, short-term\n- `attach()` - `attach(funs)` / `detach(funs)`\n - Added to search path (like package function), cannot be overwritten, but can be attached multiple times!\n- `rlang::env_bind` - `env_bind(globalenv(), !!!funs)` / `env_unbind(gloablenv(), names(funs))`\n - Added to global env (like created function), can be overwritten\n\n<!--\n## EXTRA - Previous set of slides\n\nGraphical factories **useful function factories**, such as:\n\n1. Labelling with:\n\n * formatter functions\n \n\n::: {.cell}\n\n```{.r .cell-code}\ny <- c(12345, 123456, 1234567)\ncomma_format()(y)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"12,345\" \"123,456\" \"1,234,567\"\n```\n\n\n:::\n:::\n\n\n::: {.cell}\n\n```{.r .cell-code}\nnumber_format(scale = 1e-3, suffix = \" K\")(y)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"12 K\" \"123 K\" \"1 235 K\"\n```\n\n\n:::\n:::\n\nThey are more commonly used inside a ggplot:\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\na_ggplot_object + \n scale_y_continuous(\n labels = comma_format()\n)\n```\n\n::: {.cell-output-display}\n{width=672}\n:::\n:::\n\n\n2. Using binwidth in facet histograms\n\n * binwidth_bins\n \n\n::: {.cell}\n\n```{.r .cell-code}\nbinwidth_bins <- function(n) {\n force(n)\n \n function(x) {\n (max(x) - min(x)) / n\n }\n}\n```\n:::\n\n \nOr use a concatenation of this typr of detecting number of bins functions:\n\n - nclass.Sturges()\n - nclass.scott()\n - nclass.FD()\n \n\n::: {.cell}\n\n```{.r .cell-code}\nbase_bins <- function(type) {\n fun <- switch(type,\n Sturges = nclass.Sturges,\n scott = nclass.scott,\n FD = nclass.FD,\n stop(\"Unknown type\", call. = FALSE)\n )\n \n function(x) {\n (max(x) - min(x)) / fun(x)\n }\n}\n```\n:::\n\n \n\n3. Internals:\n\n * ggplot2:::plot_dev()\n\n\n## Non-obvious combinations\n\n\n- The **Box-Cox** transformation.\n- **Bootstrap** resampling.\n- **Maximum likelihood** estimation.\n\n\n### Statistical factories\n\nThe **Box-Cox** transformation towards normality:\n\n::: {.cell}\n\n```{.r .cell-code}\nboxcox1 <- function(x, lambda) {\n stopifnot(length(lambda) == 1)\n \n if (lambda == 0) {\n log(x)\n } else {\n (x ^ lambda - 1) / lambda\n }\n}\n```\n:::\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nboxcox2 <- function(lambda) {\n if (lambda == 0) {\n function(x) log(x)\n } else {\n function(x) (x ^ lambda - 1) / lambda\n }\n}\n\nstat_boxcox <- function(lambda) {\n stat_function(aes(colour = lambda), fun = boxcox2(lambda), size = 1)\n}\n\nplot1 <- ggplot(data.frame(x = c(0, 5)), aes(x)) + \n lapply(c(0.5, 1, 1.5), stat_boxcox) + \n scale_colour_viridis_c(limits = c(0, 1.5))\n```\n\n::: {.cell-output .cell-output-stderr}\n\n```\n#> Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.\n#> ℹ Please use `linewidth` instead.\n```\n\n\n:::\n\n```{.r .cell-code}\n# visually, log() does seem to make sense as the transformation\n# for lambda = 0; as values get smaller and smaller, the function\n# gets close and closer to a log transformation\nplot2 <- ggplot(data.frame(x = c(0.01, 1)), aes(x)) + \n lapply(c(0.5, 0.25, 0.1, 0), stat_boxcox) + \n scale_colour_viridis_c(limits = c(0, 1.5))\nlibrary(patchwork)\nplot1+plot2\n```\n\n::: {.cell-output-display}\n{width=672}\n:::\n:::\n\n\n**Bootstrap generators**\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nboot_permute <- function(df, var) {\n n <- nrow(df)\n force(var)\n \n function() {\n col <- df[[var]]\n col[sample(n, replace = TRUE)]\n }\n}\n\nboot_mtcars1 <- boot_permute(mtcars, \"mpg\")\nhead(boot_mtcars1())\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 30.4 16.4 15.8 14.7 10.4 21.4\n```\n\n\n:::\n\n```{.r .cell-code}\n#> [1] 16.4 22.8 22.8 22.8 16.4 19.2\nhead(boot_mtcars1())\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 32.4 26.0 17.3 13.3 26.0 18.1\n```\n\n\n:::\n\n```{.r .cell-code}\n#> [1] 17.8 18.7 30.4 30.4 16.4 21.0\n```\n:::\n\n\n::: {.cell}\n\n```{.r .cell-code}\nboot_model <- function(df, formula) {\n mod <- lm(formula, data = df)\n fitted <- unname(fitted(mod))\n resid <- unname(resid(mod))\n rm(mod)\n\n function() {\n fitted + sample(resid)\n }\n} \n\nboot_mtcars2 <- boot_model(mtcars, mpg ~ wt)\nhead(boot_mtcars2())\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 25.38590 24.26973 22.27495 20.96952 15.69478 19.96660\n```\n\n\n:::\n\n```{.r .cell-code}\n#> [1] 25.0 24.0 21.7 19.2 24.9 16.0\nhead(boot_mtcars2())\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 27.44635 23.66597 30.86703 18.21963 21.36451 15.06639\n```\n\n\n:::\n\n```{.r .cell-code}\n#> [1] 27.4 21.0 20.3 19.4 16.3 21.3\n```\n:::\n\n\n**Maximum likelihood estimation**\n\n$$P(\\lambda,x)=\\prod_{i=1}^{n}\\frac{\\lambda^{x_i}e^{-\\lambda}}{x_i!}$$\n\n::: {.cell}\n\n```{.r .cell-code}\nlprob_poisson <- function(lambda, x) {\n n <- length(x)\n (log(lambda) * sum(x)) - (n * lambda) - sum(lfactorial(x))\n}\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx1 <- c(41, 30, 31, 38, 29, 24, 30, 29, 31, 38)\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlprob_poisson(10, x1)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] -183.6405\n```\n\n\n:::\n\n```{.r .cell-code}\n#> [1] -184\nlprob_poisson(20, x1)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] -61.14028\n```\n\n\n:::\n\n```{.r .cell-code}\n#> [1] -61.1\nlprob_poisson(30, x1)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] -30.98598\n```\n\n\n:::\n\n```{.r .cell-code}\n#> [1] -31\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nll_poisson1 <- function(x) {\n n <- length(x)\n\n function(lambda) {\n log(lambda) * sum(x) - n * lambda - sum(lfactorial(x))\n }\n}\n```\n:::\n\n\n::: {.cell}\n\n```{.r .cell-code}\nll_poisson2 <- function(x) {\n n <- length(x)\n sum_x <- sum(x)\n c <- sum(lfactorial(x))\n\n function(lambda) {\n log(lambda) * sum_x - n * lambda - c\n }\n}\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nll1 <- ll_poisson2(x1)\n\nll1(10)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] -183.6405\n```\n\n\n:::\n\n```{.r .cell-code}\n#> [1] -184\nll1(20)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] -61.14028\n```\n\n\n:::\n\n```{.r .cell-code}\n#> [1] -61.1\nll1(30)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] -30.98598\n```\n\n\n:::\n\n```{.r .cell-code}\n#> [1] -31\n```\n:::\n\n\n::: {.cell}\n\n```{.r .cell-code}\noptimise(ll1, c(0, 100), maximum = TRUE)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> $maximum\n#> [1] 32.09999\n#> \n#> $objective\n#> [1] -30.26755\n```\n\n\n:::\n\n```{.r .cell-code}\n#> $maximum\n#> [1] 32.1\n#> \n#> $objective\n#> [1] -30.3\n```\n:::\n\n\n::: {.cell}\n\n```{.r .cell-code}\noptimise(lprob_poisson, c(0, 100), x = x1, maximum = TRUE)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> $maximum\n#> [1] 32.09999\n#> \n#> $objective\n#> [1] -30.26755\n```\n\n\n:::\n\n```{.r .cell-code}\n#> $maximum\n#> [1] 32.1\n#> \n#> $objective\n#> [1] -30.3\n```\n:::\n\n\n## Function factory applications\n\n\nCombine functionals and function factories to turn data into many functions.\n\n### Function factories + functionals\n\n::: {.cell}\n\n```{.r .cell-code}\nnames <- list(\n square = 2, \n cube = 3, \n root = 1/2, \n cuberoot = 1/3, \n reciprocal = -1\n)\nfuns <- purrr::map(names, power1)\n\nfuns$root(64)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 8\n```\n\n\n:::\n\n```{.r .cell-code}\n#> [1] 8\nfuns$root\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> function (x) \n#> x^exp\n#> <bytecode: 0x000001c9f32287e8>\n#> <environment: 0x000001c9f4276658>\n```\n\n\n:::\n\n```{.r .cell-code}\n#> function(x) {\n#> x ^ exp\n#> }\n#> <bytecode: 0x7fe85512a410>\n#> <environment: 0x7fe85b21f190>\n```\n:::\n\n\n::: {.cell}\n\n```{.r .cell-code}\nwith(funs, root(100))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 10\n```\n\n\n:::\n\n```{.r .cell-code}\n#> [1] 10\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nattach(funs)\n```\n\n::: {.cell-output .cell-output-stderr}\n\n```\n#> The following objects are masked _by_ .GlobalEnv:\n#> \n#> cube, square\n```\n\n\n:::\n\n```{.r .cell-code}\n#> The following objects are masked _by_ .GlobalEnv:\n#> \n#> cube, square\nroot(100)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 10\n```\n\n\n:::\n\n```{.r .cell-code}\n#> [1] 10\ndetach(funs)\n```\n:::\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nrlang::env_bind(globalenv(), !!!funs)\nroot(100)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 10\n```\n\n\n:::\n\n```{.r .cell-code}\n#> [1] 10\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nrlang::env_unbind(globalenv(), names(funs))\n```\n:::\n\n\n\n-->\n\n\n## Meeting Videos\n\n### Cohort 1\n\n<iframe src=\"https://www.youtube.com/embed/enI5Ynq6olI\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 2\n\n<iframe src=\"https://www.youtube.com/embed/U-CoF7MCik0\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 3\n\n<iframe src=\"https://www.youtube.com/embed/qgn7WTITnNs\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 4\n\n<iframe src=\"https://www.youtube.com/embed/GHp2W4JxVaY\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 5\n\n<iframe src=\"https://www.youtube.com/embed/8TGXjzi0n0o\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 6\n\n<iframe src=\"https://www.youtube.com/embed/FUoYwYFqT7Q\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n<details>\n<summary> Meeting chat log </summary>\n\n```\n01:02:25\tTrevin:\tI'm good with combining 👍\n01:02:57\tOluwafemi Oyedele:\tI agree with combining the chapter!!!\n```\n</details>\n\n### Cohort 7\n\n<iframe src=\"https://www.youtube.com/embed/7GLyO3IntgE\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n", + "markdown": "---\nengine: knitr\ntitle: Function factories\n---\n\n## Learning objectives:\n\n- Understand what a function factory is\n- Recognise how function factories work\n- Learn about non-obvious combination of function features\n- Generate a family of functions from data\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(rlang)\nlibrary(ggplot2)\nlibrary(scales)\n```\n:::\n\n\n\n## What is a function factory?\n\n\nA **function factory** is a function that makes (returns) functions\n\nFactory made function are **manufactured functions**.\n\n\n::: {.cell layout-align=\"center\"}\n::: {.cell-output-display}\n{fig-align='center' fig-alt='https://epsis.com/no/operations-centers-focus-on-ways-of-working/' width=512}\n:::\n:::\n\n\n\n\n## How does a function factory work?\n\n::: {.cell layout-align=\"center\"}\n::: {.cell-output-display}\n{fig-align='center' width=130}\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\npower1 <- function(exp) {\n function(x) {\n x ^ exp\n }\n}\n\nsquare <- power1(2)\ncube <- power1(3)\n```\n:::\n\n`power1()` is the function factory and `square()` and `cube()` are manufactured functions.\n\n## Important to remember\n\n1. R has First-class functions (can be created with `function()` and `<-`)\n\n> R functions are objects in their own right, a language property often called “first-class functions” \n> -- [Section 6.2.3](https://adv-r.hadley.nz/functions.html?q=first%20class#first-class-functions)\n\n2. Functions capture (enclose) environment in which they are created\n\n\n::: {.cell}\n\n```{.r .cell-code}\nf <- function(x) function(y) x + y\nfn_env(f) # The function f()\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> <environment: R_GlobalEnv>\n```\n\n\n:::\n\n```{.r .cell-code}\nfn_env(f()) # The function created by f()\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> <environment: 0x0000029fbf09a6d0>\n```\n\n\n:::\n:::\n\n\n3. Functions create a new environment on each run\n\n::: {.cell}\n\n```{.r .cell-code}\nf <- function(x) {\n function() x + 1\n}\nff <- f(1)\nff()\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 2\n```\n\n\n:::\n\n```{.r .cell-code}\nff()\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 2\n```\n\n\n:::\n:::\n\n\n\n## Fundamentals - Environment\n\n- Environment when function is created defines arguments in the function\n- Use `env_print(fun)` and `fn_env()` to explore\n\n\n::: {.cell}\n\n```{.r .cell-code}\nenv_print(square)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> <environment: 0x0000029fc0169900>\n#> Parent: <environment: global>\n#> Bindings:\n#> • exp: <lazy>\n```\n\n\n:::\n\n```{.r .cell-code}\nfn_env(square)$exp\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 2\n```\n\n\n:::\n:::\n\n\n{width=50% fig-align=center}\n\n## Fundamentals - Forcing\n\n- Lazy evaluation means arguments only evaluated when used\n- \"[can] lead to a real head-scratcher of a bug\"\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- 2\nsquare <- power1(x)\nx <- 3\nsquare(4)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 64\n```\n\n\n:::\n:::\n\n\n- *Only applies if passing object as argument*\n- Here argument `2` evaluated when function called\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsquare <- power1(2)\nx <- 3\nsquare(4)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 16\n```\n\n\n:::\n:::\n\n\nSo use `force()`! (Unless you want it to change with the `x` in the parent environment)\n\n## Forcing - Reiterated\n\nOnly required if the argument is **not** evaluated before the new function is created:\n\n::: {.cell}\n\n```{.r .cell-code}\npower1 <- function(exp) {\n stopifnot(is.numeric(exp))\n function(x) x ^ exp\n}\n\nx <- 2\nsquare <- power1(x)\nx <- 3\nsquare(4)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 16\n```\n\n\n:::\n:::\n\n\n## Fundamentals - Stateful functions\n\nBecause\n\n- The enclosing environment is unique and constant, and\n- We have `<<-` (super assignment)\n\nWe can *change* that enclosing environment and keep track of that state\nacross iterations (!)\n\n- `<-` Assignment in *current* environment\n- `<<-` Assignment in *parent* environment\n\n\n::: {.cell}\n\n```{.r .cell-code}\nnew_counter <- function() {\n i <- 0 \n function() {\n i <<- i + 1 # second assignment (super assignment)\n i\n }\n}\n\ncounter_one <- new_counter()\ncounter_two <- new_counter()\nc(counter_one(), counter_one(), counter_one())\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1 2 3\n```\n\n\n:::\n\n```{.r .cell-code}\nc(counter_two(), counter_two(), counter_two())\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1 2 3\n```\n\n\n:::\n:::\n\n\n\n> \"As soon as your function starts managing the state of multiple variables, it’s better to switch to R6\"\n\n## Fundamentals - Garbage collection\n\n- Because environment is attached to (enclosed by) function, temporary objects\ndon't go away.\n\n**Cleaning up** using `rm()` inside a function:\n\n::: {.cell}\n\n```{.r .cell-code}\nf_dirty <- function(n) {\n x <- runif(n)\n m <- mean(x)\n function() m\n}\n\nf_clean <- function(n) {\n x <- runif(n)\n m <- mean(x)\n rm(x) # <---- Important part!\n function() m\n}\n\nlobstr::obj_size(f_dirty(1e6))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> 8.00 MB\n```\n\n\n:::\n\n```{.r .cell-code}\nlobstr::obj_size(f_clean(1e6))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> 504 B\n```\n\n\n:::\n:::\n\n\n\n## Useful Examples - Histograms and binwidth\n\n**Useful when...** \n\n- You need to pass a function \n- You don't want to have to re-write the function every time \n (the *default* behaviour of the function should be flexible)\n\n\nFor example, these bins are not appropriate\n\n::: {.cell}\n\n```{.r .cell-code}\nsd <- c(1, 5, 15)\nn <- 100\ndf <- data.frame(x = rnorm(3 * n, sd = sd), sd = rep(sd, n))\n\nggplot(df, aes(x)) + \n geom_histogram(binwidth = 2) + \n facet_wrap(~ sd, scales = \"free_x\") + \n labs(x = NULL)\n```\n\n::: {.cell-output-display}\n{width=672}\n:::\n:::\n\n\nWe could just make a function...\n\n::: {.cell}\n\n```{.r .cell-code}\nbinwidth_bins <- function(x) (max(x) - min(x)) / 20\n\nggplot(df, aes(x = x)) + \n geom_histogram(binwidth = binwidth_bins) + \n facet_wrap(~ sd, scales = \"free_x\") + \n labs(x = NULL)\n```\n\n::: {.cell-output-display}\n{width=672}\n:::\n:::\n\n\nBut if we want to change the number of bins (20) we'd have to re-write the function\neach time.\n\nIf we use a factory, we don't have to do that.\n\n::: {.cell}\n\n```{.r .cell-code}\nbinwidth_bins <- function(n) {\n force(n)\n function(x) (max(x) - min(x)) / n\n}\n\nggplot(df, aes(x = x)) + \n geom_histogram(binwidth = binwidth_bins(20)) + \n facet_wrap(~ sd, scales = \"free_x\") + \n labs(x = NULL, title = \"20 bins\")\n```\n\n::: {.cell-output-display}\n{width=672}\n:::\n\n```{.r .cell-code}\nggplot(df, aes(x = x)) + \n geom_histogram(binwidth = binwidth_bins(5)) + \n facet_wrap(~ sd, scales = \"free_x\") + \n labs(x = NULL, title = \"5 bins\")\n```\n\n::: {.cell-output-display}\n{width=672}\n:::\n:::\n\n\n> Similar benefit in Box-cox example\n\n## Useful Examples - Wrapper\n\n**Useful when...**\n\n- You want to create a function that wraps a bunch of other functions\n\nFor example, `ggsave()` wraps a bunch of different graphics device functions: \n\n\n::: {.cell}\n\n```{.r .cell-code}\n# (Even more simplified)\nplot_dev <- function(ext, dpi = 96) {\n force(dpi)\n \n switch(\n ext,\n svg = function(filename, ...) svglite::svglite(file = filename, ...),\n png = function(...) grDevices::png(..., res = dpi, units = \"in\"),\n jpg = ,\n jpeg = function(...) grDevices::jpeg(..., res = dpi, units = \"in\"),\n stop(\"Unknown graphics extension: \", ext, call. = FALSE)\n )\n}\n```\n:::\n\n\nThen `ggsave()` uses\n\n```\nggsave <- function(...) {\n dev <- plot_dev(device, filename, dpi = dpi)\n ...\n dev(filename = filename, width = dim[1], height = dim[2], bg = bg, ...)\n ...\n}\n```\n\nOtherwise, would have to do something like like a bunch of if/else statements.\n\n## Useful Examples - Optimizing\n\n**Useful when...**\n\n- Want to pass function on to `optimise()`/`optimize()`\n- Want to perform pre-computations to speed things up\n- Want to re-use this for other datasets\n\n(*Skipping to final results from section*)\n\nHere, using MLE want to to find the most likely value of lambda for a Poisson distribution\nand this data.\n\n::: {.cell}\n\n```{.r .cell-code}\nx1 <- c(41, 30, 31, 38, 29, 24, 30, 29, 31, 38)\n```\n:::\n\n\nWe'll create a function that creates a lambda assessment function for a given \ndata set.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nll_poisson <- function(x) {\n n <- length(x)\n sum_x <- sum(x)\n c <- sum(lfactorial(x))\n\n function(lambda) {\n log(lambda) * sum_x - n * lambda - c\n }\n}\n```\n:::\n\n\nWe can use this on different data sets, but here use ours `x1`\n\n::: {.cell}\n\n```{.r .cell-code}\nll <- ll_poisson(x1)\nll(10) # Log-probility of a lambda = 10\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] -183.6405\n```\n\n\n:::\n:::\n\n\nUse `optimise()` rather than trial and error\n\n::: {.cell}\n\n```{.r .cell-code}\noptimise(ll, c(0, 100), maximum = TRUE)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> $maximum\n#> [1] 32.09999\n#> \n#> $objective\n#> [1] -30.26755\n```\n\n\n:::\n:::\n\n\nResult: Highest log-probability is -30.3, best lambda is 32.1\n\n\n## Function factories + functionals\n\nCombine functionals and function factories to turn data into many functions.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nnames <- list(\n square = 2, \n cube = 3, \n root = 1/2, \n cuberoot = 1/3, \n reciprocal = -1\n)\nfuns <- purrr::map(names, power1)\nnames(funs)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"square\" \"cube\" \"root\" \"cuberoot\" \"reciprocal\"\n```\n\n\n:::\n\n```{.r .cell-code}\nfuns$root(64)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 8\n```\n\n\n:::\n\n```{.r .cell-code}\nfuns$square(3)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 9\n```\n\n\n:::\n:::\n\n\nAvoid the prefix with\n\n- `with()` - `with(funs, root(100))`\n - Temporary, clear, short-term\n- `attach()` - `attach(funs)` / `detach(funs)`\n - Added to search path (like package function), cannot be overwritten, but can be attached multiple times!\n- `rlang::env_bind` - `env_bind(globalenv(), !!!funs)` / `env_unbind(gloablenv(), names(funs))`\n - Added to global env (like created function), can be overwritten\n\n<!--\n## EXTRA - Previous set of slides\n\nGraphical factories **useful function factories**, such as:\n\n1. Labelling with:\n\n * formatter functions\n \n\n::: {.cell}\n\n```{.r .cell-code}\ny <- c(12345, 123456, 1234567)\ncomma_format()(y)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"12,345\" \"123,456\" \"1,234,567\"\n```\n\n\n:::\n:::\n\n\n::: {.cell}\n\n```{.r .cell-code}\nnumber_format(scale = 1e-3, suffix = \" K\")(y)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"12 K\" \"123 K\" \"1 235 K\"\n```\n\n\n:::\n:::\n\nThey are more commonly used inside a ggplot:\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\na_ggplot_object + \n scale_y_continuous(\n labels = comma_format()\n)\n```\n\n::: {.cell-output-display}\n{width=672}\n:::\n:::\n\n\n2. Using binwidth in facet histograms\n\n * binwidth_bins\n \n\n::: {.cell}\n\n```{.r .cell-code}\nbinwidth_bins <- function(n) {\n force(n)\n \n function(x) {\n (max(x) - min(x)) / n\n }\n}\n```\n:::\n\n \nOr use a concatenation of this typr of detecting number of bins functions:\n\n - nclass.Sturges()\n - nclass.scott()\n - nclass.FD()\n \n\n::: {.cell}\n\n```{.r .cell-code}\nbase_bins <- function(type) {\n fun <- switch(type,\n Sturges = nclass.Sturges,\n scott = nclass.scott,\n FD = nclass.FD,\n stop(\"Unknown type\", call. = FALSE)\n )\n \n function(x) {\n (max(x) - min(x)) / fun(x)\n }\n}\n```\n:::\n\n \n\n3. Internals:\n\n * ggplot2:::plot_dev()\n\n\n## Non-obvious combinations\n\n\n- The **Box-Cox** transformation.\n- **Bootstrap** resampling.\n- **Maximum likelihood** estimation.\n\n\n### Statistical factories\n\nThe **Box-Cox** transformation towards normality:\n\n::: {.cell}\n\n```{.r .cell-code}\nboxcox1 <- function(x, lambda) {\n stopifnot(length(lambda) == 1)\n \n if (lambda == 0) {\n log(x)\n } else {\n (x ^ lambda - 1) / lambda\n }\n}\n```\n:::\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nboxcox2 <- function(lambda) {\n if (lambda == 0) {\n function(x) log(x)\n } else {\n function(x) (x ^ lambda - 1) / lambda\n }\n}\n\nstat_boxcox <- function(lambda) {\n stat_function(aes(colour = lambda), fun = boxcox2(lambda), size = 1)\n}\n\nplot1 <- ggplot(data.frame(x = c(0, 5)), aes(x)) + \n lapply(c(0.5, 1, 1.5), stat_boxcox) + \n scale_colour_viridis_c(limits = c(0, 1.5))\n```\n\n::: {.cell-output .cell-output-stderr}\n\n```\n#> Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.\n#> ℹ Please use `linewidth` instead.\n```\n\n\n:::\n\n```{.r .cell-code}\n# visually, log() does seem to make sense as the transformation\n# for lambda = 0; as values get smaller and smaller, the function\n# gets close and closer to a log transformation\nplot2 <- ggplot(data.frame(x = c(0.01, 1)), aes(x)) + \n lapply(c(0.5, 0.25, 0.1, 0), stat_boxcox) + \n scale_colour_viridis_c(limits = c(0, 1.5))\nlibrary(patchwork)\nplot1+plot2\n```\n\n::: {.cell-output-display}\n{width=672}\n:::\n:::\n\n\n**Bootstrap generators**\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nboot_permute <- function(df, var) {\n n <- nrow(df)\n force(var)\n \n function() {\n col <- df[[var]]\n col[sample(n, replace = TRUE)]\n }\n}\n\nboot_mtcars1 <- boot_permute(mtcars, \"mpg\")\nhead(boot_mtcars1())\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 21.5 19.2 16.4 10.4 22.8 24.4\n```\n\n\n:::\n\n```{.r .cell-code}\n#> [1] 16.4 22.8 22.8 22.8 16.4 19.2\nhead(boot_mtcars1())\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 32.4 22.8 19.2 10.4 22.8 18.1\n```\n\n\n:::\n\n```{.r .cell-code}\n#> [1] 17.8 18.7 30.4 30.4 16.4 21.0\n```\n:::\n\n\n::: {.cell}\n\n```{.r .cell-code}\nboot_model <- function(df, formula) {\n mod <- lm(formula, data = df)\n fitted <- unname(fitted(mod))\n resid <- unname(resid(mod))\n rm(mod)\n\n function() {\n fitted + sample(resid)\n }\n} \n\nboot_mtcars2 <- boot_model(mtcars, mpg ~ wt)\nhead(boot_mtcars2())\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 24.48367 22.27620 26.63215 20.96952 25.77286 16.51064\n```\n\n\n:::\n\n```{.r .cell-code}\n#> [1] 25.0 24.0 21.7 19.2 24.9 16.0\nhead(boot_mtcars2())\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 29.70459 23.66597 25.75283 26.08372 25.77286 14.88789\n```\n\n\n:::\n\n```{.r .cell-code}\n#> [1] 27.4 21.0 20.3 19.4 16.3 21.3\n```\n:::\n\n\n**Maximum likelihood estimation**\n\n$$P(\\lambda,x)=\\prod_{i=1}^{n}\\frac{\\lambda^{x_i}e^{-\\lambda}}{x_i!}$$\n\n::: {.cell}\n\n```{.r .cell-code}\nlprob_poisson <- function(lambda, x) {\n n <- length(x)\n (log(lambda) * sum(x)) - (n * lambda) - sum(lfactorial(x))\n}\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx1 <- c(41, 30, 31, 38, 29, 24, 30, 29, 31, 38)\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlprob_poisson(10, x1)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] -183.6405\n```\n\n\n:::\n\n```{.r .cell-code}\n#> [1] -184\nlprob_poisson(20, x1)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] -61.14028\n```\n\n\n:::\n\n```{.r .cell-code}\n#> [1] -61.1\nlprob_poisson(30, x1)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] -30.98598\n```\n\n\n:::\n\n```{.r .cell-code}\n#> [1] -31\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nll_poisson1 <- function(x) {\n n <- length(x)\n\n function(lambda) {\n log(lambda) * sum(x) - n * lambda - sum(lfactorial(x))\n }\n}\n```\n:::\n\n\n::: {.cell}\n\n```{.r .cell-code}\nll_poisson2 <- function(x) {\n n <- length(x)\n sum_x <- sum(x)\n c <- sum(lfactorial(x))\n\n function(lambda) {\n log(lambda) * sum_x - n * lambda - c\n }\n}\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nll1 <- ll_poisson2(x1)\n\nll1(10)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] -183.6405\n```\n\n\n:::\n\n```{.r .cell-code}\n#> [1] -184\nll1(20)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] -61.14028\n```\n\n\n:::\n\n```{.r .cell-code}\n#> [1] -61.1\nll1(30)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] -30.98598\n```\n\n\n:::\n\n```{.r .cell-code}\n#> [1] -31\n```\n:::\n\n\n::: {.cell}\n\n```{.r .cell-code}\noptimise(ll1, c(0, 100), maximum = TRUE)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> $maximum\n#> [1] 32.09999\n#> \n#> $objective\n#> [1] -30.26755\n```\n\n\n:::\n\n```{.r .cell-code}\n#> $maximum\n#> [1] 32.1\n#> \n#> $objective\n#> [1] -30.3\n```\n:::\n\n\n::: {.cell}\n\n```{.r .cell-code}\noptimise(lprob_poisson, c(0, 100), x = x1, maximum = TRUE)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> $maximum\n#> [1] 32.09999\n#> \n#> $objective\n#> [1] -30.26755\n```\n\n\n:::\n\n```{.r .cell-code}\n#> $maximum\n#> [1] 32.1\n#> \n#> $objective\n#> [1] -30.3\n```\n:::\n\n\n## Function factory applications\n\n\nCombine functionals and function factories to turn data into many functions.\n\n### Function factories + functionals\n\n::: {.cell}\n\n```{.r .cell-code}\nnames <- list(\n square = 2, \n cube = 3, \n root = 1/2, \n cuberoot = 1/3, \n reciprocal = -1\n)\nfuns <- purrr::map(names, power1)\n\nfuns$root(64)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 8\n```\n\n\n:::\n\n```{.r .cell-code}\n#> [1] 8\nfuns$root\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> function (x) \n#> x^exp\n#> <bytecode: 0x0000029fbed27b60>\n#> <environment: 0x0000029fbf4bb9a0>\n```\n\n\n:::\n\n```{.r .cell-code}\n#> function(x) {\n#> x ^ exp\n#> }\n#> <bytecode: 0x7fe85512a410>\n#> <environment: 0x7fe85b21f190>\n```\n:::\n\n\n::: {.cell}\n\n```{.r .cell-code}\nwith(funs, root(100))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 10\n```\n\n\n:::\n\n```{.r .cell-code}\n#> [1] 10\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nattach(funs)\n```\n\n::: {.cell-output .cell-output-stderr}\n\n```\n#> The following objects are masked _by_ .GlobalEnv:\n#> \n#> cube, square\n```\n\n\n:::\n\n```{.r .cell-code}\n#> The following objects are masked _by_ .GlobalEnv:\n#> \n#> cube, square\nroot(100)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 10\n```\n\n\n:::\n\n```{.r .cell-code}\n#> [1] 10\ndetach(funs)\n```\n:::\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nrlang::env_bind(globalenv(), !!!funs)\nroot(100)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 10\n```\n\n\n:::\n\n```{.r .cell-code}\n#> [1] 10\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nrlang::env_unbind(globalenv(), names(funs))\n```\n:::\n\n\n\n-->\n", "supporting": [ "10_files" ], diff --git a/_freeze/slides/10/figure-html/10-22-1.png b/_freeze/slides/10/figure-html/10-22-1.png Binary files differ. diff --git a/_freeze/slides/10/figure-html/unnamed-chunk-18-1.png b/_freeze/slides/10/figure-html/unnamed-chunk-18-1.png Binary files differ. diff --git a/_freeze/slides/10/figure-html/unnamed-chunk-6-1.png b/_freeze/slides/10/figure-html/unnamed-chunk-6-1.png Binary files differ. diff --git a/_freeze/slides/10/figure-html/unnamed-chunk-7-1.png b/_freeze/slides/10/figure-html/unnamed-chunk-7-1.png Binary files differ. diff --git a/_freeze/slides/10/figure-html/unnamed-chunk-8-1.png b/_freeze/slides/10/figure-html/unnamed-chunk-8-1.png Binary files differ. diff --git a/_freeze/slides/10/figure-html/unnamed-chunk-8-2.png b/_freeze/slides/10/figure-html/unnamed-chunk-8-2.png Binary files differ. diff --git a/_freeze/slides/11/execute-results/html.json b/_freeze/slides/11/execute-results/html.json @@ -1,9 +1,11 @@ { - "hash": "eab6be481d2e45b9f6991b49fe05991c", + "hash": "8b328be22c5a8793824ab48fed89ce11", "result": { "engine": "knitr", - "markdown": "---\nengine: knitr\ntitle: Function operators\n---\n\n## Learning objectives:\n\n- Define function operator\n- Explore some existing function operators\n- Make our own function operator\n\n\n\n## Introduction \n\n<!--- ```{r 10-01, fig.align='center',fig.dim=\"50%\", fig.cap=\"Credits: 2001-2003 Michael P.Frank (https://slideplayer.com/slide/17666226/)\",echo=FALSE}\n\nknitr::include_graphics(\"images/11-maths_example.png\")\n``` --->\n\n- A **function operator** is a function that takes one (or more) functions as input and returns a function as output.\n\n- Function operators are a special case of **function factories**, since they return functions.\n\n- They are often used to wrap an existing function to provide additional capability, similar to python's **decorators**. \n\n\n::: {.cell}\n\n```{.r .cell-code}\nchatty <- function(f) {\n force(f)\n \n function(x, ...) {\n res <- f(x, ...)\n cat(\"Processing \", x, \"\\n\", sep = \"\")\n res\n }\n}\n\nf <- function(x) x ^ 2\ns <- c(3, 2, 1)\n\npurrr::map_dbl(s, chatty(f))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> Processing 3\n#> Processing 2\n#> Processing 1\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 9 4 1\n```\n\n\n:::\n:::\n\n\n## Existing function operators \n\nTwo function operator examples are `purrr:safely()` and `memoise::memoise()`. These can be found in `purr` and `memoise`:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(purrr)\nlibrary(memoise)\n```\n:::\n\n\n## purrr::safely {-}\n\nCapturing Errors: turns errors into data!\n\n \n\n::: {.cell}\n\n```{.r .cell-code}\nx <- list(\n c(0.512, 0.165, 0.717),\n c(0.064, 0.781, 0.427),\n c(0.890, 0.785, 0.495),\n \"oops\"\n)\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmap_dbl(x, sum)\n#> Error in .Primitive(\"sum\")(..., na.rm = na.rm): invalid 'type' (character) of\n#> argument\n```\n:::\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# note use of map (not map_dbl), safely returns a lisst\n\nout <- map(x, safely(sum))\nstr(transpose(out))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> List of 2\n#> $ result:List of 4\n#> ..$ : num 1.39\n#> ..$ : num 1.27\n#> ..$ : num 2.17\n#> ..$ : NULL\n#> $ error :List of 4\n#> ..$ : NULL\n#> ..$ : NULL\n#> ..$ : NULL\n#> ..$ :List of 2\n#> .. ..$ message: chr \"invalid 'type' (character) of argument\"\n#> .. ..$ call : language .Primitive(\"sum\")(..., na.rm = na.rm)\n#> .. ..- attr(*, \"class\")= chr [1:3] \"simpleError\" \"error\" \"condition\"\n```\n\n\n:::\n:::\n\n \n\n\n## Other `purrr` function operators {-}\n\n<!---\n\n::: {.cell layout-align=\"center\"}\n::: {.cell-output-display}\n{fig-align='center' width=350}\n:::\n:::\n\n--->\n\n> purrr comes with three other function operators in a similar vein:\n\n \n possibly(): returns a default value when there’s an error. It provides no way to tell if an error occured or not, so it’s best reserved for cases when there’s some obvious sentinel value (like NA).\n\n quietly(): turns output, messages, and warning side-effects into output, message, and warning components of the output.\n\n auto_browser(): automatically executes browser() inside the function when there’s an error.\n\n\n## memoise::memoise {-}\n\nCaching computations: avoid repeated computations!\n \n\n\n::: {.cell}\n\n```{.r .cell-code}\nslow_function <- function(x) {\n Sys.sleep(1)\n x * 10 * runif(1)\n}\nsystem.time(print(slow_function(1)))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 5.929748\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> user system elapsed \n#> 0.00 0.00 1.01\n```\n\n\n:::\n\n```{.r .cell-code}\nsystem.time(print(slow_function(1)))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 8.725902\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> user system elapsed \n#> 0.00 0.00 1.01\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nfast_function <- memoise::memoise(slow_function)\nsystem.time(print(fast_function(1)))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 6.995218\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> user system elapsed \n#> 0 0 1\n```\n\n\n:::\n\n```{.r .cell-code}\nsystem.time(print(fast_function(1)))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 6.995218\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> user system elapsed \n#> 0.00 0.00 0.01\n```\n\n\n:::\n:::\n\n\n> Be careful about memoising impure functions! \n\n## Exercise {-}\n\nHow does `safely()` work? \nThe source code looks like this:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsafely\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> function (.f, otherwise = NULL, quiet = TRUE) \n#> {\n#> .f <- as_mapper(.f)\n#> force(otherwise)\n#> check_bool(quiet)\n#> function(...) capture_error(.f(...), otherwise, quiet)\n#> }\n#> <bytecode: 0x0000023950891460>\n#> <environment: namespace:purrr>\n```\n\n\n:::\n:::\n\n\nThe real work is done in `capture_error` which is defined in the package **namespace**. We can access it with the `:::` operator. (Could also grab it from the function's environment.)\n\n\n::: {.cell}\n\n```{.r .cell-code}\npurrr:::capture_error\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> function (code, otherwise = NULL, quiet = TRUE) \n#> {\n#> tryCatch(list(result = code, error = NULL), error = function(e) {\n#> if (!quiet) {\n#> message(\"Error: \", conditionMessage(e))\n#> }\n#> list(result = otherwise, error = e)\n#> })\n#> }\n#> <bytecode: 0x0000023950926580>\n#> <environment: namespace:purrr>\n```\n\n\n:::\n:::\n\n\n## Case study: make your own function operator\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nurls <- c(\n \"adv-r\" = \"https://adv-r.hadley.nz\", \n \"r4ds\" = \"http://r4ds.had.co.nz/\"\n # and many many more\n)\npath <- paste(tempdir(), names(urls), \".html\")\n\nwalk2(urls, path, download.file, quiet = TRUE)\n```\n:::\n\n\n\nHere we make a function operator that add a little delay in reading each page:\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndelay_by <- function(f, amount) {\n force(f)\n force(amount)\n \n function(...) {\n Sys.sleep(amount)\n f(...)\n }\n}\nsystem.time(runif(100))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> user system elapsed \n#> 0 0 0\n```\n\n\n:::\n\n```{.r .cell-code}\nsystem.time(delay_by(runif, 0.1)(100))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> user system elapsed \n#> 0.00 0.00 0.11\n```\n\n\n:::\n:::\n\n\n\nAnd another to add a dot after nth invocation:\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndot_every <- function(f, n) {\n force(f)\n force(n)\n \n i <- 0\n function(...) {\n i <<- i + 1\n if (i %% n == 0) cat(\".\")\n f(...)\n }\n}\n\nwalk(1:100, dot_every(runif, 10))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> ..........\n```\n\n\n:::\n:::\n\n\nCan now use both of these function operators to express our desired result:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nwalk2(\n urls, path, \n download.file %>% dot_every(10) %>% delay_by(0.1), \n quiet = TRUE\n)\n```\n:::\n\n\n## Exercise {-}\n\n2) Should you memoise file.download? Why or why not?\n\n \n\n## Meeting Videos\n\n### Cohort 1\n\n<iframe src=\"https://www.youtube.com/embed/zzUY03gt_pA\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 2\n\n<iframe src=\"https://www.youtube.com/embed/fD1QJB2pHik\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 3\n\n<iframe src=\"https://www.youtube.com/embed/Re6y5CQzwG4\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 4\n\n<iframe src=\"https://www.youtube.com/embed/rVooJFdbePs\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 5\n\n<iframe src=\"https://www.youtube.com/embed/XOurCfeJLGc\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 6\n\n<iframe src=\"https://www.youtube.com/embed/EPs57es2MsE\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n<details>\n<summary> Meeting chat log </summary>\n\n```\n00:01:42\tOluwafemi Oyedele:\tHi, Good Evening !!!\n00:05:52\tArthur Shaw:\t@federica, love the hex stickers behind you. All from rstudio::conf?\n00:07:12\tArthur Shaw:\tI tried doing the same. I had a hard time ordering them. I also thought I'd make the stickers into magnets so that I could rearrange them in future.\n00:48:34\tOluwafemi Oyedele:\tThank you !!!\n```\n</details>\n\n### Cohort 7\n\n<iframe src=\"https://www.youtube.com/embed/WDehjcuc7xs\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n<details>\n\n<summary>Meeting chat log</summary>\n```\n00:18:21\tcollinberke: Jenny Bryan debugging: https://www.youtube.com/watch?v=vgYS-F8opgE\n00:31:10\tcollinberke: https://purrr.tidyverse.org/reference/slowly.html\n00:47:43\tRobert Hilly: By guys!\n```\n</details>\n", - "supporting": [], + "markdown": "---\nengine: knitr\ntitle: Function operators\n---\n\n## Learning objectives:\n\n- Define function operator\n- Explore some existing function operators\n- Make our own function operator\n\n\n\n## Introduction \n\n<!--- ```{r 10-01, fig.align='center',fig.dim=\"50%\", fig.cap=\"Credits: 2001-2003 Michael P.Frank (https://slideplayer.com/slide/17666226/)\",echo=FALSE}\n\nknitr::include_graphics(\"images/11-maths_example.png\")\n``` --->\n\n- A **function operator** is a function that takes one (or more) functions as input and returns a function as output.\n\n- Function operators are a special case of **function factories**, since they return functions.\n\n- They are often used to wrap an existing function to provide additional capability, similar to python's **decorators**. \n\n\n::: {.cell}\n\n```{.r .cell-code}\nchatty <- function(f) {\n force(f)\n \n function(x, ...) {\n res <- f(x, ...)\n cat(\"Processing \", x, \"\\n\", sep = \"\")\n res\n }\n}\n\nf <- function(x) x ^ 2\ns <- c(3, 2, 1)\n\npurrr::map_dbl(s, chatty(f))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> Processing 3\n#> Processing 2\n#> Processing 1\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 9 4 1\n```\n\n\n:::\n:::\n\n\n## Existing function operators \n\nTwo function operator examples are `purrr:safely()` and `memoise::memoise()`. These can be found in `purr` and `memoise`:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(purrr)\nlibrary(memoise)\n```\n:::\n\n\n## purrr::safely {-}\n\nCapturing Errors: turns errors into data!\n\n \n\n::: {.cell}\n\n```{.r .cell-code}\nx <- list(\n c(0.512, 0.165, 0.717),\n c(0.064, 0.781, 0.427),\n c(0.890, 0.785, 0.495),\n \"oops\"\n)\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmap_dbl(x, sum)\n#> Error in .Primitive(\"sum\")(..., na.rm = na.rm): invalid 'type' (character) of\n#> argument\n```\n:::\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# note use of map (not map_dbl), safely returns a lisst\n\nout <- map(x, safely(sum))\nstr(transpose(out))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> List of 2\n#> $ result:List of 4\n#> ..$ : num 1.39\n#> ..$ : num 1.27\n#> ..$ : num 2.17\n#> ..$ : NULL\n#> $ error :List of 4\n#> ..$ : NULL\n#> ..$ : NULL\n#> ..$ : NULL\n#> ..$ :List of 2\n#> .. ..$ message: chr \"invalid 'type' (character) of argument\"\n#> .. ..$ call : language .Primitive(\"sum\")(..., na.rm = na.rm)\n#> .. ..- attr(*, \"class\")= chr [1:3] \"simpleError\" \"error\" \"condition\"\n```\n\n\n:::\n:::\n\n \n\n\n## Other `purrr` function operators {-}\n\n<!---\n\n::: {.cell layout-align=\"center\"}\n::: {.cell-output-display}\n{fig-align='center' width=350}\n:::\n:::\n\n--->\n\n> purrr comes with three other function operators in a similar vein:\n\n \n possibly(): returns a default value when there’s an error. It provides no way to tell if an error occured or not, so it’s best reserved for cases when there’s some obvious sentinel value (like NA).\n\n quietly(): turns output, messages, and warning side-effects into output, message, and warning components of the output.\n\n auto_browser(): automatically executes browser() inside the function when there’s an error.\n\n\n## memoise::memoise {-}\n\nCaching computations: avoid repeated computations!\n \n\n\n::: {.cell}\n\n```{.r .cell-code}\nslow_function <- function(x) {\n Sys.sleep(1)\n x * 10 * runif(1)\n}\nsystem.time(print(slow_function(1)))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 3.054764\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> user system elapsed \n#> 0.02 0.00 1.02\n```\n\n\n:::\n\n```{.r .cell-code}\nsystem.time(print(slow_function(1)))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 3.644634\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> user system elapsed \n#> 0.00 0.00 1.03\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nfast_function <- memoise::memoise(slow_function)\nsystem.time(print(fast_function(1)))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 7.451108\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> user system elapsed \n#> 0.00 0.00 1.01\n```\n\n\n:::\n\n```{.r .cell-code}\nsystem.time(print(fast_function(1)))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 7.451108\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> user system elapsed \n#> 0.00 0.00 0.01\n```\n\n\n:::\n:::\n\n\n> Be careful about memoising impure functions! \n\n## Exercise {-}\n\nHow does `safely()` work? \nThe source code looks like this:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsafely\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> function (.f, otherwise = NULL, quiet = TRUE) \n#> {\n#> .f <- as_mapper(.f)\n#> force(otherwise)\n#> check_bool(quiet)\n#> function(...) capture_error(.f(...), otherwise, quiet)\n#> }\n#> <bytecode: 0x00000183867aacf0>\n#> <environment: namespace:purrr>\n```\n\n\n:::\n:::\n\n\nThe real work is done in `capture_error` which is defined in the package **namespace**. We can access it with the `:::` operator. (Could also grab it from the function's environment.)\n\n\n::: {.cell}\n\n```{.r .cell-code}\npurrr:::capture_error\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> function (code, otherwise = NULL, quiet = TRUE) \n#> {\n#> tryCatch(list(result = code, error = NULL), error = function(e) {\n#> if (!quiet) {\n#> message(\"Error: \", conditionMessage(e))\n#> }\n#> list(result = otherwise, error = e)\n#> })\n#> }\n#> <bytecode: 0x0000018386817b60>\n#> <environment: namespace:purrr>\n```\n\n\n:::\n:::\n\n\n## Case study: make your own function operator\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nurls <- c(\n \"adv-r\" = \"https://adv-r.hadley.nz\", \n \"r4ds\" = \"http://r4ds.had.co.nz/\"\n # and many many more\n)\npath <- paste(tempdir(), names(urls), \".html\")\n\nwalk2(urls, path, download.file, quiet = TRUE)\n```\n:::\n\n\n\nHere we make a function operator that add a little delay in reading each page:\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndelay_by <- function(f, amount) {\n force(f)\n force(amount)\n \n function(...) {\n Sys.sleep(amount)\n f(...)\n }\n}\nsystem.time(runif(100))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> user system elapsed \n#> 0 0 0\n```\n\n\n:::\n\n```{.r .cell-code}\nsystem.time(delay_by(runif, 0.1)(100))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> user system elapsed \n#> 0.00 0.00 0.11\n```\n\n\n:::\n:::\n\n\n\nAnd another to add a dot after nth invocation:\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndot_every <- function(f, n) {\n force(f)\n force(n)\n \n i <- 0\n function(...) {\n i <<- i + 1\n if (i %% n == 0) cat(\".\")\n f(...)\n }\n}\n\nwalk(1:100, dot_every(runif, 10))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> ..........\n```\n\n\n:::\n:::\n\n\nCan now use both of these function operators to express our desired result:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nwalk2(\n urls, path, \n download.file %>% dot_every(10) %>% delay_by(0.1), \n quiet = TRUE\n)\n```\n:::\n\n\n## Exercise {-}\n\n2) Should you memoise file.download? Why or why not?\n", + "supporting": [ + "11_files" + ], "filters": [ "rmarkdown/pagebreak.lua" ], diff --git a/_freeze/slides/12/execute-results/html.json b/_freeze/slides/12/execute-results/html.json @@ -1,9 +1,11 @@ { - "hash": "9a73a284af57bc1c1ac271995bcd1308", + "hash": "a94a54d500276d02c4752c2293baa2b5", "result": { "engine": "knitr", - "markdown": "---\nengine: knitr\ntitle: Base types\n---\n\n## Learning objectives:\n\n- Understand what OOP means--at the very least for R\n- Know how to discern an object's nature--base or OO--and type\n\n\n\n<details>\n<summary>Session Info</summary>\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(\"DiagrammeR\")\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nutils::sessionInfo()\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> R version 4.5.1 (2025-06-13 ucrt)\n#> Platform: x86_64-w64-mingw32/x64\n#> Running under: Windows 11 x64 (build 26100)\n#> \n#> Matrix products: default\n#> LAPACK version 3.12.1\n#> \n#> locale:\n#> [1] LC_COLLATE=English_United States.utf8 \n#> [2] LC_CTYPE=English_United States.utf8 \n#> [3] LC_MONETARY=English_United States.utf8\n#> [4] LC_NUMERIC=C \n#> [5] LC_TIME=English_United States.utf8 \n#> \n#> time zone: America/Chicago\n#> tzcode source: internal\n#> \n#> attached base packages:\n#> [1] stats graphics grDevices utils datasets methods base \n#> \n#> other attached packages:\n#> [1] DiagrammeR_1.0.11\n#> \n#> loaded via a namespace (and not attached):\n#> [1] digest_0.6.37 RColorBrewer_1.1-3 R6_2.6.1 fastmap_1.2.0 \n#> [5] xfun_0.52 magrittr_2.0.3 glue_1.8.0 knitr_1.50 \n#> [9] htmltools_0.5.8.1 rmarkdown_2.29 cli_3.6.5 visNetwork_2.1.2 \n#> [13] compiler_4.5.1 tools_4.5.1 evaluate_1.0.4 yaml_2.3.10 \n#> [17] rlang_1.1.6 jsonlite_2.0.0 htmlwidgets_1.6.4 keyring_1.4.1\n```\n\n\n:::\n:::\n\n\n</details>\n\n\n## Why OOP is hard in R\n\n- Multiple OOP systems exist: S3, R6, S4, and (now/soon) S7.\n- Multiple preferences: some users prefer one system; others, another.\n- R's OOP systems are different enough that prior OOP experience may not transfer well.\n\n[](https://xkcd.com/927/)\n\n\n## OOP: Big Ideas\n\n1. **Polymorphism.** Function has a single interface (outside), but contains (inside) several class-specific implementations.\n\n::: {.cell}\n\n```{.r .cell-code}\n# imagine a function with object x as an argument\n# from the outside, users interact with the same function\n# but inside the function, there are provisions to deal with objects of different classes\nsome_function <- function(x) {\n if is.numeric(x) {\n # implementation for numeric x\n } else if is.character(x) {\n # implementation for character x\n } ...\n}\n```\n:::\n\n\n<details>\n<summary>Example of polymorphism</summary>\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# data frame\nsummary(mtcars[,1:4])\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> mpg cyl disp hp \n#> Min. :10.40 Min. :4.000 Min. : 71.1 Min. : 52.0 \n#> 1st Qu.:15.43 1st Qu.:4.000 1st Qu.:120.8 1st Qu.: 96.5 \n#> Median :19.20 Median :6.000 Median :196.3 Median :123.0 \n#> Mean :20.09 Mean :6.188 Mean :230.7 Mean :146.7 \n#> 3rd Qu.:22.80 3rd Qu.:8.000 3rd Qu.:326.0 3rd Qu.:180.0 \n#> Max. :33.90 Max. :8.000 Max. :472.0 Max. :335.0\n```\n\n\n:::\n\n```{.r .cell-code}\n# statistical model\nlin_fit <- lm(mpg ~ hp, data = mtcars)\nsummary(lin_fit)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> \n#> Call:\n#> lm(formula = mpg ~ hp, data = mtcars)\n#> \n#> Residuals:\n#> Min 1Q Median 3Q Max \n#> -5.7121 -2.1122 -0.8854 1.5819 8.2360 \n#> \n#> Coefficients:\n#> Estimate Std. Error t value Pr(>|t|) \n#> (Intercept) 30.09886 1.63392 18.421 < 2e-16 ***\n#> hp -0.06823 0.01012 -6.742 1.79e-07 ***\n#> ---\n#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n#> \n#> Residual standard error: 3.863 on 30 degrees of freedom\n#> Multiple R-squared: 0.6024,\tAdjusted R-squared: 0.5892 \n#> F-statistic: 45.46 on 1 and 30 DF, p-value: 1.788e-07\n```\n\n\n:::\n:::\n\n\n</details>\n\n2. **Encapsulation.** Function \"encapsulates\"--that is, encloses in an inviolate capsule--both data and how it acts on data. Think of a REST API: a client interacts with with an API only through a set of discrete endpoints (i.e., things to get or set), but the server does not otherwise give access to its internal workings or state. Like with an API, this creates a separation of concerns: OOP functions take inputs and yield results; users only consume those results.\n\n## OOP: Properties\n\n### Objects have class\n\n- Class defines:\n - Method (i.e., what can be done with object)\n - Fields (i.e., data that defines an instance of the class)\n- Objects are an instance of a class\n\n### Class is inherited\n\n- Class is defined:\n - By an object's class (e.g., ordered factor)\n - By the parent of the object's class (e.g., factor)\n- Inheritance matters for method dispatch\n - If a method is defined for an object's class, use that method\n - If an object doesn't have a method, use the method of the parent class\n - The process of finding a method, is called dispatch\n\n## OOP in R: Two Paradigms\n\n**1. Encapsulated OOP**\n\n- Objects \"encapsulate\"\n - Methods (i.e., what can be done)\n - Fields (i.e., data on which things are done)\n- Calls communicate this encapsulation, since form follows function\n - Form: `object.method(arg1, arg2)`\n - Function: for `object`, apply `method` for `object`'s class with arguments `arg1` and `arg2`\n\n**2. Functional OOP**\n\n- Methods belong to \"generic\" functions\n- From the outside, look like regular functions: `generic(object, arg2, arg3)`\n- From the inside, components are also functions\n\n### Concept Map\n\n\n::: {.cell}\n::: {.cell-output-display}\n\n```{=html}\n<div class=\"DiagrammeR html-widget html-fill-item\" id=\"htmlwidget-d01e44d26b063c9aa5cc\" style=\"width:100%;height:464px;\"></div>\n<script type=\"application/json\" data-for=\"htmlwidget-d01e44d26b063c9aa5cc\">{\"x\":{\"diagram\":\"\\ngraph LR\\n\\nOOP --> encapsulated_OOP\\nOOP --> functional_OOP\\n\\nfunctional_OOP --> S3\\nfunctional_OOP --> S4\\n\\nencapsulated_OOP --> R6\\nencapsulated_OOP --> RC\\n\"},\"evals\":[],\"jsHooks\":[]}</script>\n```\n\n:::\n:::\n\n\n<details>\n<summary>Mermaid code</summary>\n\n::: {.cell}\n\n```{.r .cell-code}\nDiagrammeR::mermaid(\"\ngraph LR\n\nOOP --> encapsulated_OOP\nOOP --> functional_OOP\n\nfunctional_OOP --> S3\nfunctional_OOP --> S4\n\nencapsulated_OOP --> R6\nencapsulated_OOP --> RC\n\")\n```\n:::\n\n</details>\n\n## OOP in base R\n\n- **S3**\n - Paradigm: functional OOP\n - Noteworthy: R's first OOP system\n - Use case: low-cost solution for common problems\n - Downsides: no guarantees\n- **S4**\n - Paradigm: functional OOP\n - Noteworthy: rewrite of S3, used by `Bioconductor`\n - Use case: \"more guarantees and greater encapsulation\" than S3\n - Downsides: higher setup cost than S3\n- **RC**\n - Paradigm: encapsulated OOP\n - Noteworthy: special type of S4 object is mutable--in other words, that can be modified in place (instead of R's usual copy-on-modify behavior)\n - Use cases: problems that are hard to tackle with functional OOP (in S3 and S4)\n - Downsides: harder to reason about (because of modify-in-place logic)\n\n## OOP in packages\n\n- **R6**\n - Paradigm: encapsulated OOP\n - Noteworthy: resolves issues with RC\n- **R7**\n - Paradigm: functional OOP\n - Noteworthy: \n - best parts of S3 and S4\n - ease of S3\n - power of S4\n - See more in [rstudio::conf(2022) talk](https://www.rstudio.com/conference/2022/talks/introduction-to-r7/)\n- **R.oo**\n - Paradigm: hybrid functional and encapsulated (?)\n- **proto**\n - Paradigm: prototype OOP\n - Noteworthy: OOP style used in `ggplot2`\n\n## How can you tell if an object is base or OOP?\n\n### Functions\n\nTwo functions:\n\n- `base::is.object()`, which yields TRUE/FALSE about whether is OOP object\n- `sloop::otype()`, which says what type of object type: `\"base\"`, `\"S3\"`, etc.\n\nAn few examples:\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Example 1: a base object\nis.object(1:10)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] FALSE\n```\n\n\n:::\n\n```{.r .cell-code}\nsloop::otype(1:10)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"base\"\n```\n\n\n:::\n\n```{.r .cell-code}\n# Example 2: an OO object\nis.object(mtcars)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] TRUE\n```\n\n\n:::\n\n```{.r .cell-code}\nsloop::otype(mtcars)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"S3\"\n```\n\n\n:::\n:::\n\n\n### sloop\n\n* **S** **L**anguage **O**bject-**O**riented **P**rogramming\n\n[](https://en.wikipedia.org/wiki/Sloop_John_B)\n\n### Class\n\nOO objects have a \"class\" attribute:\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# base object has no class\nattr(1:10, \"class\")\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> NULL\n```\n\n\n:::\n\n```{.r .cell-code}\n# OO object has one or more classes\nattr(mtcars, \"class\")\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"data.frame\"\n```\n\n\n:::\n:::\n\n\n## What about types?\n\nOnly OO objects have a \"class\" attribute, but every object--whether base or OO--has class\n\n### Vectors\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntypeof(NULL)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"NULL\"\n```\n\n\n:::\n\n```{.r .cell-code}\ntypeof(c(\"a\", \"b\", \"c\"))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"character\"\n```\n\n\n:::\n\n```{.r .cell-code}\ntypeof(1L)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"integer\"\n```\n\n\n:::\n\n```{.r .cell-code}\ntypeof(1i)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"complex\"\n```\n\n\n:::\n:::\n\n\n\n### Functions\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# \"normal\" function\nmy_fun <- function(x) { x + 1 }\ntypeof(my_fun)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"closure\"\n```\n\n\n:::\n\n```{.r .cell-code}\n# internal function\ntypeof(`[`)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"special\"\n```\n\n\n:::\n\n```{.r .cell-code}\n# primitive function\ntypeof(sum) \n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"builtin\"\n```\n\n\n:::\n:::\n\n\n### Environments\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntypeof(globalenv())\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"environment\"\n```\n\n\n:::\n:::\n\n\n\n### S4\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmle_obj <- stats4::mle(function(x = 1) (x - 2) ^ 2)\ntypeof(mle_obj)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"S4\"\n```\n\n\n:::\n:::\n\n\n\n### Language components\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntypeof(quote(a))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"symbol\"\n```\n\n\n:::\n\n```{.r .cell-code}\ntypeof(quote(a + 1))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"language\"\n```\n\n\n:::\n\n```{.r .cell-code}\ntypeof(formals(my_fun))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"pairlist\"\n```\n\n\n:::\n:::\n\n\n### Concept Map\n\n\n\n<details>\n<summary>Sankey graph code</summary>\n\nThe graph above was made with [SankeyMATIC](https://sankeymatic.com/)\n\n```\n// toggle \"Show Values\"\n// set Default Flow Colors from \"each flow's Source\"\n\nbase\\ntypes [8] vectors\nbase\\ntypes [3] functions\nbase\\ntypes [1] environments\nbase\\ntypes [1] S4 OOP\nbase\\ntypes [3] language\\ncomponents\nbase\\ntypes [6] C components\n\nvectors [1] NULL\nvectors [1] logical\nvectors [1] integer\nvectors [1] double\nvectors [1] complex\nvectors [1] character\nvectors [1] list\nvectors [1] raw\n\nfunctions [1] closure\nfunctions [1] special\nfunctions [1] builtin\n\nenvironments [1] environment\n\nS4 OOP [1] S4\n\nlanguage\\ncomponents [1] symbol\nlanguage\\ncomponents [1] language\nlanguage\\ncomponents [1] pairlist\n\nC components [1] externalptr\nC components [1] weakref\nC components [1] bytecode\nC components [1] promise\nC components [1] ...\nC components [1] any\n```\n\n</details>\n\n## Be careful about the numeric type\n\n1. Often \"numeric\" is treated as synonymous for double:\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# create a double and integeger objects\none <- 1\noneL <- 1L\ntypeof(one)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"double\"\n```\n\n\n:::\n\n```{.r .cell-code}\ntypeof(oneL)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"integer\"\n```\n\n\n:::\n\n```{.r .cell-code}\n# check their type after as.numeric()\none |> as.numeric() |> typeof()\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"double\"\n```\n\n\n:::\n\n```{.r .cell-code}\noneL |> as.numeric() |> typeof()\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"double\"\n```\n\n\n:::\n:::\n\n\n2. In S3 and S4, \"numeric\" is taken as either integer or double, when choosing methods:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsloop::s3_class(1)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"double\" \"numeric\"\n```\n\n\n:::\n\n```{.r .cell-code}\nsloop::s3_class(1L)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"integer\" \"numeric\"\n```\n\n\n:::\n:::\n\n\n3. `is.numeric()` tests whether an object behaves like a number\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntypeof(factor(\"x\"))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"integer\"\n```\n\n\n:::\n\n```{.r .cell-code}\nis.numeric(factor(\"x\"))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] FALSE\n```\n\n\n:::\n:::\n\n\nBut Advanced R consistently uses numeric to mean integer or double type.\n\n## Meeting Videos\n\n### Cohort 1\n\n<iframe src=\"https://www.youtube.com/embed/Fy3JF5Em6qY\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 2\n\n<iframe src=\"https://www.youtube.com/embed/9GkgNC15EAw\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 3\n\n<iframe src=\"https://www.youtube.com/embed/IL6iJhAsZAY\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 4\n\n<iframe src=\"https://www.youtube.com/embed/4la5adcWwKE\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 5\n\n<iframe src=\"https://www.youtube.com/embed/NeHtEGab1Og\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 6\n\n<iframe src=\"https://www.youtube.com/embed/rfidR7tI_nQ\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n<details>\n<summary> Meeting chat log </summary>\n\n```\n00:35:02\tTrevin:\tsloop (“sail the seas of OOP”)\n00:42:40\tRyan Metcalf:\tAwesome input Trevin! I jumped to the vignette, but didn't see the reference directly.\n01:00:01\tTrevin:\tIf you're interested there may be a new “R Packages\" cohort starting up soon (also a new version of the book coming out soonish as well?)\n01:08:23\tOluwafemi Oyedele:\tThank you !!!\n```\n</details>\n\n### Cohort 7\n\n<iframe src=\"https://www.youtube.com/embed/mOpmvc9h_4M\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n<details>\n\n<summary>Meeting chat log</summary>\n```\n00:35:43\tStone: base::InternalMethods\n00:48:04\tcollinberke: https://cran.r-project.org/doc/manuals/R-exts.html\n```\n</details>\n", - "supporting": [], + "markdown": "---\nengine: knitr\ntitle: Base types\n---\n\n## Learning objectives:\n\n- Understand what OOP means--at the very least for R\n- Know how to discern an object's nature--base or OO--and type\n\n\n\n<details>\n<summary>Session Info</summary>\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(\"DiagrammeR\")\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nutils::sessionInfo()\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> R version 4.5.1 (2025-06-13 ucrt)\n#> Platform: x86_64-w64-mingw32/x64\n#> Running under: Windows 11 x64 (build 26100)\n#> \n#> Matrix products: default\n#> LAPACK version 3.12.1\n#> \n#> locale:\n#> [1] LC_COLLATE=English_United States.utf8 \n#> [2] LC_CTYPE=English_United States.utf8 \n#> [3] LC_MONETARY=English_United States.utf8\n#> [4] LC_NUMERIC=C \n#> [5] LC_TIME=English_United States.utf8 \n#> \n#> time zone: America/Chicago\n#> tzcode source: internal\n#> \n#> attached base packages:\n#> [1] stats graphics grDevices utils datasets methods base \n#> \n#> other attached packages:\n#> [1] DiagrammeR_1.0.11\n#> \n#> loaded via a namespace (and not attached):\n#> [1] digest_0.6.37 RColorBrewer_1.1-3 R6_2.6.1 fastmap_1.2.0 \n#> [5] xfun_0.52 magrittr_2.0.3 glue_1.8.0 knitr_1.50 \n#> [9] htmltools_0.5.8.1 rmarkdown_2.29 cli_3.6.5 visNetwork_2.1.2 \n#> [13] compiler_4.5.1 tools_4.5.1 evaluate_1.0.4 yaml_2.3.10 \n#> [17] rlang_1.1.6 jsonlite_2.0.0 htmlwidgets_1.6.4 keyring_1.4.1\n```\n\n\n:::\n:::\n\n\n</details>\n\n\n## Why OOP is hard in R\n\n- Multiple OOP systems exist: S3, R6, S4, and (now/soon) S7.\n- Multiple preferences: some users prefer one system; others, another.\n- R's OOP systems are different enough that prior OOP experience may not transfer well.\n\n[](https://xkcd.com/927/)\n\n\n## OOP: Big Ideas\n\n1. **Polymorphism.** Function has a single interface (outside), but contains (inside) several class-specific implementations.\n\n::: {.cell}\n\n```{.r .cell-code}\n# imagine a function with object x as an argument\n# from the outside, users interact with the same function\n# but inside the function, there are provisions to deal with objects of different classes\nsome_function <- function(x) {\n if is.numeric(x) {\n # implementation for numeric x\n } else if is.character(x) {\n # implementation for character x\n } ...\n}\n```\n:::\n\n\n<details>\n<summary>Example of polymorphism</summary>\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# data frame\nsummary(mtcars[,1:4])\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> mpg cyl disp hp \n#> Min. :10.40 Min. :4.000 Min. : 71.1 Min. : 52.0 \n#> 1st Qu.:15.43 1st Qu.:4.000 1st Qu.:120.8 1st Qu.: 96.5 \n#> Median :19.20 Median :6.000 Median :196.3 Median :123.0 \n#> Mean :20.09 Mean :6.188 Mean :230.7 Mean :146.7 \n#> 3rd Qu.:22.80 3rd Qu.:8.000 3rd Qu.:326.0 3rd Qu.:180.0 \n#> Max. :33.90 Max. :8.000 Max. :472.0 Max. :335.0\n```\n\n\n:::\n\n```{.r .cell-code}\n# statistical model\nlin_fit <- lm(mpg ~ hp, data = mtcars)\nsummary(lin_fit)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> \n#> Call:\n#> lm(formula = mpg ~ hp, data = mtcars)\n#> \n#> Residuals:\n#> Min 1Q Median 3Q Max \n#> -5.7121 -2.1122 -0.8854 1.5819 8.2360 \n#> \n#> Coefficients:\n#> Estimate Std. Error t value Pr(>|t|) \n#> (Intercept) 30.09886 1.63392 18.421 < 2e-16 ***\n#> hp -0.06823 0.01012 -6.742 1.79e-07 ***\n#> ---\n#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n#> \n#> Residual standard error: 3.863 on 30 degrees of freedom\n#> Multiple R-squared: 0.6024,\tAdjusted R-squared: 0.5892 \n#> F-statistic: 45.46 on 1 and 30 DF, p-value: 1.788e-07\n```\n\n\n:::\n:::\n\n\n</details>\n\n2. **Encapsulation.** Function \"encapsulates\"--that is, encloses in an inviolate capsule--both data and how it acts on data. Think of a REST API: a client interacts with with an API only through a set of discrete endpoints (i.e., things to get or set), but the server does not otherwise give access to its internal workings or state. Like with an API, this creates a separation of concerns: OOP functions take inputs and yield results; users only consume those results.\n\n## OOP: Properties\n\n### Objects have class\n\n- Class defines:\n - Method (i.e., what can be done with object)\n - Fields (i.e., data that defines an instance of the class)\n- Objects are an instance of a class\n\n### Class is inherited\n\n- Class is defined:\n - By an object's class (e.g., ordered factor)\n - By the parent of the object's class (e.g., factor)\n- Inheritance matters for method dispatch\n - If a method is defined for an object's class, use that method\n - If an object doesn't have a method, use the method of the parent class\n - The process of finding a method, is called dispatch\n\n## OOP in R: Two Paradigms\n\n**1. Encapsulated OOP**\n\n- Objects \"encapsulate\"\n - Methods (i.e., what can be done)\n - Fields (i.e., data on which things are done)\n- Calls communicate this encapsulation, since form follows function\n - Form: `object.method(arg1, arg2)`\n - Function: for `object`, apply `method` for `object`'s class with arguments `arg1` and `arg2`\n\n**2. Functional OOP**\n\n- Methods belong to \"generic\" functions\n- From the outside, look like regular functions: `generic(object, arg2, arg3)`\n- From the inside, components are also functions\n\n### Concept Map\n\n\n::: {.cell}\n::: {.cell-output-display}\n\n```{=html}\n<div class=\"DiagrammeR html-widget html-fill-item\" id=\"htmlwidget-f19ffbf1ff0c903a1236\" style=\"width:100%;height:464px;\"></div>\n<script type=\"application/json\" data-for=\"htmlwidget-f19ffbf1ff0c903a1236\">{\"x\":{\"diagram\":\"\\ngraph LR\\n\\nOOP --> encapsulated_OOP\\nOOP --> functional_OOP\\n\\nfunctional_OOP --> S3\\nfunctional_OOP --> S4\\n\\nencapsulated_OOP --> R6\\nencapsulated_OOP --> RC\\n\"},\"evals\":[],\"jsHooks\":[]}</script>\n```\n\n:::\n:::\n\n\n<details>\n<summary>Mermaid code</summary>\n\n::: {.cell}\n\n```{.r .cell-code}\nDiagrammeR::mermaid(\"\ngraph LR\n\nOOP --> encapsulated_OOP\nOOP --> functional_OOP\n\nfunctional_OOP --> S3\nfunctional_OOP --> S4\n\nencapsulated_OOP --> R6\nencapsulated_OOP --> RC\n\")\n```\n:::\n\n</details>\n\n## OOP in base R\n\n- **S3**\n - Paradigm: functional OOP\n - Noteworthy: R's first OOP system\n - Use case: low-cost solution for common problems\n - Downsides: no guarantees\n- **S4**\n - Paradigm: functional OOP\n - Noteworthy: rewrite of S3, used by `Bioconductor`\n - Use case: \"more guarantees and greater encapsulation\" than S3\n - Downsides: higher setup cost than S3\n- **RC**\n - Paradigm: encapsulated OOP\n - Noteworthy: special type of S4 object is mutable--in other words, that can be modified in place (instead of R's usual copy-on-modify behavior)\n - Use cases: problems that are hard to tackle with functional OOP (in S3 and S4)\n - Downsides: harder to reason about (because of modify-in-place logic)\n\n## OOP in packages\n\n- **R6**\n - Paradigm: encapsulated OOP\n - Noteworthy: resolves issues with RC\n- **R7**\n - Paradigm: functional OOP\n - Noteworthy: \n - best parts of S3 and S4\n - ease of S3\n - power of S4\n - See more in [rstudio::conf(2022) talk](https://www.rstudio.com/conference/2022/talks/introduction-to-r7/)\n- **R.oo**\n - Paradigm: hybrid functional and encapsulated (?)\n- **proto**\n - Paradigm: prototype OOP\n - Noteworthy: OOP style used in `ggplot2`\n\n## How can you tell if an object is base or OOP?\n\n### Functions\n\nTwo functions:\n\n- `base::is.object()`, which yields TRUE/FALSE about whether is OOP object\n- `sloop::otype()`, which says what type of object type: `\"base\"`, `\"S3\"`, etc.\n\nAn few examples:\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Example 1: a base object\nis.object(1:10)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] FALSE\n```\n\n\n:::\n\n```{.r .cell-code}\nsloop::otype(1:10)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"base\"\n```\n\n\n:::\n\n```{.r .cell-code}\n# Example 2: an OO object\nis.object(mtcars)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] TRUE\n```\n\n\n:::\n\n```{.r .cell-code}\nsloop::otype(mtcars)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"S3\"\n```\n\n\n:::\n:::\n\n\n### sloop\n\n* **S** **L**anguage **O**bject-**O**riented **P**rogramming\n\n[](https://en.wikipedia.org/wiki/Sloop_John_B)\n\n### Class\n\nOO objects have a \"class\" attribute:\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# base object has no class\nattr(1:10, \"class\")\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> NULL\n```\n\n\n:::\n\n```{.r .cell-code}\n# OO object has one or more classes\nattr(mtcars, \"class\")\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"data.frame\"\n```\n\n\n:::\n:::\n\n\n## What about types?\n\nOnly OO objects have a \"class\" attribute, but every object--whether base or OO--has class\n\n### Vectors\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntypeof(NULL)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"NULL\"\n```\n\n\n:::\n\n```{.r .cell-code}\ntypeof(c(\"a\", \"b\", \"c\"))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"character\"\n```\n\n\n:::\n\n```{.r .cell-code}\ntypeof(1L)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"integer\"\n```\n\n\n:::\n\n```{.r .cell-code}\ntypeof(1i)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"complex\"\n```\n\n\n:::\n:::\n\n\n\n### Functions\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# \"normal\" function\nmy_fun <- function(x) { x + 1 }\ntypeof(my_fun)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"closure\"\n```\n\n\n:::\n\n```{.r .cell-code}\n# internal function\ntypeof(`[`)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"special\"\n```\n\n\n:::\n\n```{.r .cell-code}\n# primitive function\ntypeof(sum) \n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"builtin\"\n```\n\n\n:::\n:::\n\n\n### Environments\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntypeof(globalenv())\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"environment\"\n```\n\n\n:::\n:::\n\n\n\n### S4\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmle_obj <- stats4::mle(function(x = 1) (x - 2) ^ 2)\ntypeof(mle_obj)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"S4\"\n```\n\n\n:::\n:::\n\n\n\n### Language components\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntypeof(quote(a))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"symbol\"\n```\n\n\n:::\n\n```{.r .cell-code}\ntypeof(quote(a + 1))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"language\"\n```\n\n\n:::\n\n```{.r .cell-code}\ntypeof(formals(my_fun))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"pairlist\"\n```\n\n\n:::\n:::\n\n\n### Concept Map\n\n\n\n<details>\n<summary>Sankey graph code</summary>\n\nThe graph above was made with [SankeyMATIC](https://sankeymatic.com/)\n\n```\n// toggle \"Show Values\"\n// set Default Flow Colors from \"each flow's Source\"\n\nbase\\ntypes [8] vectors\nbase\\ntypes [3] functions\nbase\\ntypes [1] environments\nbase\\ntypes [1] S4 OOP\nbase\\ntypes [3] language\\ncomponents\nbase\\ntypes [6] C components\n\nvectors [1] NULL\nvectors [1] logical\nvectors [1] integer\nvectors [1] double\nvectors [1] complex\nvectors [1] character\nvectors [1] list\nvectors [1] raw\n\nfunctions [1] closure\nfunctions [1] special\nfunctions [1] builtin\n\nenvironments [1] environment\n\nS4 OOP [1] S4\n\nlanguage\\ncomponents [1] symbol\nlanguage\\ncomponents [1] language\nlanguage\\ncomponents [1] pairlist\n\nC components [1] externalptr\nC components [1] weakref\nC components [1] bytecode\nC components [1] promise\nC components [1] ...\nC components [1] any\n```\n\n</details>\n\n## Be careful about the numeric type\n\n1. Often \"numeric\" is treated as synonymous for double:\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# create a double and integeger objects\none <- 1\noneL <- 1L\ntypeof(one)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"double\"\n```\n\n\n:::\n\n```{.r .cell-code}\ntypeof(oneL)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"integer\"\n```\n\n\n:::\n\n```{.r .cell-code}\n# check their type after as.numeric()\none |> as.numeric() |> typeof()\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"double\"\n```\n\n\n:::\n\n```{.r .cell-code}\noneL |> as.numeric() |> typeof()\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"double\"\n```\n\n\n:::\n:::\n\n\n2. In S3 and S4, \"numeric\" is taken as either integer or double, when choosing methods:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsloop::s3_class(1)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"double\" \"numeric\"\n```\n\n\n:::\n\n```{.r .cell-code}\nsloop::s3_class(1L)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"integer\" \"numeric\"\n```\n\n\n:::\n:::\n\n\n3. `is.numeric()` tests whether an object behaves like a number\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntypeof(factor(\"x\"))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"integer\"\n```\n\n\n:::\n\n```{.r .cell-code}\nis.numeric(factor(\"x\"))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] FALSE\n```\n\n\n:::\n:::\n\n\nBut Advanced R consistently uses numeric to mean integer or double type.\n", + "supporting": [ + "12_files" + ], "filters": [ "rmarkdown/pagebreak.lua" ], diff --git a/_freeze/slides/13/execute-results/html.json b/_freeze/slides/13/execute-results/html.json @@ -1,9 +1,11 @@ { - "hash": "0fdb13f9b281420c36a022c290d2416c", + "hash": "9b125f32ffe5d76475c3174fbd6c7c25", "result": { "engine": "knitr", - "markdown": "---\nengine: knitr\ntitle: S3\n---\n\n# Introduction\n\n## Basics\n\n- Has class\n- Uses a generic function to decide on method\n - method = implementation for a specific class\n - dispatch = process of searching for right method\n\n## Classes\n\n**Theory:**\n\nWhat is class?\n\n - No formal definition in S3\n - Simply set class attribute\n\nHow to set class?\n\n - At time of object creation\n - After object creation\n \n\n::: {.cell}\n\n```{.r .cell-code}\n# at time of object creation\nx <- structure(list(), class = \"my_class\")\n\n# after object creation\nx <- list()\nclass(x) <- \"my_class\"\n```\n:::\n\n\nSome advice on style:\n\n - Rules: Can be any string\n - Advice: Consider using/including package name to avoid collision with name of another class (e.g., `blob`, which defines a single class; haven has `labelled` and `haven_labelled`)\n - Convention: letters and `_`; avoid `.` since it might be confused as separator between generic and class name\n\n**Practice:**\n\nHow to compose a class in practice?\n\n- **Constructor**, which helps the developer create new object of target class. Provide always.\n- **Validator**, which checks that values in constructor are valid. May not be necessary for simple classes.\n- **Helper**, which helps users create new objects of target class. May be relevant only for user-facing classes.\n\n### Constructors\n\nHelp developers construct an object of the target class:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nnew_difftime <- function(x = double(), units = \"secs\") {\n # check inputs\n # issue generic system error if unexpected type or value\n stopifnot(is.double(x))\n units <- match.arg(units, c(\"secs\", \"mins\", \"hours\", \"days\", \"weeks\"))\n\n # construct instance of target class\n structure(x,\n class = \"difftime\",\n units = units\n )\n}\n```\n:::\n\n\n### Validators\n\nContrast a constructor, aimed at quickly creating instances of a class, which only checks type of inputs ...\n\n\n::: {.cell}\n\n```{.r .cell-code}\nnew_factor <- function(x = integer(), levels = character()) {\n stopifnot(is.integer(x))\n stopifnot(is.character(levels))\n\n structure(\n x,\n levels = levels,\n class = \"factor\"\n )\n}\n\n# error messages are for system default and developer-facing\nnew_factor(1:5, \"a\")\n```\n\n::: {.cell-output .cell-output-error}\n\n```\n#> Error in as.character.factor(x): malformed factor\n```\n\n\n:::\n:::\n\n\n\n... with a validator, aimed at emitting errors if inputs pose problems, which makes more expensive checks\n\n\n::: {.cell}\n\n```{.r .cell-code}\nvalidate_factor <- function(x) {\n values <- unclass(x)\n levels <- attr(x, \"levels\")\n\n if (!all(!is.na(values) & values > 0)) {\n stop(\n \"All `x` values must be non-missing and greater than zero\",\n call. = FALSE\n )\n }\n\n if (length(levels) < max(values)) {\n stop(\n \"There must be at least as many `levels` as possible values in `x`\",\n call. = FALSE\n )\n }\n\n x\n}\n\n# error messages are informative and user-facing\nvalidate_factor(new_factor(1:5, \"a\"))\n```\n\n::: {.cell-output .cell-output-error}\n\n```\n#> Error: There must be at least as many `levels` as possible values in `x`\n```\n\n\n:::\n:::\n\n\nMaybe there is a typo in the `validate_factor()` function? Do the integers need to start at 1 and be consecutive? \n\n* If not, then `length(levels) < max(values)` should be `length(levels) < length(values)`, right?\n* If so, why do the integers need to start at 1 and be consecutive? And if they need to be as such, we should tell the user, right?\n\n\n::: {.cell}\n\n```{.r .cell-code}\nvalidate_factor(new_factor(1:3, levels = c(\"a\", \"b\", \"c\")))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] a b c\n#> Levels: a b c\n```\n\n\n:::\n\n```{.r .cell-code}\nvalidate_factor(new_factor(10:12, levels = c(\"a\", \"b\", \"c\")))\n```\n\n::: {.cell-output .cell-output-error}\n\n```\n#> Error: There must be at least as many `levels` as possible values in `x`\n```\n\n\n:::\n:::\n\n\n\n### Helpers\n\nSome desired virtues:\n\n- Have the same name as the class\n- Call the constructor and validator, if the latter exists.\n- Issue error informative, user-facing error messages\n- Adopt thoughtful/useful defaults or type conversion\n\n\nExercise 5 in 13.3.4\n\nQ: Read the documentation for `utils::as.roman()`. How would you write a constructor for this class? Does it need a validator? What might a helper do?\n\nA: This function transforms numeric input into Roman numbers. It is built on the integer type, which results in the following constructor.\n \n \n\n::: {.cell}\n\n```{.r .cell-code}\nnew_roman <- function(x = integer()) {\n stopifnot(is.integer(x))\n structure(x, class = \"roman\")\n}\n```\n:::\n\n\nThe documentation tells us, that only values between 1 and 3899 are uniquely represented, which we then include in our validation function.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nvalidate_roman <- function(x) {\n values <- unclass(x)\n \n if (any(values < 1 | values > 3899)) {\n stop(\n \"Roman numbers must fall between 1 and 3899.\",\n call. = FALSE\n )\n }\n x\n}\n```\n:::\n\n\nFor convenience, we allow the user to also pass real values to a helper function.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nroman <- function(x = integer()) {\n x <- as.integer(x)\n \n validate_roman(new_roman(x))\n}\n\n# Test\nroman(c(1, 753, 2024))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] I DCCLIII MMXXIV\n```\n\n\n:::\n\n```{.r .cell-code}\nroman(0)\n```\n\n::: {.cell-output .cell-output-error}\n\n```\n#> Error: Roman numbers must fall between 1 and 3899.\n```\n\n\n:::\n:::\n\n\n\n\n## Generics and methods\n\n**Generic functions:**\n\n- Consist of a call to `UseMethod()`\n- Pass arguments from the generic to the dispatched method \"auto-magically\"\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmy_new_generic <- function(x) {\n UseMethod(\"my_new_generic\")\n}\n```\n:::\n\n\n### Method dispatch\n\n- `UseMethod()` creates a vector of method names\n- Dispatch \n - Examines all methods in the vector\n - Selects a method\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- Sys.Date()\nsloop::s3_dispatch(print(x))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> => print.Date\n#> * print.default\n```\n\n\n:::\n:::\n\n\n### Finding methods\n\nWhile `sloop::s3_dispatch()` gives the specific method selected for a specific call, on can see the methods defined:\n\n- For a generic\n\n::: {.cell}\n\n```{.r .cell-code}\nsloop::s3_methods_generic(\"mean\")\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> # A tibble: 7 × 4\n#> generic class visible source \n#> <chr> <chr> <lgl> <chr> \n#> 1 mean Date TRUE base \n#> 2 mean default TRUE base \n#> 3 mean difftime TRUE base \n#> 4 mean POSIXct TRUE base \n#> 5 mean POSIXlt TRUE base \n#> 6 mean quosure FALSE registered S3method\n#> 7 mean vctrs_vctr FALSE registered S3method\n```\n\n\n:::\n:::\n\n- For a class\n\n::: {.cell}\n\n```{.r .cell-code}\nsloop::s3_methods_class(\"ordered\")\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> # A tibble: 4 × 4\n#> generic class visible source \n#> <chr> <chr> <lgl> <chr> \n#> 1 as.data.frame ordered TRUE base \n#> 2 Ops ordered TRUE base \n#> 3 relevel ordered FALSE registered S3method\n#> 4 Summary ordered TRUE base\n```\n\n\n:::\n:::\n\n\n### Creating methods\n\nTwo rules:\n\n- Only write a method if you own the generic. Otherwise, bad manners.\n- Method must have same arguments as its generic--with one important exception: `...`\n\n**Example from text:**\n\nI thought it would be good for us to work through this problem.\n\n> Carefully read the documentation for `UseMethod()` and explain why the following code returns the results that it does. What two usual rules of function evaluation does `UseMethod()` violate?\n\n\n::: {.cell}\n\n```{.r .cell-code}\ng <- function(x) {\n x <- 10\n y <- 10\n UseMethod(\"g\")\n}\ng.default <- function(x) c(x = x, y = y)\n\nx <- 1\ny <- 1\ng(x)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> x y \n#> 1 1\n```\n\n\n:::\n\n```{.r .cell-code}\ng.default(x)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> x y \n#> 1 1\n```\n\n\n:::\n:::\n\n\n\n\n**Examples caught in the wild:**\n\n- [`haven::zap_label`](https://github.com/tidyverse/haven/blob/main/R/zap_label.R), which removes column labels\n- [`dplyr::mutate`](https://github.com/tidyverse/dplyr/blob/main/R/mutate.R)\n- [`tidyr::pivot_longer`](https://github.com/tidyverse/tidyr/blob/main/R/pivot-long.R)\n\n## Object styles\n\n## Inheritance\n\nThree ideas:\n\n1. Class is a vector of classes\n\n::: {.cell}\n\n```{.r .cell-code}\nclass(ordered(\"x\"))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"ordered\" \"factor\"\n```\n\n\n:::\n\n```{.r .cell-code}\nclass(Sys.time())\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"POSIXct\" \"POSIXt\"\n```\n\n\n:::\n:::\n\n2. Dispatch moves through class vector until it finds a defined method\n\n::: {.cell}\n\n```{.r .cell-code}\nsloop::s3_dispatch(print(ordered(\"x\")))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> print.ordered\n#> => print.factor\n#> * print.default\n```\n\n\n:::\n:::\n\n3. Method can delegate to another method via `NextMethod()`, which is indicated by `->` as below:\n\n::: {.cell}\n\n```{.r .cell-code}\nsloop::s3_dispatch(ordered(\"x\")[1])\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [.ordered\n#> => [.factor\n#> [.default\n#> -> [ (internal)\n```\n\n\n:::\n:::\n\n\n### `NextMethod()`\n\nConsider `secret` class that masks each character of the input with `x` in output\n\n\n::: {.cell}\n\n```{.r .cell-code}\nnew_secret <- function(x = double()) {\n stopifnot(is.double(x))\n structure(x, class = \"secret\")\n}\n\nprint.secret <- function(x, ...) {\n print(strrep(\"x\", nchar(x)))\n invisible(x)\n}\n\ny <- new_secret(c(15, 1, 456))\ny\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"xx\" \"x\" \"xxx\"\n```\n\n\n:::\n:::\n\n\nNotice that the `[` method is problematic in that it does not preserve the `secret` class. Additionally, it returns `15` as the first element instead of `xx`.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsloop::s3_dispatch(y[1])\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [.secret\n#> [.default\n#> => [ (internal)\n```\n\n\n:::\n\n```{.r .cell-code}\ny[1]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 15\n```\n\n\n:::\n:::\n\n\nFix this with a `[.secret` method:\n\nThe first fix (not run) is inefficient because it creates a copy of `y`.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# not run\n`[.secret` <- function(x, i) {\n x <- unclass(x)\n new_secret(x[i])\n}\n```\n:::\n\n\n`NextMethod()` is more efficient.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n`[.secret` <- function(x, i) {\n # first, dispatch to `[`\n # then, coerce subset value to `secret` class\n new_secret(NextMethod())\n}\n```\n:::\n\n\nNotice that `[.secret` is selected for dispatch, but that the method delegates to the internal `[`.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsloop::s3_dispatch(y[1])\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> => [.secret\n#> [.default\n#> -> [ (internal)\n```\n\n\n:::\n\n```{.r .cell-code}\ny[1]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"xx\"\n```\n\n\n:::\n:::\n\n\n\n### Allowing subclassing\n\nContinue the example above to have a `supersecret` subclass that hides even the number of characters in the input (e.g., `123` -> `xxxxx`, 12345678 -> `xxxxx`, 1 -> `xxxxx`).\n\nTo allow for this subclass, the constructor function needs to include two additional arguments:\n\n- `...` for passing an arbitrary set of arguments to different subclasses\n- `class` for defining the subclass\n\n\n::: {.cell}\n\n```{.r .cell-code}\nnew_secret <- function(x, ..., class = character()) {\n stopifnot(is.double(x))\n\n structure(\n x,\n ...,\n class = c(class, \"secret\")\n )\n}\n```\n:::\n\n\nTo create the subclass, simply invoke the parent class constructor inside of the subclass constructor:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nnew_supersecret <- function(x) {\n new_secret(x, class = \"supersecret\")\n}\n\nprint.supersecret <- function(x, ...) {\n print(rep(\"xxxxx\", length(x)))\n invisible(x)\n}\n```\n:::\n\n\nBut this means the subclass inherits all parent methods and needs to overwrite all parent methods with subclass methods that return the sublclass rather than the parent class.\n\nThere's no easy solution to this problem in base R.\n\nThere is a solution in the vectors package: `vctrs::vec_restore()`\n\n<!-- TODO: read docs/vignettes to be able to summarize how this works -->\n\n## Meeting Videos\n\n### Cohort 1\n\n<iframe src=\"https://www.youtube.com/embed/Fy3JF5Em6qY\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 2\n\n<iframe src=\"https://www.youtube.com/embed/9GkgNC15EAw\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 3\n\n<iframe src=\"https://www.youtube.com/embed/q7lFXSLdC1g\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n<iframe src=\"https://www.youtube.com/embed/2rHS_urTGFg\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 4\n\n<iframe src=\"https://www.youtube.com/embed/4la5adcWwKE\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n<iframe src=\"https://www.youtube.com/embed/eTCT2O58GYM\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 5\n\n<iframe src=\"https://www.youtube.com/embed/NeHtEGab1Og\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 6\n\n<iframe src=\"https://www.youtube.com/embed/vzbl2o-MEeQ\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n<details>\n<summary> Meeting chat log </summary>\n\n```\n00:05:30\tOluwafemi Oyedele:\tHi everyone, Good Evening !!!\n00:09:44\tTrevin:\tI agree Arthur, need to look at that package some more\n```\n</details>\n\n### Cohort 7\n\n<iframe src=\"https://www.youtube.com/embed/zNLx4q8TCKQ\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n", - "supporting": [], + "markdown": "---\nengine: knitr\ntitle: S3\n---\n\n# Introduction\n\n## Basics\n\n- Has class\n- Uses a generic function to decide on method\n - method = implementation for a specific class\n - dispatch = process of searching for right method\n\n## Classes\n\n**Theory:**\n\nWhat is class?\n\n - No formal definition in S3\n - Simply set class attribute\n\nHow to set class?\n\n - At time of object creation\n - After object creation\n \n\n::: {.cell}\n\n```{.r .cell-code}\n# at time of object creation\nx <- structure(list(), class = \"my_class\")\n\n# after object creation\nx <- list()\nclass(x) <- \"my_class\"\n```\n:::\n\n\nSome advice on style:\n\n - Rules: Can be any string\n - Advice: Consider using/including package name to avoid collision with name of another class (e.g., `blob`, which defines a single class; haven has `labelled` and `haven_labelled`)\n - Convention: letters and `_`; avoid `.` since it might be confused as separator between generic and class name\n\n**Practice:**\n\nHow to compose a class in practice?\n\n- **Constructor**, which helps the developer create new object of target class. Provide always.\n- **Validator**, which checks that values in constructor are valid. May not be necessary for simple classes.\n- **Helper**, which helps users create new objects of target class. May be relevant only for user-facing classes.\n\n### Constructors\n\nHelp developers construct an object of the target class:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nnew_difftime <- function(x = double(), units = \"secs\") {\n # check inputs\n # issue generic system error if unexpected type or value\n stopifnot(is.double(x))\n units <- match.arg(units, c(\"secs\", \"mins\", \"hours\", \"days\", \"weeks\"))\n\n # construct instance of target class\n structure(x,\n class = \"difftime\",\n units = units\n )\n}\n```\n:::\n\n\n### Validators\n\nContrast a constructor, aimed at quickly creating instances of a class, which only checks type of inputs ...\n\n\n::: {.cell}\n\n```{.r .cell-code}\nnew_factor <- function(x = integer(), levels = character()) {\n stopifnot(is.integer(x))\n stopifnot(is.character(levels))\n\n structure(\n x,\n levels = levels,\n class = \"factor\"\n )\n}\n\n# error messages are for system default and developer-facing\nnew_factor(1:5, \"a\")\n```\n\n::: {.cell-output .cell-output-error}\n\n```\n#> Error in as.character.factor(x): malformed factor\n```\n\n\n:::\n:::\n\n\n\n... with a validator, aimed at emitting errors if inputs pose problems, which makes more expensive checks\n\n\n::: {.cell}\n\n```{.r .cell-code}\nvalidate_factor <- function(x) {\n values <- unclass(x)\n levels <- attr(x, \"levels\")\n\n if (!all(!is.na(values) & values > 0)) {\n stop(\n \"All `x` values must be non-missing and greater than zero\",\n call. = FALSE\n )\n }\n\n if (length(levels) < max(values)) {\n stop(\n \"There must be at least as many `levels` as possible values in `x`\",\n call. = FALSE\n )\n }\n\n x\n}\n\n# error messages are informative and user-facing\nvalidate_factor(new_factor(1:5, \"a\"))\n```\n\n::: {.cell-output .cell-output-error}\n\n```\n#> Error: There must be at least as many `levels` as possible values in `x`\n```\n\n\n:::\n:::\n\n\nMaybe there is a typo in the `validate_factor()` function? Do the integers need to start at 1 and be consecutive? \n\n* If not, then `length(levels) < max(values)` should be `length(levels) < length(values)`, right?\n* If so, why do the integers need to start at 1 and be consecutive? And if they need to be as such, we should tell the user, right?\n\n\n::: {.cell}\n\n```{.r .cell-code}\nvalidate_factor(new_factor(1:3, levels = c(\"a\", \"b\", \"c\")))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] a b c\n#> Levels: a b c\n```\n\n\n:::\n\n```{.r .cell-code}\nvalidate_factor(new_factor(10:12, levels = c(\"a\", \"b\", \"c\")))\n```\n\n::: {.cell-output .cell-output-error}\n\n```\n#> Error: There must be at least as many `levels` as possible values in `x`\n```\n\n\n:::\n:::\n\n\n\n### Helpers\n\nSome desired virtues:\n\n- Have the same name as the class\n- Call the constructor and validator, if the latter exists.\n- Issue error informative, user-facing error messages\n- Adopt thoughtful/useful defaults or type conversion\n\n\nExercise 5 in 13.3.4\n\nQ: Read the documentation for `utils::as.roman()`. How would you write a constructor for this class? Does it need a validator? What might a helper do?\n\nA: This function transforms numeric input into Roman numbers. It is built on the integer type, which results in the following constructor.\n \n \n\n::: {.cell}\n\n```{.r .cell-code}\nnew_roman <- function(x = integer()) {\n stopifnot(is.integer(x))\n structure(x, class = \"roman\")\n}\n```\n:::\n\n\nThe documentation tells us, that only values between 1 and 3899 are uniquely represented, which we then include in our validation function.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nvalidate_roman <- function(x) {\n values <- unclass(x)\n \n if (any(values < 1 | values > 3899)) {\n stop(\n \"Roman numbers must fall between 1 and 3899.\",\n call. = FALSE\n )\n }\n x\n}\n```\n:::\n\n\nFor convenience, we allow the user to also pass real values to a helper function.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nroman <- function(x = integer()) {\n x <- as.integer(x)\n \n validate_roman(new_roman(x))\n}\n\n# Test\nroman(c(1, 753, 2024))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] I DCCLIII MMXXIV\n```\n\n\n:::\n\n```{.r .cell-code}\nroman(0)\n```\n\n::: {.cell-output .cell-output-error}\n\n```\n#> Error: Roman numbers must fall between 1 and 3899.\n```\n\n\n:::\n:::\n\n\n\n\n## Generics and methods\n\n**Generic functions:**\n\n- Consist of a call to `UseMethod()`\n- Pass arguments from the generic to the dispatched method \"auto-magically\"\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmy_new_generic <- function(x) {\n UseMethod(\"my_new_generic\")\n}\n```\n:::\n\n\n### Method dispatch\n\n- `UseMethod()` creates a vector of method names\n- Dispatch \n - Examines all methods in the vector\n - Selects a method\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- Sys.Date()\nsloop::s3_dispatch(print(x))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> => print.Date\n#> * print.default\n```\n\n\n:::\n:::\n\n\n### Finding methods\n\nWhile `sloop::s3_dispatch()` gives the specific method selected for a specific call, on can see the methods defined:\n\n- For a generic\n\n::: {.cell}\n\n```{.r .cell-code}\nsloop::s3_methods_generic(\"mean\")\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> # A tibble: 7 × 4\n#> generic class visible source \n#> <chr> <chr> <lgl> <chr> \n#> 1 mean Date TRUE base \n#> 2 mean default TRUE base \n#> 3 mean difftime TRUE base \n#> 4 mean POSIXct TRUE base \n#> 5 mean POSIXlt TRUE base \n#> 6 mean quosure FALSE registered S3method\n#> 7 mean vctrs_vctr FALSE registered S3method\n```\n\n\n:::\n:::\n\n- For a class\n\n::: {.cell}\n\n```{.r .cell-code}\nsloop::s3_methods_class(\"ordered\")\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> # A tibble: 4 × 4\n#> generic class visible source \n#> <chr> <chr> <lgl> <chr> \n#> 1 as.data.frame ordered TRUE base \n#> 2 Ops ordered TRUE base \n#> 3 relevel ordered FALSE registered S3method\n#> 4 Summary ordered TRUE base\n```\n\n\n:::\n:::\n\n\n### Creating methods\n\nTwo rules:\n\n- Only write a method if you own the generic. Otherwise, bad manners.\n- Method must have same arguments as its generic--with one important exception: `...`\n\n**Example from text:**\n\nI thought it would be good for us to work through this problem.\n\n> Carefully read the documentation for `UseMethod()` and explain why the following code returns the results that it does. What two usual rules of function evaluation does `UseMethod()` violate?\n\n\n::: {.cell}\n\n```{.r .cell-code}\ng <- function(x) {\n x <- 10\n y <- 10\n UseMethod(\"g\")\n}\ng.default <- function(x) c(x = x, y = y)\n\nx <- 1\ny <- 1\ng(x)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> x y \n#> 1 1\n```\n\n\n:::\n\n```{.r .cell-code}\ng.default(x)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> x y \n#> 1 1\n```\n\n\n:::\n:::\n\n\n\n\n**Examples caught in the wild:**\n\n- [`haven::zap_label`](https://github.com/tidyverse/haven/blob/main/R/zap_label.R), which removes column labels\n- [`dplyr::mutate`](https://github.com/tidyverse/dplyr/blob/main/R/mutate.R)\n- [`tidyr::pivot_longer`](https://github.com/tidyverse/tidyr/blob/main/R/pivot-long.R)\n\n## Object styles\n\n## Inheritance\n\nThree ideas:\n\n1. Class is a vector of classes\n\n::: {.cell}\n\n```{.r .cell-code}\nclass(ordered(\"x\"))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"ordered\" \"factor\"\n```\n\n\n:::\n\n```{.r .cell-code}\nclass(Sys.time())\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"POSIXct\" \"POSIXt\"\n```\n\n\n:::\n:::\n\n2. Dispatch moves through class vector until it finds a defined method\n\n::: {.cell}\n\n```{.r .cell-code}\nsloop::s3_dispatch(print(ordered(\"x\")))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> print.ordered\n#> => print.factor\n#> * print.default\n```\n\n\n:::\n:::\n\n3. Method can delegate to another method via `NextMethod()`, which is indicated by `->` as below:\n\n::: {.cell}\n\n```{.r .cell-code}\nsloop::s3_dispatch(ordered(\"x\")[1])\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [.ordered\n#> => [.factor\n#> [.default\n#> -> [ (internal)\n```\n\n\n:::\n:::\n\n\n### `NextMethod()`\n\nConsider `secret` class that masks each character of the input with `x` in output\n\n\n::: {.cell}\n\n```{.r .cell-code}\nnew_secret <- function(x = double()) {\n stopifnot(is.double(x))\n structure(x, class = \"secret\")\n}\n\nprint.secret <- function(x, ...) {\n print(strrep(\"x\", nchar(x)))\n invisible(x)\n}\n\ny <- new_secret(c(15, 1, 456))\ny\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"xx\" \"x\" \"xxx\"\n```\n\n\n:::\n:::\n\n\nNotice that the `[` method is problematic in that it does not preserve the `secret` class. Additionally, it returns `15` as the first element instead of `xx`.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsloop::s3_dispatch(y[1])\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [.secret\n#> [.default\n#> => [ (internal)\n```\n\n\n:::\n\n```{.r .cell-code}\ny[1]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 15\n```\n\n\n:::\n:::\n\n\nFix this with a `[.secret` method:\n\nThe first fix (not run) is inefficient because it creates a copy of `y`.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# not run\n`[.secret` <- function(x, i) {\n x <- unclass(x)\n new_secret(x[i])\n}\n```\n:::\n\n\n`NextMethod()` is more efficient.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n`[.secret` <- function(x, i) {\n # first, dispatch to `[`\n # then, coerce subset value to `secret` class\n new_secret(NextMethod())\n}\n```\n:::\n\n\nNotice that `[.secret` is selected for dispatch, but that the method delegates to the internal `[`.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsloop::s3_dispatch(y[1])\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> => [.secret\n#> [.default\n#> -> [ (internal)\n```\n\n\n:::\n\n```{.r .cell-code}\ny[1]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"xx\"\n```\n\n\n:::\n:::\n\n\n\n### Allowing subclassing\n\nContinue the example above to have a `supersecret` subclass that hides even the number of characters in the input (e.g., `123` -> `xxxxx`, 12345678 -> `xxxxx`, 1 -> `xxxxx`).\n\nTo allow for this subclass, the constructor function needs to include two additional arguments:\n\n- `...` for passing an arbitrary set of arguments to different subclasses\n- `class` for defining the subclass\n\n\n::: {.cell}\n\n```{.r .cell-code}\nnew_secret <- function(x, ..., class = character()) {\n stopifnot(is.double(x))\n\n structure(\n x,\n ...,\n class = c(class, \"secret\")\n )\n}\n```\n:::\n\n\nTo create the subclass, simply invoke the parent class constructor inside of the subclass constructor:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nnew_supersecret <- function(x) {\n new_secret(x, class = \"supersecret\")\n}\n\nprint.supersecret <- function(x, ...) {\n print(rep(\"xxxxx\", length(x)))\n invisible(x)\n}\n```\n:::\n\n\nBut this means the subclass inherits all parent methods and needs to overwrite all parent methods with subclass methods that return the sublclass rather than the parent class.\n\nThere's no easy solution to this problem in base R.\n\nThere is a solution in the vectors package: `vctrs::vec_restore()`\n\n<!-- TODO: read docs/vignettes to be able to summarize how this works -->\n", + "supporting": [ + "13_files" + ], "filters": [ "rmarkdown/pagebreak.lua" ], diff --git a/_freeze/slides/14/execute-results/html.json b/_freeze/slides/14/execute-results/html.json @@ -1,9 +1,11 @@ { - "hash": "f9c4d7a7e938351bf82fa64599b981e9", + "hash": "f015c42b2d4738df55458227dca78b47", "result": { "engine": "knitr", - "markdown": "---\nengine: knitr\ntitle: R6\n---\n\n## Learning objectives:\n\n\n\n- Discuss how to construct a R6 class.\n- Overview the different mechanisms of a R6 class (e.g. initialization, print, public, private, and active fields and methods).\n- Observe various examples using R6's mechanisms to create R6 classes, objects, fields, and methods.\n- Observe the consequences of R6's reference semantics.\n- Review the book's arguments on the use of R6 over reference classes.\n\n## A review of OOP\n\n\n\n* **A PIE**\n\n## Introducing R6 \n\n\n\n* R6 classes are not built into base.\n * It is a separate [package](https://r6.r-lib.org/).\n * You have to install and attach to use.\n * If R6 objects are used in a package, it needs to be specified as a dependency in the `DESCRIPTION` file.\n\n\n::: {.cell}\n\n```{.r .cell-code}\ninstall.packages(\"R6\")\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(R6)\n```\n:::\n\n\n* R6 classes have two special properties:\n 1. Uses an encapsulated OOP paradigm.\n * Methods belong to objects, not generics.\n * They follow the form `object$method()` for calling fields and methods.\n 2. R6 objects are mutable.\n * Modified in place.\n * They follow reference semantics.\n* R6 is similar to OOP in other languages.\n* However, its use can lead ton non-idiomatic R code.\n * Tradeoffs - follows an OOP paradigm but sacrafice what users are use to. \n * [Microsoft365R](https://github.com/Azure/Microsoft365R).\n\n## Constructing an R6 class, the basics\n\n* Really simple to do, just use the `R6::R6Class()` function.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nAccumulator <- R6Class(\"Accumulator\", list(\n sum = 0,\n add = function(x = 1) {\n self$sum <- self$sum + x\n invisible(self)\n }\n))\n```\n:::\n\n\n* Two important arguments:\n 1. `classname` - A string used to name the class (not needed but suggested)\n 2. `public` - A list of methods (functions) and fields (anything else)\n* Suggested style conventions to follow:\n * Class name should follow `UpperCamelCase`.\n * Methods and fields should use `snake_case`.\n * Always assign the result of a `R6Class()` into a variable with the same name as the class.\n* You can use `self$` to access methods and fields of the current object.\n\n## Constructing an R6 object\n\n* Just use `$new()`\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- Accumulator$new()\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx$add(4)\nx$sum\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 4\n```\n\n\n:::\n:::\n\n\n## R6 objects and method chaining\n\n* All side-effect R6 methods should return `self` invisibly.\n* This allows for method chaining.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx$add(10)$add(10)$sum\n# [1] 24\n```\n:::\n\n\n* To improve readability:\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Method chaining\nx$\n add(10)$\n add(10)$\n sum\n# [1] 44\n```\n:::\n\n\n## R6 useful methods\n\n* `$print()` - Modifies the default printing method.\n * `$print()` should always return `invisible(self)`.\n* `$initialize()` - Overides the default behaviour of `$new()`.\n * Also provides a space to validate inputs.\n\n## Constructing a bank account class\n\n\n::: {.cell}\n\n```{.r .cell-code}\nBankAccount <- R6Class(\"BankAccount\", list(\n owner = NULL,\n type = NULL,\n balance = 0,\n initialize = function(owner, type) {\n stopifnot(is.character(owner), length(owner) == 1)\n stopifnot(is.character(type), length(type) == 1)\n },\n deposit = function(amount) {\n self$balance <- self$balance + amount\n invisible(self)\n },\n withdraw = function(amount) {\n self$balance <- self$balance - amount\n invisible(self)\n }\n))\n```\n:::\n\n\n## Simple transactions\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncollinsavings <- BankAccount$new(\"Collin\", type = \"Savings\")\ncollinsavings$deposit(10)\ncollinsavings\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> <BankAccount>\n#> Public:\n#> balance: 10\n#> clone: function (deep = FALSE) \n#> deposit: function (amount) \n#> initialize: function (owner, type) \n#> owner: NULL\n#> type: NULL\n#> withdraw: function (amount)\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncollinsavings$withdraw(10)\ncollinsavings\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> <BankAccount>\n#> Public:\n#> balance: 0\n#> clone: function (deep = FALSE) \n#> deposit: function (amount) \n#> initialize: function (owner, type) \n#> owner: NULL\n#> type: NULL\n#> withdraw: function (amount)\n```\n\n\n:::\n:::\n\n\n## Modifying the `$print()` method \n\n\n::: {.cell}\n\n```{.r .cell-code}\nBankAccount <- R6Class(\"BankAccount\", list(\n owner = NULL,\n type = NULL,\n balance = 0,\n initialize = function(owner, type) {\n stopifnot(is.character(owner), length(owner) == 1)\n stopifnot(is.character(type), length(type) == 1)\n\n self$owner <- owner\n self$type <- type\n },\n deposit = function(amount) {\n self$balance <- self$balance + amount\n invisible(self)\n },\n withdraw = function(amount) {\n self$balance <- self$balance - amount\n invisible(self)\n },\n print = function(...) {\n cat(\"Account owner: \", self$owner, \"\\n\", sep = \"\")\n cat(\"Account type: \", self$type, \"\\n\", sep = \"\")\n cat(\" Balance: \", self$balance, \"\\n\", sep = \"\")\n invisible(self)\n }\n))\n```\n:::\n\n\n* Important point: Methods are bound to individual objects.\n * Reference semantics vs. copy-on-modify.\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncollinsavings\n\nhadleychecking <- BankAccount$new(\"Hadley\", type = \"Checking\")\n\nhadleychecking\n```\n:::\n\n\n## How does this work? \n\n* [Winston Chang's 2017 useR talk](https://www.youtube.com/watch?v=3GEFd8rZQgY&list=WL&index=11)\n\n* [R6 objects are just environments with a particular structure.](https://youtu.be/3GEFd8rZQgY?t=759)\n \n\n\n## Adding methods after class creation\n\n* Use `$set()` to add methods after creation.\n* Keep in mind methods added with `$set()` are only available with new objects.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nAccumulator <- R6Class(\"Accumulator\")\nAccumlator$set(\"public\", \"sum\", 0)\nAccumulator$set(\"public\", \"add\", function(x = 1) {\n self$sum <- self$sum + x\n invisible(self)\n})\n```\n:::\n\n\n## Inheritance\n\n* To inherit behaviour from an existing class, provide the class object via the `inherit` argument.\n* This example also provides a good example on how to [debug]() an R6 class.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nBankAccountOverDraft <- R6Class(\"BankAccountOverDraft\",\n inherit = BankAccount,\n public = list(\n withdraw = function(amount) {\n if ((self$balance - amount) < 0) {\n stop(\"Overdraft\")\n }\n # self$balance() <- self$withdraw()\n self$balance <- self$balance - amount\n invisible(self)\n }\n )\n)\n```\n:::\n\n\n### Future instances debugging\n\n\n::: {.cell}\n\n```{.r .cell-code}\nBankAccountOverDraft$debug(\"withdraw\")\nx <- BankAccountOverDraft$new(\"x\", type = \"Savings\")\nx$withdraw(20)\n\n# Turn debugging off\nBankAccountOverDraft$undebug(\"withdraw\")\n```\n:::\n\n\n### Individual object debugging\n\n* Use the `debug()` function.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- BankAccountOverDraft$new(\"x\", type = \"Savings\")\n# Turn on debugging\ndebug(x$withdraw)\nx$withdraw(10)\n\n# Turn off debugging\nundebug(x$withdraw)\nx$withdraw(5)\n```\n:::\n\n\n### Test out our debugged class\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncollinsavings <- BankAccountOverDraft$new(\"Collin\", type = \"Savings\")\ncollinsavings\ncollinsavings$withdraw(10)\ncollinsavings\ncollinsavings$deposit(5)\ncollinsavings\ncollinsavings$withdraw(5)\n```\n:::\n\n\n## Introspection\n\n* Every R6 object has an S3 class that reflects its hierarchy of R6 classes.\n* Use the `class()` function to determine class (and all classes it inherits from).\n\n\n::: {.cell}\n\n```{.r .cell-code}\nclass(collinsavings)\n```\n:::\n\n\n* You can also list all methods and fields of an R6 object with `names()`.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nnames(collinsavings)\n```\n:::\n\n\n## Controlling access\n\n* R6 provides two other arguments:\n * `private` - create fields and methods only available from within the class.\n * `active` - allows you to use accessor functions to define dynamic or active fields.\n\n## Privacy\n\n* Private fields and methods - elements that can only be accessed from within the class, not from the outside.\n* We need to know two things to use private elements:\n 1. `private`'s interface is just like `public`'s interface.\n * List of methods (functions) and fields (everything else).\n 2. You use `private$` instead of `self$`\n * You cannot access private fields or methods outside of the class.\n* Why might you want to keep your methods and fields private?\n * You'll want to be clear what is ok for others to access, especially if you have a complex system of classes.\n * It's easier to refactor private fields and methods, as you know others are not relying on it.\n\n## Active fields\n\n* Active fields allow you to define components that look like fields from the outside, but are defined with functions, like methods.\n* Implemented using active bindings.\n* Each active binding is a function that takes a single argument `value`.\n* Great when used in conjunction with private fields.\n * This allows for additional checks.\n * For example, we can use them to make a read-only field and to validate inputs.\n\n## Adding a read-only bank account number\n\n\n::: {.cell}\n\n```{.r .cell-code}\nBankAccount <- R6Class(\"BankAccount\", public = list(\n owner = NULL,\n type = NULL,\n balance = 0,\n initialize = function(owner, type, acct_num = NULL) {\n private$acct_num <- acct_num\n self$owner <- owner\n self$type <- type\n },\n deposit = function(amount) {\n self$balance <- self$balance + amount\n invisible(self)\n },\n withdraw = function(amount) {\n self$balance <- self$balance - amount\n invisible(self)\n },\n print = function(...) {\n cat(\"Account owner: \", self$owner, \"\\n\", sep = \"\")\n cat(\"Account type: \", self$type, \"\\n\", sep = \"\")\n cat(\"Account #: \", private$acct_num, \"\\n\", sep = \"\")\n cat(\" Balance: \", self$balance, \"\\n\", sep = \"\")\n invisible(self)\n }\n ),\n private = list(\n acct_num = NULL\n ),\n active = list(\n create_acct_num = function(value) {\n if (is.null(private$acct_num)) {\n private$acct_num <- ids::uuid()\n } else {\n stop(\"`$acct_num` already assigned\")\n }\n }\n )\n)\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncollinsavings <- BankAccount$new(\"Collin\", type = \"Savings\")\ncollinsavings$create_acct_num\n# Stops because account number is assigned\ncollinsavings$create_acct_num()\ncollinsavings$print()\n```\n:::\n\n\n## How does an active field work?\n\n* Not sold on this, as I don't know if `active` gets its own environment. \n * Any ideas?\n\n\n\n## Reference semantics\n\n* Big difference to note about R6 objects in relation to other objects:\n * R6 objects have reference semantics.\n* The primary consequence of reference semantics is that objects are not copied when modified.\n* If you want to copy an R6 object, you need to use `$clone`.\n* There are some other less obvious consequences:\n * It's harder to reason about code that uses R6 objects, as you need more context.\n * Think about when an R6 object is deleted, you can use `$finalize()` to clean up after yourself.\n * If one of the fields is an R6 object, you must create it inside `$initialize()`, not `R6Class()`\n\n## R6 makes it harder to reason about code\n\n* Reference semantics makes code harder to reason about.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- list(a = 1)\ny <- list(b = 2)\n\n# Here we know the final line only modifies z\nz <- f(x, y)\n\n# vs.\n\nx <- List$new(a = 1)\ny <- List$new(b = 2)\n\n# If x or y is a method, we don't know if it modifies\n# something other than z. Is this a limitation of\n# abstraction?\nz <- f(x, y)\n```\n:::\n\n\n* I understand the basics, but not necessarily the tradeoffs.\n * Anyone care to fill me in?\n * Is this a limitation of abstraction?\n\n## Better sense of what's going on by looking at a finalizer\n\n* Since R6 objects are not copied-on-modified, so they are only deleted once.\n* We can use this characteristic to complement our `$initialize()` with a `$finalize()` method.\n * i.e., to clean up after we delete an R6 object.\n * This could be a way to close a database connection.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nTemporaryFile <- R6Class(\"TemporaryFile\", list(\n path = NULL,\n initialize = function() {\n self$path <- tempfile()\n },\n finalize = function() {\n message(\"Cleaning up \", self$path)\n unlink(self$path)\n }\n))\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntf <- TemporaryFile$new()\n# The finalizer will clean up, once the R6 object is deleted.\nrm(tf)\n```\n:::\n\n\n## Consequences of R6 fields\n\n* If you use an R6 class as the default value of a field, it will be shared across all instances of the object.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nTemporaryDatabase <- R6Class(\"TemporaryDatabase\", list(\n con = NULL,\n file = TemporaryFile$new(),\n initialize = function() {\n self$con <- DBI::dbConnect(RSQLite::SQLite(), path = file$path)\n },\n finalize = function() {\n DBI::dbDisconnect(self$con)\n }\n))\n\ndb_a <- TemporaryDatabase$new()\ndb_b <- TemporaryDatabase$new()\n\ndb_a$file$path == db_b$file$path\n#> [1] TRUE\n```\n:::\n\n\n* To fix this, we need to move the class method call to `$intialize()`\n\n\n::: {.cell}\n\n```{.r .cell-code}\nTemporaryDatabase <- R6Class(\"TemporaryDatabase\", list(\n con = NULL,\n file = NULL,\n initialize = function() {\n self$file <- TemporaryFile$new()\n self$con <- DBI::dbConnect(RSQLite::SQLite(), path = file$path)\n },\n finalize = function() {\n DBI::dbDisconnect(self$con)\n }\n))\n\ndb_a <- TemporaryDatabase$new()\ndb_b <- TemporaryDatabase$new()\n\ndb_a$file$path == db_b$file$path\n#> [1] FALSE\n```\n:::\n\n\n## Why use R6?\n\n* Book mentions R6 is similar to the built-in reference classes.\n* Then why use R6?\n* R6 is simpler. \n * RC requires you to understand S4.\n* [Comprehensive documentation](https://r6.r-lib.org/articles/Introduction.html).\n* Simpler mechanisms for cross-package subclassing, which just works.\n* R6 separates public and private fields in separate environments, RC stacks everything in the same environment. \n* [R6 is faster](https://r6.r-lib.org/articles/Performance.html).\n* RC is tied to R, so any bug fixes need a newer version of R.\n * This is especially important if you're writing packages that need to work with multiple R versions.\n* R6 and RC are similar, so if you need RC, it will only require a small amount of additional effort to learn RC.\n\n## Meeting Videos\n\n### Cohort 1\n\n<iframe src=\"https://www.youtube.com/embed/hPjaOdprgow\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 2\n\n<iframe src=\"https://www.youtube.com/embed/LVkDJ28XJUE\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 3\n\n<iframe src=\"https://www.youtube.com/embed/rCjQTbQ22qc\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n<iframe src=\"https://www.youtube.com/embed/ii6xhOzT_HQ\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 4\n\n<iframe src=\"https://www.youtube.com/embed/i_z6pHavhX0\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 5\n\n<iframe src=\"https://www.youtube.com/embed/NXmlqK2LxWw\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 6\n\n<iframe src=\"https://www.youtube.com/embed/EuTubeJ1VUw\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n<details>\n<summary> Meeting chat log </summary>\n\n```\n00:11:34\tTrevin:\thttps://engineering-shiny.org/common-app-caveats.html?q=R6#using-r6-as-data-storage\n00:39:36\tFederica Gazzelloni:\tnew R7: https://rconsortium.github.io/OOP-WG/\n00:40:04\tFederica Gazzelloni:\tR7 designed to be a successor to S3 and S4\n00:40:40\tFederica Gazzelloni:\tR6: https://r6.r-lib.org/articles/Introduction.html\n00:52:44\tTrevin:\thttps://advanced-r-solutions.rbind.io/r6.html#controlling-access\n01:00:34\tFederica Gazzelloni:\tinteresting: https://r-craft.org/r-news/object-oriented-programming-oop-in-r-with-r6-the-complete-guide/\n01:01:58\tTrevin:\thttps://hadley.shinyapps.io/cran-downloads/\n01:02:33\tOluwafemi Oyedele:\tThank you !!!\n```\n</details>\n\n### Cohort 7\n\n<iframe src=\"https://www.youtube.com/embed/Q4FA0BB_PeY\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n<details>\n\n<summary>Meeting chat log</summary>\n```\n00:06:57\tRon Legere:\thttps://arxiv.org/abs/2303.12712\n00:07:07\tRon Legere:\t^^ shows some of the power and limitations\n00:39:41\tcollinberke:\thttps://www.youtube.com/watch?v=3GEFd8rZQgY&list=WL&index=11\n00:49:20\tiPhone:\tSorry fellas need to jump early. See you next week!\n01:05:21\tcollinberke:\thttps://github.com/r4ds/bookclub-advr\n01:09:30\tRon Legere:\tmakeActiveBinding\n```\n</details>\n", - "supporting": [], + "markdown": "---\nengine: knitr\ntitle: R6\n---\n\n## Learning objectives:\n\n\n\n- Discuss how to construct a R6 class.\n- Overview the different mechanisms of a R6 class (e.g. initialization, print, public, private, and active fields and methods).\n- Observe various examples using R6's mechanisms to create R6 classes, objects, fields, and methods.\n- Observe the consequences of R6's reference semantics.\n- Review the book's arguments on the use of R6 over reference classes.\n\n## A review of OOP\n\n\n\n* **A PIE**\n\n## Introducing R6 \n\n\n\n* R6 classes are not built into base.\n * It is a separate [package](https://r6.r-lib.org/).\n * You have to install and attach to use.\n * If R6 objects are used in a package, it needs to be specified as a dependency in the `DESCRIPTION` file.\n\n\n::: {.cell}\n\n```{.r .cell-code}\ninstall.packages(\"R6\")\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(R6)\n```\n:::\n\n\n* R6 classes have two special properties:\n 1. Uses an encapsulated OOP paradigm.\n * Methods belong to objects, not generics.\n * They follow the form `object$method()` for calling fields and methods.\n 2. R6 objects are mutable.\n * Modified in place.\n * They follow reference semantics.\n* R6 is similar to OOP in other languages.\n* However, its use can lead ton non-idiomatic R code.\n * Tradeoffs - follows an OOP paradigm but sacrafice what users are use to. \n * [Microsoft365R](https://github.com/Azure/Microsoft365R).\n\n## Constructing an R6 class, the basics\n\n* Really simple to do, just use the `R6::R6Class()` function.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nAccumulator <- R6Class(\"Accumulator\", list(\n sum = 0,\n add = function(x = 1) {\n self$sum <- self$sum + x\n invisible(self)\n }\n))\n```\n:::\n\n\n* Two important arguments:\n 1. `classname` - A string used to name the class (not needed but suggested)\n 2. `public` - A list of methods (functions) and fields (anything else)\n* Suggested style conventions to follow:\n * Class name should follow `UpperCamelCase`.\n * Methods and fields should use `snake_case`.\n * Always assign the result of a `R6Class()` into a variable with the same name as the class.\n* You can use `self$` to access methods and fields of the current object.\n\n## Constructing an R6 object\n\n* Just use `$new()`\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- Accumulator$new()\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx$add(4)\nx$sum\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 4\n```\n\n\n:::\n:::\n\n\n## R6 objects and method chaining\n\n* All side-effect R6 methods should return `self` invisibly.\n* This allows for method chaining.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx$add(10)$add(10)$sum\n# [1] 24\n```\n:::\n\n\n* To improve readability:\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Method chaining\nx$\n add(10)$\n add(10)$\n sum\n# [1] 44\n```\n:::\n\n\n## R6 useful methods\n\n* `$print()` - Modifies the default printing method.\n * `$print()` should always return `invisible(self)`.\n* `$initialize()` - Overides the default behaviour of `$new()`.\n * Also provides a space to validate inputs.\n\n## Constructing a bank account class\n\n\n::: {.cell}\n\n```{.r .cell-code}\nBankAccount <- R6Class(\"BankAccount\", list(\n owner = NULL,\n type = NULL,\n balance = 0,\n initialize = function(owner, type) {\n stopifnot(is.character(owner), length(owner) == 1)\n stopifnot(is.character(type), length(type) == 1)\n },\n deposit = function(amount) {\n self$balance <- self$balance + amount\n invisible(self)\n },\n withdraw = function(amount) {\n self$balance <- self$balance - amount\n invisible(self)\n }\n))\n```\n:::\n\n\n## Simple transactions\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncollinsavings <- BankAccount$new(\"Collin\", type = \"Savings\")\ncollinsavings$deposit(10)\ncollinsavings\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> <BankAccount>\n#> Public:\n#> balance: 10\n#> clone: function (deep = FALSE) \n#> deposit: function (amount) \n#> initialize: function (owner, type) \n#> owner: NULL\n#> type: NULL\n#> withdraw: function (amount)\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncollinsavings$withdraw(10)\ncollinsavings\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> <BankAccount>\n#> Public:\n#> balance: 0\n#> clone: function (deep = FALSE) \n#> deposit: function (amount) \n#> initialize: function (owner, type) \n#> owner: NULL\n#> type: NULL\n#> withdraw: function (amount)\n```\n\n\n:::\n:::\n\n\n## Modifying the `$print()` method \n\n\n::: {.cell}\n\n```{.r .cell-code}\nBankAccount <- R6Class(\"BankAccount\", list(\n owner = NULL,\n type = NULL,\n balance = 0,\n initialize = function(owner, type) {\n stopifnot(is.character(owner), length(owner) == 1)\n stopifnot(is.character(type), length(type) == 1)\n\n self$owner <- owner\n self$type <- type\n },\n deposit = function(amount) {\n self$balance <- self$balance + amount\n invisible(self)\n },\n withdraw = function(amount) {\n self$balance <- self$balance - amount\n invisible(self)\n },\n print = function(...) {\n cat(\"Account owner: \", self$owner, \"\\n\", sep = \"\")\n cat(\"Account type: \", self$type, \"\\n\", sep = \"\")\n cat(\" Balance: \", self$balance, \"\\n\", sep = \"\")\n invisible(self)\n }\n))\n```\n:::\n\n\n* Important point: Methods are bound to individual objects.\n * Reference semantics vs. copy-on-modify.\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncollinsavings\n\nhadleychecking <- BankAccount$new(\"Hadley\", type = \"Checking\")\n\nhadleychecking\n```\n:::\n\n\n## How does this work? \n\n* [Winston Chang's 2017 useR talk](https://www.youtube.com/watch?v=3GEFd8rZQgY&list=WL&index=11)\n\n* [R6 objects are just environments with a particular structure.](https://youtu.be/3GEFd8rZQgY?t=759)\n \n\n\n## Adding methods after class creation\n\n* Use `$set()` to add methods after creation.\n* Keep in mind methods added with `$set()` are only available with new objects.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nAccumulator <- R6Class(\"Accumulator\")\nAccumlator$set(\"public\", \"sum\", 0)\nAccumulator$set(\"public\", \"add\", function(x = 1) {\n self$sum <- self$sum + x\n invisible(self)\n})\n```\n:::\n\n\n## Inheritance\n\n* To inherit behaviour from an existing class, provide the class object via the `inherit` argument.\n* This example also provides a good example on how to [debug]() an R6 class.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nBankAccountOverDraft <- R6Class(\"BankAccountOverDraft\",\n inherit = BankAccount,\n public = list(\n withdraw = function(amount) {\n if ((self$balance - amount) < 0) {\n stop(\"Overdraft\")\n }\n # self$balance() <- self$withdraw()\n self$balance <- self$balance - amount\n invisible(self)\n }\n )\n)\n```\n:::\n\n\n### Future instances debugging\n\n\n::: {.cell}\n\n```{.r .cell-code}\nBankAccountOverDraft$debug(\"withdraw\")\nx <- BankAccountOverDraft$new(\"x\", type = \"Savings\")\nx$withdraw(20)\n\n# Turn debugging off\nBankAccountOverDraft$undebug(\"withdraw\")\n```\n:::\n\n\n### Individual object debugging\n\n* Use the `debug()` function.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- BankAccountOverDraft$new(\"x\", type = \"Savings\")\n# Turn on debugging\ndebug(x$withdraw)\nx$withdraw(10)\n\n# Turn off debugging\nundebug(x$withdraw)\nx$withdraw(5)\n```\n:::\n\n\n### Test out our debugged class\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncollinsavings <- BankAccountOverDraft$new(\"Collin\", type = \"Savings\")\ncollinsavings\ncollinsavings$withdraw(10)\ncollinsavings\ncollinsavings$deposit(5)\ncollinsavings\ncollinsavings$withdraw(5)\n```\n:::\n\n\n## Introspection\n\n* Every R6 object has an S3 class that reflects its hierarchy of R6 classes.\n* Use the `class()` function to determine class (and all classes it inherits from).\n\n\n::: {.cell}\n\n```{.r .cell-code}\nclass(collinsavings)\n```\n:::\n\n\n* You can also list all methods and fields of an R6 object with `names()`.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nnames(collinsavings)\n```\n:::\n\n\n## Controlling access\n\n* R6 provides two other arguments:\n * `private` - create fields and methods only available from within the class.\n * `active` - allows you to use accessor functions to define dynamic or active fields.\n\n## Privacy\n\n* Private fields and methods - elements that can only be accessed from within the class, not from the outside.\n* We need to know two things to use private elements:\n 1. `private`'s interface is just like `public`'s interface.\n * List of methods (functions) and fields (everything else).\n 2. You use `private$` instead of `self$`\n * You cannot access private fields or methods outside of the class.\n* Why might you want to keep your methods and fields private?\n * You'll want to be clear what is ok for others to access, especially if you have a complex system of classes.\n * It's easier to refactor private fields and methods, as you know others are not relying on it.\n\n## Active fields\n\n* Active fields allow you to define components that look like fields from the outside, but are defined with functions, like methods.\n* Implemented using active bindings.\n* Each active binding is a function that takes a single argument `value`.\n* Great when used in conjunction with private fields.\n * This allows for additional checks.\n * For example, we can use them to make a read-only field and to validate inputs.\n\n## Adding a read-only bank account number\n\n\n::: {.cell}\n\n```{.r .cell-code}\nBankAccount <- R6Class(\"BankAccount\", public = list(\n owner = NULL,\n type = NULL,\n balance = 0,\n initialize = function(owner, type, acct_num = NULL) {\n private$acct_num <- acct_num\n self$owner <- owner\n self$type <- type\n },\n deposit = function(amount) {\n self$balance <- self$balance + amount\n invisible(self)\n },\n withdraw = function(amount) {\n self$balance <- self$balance - amount\n invisible(self)\n },\n print = function(...) {\n cat(\"Account owner: \", self$owner, \"\\n\", sep = \"\")\n cat(\"Account type: \", self$type, \"\\n\", sep = \"\")\n cat(\"Account #: \", private$acct_num, \"\\n\", sep = \"\")\n cat(\" Balance: \", self$balance, \"\\n\", sep = \"\")\n invisible(self)\n }\n ),\n private = list(\n acct_num = NULL\n ),\n active = list(\n create_acct_num = function(value) {\n if (is.null(private$acct_num)) {\n private$acct_num <- ids::uuid()\n } else {\n stop(\"`$acct_num` already assigned\")\n }\n }\n )\n)\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncollinsavings <- BankAccount$new(\"Collin\", type = \"Savings\")\ncollinsavings$create_acct_num\n# Stops because account number is assigned\ncollinsavings$create_acct_num()\ncollinsavings$print()\n```\n:::\n\n\n## How does an active field work?\n\n* Not sold on this, as I don't know if `active` gets its own environment. \n * Any ideas?\n\n\n\n## Reference semantics\n\n* Big difference to note about R6 objects in relation to other objects:\n * R6 objects have reference semantics.\n* The primary consequence of reference semantics is that objects are not copied when modified.\n* If you want to copy an R6 object, you need to use `$clone`.\n* There are some other less obvious consequences:\n * It's harder to reason about code that uses R6 objects, as you need more context.\n * Think about when an R6 object is deleted, you can use `$finalize()` to clean up after yourself.\n * If one of the fields is an R6 object, you must create it inside `$initialize()`, not `R6Class()`\n\n## R6 makes it harder to reason about code\n\n* Reference semantics makes code harder to reason about.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- list(a = 1)\ny <- list(b = 2)\n\n# Here we know the final line only modifies z\nz <- f(x, y)\n\n# vs.\n\nx <- List$new(a = 1)\ny <- List$new(b = 2)\n\n# If x or y is a method, we don't know if it modifies\n# something other than z. Is this a limitation of\n# abstraction?\nz <- f(x, y)\n```\n:::\n\n\n* I understand the basics, but not necessarily the tradeoffs.\n * Anyone care to fill me in?\n * Is this a limitation of abstraction?\n\n## Better sense of what's going on by looking at a finalizer\n\n* Since R6 objects are not copied-on-modified, so they are only deleted once.\n* We can use this characteristic to complement our `$initialize()` with a `$finalize()` method.\n * i.e., to clean up after we delete an R6 object.\n * This could be a way to close a database connection.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nTemporaryFile <- R6Class(\"TemporaryFile\", list(\n path = NULL,\n initialize = function() {\n self$path <- tempfile()\n },\n finalize = function() {\n message(\"Cleaning up \", self$path)\n unlink(self$path)\n }\n))\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntf <- TemporaryFile$new()\n# The finalizer will clean up, once the R6 object is deleted.\nrm(tf)\n```\n:::\n\n\n## Consequences of R6 fields\n\n* If you use an R6 class as the default value of a field, it will be shared across all instances of the object.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nTemporaryDatabase <- R6Class(\"TemporaryDatabase\", list(\n con = NULL,\n file = TemporaryFile$new(),\n initialize = function() {\n self$con <- DBI::dbConnect(RSQLite::SQLite(), path = file$path)\n },\n finalize = function() {\n DBI::dbDisconnect(self$con)\n }\n))\n\ndb_a <- TemporaryDatabase$new()\ndb_b <- TemporaryDatabase$new()\n\ndb_a$file$path == db_b$file$path\n#> [1] TRUE\n```\n:::\n\n\n* To fix this, we need to move the class method call to `$intialize()`\n\n\n::: {.cell}\n\n```{.r .cell-code}\nTemporaryDatabase <- R6Class(\"TemporaryDatabase\", list(\n con = NULL,\n file = NULL,\n initialize = function() {\n self$file <- TemporaryFile$new()\n self$con <- DBI::dbConnect(RSQLite::SQLite(), path = file$path)\n },\n finalize = function() {\n DBI::dbDisconnect(self$con)\n }\n))\n\ndb_a <- TemporaryDatabase$new()\ndb_b <- TemporaryDatabase$new()\n\ndb_a$file$path == db_b$file$path\n#> [1] FALSE\n```\n:::\n\n\n## Why use R6?\n\n* Book mentions R6 is similar to the built-in reference classes.\n* Then why use R6?\n* R6 is simpler. \n * RC requires you to understand S4.\n* [Comprehensive documentation](https://r6.r-lib.org/articles/Introduction.html).\n* Simpler mechanisms for cross-package subclassing, which just works.\n* R6 separates public and private fields in separate environments, RC stacks everything in the same environment. \n* [R6 is faster](https://r6.r-lib.org/articles/Performance.html).\n* RC is tied to R, so any bug fixes need a newer version of R.\n * This is especially important if you're writing packages that need to work with multiple R versions.\n* R6 and RC are similar, so if you need RC, it will only require a small amount of additional effort to learn RC.\n", + "supporting": [ + "14_files" + ], "filters": [ "rmarkdown/pagebreak.lua" ], diff --git a/_freeze/slides/15/execute-results/html.json b/_freeze/slides/15/execute-results/html.json @@ -1,9 +1,11 @@ { - "hash": "43b26ede4f7107301b64a506a25f3778", + "hash": "21ef572cb9871c3d1d642b297469a824", "result": { "engine": "knitr", - "markdown": "---\nengine: knitr\ntitle: S4\n---\n\n## Introduction\n\nObject consists of:\n\n- Slots. Like fields in R6.\n- Methods. Accessed through generics. Dispatched to particular methods.\n\nUses functions to define classes and their methods:\n\n- `setClass()`. Define class and its components. \n- `setGenerics()`. Define generic functions. Used to dispatch.\n- `setMethods()`. Define methods\n\n## Basics overview\n\n### Set class\n\nDefine the class:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsetClass(\"Person\", \n slots = c(\n name = \"character\", \n age = \"numeric\"\n )\n)\n```\n:::\n\n\nCreate an instance of the class\n\n\n::: {.cell}\n\n```{.r .cell-code}\njohn <- new(\"Person\", name = \"John Smith\", age = NA_real_)\n```\n:::\n\n\n\n### Set generics\n\nDefine generic functions for setting and getting the age slot\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# get the value\nsetGeneric(\"age\", function(x) standardGeneric(\"age\"))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"age\"\n```\n\n\n:::\n\n```{.r .cell-code}\n# set the value\nsetGeneric(\"age<-\", function(x, value) standardGeneric(\"age<-\"))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"age<-\"\n```\n\n\n:::\n:::\n\n\n### Set methods\n\nDefine methods for the generics:\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# get the value\nsetMethod(\"age\", \"Person\", function(x) x@age)\n# set the value\nsetMethod(\"age<-\", \"Person\", function(x, value) {\n x@age <- value\n x\n})\n\n# set the value\nage(john) <- 50\n# get the value\nage(john)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 50\n```\n\n\n:::\n:::\n\n\nTo give a flavor, there is only one method per slot. In more realistic cases, there might be several methods.\n\n## Details on defining the class\n\n### Inheritance\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsetClass(\"Employee\", \n contains = \"Person\", \n slots = c(\n boss = \"Person\"\n ),\n prototype = list(\n boss = new(\"Person\")\n )\n)\n```\n:::\n\n\n### Instantiation\n\nCreate an instance of the class at two levels:\n\n- For developer (you): `methods::new()`\n- For user: constructor function\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# how user constructs an instance\nPerson <- function(name, age = NA) {\n age <- as.double(age)\n \n # how the developer constructs an instance\n new(\"Person\", name = name, age = age)\n}\n\nPerson(\"Someone\")\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> An object of class \"Person\"\n#> Slot \"name\":\n#> [1] \"Someone\"\n#> \n#> Slot \"age\":\n#> [1] NA\n```\n\n\n:::\n:::\n\n\n### Validation\n\nS4 objects\n\n- Check class of slot at creation\n\n::: {.cell}\n\n```{.r .cell-code}\nPerson(mtcars)\n```\n\n::: {.cell-output .cell-output-error}\n\n```\n#> Error in validObject(.Object): invalid class \"Person\" object: invalid object for slot \"name\" in class \"Person\": got class \"data.frame\", should be or extend class \"character\"\n```\n\n\n:::\n:::\n\n\n- Do **not** check other things\n\n::: {.cell}\n\n```{.r .cell-code}\nPerson(\"Hadley\", age = c(30, 37))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> An object of class \"Person\"\n#> Slot \"name\":\n#> [1] \"Hadley\"\n#> \n#> Slot \"age\":\n#> [1] 30 37\n```\n\n\n:::\n:::\n\n\nThat's where validation comes in--at two stages:\n\n1. At creation\n2. At modification\n\n#### At creation\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsetValidity(\"Person\", function(object) {\n if (length(object@name) != length(object@age)) {\n \"@name and @age must be same length\"\n } else {\n TRUE\n }\n})\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> Class \"Person\" [in \".GlobalEnv\"]\n#> \n#> Slots:\n#> \n#> Name: name age\n#> Class: character numeric\n```\n\n\n:::\n\n```{.r .cell-code}\nPerson(\"Hadley\", age = c(30, 37))\n```\n\n::: {.cell-output .cell-output-error}\n\n```\n#> Error in validObject(.Object): invalid class \"Person\" object: @name and @age must be same length\n```\n\n\n:::\n:::\n\n\n#### At modification\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# get value\nsetGeneric(\"name\", function(x) standardGeneric(\"name\"))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"name\"\n```\n\n\n:::\n\n```{.r .cell-code}\nsetMethod(\"name\", \"Person\", function(x) x@name)\n\n# set value--and assess whether resulting object is valid\nsetGeneric(\"name<-\", function(x, value) standardGeneric(\"name<-\"))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"name<-\"\n```\n\n\n:::\n\n```{.r .cell-code}\nsetMethod(\"name<-\", \"Person\", function(x, value) {\n x@name <- value\n validObject(x)\n x\n})\n\n# normal name; no problem\nname(john) <- \"Jon Smythe\"\nname(john)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"Jon Smythe\"\n```\n\n\n:::\n\n```{.r .cell-code}\n# invalid name; error thrown\nname(john) <- letters\n```\n\n::: {.cell-output .cell-output-error}\n\n```\n#> Error in validObject(x): invalid class \"Person\" object: @name and @age must be same length\n```\n\n\n:::\n:::\n\n\n\n## Details on generics and methods\n\n### Dictate dispatch via signature\n\nSpecify function arguments to be used in determining method.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsetGeneric(\"myGeneric\", \n function(x, ..., verbose = TRUE) standardGeneric(\"myGeneric\"),\n signature = \"x\"\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"myGeneric\"\n```\n\n\n:::\n:::\n\n\n### Define generics\n\nGeneral form:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsetMethod(\"myGeneric\", \"Person\", function(x) {\n # method implementation\n})\n```\n:::\n\n\nExample to print object:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsetMethod(\"show\", \"Person\", function(object) {\n cat(is(object)[[1]], \"\\n\",\n \" Name: \", object@name, \"\\n\",\n \" Age: \", object@age, \"\\n\",\n sep = \"\"\n )\n})\njohn\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> Person\n#> Name: Jon Smythe\n#> Age: 50\n```\n\n\n:::\n:::\n\n\nExample to access slot:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsetGeneric(\"name\", function(x) standardGeneric(\"name\"))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"name\"\n```\n\n\n:::\n\n```{.r .cell-code}\nsetMethod(\"name\", \"Person\", function(x) x@name)\n\nname(john)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"Jon Smythe\"\n```\n\n\n:::\n:::\n\n\n\nThis is how end users should access slots.\n\n## Example: `lubridate::period()`\n\n### Define the class\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsetClass(\"Period\",\n # inherits from these classes\n contains = c(\"Timespan\", \"numeric\"),\n # has slots for time components\n slots = c(\n year = \"numeric\", \n month = \"numeric\", \n day = \"numeric\",\n hour = \"numeric\", \n minute = \"numeric\"\n ),\n # defines prototype as period of zero duration for all slots\n prototype = prototype(year = 0, month = 0, day = 0, hour = 0, minute = 0),\n # check validity with `check_period` function; see section below\n validity = check_period\n)\n```\n:::\n\n\nSee source code [here](https://github.com/tidyverse/lubridate/blob/0bb49b21c88736240219dc67e7ed0eb3df15d9b1/R/periods.r#L90)\n\n### Validate object\n\nCheck whether object is valid--notably if all arugments have the same length and are integers.\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncheck_period <- function(object) {\n # start with an empty vector of error messages\n errors <- character()\n\n # check length of object's data\n length(object@.Data) -> n\n # check length of each slot\n lengths <- c(\n length(object@year), \n length(object@month),\n length(object@day), \n length(object@hour), \n length(object@minute)\n )\n\n # if length of any slot is different than overall length, compose error message\n if (any(lengths != n)) {\n msg <- paste(\"Inconsistent lengths: year = \", lengths[1],\n \", month = \", lengths[2],\n \", day = \", lengths[3],\n \", hour = \", lengths[4],\n \", minute = \", lengths[5],\n \", second = \", n,\n sep = \"\"\n )\n # add just-composed error to vector of error messages\n errors <- c(errors, msg)\n }\n\n values <- c(object@year, object@month, object@day, object@hour, object@minute)\n values <- na.omit(values)\n if (sum(values - trunc(values))) {\n msg <- \"periods must have integer values\"\n errors <- c(errors, msg)\n }\n\n if (length(errors) == 0) {\n TRUE\n } else {\n errors\n }\n}\n```\n:::\n\n\n\nSee source code [here](https://github.com/tidyverse/lubridate/blob/0bb49b21c88736240219dc67e7ed0eb3df15d9b1/R/periods.r#L6).\n\n### Set methods\n\nShow period:\n\n\n::: {.cell}\n\n```{.r .cell-code}\n#' @export\nsetMethod(\"show\", signature(object = \"Period\"), function(object) {\n if (length(object@.Data) == 0) {\n cat(\"<Period[0]>\\n\")\n } else {\n print(format(object))\n }\n})\n\n#' @export\nformat.Period <- function(x, ...) {\n if (length(x) == 0) {\n return(character())\n }\n\n show <- paste(\n x@year, \"y \", x@month, \"m \", x@day, \"d \",\n x@hour, \"H \", x@minute, \"M \", x@.Data, \"S\",\n sep = \"\"\n )\n start <- regexpr(\"[-1-9]|(0\\\\.)\", show)\n show <- ifelse(start > 0, substr(show, start, nchar(show)), \"0S\")\n\n show[is.na(x)] <- NA\n show\n}\n```\n:::\n\n\nSee source code [here](https://github.com/tidyverse/lubridate/blob/0bb49b21c88736240219dc67e7ed0eb3df15d9b1/R/periods.r#L195)\n\n## Meeting Videos\n\n### Cohort 1\n\n<iframe src=\"https://www.youtube.com/embed/a1jzpWiksyA\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 2\n\n<iframe src=\"https://www.youtube.com/embed/bzo37PHCM1I\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 3\n\n<iframe src=\"https://www.youtube.com/embed/WWnJ5Cl-aTE\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n<iframe src=\"https://www.youtube.com/embed/_byYFTQHp1Y\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 4\n\n<iframe src=\"https://www.youtube.com/embed/M8Poajmj-HU\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 5\n\n<iframe src=\"https://www.youtube.com/embed/unNfE1fDFEY\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 6\n\n<iframe src=\"https://www.youtube.com/embed/q1-QUFJsbLA\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 7\n\n<iframe src=\"https://www.youtube.com/embed/puvaJtv9gQw\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n<details>\n\n<summary>Meeting chat log</summary>\n```\n01:09:37\tRon Legere:\thttps://en.wikipedia.org/wiki/Composition_over_inheritance\n```\n</details>\n", - "supporting": [], + "markdown": "---\nengine: knitr\ntitle: S4\n---\n\n## Introduction\n\nObject consists of:\n\n- Slots. Like fields in R6.\n- Methods. Accessed through generics. Dispatched to particular methods.\n\nUses functions to define classes and their methods:\n\n- `setClass()`. Define class and its components. \n- `setGenerics()`. Define generic functions. Used to dispatch.\n- `setMethods()`. Define methods\n\n## Basics overview\n\n### Set class\n\nDefine the class:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsetClass(\"Person\", \n slots = c(\n name = \"character\", \n age = \"numeric\"\n )\n)\n```\n:::\n\n\nCreate an instance of the class\n\n\n::: {.cell}\n\n```{.r .cell-code}\njohn <- new(\"Person\", name = \"John Smith\", age = NA_real_)\n```\n:::\n\n\n\n### Set generics\n\nDefine generic functions for setting and getting the age slot\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# get the value\nsetGeneric(\"age\", function(x) standardGeneric(\"age\"))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"age\"\n```\n\n\n:::\n\n```{.r .cell-code}\n# set the value\nsetGeneric(\"age<-\", function(x, value) standardGeneric(\"age<-\"))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"age<-\"\n```\n\n\n:::\n:::\n\n\n### Set methods\n\nDefine methods for the generics:\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# get the value\nsetMethod(\"age\", \"Person\", function(x) x@age)\n# set the value\nsetMethod(\"age<-\", \"Person\", function(x, value) {\n x@age <- value\n x\n})\n\n# set the value\nage(john) <- 50\n# get the value\nage(john)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 50\n```\n\n\n:::\n:::\n\n\nTo give a flavor, there is only one method per slot. In more realistic cases, there might be several methods.\n\n## Details on defining the class\n\n### Inheritance\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsetClass(\"Employee\", \n contains = \"Person\", \n slots = c(\n boss = \"Person\"\n ),\n prototype = list(\n boss = new(\"Person\")\n )\n)\n```\n:::\n\n\n### Instantiation\n\nCreate an instance of the class at two levels:\n\n- For developer (you): `methods::new()`\n- For user: constructor function\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# how user constructs an instance\nPerson <- function(name, age = NA) {\n age <- as.double(age)\n \n # how the developer constructs an instance\n new(\"Person\", name = name, age = age)\n}\n\nPerson(\"Someone\")\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> An object of class \"Person\"\n#> Slot \"name\":\n#> [1] \"Someone\"\n#> \n#> Slot \"age\":\n#> [1] NA\n```\n\n\n:::\n:::\n\n\n### Validation\n\nS4 objects\n\n- Check class of slot at creation\n\n::: {.cell}\n\n```{.r .cell-code}\nPerson(mtcars)\n```\n\n::: {.cell-output .cell-output-error}\n\n```\n#> Error in validObject(.Object): invalid class \"Person\" object: invalid object for slot \"name\" in class \"Person\": got class \"data.frame\", should be or extend class \"character\"\n```\n\n\n:::\n:::\n\n\n- Do **not** check other things\n\n::: {.cell}\n\n```{.r .cell-code}\nPerson(\"Hadley\", age = c(30, 37))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> An object of class \"Person\"\n#> Slot \"name\":\n#> [1] \"Hadley\"\n#> \n#> Slot \"age\":\n#> [1] 30 37\n```\n\n\n:::\n:::\n\n\nThat's where validation comes in--at two stages:\n\n1. At creation\n2. At modification\n\n#### At creation\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsetValidity(\"Person\", function(object) {\n if (length(object@name) != length(object@age)) {\n \"@name and @age must be same length\"\n } else {\n TRUE\n }\n})\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> Class \"Person\" [in \".GlobalEnv\"]\n#> \n#> Slots:\n#> \n#> Name: name age\n#> Class: character numeric\n```\n\n\n:::\n\n```{.r .cell-code}\nPerson(\"Hadley\", age = c(30, 37))\n```\n\n::: {.cell-output .cell-output-error}\n\n```\n#> Error in validObject(.Object): invalid class \"Person\" object: @name and @age must be same length\n```\n\n\n:::\n:::\n\n\n#### At modification\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# get value\nsetGeneric(\"name\", function(x) standardGeneric(\"name\"))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"name\"\n```\n\n\n:::\n\n```{.r .cell-code}\nsetMethod(\"name\", \"Person\", function(x) x@name)\n\n# set value--and assess whether resulting object is valid\nsetGeneric(\"name<-\", function(x, value) standardGeneric(\"name<-\"))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"name<-\"\n```\n\n\n:::\n\n```{.r .cell-code}\nsetMethod(\"name<-\", \"Person\", function(x, value) {\n x@name <- value\n validObject(x)\n x\n})\n\n# normal name; no problem\nname(john) <- \"Jon Smythe\"\nname(john)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"Jon Smythe\"\n```\n\n\n:::\n\n```{.r .cell-code}\n# invalid name; error thrown\nname(john) <- letters\n```\n\n::: {.cell-output .cell-output-error}\n\n```\n#> Error in validObject(x): invalid class \"Person\" object: @name and @age must be same length\n```\n\n\n:::\n:::\n\n\n\n## Details on generics and methods\n\n### Dictate dispatch via signature\n\nSpecify function arguments to be used in determining method.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsetGeneric(\"myGeneric\", \n function(x, ..., verbose = TRUE) standardGeneric(\"myGeneric\"),\n signature = \"x\"\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"myGeneric\"\n```\n\n\n:::\n:::\n\n\n### Define generics\n\nGeneral form:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsetMethod(\"myGeneric\", \"Person\", function(x) {\n # method implementation\n})\n```\n:::\n\n\nExample to print object:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsetMethod(\"show\", \"Person\", function(object) {\n cat(is(object)[[1]], \"\\n\",\n \" Name: \", object@name, \"\\n\",\n \" Age: \", object@age, \"\\n\",\n sep = \"\"\n )\n})\njohn\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> Person\n#> Name: Jon Smythe\n#> Age: 50\n```\n\n\n:::\n:::\n\n\nExample to access slot:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsetGeneric(\"name\", function(x) standardGeneric(\"name\"))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"name\"\n```\n\n\n:::\n\n```{.r .cell-code}\nsetMethod(\"name\", \"Person\", function(x) x@name)\n\nname(john)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"Jon Smythe\"\n```\n\n\n:::\n:::\n\n\n\nThis is how end users should access slots.\n\n## Example: `lubridate::period()`\n\n### Define the class\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsetClass(\"Period\",\n # inherits from these classes\n contains = c(\"Timespan\", \"numeric\"),\n # has slots for time components\n slots = c(\n year = \"numeric\", \n month = \"numeric\", \n day = \"numeric\",\n hour = \"numeric\", \n minute = \"numeric\"\n ),\n # defines prototype as period of zero duration for all slots\n prototype = prototype(year = 0, month = 0, day = 0, hour = 0, minute = 0),\n # check validity with `check_period` function; see section below\n validity = check_period\n)\n```\n:::\n\n\nSee source code [here](https://github.com/tidyverse/lubridate/blob/0bb49b21c88736240219dc67e7ed0eb3df15d9b1/R/periods.r#L90)\n\n### Validate object\n\nCheck whether object is valid--notably if all arugments have the same length and are integers.\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncheck_period <- function(object) {\n # start with an empty vector of error messages\n errors <- character()\n\n # check length of object's data\n length(object@.Data) -> n\n # check length of each slot\n lengths <- c(\n length(object@year), \n length(object@month),\n length(object@day), \n length(object@hour), \n length(object@minute)\n )\n\n # if length of any slot is different than overall length, compose error message\n if (any(lengths != n)) {\n msg <- paste(\"Inconsistent lengths: year = \", lengths[1],\n \", month = \", lengths[2],\n \", day = \", lengths[3],\n \", hour = \", lengths[4],\n \", minute = \", lengths[5],\n \", second = \", n,\n sep = \"\"\n )\n # add just-composed error to vector of error messages\n errors <- c(errors, msg)\n }\n\n values <- c(object@year, object@month, object@day, object@hour, object@minute)\n values <- na.omit(values)\n if (sum(values - trunc(values))) {\n msg <- \"periods must have integer values\"\n errors <- c(errors, msg)\n }\n\n if (length(errors) == 0) {\n TRUE\n } else {\n errors\n }\n}\n```\n:::\n\n\n\nSee source code [here](https://github.com/tidyverse/lubridate/blob/0bb49b21c88736240219dc67e7ed0eb3df15d9b1/R/periods.r#L6).\n\n### Set methods\n\nShow period:\n\n\n::: {.cell}\n\n```{.r .cell-code}\n#' @export\nsetMethod(\"show\", signature(object = \"Period\"), function(object) {\n if (length(object@.Data) == 0) {\n cat(\"<Period[0]>\\n\")\n } else {\n print(format(object))\n }\n})\n\n#' @export\nformat.Period <- function(x, ...) {\n if (length(x) == 0) {\n return(character())\n }\n\n show <- paste(\n x@year, \"y \", x@month, \"m \", x@day, \"d \",\n x@hour, \"H \", x@minute, \"M \", x@.Data, \"S\",\n sep = \"\"\n )\n start <- regexpr(\"[-1-9]|(0\\\\.)\", show)\n show <- ifelse(start > 0, substr(show, start, nchar(show)), \"0S\")\n\n show[is.na(x)] <- NA\n show\n}\n```\n:::\n\n\nSee source code [here](https://github.com/tidyverse/lubridate/blob/0bb49b21c88736240219dc67e7ed0eb3df15d9b1/R/periods.r#L195)\n", + "supporting": [ + "15_files" + ], "filters": [ "rmarkdown/pagebreak.lua" ], diff --git a/_freeze/slides/16/execute-results/html.json b/_freeze/slides/16/execute-results/html.json @@ -1,9 +1,11 @@ { - "hash": "5b2c395d798d8572d368eef6180abcbc", + "hash": "34d644ad4ea8b8873e90265633e52f27", "result": { "engine": "knitr", - "markdown": "---\nengine: knitr\ntitle: Trade-offs\n---\n\n## Learning objectives:\n\n- Understand the Trade-offs between S3, R6 and S4\n\n- Brief intro to S7 (the object system formerly known as R7)\n\n\n## Introduction {-}\n\n* We have three OOP systems introduced so far (S3, S4, R6) \n\n* At the current time (pre - S7?) Hadley recommends use S3 by default: It's simple and widely used throughout base R and CRAN.\n\n* If you have experience in other languages, *Resist* the temptation to use R6 even though it will feel more familiar!\n\n\n## S4 versus S3 {-}\n\n**Which functional object system to use, S3 or S4? **\n\n- **S3** is a simple and flexible system.\n \n - Good for small teams who need flexibility and immediate payoffs.\n \n - Commonly used throughout base R and CRAN \n \n - Flexibility can cause problems, more complex systems might require formal conventions\n \n\n- **S4** is a more formal, strict system. \n\n - Good for large projects and large teams\n \n - Used by Bioconductor project\n \n - Requires significant up front investment in design, but payoff is a robust system that enforces conventions.\n \n - S4 documentation is challenging to use. \n \n\n\n## R6 versus S3 {-}\n\n**R6** is built on **encapsulated objects**, rather than generic functions. \n\n\n**Big differences: general trade-offs**\n\n* A generic is a regular function so it lives in the global namespace. An R6 method belongs to an object so it lives in a local namespace. This influences how we think about naming.\n\n* R6's reference semantics allow methods to simultaneously return a value and modify an object. This solves a painful problem called \"threading state\".\n\n* You invoke an R6 method using `$`, which is an infix operator. If you set up your methods correctly you can use chains of method calls as an alternative to the pipe.\n\n## Namespacing {-}\n\n**Where methods are found?**\n\n- in S3, **Generic functions** are **global** and live in the **global namespace**\n\n - Advantage: Uniform API: `summary`, `print`, `predict` etc.\n \n - Disadvantage: Must be careful about creating new methods! Homonyms must be avoided, don't define `plot(bank_heist)`\n \n\n- in R6, **Encapsulated methods** are **local**: objects with a **scope**\n\n - Advantage: No problems with homonyms: meaning of `bank_heist$plot()` is clear and unambiguous.\n \n - Disadvantage: Lack of a uniform API, except by convention.\n \n\n## Threading state {-}\n\n\nIn S3 the challenge is to return a value and modify the object. \n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nnew_stack <- function(items = list()) {\n structure(list(items = items), class = \"stack\")\n}\n\npush <- function(x, y) {\n x$items <- c(x$items, list(y))\n x\n}\n```\n:::\n\n\nNo problem with that, but what about when we want to pop a value? We need to return two things.\n\n\n::: {.cell}\n\n```{.r .cell-code}\npop <- function(x) {\n n <- length(x$items)\n \n item <- x$items[[n]]\n x$items <- x$items[-n]\n \n list(item = item, x = x)\n}\n```\n:::\n\n\nThe usage is a bit awkward:\n\n\n::: {.cell}\n\n```{.r .cell-code}\ns <- new_stack()\ns <- push(s, 10)\ns <- push(s, 20)\n\nout <- pop(s)\n# Update state:\ns <- out$x\n\nprint(out$item)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 20\n```\n\n\n:::\n:::\n\n\n\nIn python and other languages we have structured binding to make this less awkward. R has the {zeallot} package. For more, see this vignette:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nvignette('unpacking-assignment')\n```\n:::\n\n\nHowever, this is all easier in R6 due to the reference semantics!\n\n\n::: {.cell}\n\n```{.r .cell-code}\nStack <- R6::R6Class(\"Stack\", list(\n items = list(),\n push = function(x) {\n self$items <- c(self$items, x)\n invisible(self)\n },\n pop = function() {\n item <- self$items[[self$length()]]\n self$items <- self$items[-self$length()]\n item\n },\n length = function() {\n length(self$items)\n }\n))\n\ns <- Stack$new()\ns$push(10)\ns$push(20)\ns$pop()\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 20\n```\n\n\n:::\n:::\n\n\n\n## Method chaining {-}\n\nUseful to compose functions from left-to-right.\n\nUse of the operators:\n\n- S3: `|>` or `%>%`\n\n- R6: `$`\n\n\n::: {.cell}\n\n```{.r .cell-code}\ns$push(44)$push(32)$pop()\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 32\n```\n\n\n:::\n:::\n\n\n\n## Umm... what about S7 ? {-}\n\n\n::: {.cell}\n::: {.cell-output-display}\n\n:::\n:::\n\n\n### Primary references: {-}\n\n* Docs: <https://rconsortium.github.io/S7/>\n\n* Talk by Hadley Wickham <https://www.youtube.com/watch?v=P3FxCvSueag>\n\n## S7 briefly {-}\n\n* S7 is a 'better' version of S3 with some of the 'strictness' of S4.\n\n```\n\"A little bit more complex then S3, with almost all of the features, \nall of the payoff of S4\" - rstudio conf 2022, Hadley Wickham\n```\n* S3 + S4 = S7\n\n* Compatible with S3: S7 objects are S3 objects! Can even extend an S3 object with S7\n\n* Somewhat compatible with S4, see [compatability vignette](https://rconsortium.github.io/S7/articles/compatibility.html) for details. \n\n* Helpful error messages! \n\n* Note that it was previously called R7, but it was changed to \"S7\" to better reflect that it is functional not encapsulated! \n\n## Abbreviated introduction based on the vignette {-}\n\nTo install (it's now on CRAN): \n\n::: {.cell}\n\n```{.r .cell-code}\ninstall.packages(\"S7\")\n```\n:::\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(S7)\ndog <- new_class(\"dog\", properties = list(\n name = class_character,\n age = class_numeric\n))\ndog\n\n\n#> <S7_class>\n#> @ name : dog\n#> @ parent: <S7_object>\n#> @ properties:\n#> $ name: <character> \n#> $ age : <integer> or <double>\n```\n:::\n\n\nNote the `class_character`, these are S7 classes corresponding to the base classes.\n\nNow to use it to create an object of class _dog_:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlola <- dog(name = \"Lola\", age = 11)\nlola\n\n#> <dog>\n#> @ name: chr \"Lola\"\n#> @ age : num 11\n```\n:::\n\n\nProperties can be set/read with `@`, with automatic validation ('safety rails') based on the type!\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlola@age <- 12\nlola@age\n\n#> 12\n\nlola@age <- \"twelve\"\n\n#> Error: <dog>@age must be <integer> or <double>, not <character>\n```\n:::\n\n\nNote the helpful error message!\n\nLike S3 (and S4) S7 has generics, implemented with `new_generic` and `method` for particular methods:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nspeak <- new_generic(\"speak\", \"x\")\n\nmethod(speak, dog) <- function(x) {\n \"Woof\"\n}\n \nspeak(lola)\n\n#> [1] \"Woof\"\n```\n:::\n\n\nIf we have another class, we can implement the generic for that too:\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncat <- new_class(\"cat\", properties = list(\n name = class_character,\n age = class_double\n))\nmethod(speak, cat) <- function(x) {\n \"Meow\"\n}\n\nfluffy <- cat(name = \"Fluffy\", age = 5)\nspeak(fluffy)\n\n#> [1] \"Meow\"\n```\n:::\n\n\nHelpful messages:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nspeak\n\n#> <S7_generic> speak(x, ...) with 2 methods:\n#> 1: method(speak, cat)\n#> 2: method(speak, dog)\n```\n:::\n\n\n\n\"most usage of S7 with S3 will just work\"\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmethod(print, cat) <- function(...) {\n print(\"I am a cat.\")\n}\n\nprint(fluffy)\n#> \"I am a cat\"\n```\n:::\n\n\n*For validators, inheritance, dynamic properties and more, see the [vignette!](https://rconsortium.github.io/S7/articles/S7.html)*\n\n\n## So... switch to S7 ? {-}\n\n$$\n\\huge\n\\textbf{Soon}^{tm}\n$$\n\n* Not yet... still in development! \n\n* But consider trying it out:\n\n * To stay ahead of the curve... S7 will be integrated into base R someday!\n \n * To contribute feedback to the S7 team!\n\n * To get \"almost all\" of the benefits of S4 without the complexity ! \n \n* In particular, if you have a new project that might require the complexity of S4, consider S7 instead!\n\n## OOP system comparison {-}\n\n| Characteristic | S3 | S4 | S7 | R6 |\n|-------|------|------|------|------|\n| _Package_ | base R | base R | [S7](https://rconsortium.github.io/S7/) | [R6](https://r6.r-lib.org/) |\n| _Programming type_ | Functional | Functional | Functional | Encapulated |\n| _Complexity_ | Low | High | Medium | High |\n| _Payoff_ | Low | High | High | High |\n| _Team size_ | Small | Small-large | Large | ? |\n| _Namespace_ | Global | Global? | Global? | Local |\n| _Modify in place_ | No | No | No | Yes |\n| _Method chaining_ | `|>` | `|>`? | `|>`? | `$` |\n| _Get/set component_ | `$` | `@` | `@` | `$` |\n| _Create class_ | `class()` or `structure()` with `class` argument | `setClass()` | `new_class()` | `R6Class()` |\n| _Create validator_ | `function()` | `setValidity()` or `validator` argument in `setClass()` | `validator` argument in `new_class()` | `$validate()` |\n| _Create generic_ | `UseMethod()` | `setGeneric()` | `new_generic()` | NA |\n| _Create method_ | `function()` assigned to `generic.method` | `setMethod()` | `method()` | `R6Class()` |\n| _Create object_ | `class()` or `structure()` with `class` argument or constructor function | `new()` | Use registered method function | `$new()` |\n| _Additional components_ | attributes | slots | properties | |\n| | | | | |\n\n## Meeting Videos {-}\n\n### Cohort 1 {-}\n\n<iframe src=\"https://www.youtube.com/embed/W1uc8HbyZvI\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 2 {-}\n\n<iframe src=\"https://www.youtube.com/embed/bzo37PHCM1I\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 3 {-}\n\n<iframe src=\"https://www.youtube.com/embed/_byYFTQHp1Y\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 4 {-}\n\n<iframe src=\"https://www.youtube.com/embed/vdKDPBcOc6Y\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 5 {-}\n\n<iframe src=\"https://www.youtube.com/embed/3EvqtVYTFVM\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 6 {-}\n\n<iframe src=\"https://www.youtube.com/embed/vEButxFIvLw\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n<details>\n<summary> Meeting chat log </summary>\n\n```\n00:11:36\tOluwafemi Oyedele:\tI have not built anything with them!!!\n00:16:31\tArthur Shaw:\thttps://cran.r-project.org/web/packages/sp/index.html\n00:19:05\tArthur Shaw:\tApparently Hadley asked the same question we're asking several years ago: https://stackoverflow.com/questions/5437238/which-packages-make-good-use-of-s4-objects\n00:19:16\tTrevin:\tHA\n00:23:54\tTrevin:\tYour audio is breaking up Federica\n01:06:58\tFederica Gazzelloni:\thttps://mastering-shiny.org/reactive-motivation.html?q=R6#event-driven\n01:07:37\tFederica Gazzelloni:\thttps://engineering-shiny.org/common-app-caveats.html?q=R6#using-r6-as-data-storage\n01:10:52\tOluwafemi Oyedele:\tThank you !!!\n```\n</details>\n\n### Cohort 7 {-}\n\n<iframe src=\"https://www.youtube.com/embed/2vxnzqWp-OU\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n", - "supporting": [], + "markdown": "---\nengine: knitr\ntitle: Trade-offs\n---\n\n## Learning objectives:\n\n- Understand the Trade-offs between S3, R6 and S4\n\n- Brief intro to S7 (the object system formerly known as R7)\n\n\n## Introduction {-}\n\n* We have three OOP systems introduced so far (S3, S4, R6) \n\n* At the current time (pre - S7?) Hadley recommends use S3 by default: It's simple and widely used throughout base R and CRAN.\n\n* If you have experience in other languages, *Resist* the temptation to use R6 even though it will feel more familiar!\n\n\n## S4 versus S3 {-}\n\n**Which functional object system to use, S3 or S4? **\n\n- **S3** is a simple and flexible system.\n \n - Good for small teams who need flexibility and immediate payoffs.\n \n - Commonly used throughout base R and CRAN \n \n - Flexibility can cause problems, more complex systems might require formal conventions\n \n\n- **S4** is a more formal, strict system. \n\n - Good for large projects and large teams\n \n - Used by Bioconductor project\n \n - Requires significant up front investment in design, but payoff is a robust system that enforces conventions.\n \n - S4 documentation is challenging to use. \n \n\n\n## R6 versus S3 {-}\n\n**R6** is built on **encapsulated objects**, rather than generic functions. \n\n\n**Big differences: general trade-offs**\n\n* A generic is a regular function so it lives in the global namespace. An R6 method belongs to an object so it lives in a local namespace. This influences how we think about naming.\n\n* R6's reference semantics allow methods to simultaneously return a value and modify an object. This solves a painful problem called \"threading state\".\n\n* You invoke an R6 method using `$`, which is an infix operator. If you set up your methods correctly you can use chains of method calls as an alternative to the pipe.\n\n## Namespacing {-}\n\n**Where methods are found?**\n\n- in S3, **Generic functions** are **global** and live in the **global namespace**\n\n - Advantage: Uniform API: `summary`, `print`, `predict` etc.\n \n - Disadvantage: Must be careful about creating new methods! Homonyms must be avoided, don't define `plot(bank_heist)`\n \n\n- in R6, **Encapsulated methods** are **local**: objects with a **scope**\n\n - Advantage: No problems with homonyms: meaning of `bank_heist$plot()` is clear and unambiguous.\n \n - Disadvantage: Lack of a uniform API, except by convention.\n \n\n## Threading state {-}\n\n\nIn S3 the challenge is to return a value and modify the object. \n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nnew_stack <- function(items = list()) {\n structure(list(items = items), class = \"stack\")\n}\n\npush <- function(x, y) {\n x$items <- c(x$items, list(y))\n x\n}\n```\n:::\n\n\nNo problem with that, but what about when we want to pop a value? We need to return two things.\n\n\n::: {.cell}\n\n```{.r .cell-code}\npop <- function(x) {\n n <- length(x$items)\n \n item <- x$items[[n]]\n x$items <- x$items[-n]\n \n list(item = item, x = x)\n}\n```\n:::\n\n\nThe usage is a bit awkward:\n\n\n::: {.cell}\n\n```{.r .cell-code}\ns <- new_stack()\ns <- push(s, 10)\ns <- push(s, 20)\n\nout <- pop(s)\n# Update state:\ns <- out$x\n\nprint(out$item)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 20\n```\n\n\n:::\n:::\n\n\n\nIn python and other languages we have structured binding to make this less awkward. R has the {zeallot} package. For more, see this vignette:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nvignette('unpacking-assignment')\n```\n:::\n\n\nHowever, this is all easier in R6 due to the reference semantics!\n\n\n::: {.cell}\n\n```{.r .cell-code}\nStack <- R6::R6Class(\"Stack\", list(\n items = list(),\n push = function(x) {\n self$items <- c(self$items, x)\n invisible(self)\n },\n pop = function() {\n item <- self$items[[self$length()]]\n self$items <- self$items[-self$length()]\n item\n },\n length = function() {\n length(self$items)\n }\n))\n\ns <- Stack$new()\ns$push(10)\ns$push(20)\ns$pop()\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 20\n```\n\n\n:::\n:::\n\n\n\n## Method chaining {-}\n\nUseful to compose functions from left-to-right.\n\nUse of the operators:\n\n- S3: `|>` or `%>%`\n\n- R6: `$`\n\n\n::: {.cell}\n\n```{.r .cell-code}\ns$push(44)$push(32)$pop()\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 32\n```\n\n\n:::\n:::\n\n\n\n## Umm... what about S7 ? {-}\n\n\n::: {.cell}\n::: {.cell-output-display}\n\n:::\n:::\n\n\n### Primary references: {-}\n\n* Docs: <https://rconsortium.github.io/S7/>\n\n* Talk by Hadley Wickham <https://www.youtube.com/watch?v=P3FxCvSueag>\n\n## S7 briefly {-}\n\n* S7 is a 'better' version of S3 with some of the 'strictness' of S4.\n\n```\n\"A little bit more complex then S3, with almost all of the features, \nall of the payoff of S4\" - rstudio conf 2022, Hadley Wickham\n```\n* S3 + S4 = S7\n\n* Compatible with S3: S7 objects are S3 objects! Can even extend an S3 object with S7\n\n* Somewhat compatible with S4, see [compatability vignette](https://rconsortium.github.io/S7/articles/compatibility.html) for details. \n\n* Helpful error messages! \n\n* Note that it was previously called R7, but it was changed to \"S7\" to better reflect that it is functional not encapsulated! \n\n## Abbreviated introduction based on the vignette {-}\n\nTo install (it's now on CRAN): \n\n::: {.cell}\n\n```{.r .cell-code}\ninstall.packages(\"S7\")\n```\n:::\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(S7)\ndog <- new_class(\"dog\", properties = list(\n name = class_character,\n age = class_numeric\n))\ndog\n\n\n#> <S7_class>\n#> @ name : dog\n#> @ parent: <S7_object>\n#> @ properties:\n#> $ name: <character> \n#> $ age : <integer> or <double>\n```\n:::\n\n\nNote the `class_character`, these are S7 classes corresponding to the base classes.\n\nNow to use it to create an object of class _dog_:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlola <- dog(name = \"Lola\", age = 11)\nlola\n\n#> <dog>\n#> @ name: chr \"Lola\"\n#> @ age : num 11\n```\n:::\n\n\nProperties can be set/read with `@`, with automatic validation ('safety rails') based on the type!\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlola@age <- 12\nlola@age\n\n#> 12\n\nlola@age <- \"twelve\"\n\n#> Error: <dog>@age must be <integer> or <double>, not <character>\n```\n:::\n\n\nNote the helpful error message!\n\nLike S3 (and S4) S7 has generics, implemented with `new_generic` and `method` for particular methods:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nspeak <- new_generic(\"speak\", \"x\")\n\nmethod(speak, dog) <- function(x) {\n \"Woof\"\n}\n \nspeak(lola)\n\n#> [1] \"Woof\"\n```\n:::\n\n\nIf we have another class, we can implement the generic for that too:\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncat <- new_class(\"cat\", properties = list(\n name = class_character,\n age = class_double\n))\nmethod(speak, cat) <- function(x) {\n \"Meow\"\n}\n\nfluffy <- cat(name = \"Fluffy\", age = 5)\nspeak(fluffy)\n\n#> [1] \"Meow\"\n```\n:::\n\n\nHelpful messages:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nspeak\n\n#> <S7_generic> speak(x, ...) with 2 methods:\n#> 1: method(speak, cat)\n#> 2: method(speak, dog)\n```\n:::\n\n\n\n\"most usage of S7 with S3 will just work\"\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmethod(print, cat) <- function(...) {\n print(\"I am a cat.\")\n}\n\nprint(fluffy)\n#> \"I am a cat\"\n```\n:::\n\n\n*For validators, inheritance, dynamic properties and more, see the [vignette!](https://rconsortium.github.io/S7/articles/S7.html)*\n\n\n## So... switch to S7 ? {-}\n\n$$\n\\huge\n\\textbf{Soon}^{tm}\n$$\n\n* Not yet... still in development! \n\n* But consider trying it out:\n\n * To stay ahead of the curve... S7 will be integrated into base R someday!\n \n * To contribute feedback to the S7 team!\n\n * To get \"almost all\" of the benefits of S4 without the complexity ! \n \n* In particular, if you have a new project that might require the complexity of S4, consider S7 instead!\n\n## OOP system comparison {-}\n\n| Characteristic | S3 | S4 | S7 | R6 |\n|-------|------|------|------|------|\n| _Package_ | base R | base R | [S7](https://rconsortium.github.io/S7/) | [R6](https://r6.r-lib.org/) |\n| _Programming type_ | Functional | Functional | Functional | Encapulated |\n| _Complexity_ | Low | High | Medium | High |\n| _Payoff_ | Low | High | High | High |\n| _Team size_ | Small | Small-large | Large | ? |\n| _Namespace_ | Global | Global? | Global? | Local |\n| _Modify in place_ | No | No | No | Yes |\n| _Method chaining_ | `|>` | `|>`? | `|>`? | `$` |\n| _Get/set component_ | `$` | `@` | `@` | `$` |\n| _Create class_ | `class()` or `structure()` with `class` argument | `setClass()` | `new_class()` | `R6Class()` |\n| _Create validator_ | `function()` | `setValidity()` or `validator` argument in `setClass()` | `validator` argument in `new_class()` | `$validate()` |\n| _Create generic_ | `UseMethod()` | `setGeneric()` | `new_generic()` | NA |\n| _Create method_ | `function()` assigned to `generic.method` | `setMethod()` | `method()` | `R6Class()` |\n| _Create object_ | `class()` or `structure()` with `class` argument or constructor function | `new()` | Use registered method function | `$new()` |\n| _Additional components_ | attributes | slots | properties | |\n| | | | | |\n", + "supporting": [ + "16_files" + ], "filters": [ "rmarkdown/pagebreak.lua" ], diff --git a/_freeze/slides/17/execute-results/html.json b/_freeze/slides/17/execute-results/html.json @@ -1,9 +1,11 @@ { - "hash": "d9c903f2034652d77786901cec427b4c", + "hash": "7befea3ac5a9457dd6cb99fed880c7dd", "result": { "engine": "knitr", - "markdown": "---\nengine: knitr\ntitle: Big picture\n---\n\n## Learning objectives:\n\n- Become familiar with some metaprogramming principals and how they relate to each other\n- Review vocabulary associated with metaprogramming\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(rlang)\nlibrary(lobstr)\n```\n:::\n\n\n\n## Code is data\n\n- **expression** - Captured code (*call*, *symbol*, *constant*, or *pairlist*)\n- Use `rlang::expr()`[^1] to capture code directly\n\n\n::: {.cell}\n\n```{.r .cell-code}\nexpr(mean(x, na.rm = TRUE))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> mean(x, na.rm = TRUE)\n```\n\n\n:::\n:::\n\n\n- Use `rlang::enexpr()` to capture code indirectly\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncapture_it <- function(x) { # 'automatically quotes first argument'\n enexpr(x)\n}\ncapture_it(a + b + c)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> a + b + c\n```\n\n\n:::\n:::\n\n\n- 'Captured' code can be modified (like a list)!\n - First element is the function, next elements are the arguments\n\n\n::: {.cell}\n\n```{.r .cell-code}\nf <- expr(f(x = 1, y = 2))\nnames(f)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"\" \"x\" \"y\"\n```\n\n\n:::\n\n```{.r .cell-code}\nff <- fff <- f # Create two copies\n\nff$z <- 3 # Add an argument to one\nfff[[2]] <- NULL # Remove an argument from another\n\nf\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> f(x = 1, y = 2)\n```\n\n\n:::\n\n```{.r .cell-code}\nff\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> f(x = 1, y = 2, z = 3)\n```\n\n\n:::\n\n```{.r .cell-code}\nfff\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> f(y = 2)\n```\n\n\n:::\n:::\n\n\n> More on this next week!\n\n[^1]: Equivalent to `base::bquote()`\n\n## Code is a tree\n\n- **Abstract syntax tree** (AST) - Almost every language represents code as a tree\n- Use `lobstr::ast()` to inspect these code trees\n\n\n::: {.cell}\n\n```{.r .cell-code}\nast(f1(f2(a, b), f3(1)))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> █─f1 \n#> ├─█─f2 \n#> │ ├─a \n#> │ └─b \n#> └─█─f3 \n#> └─1\n```\n\n\n:::\n\n```{.r .cell-code}\nast(1 + 2 * 3)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> █─`+` \n#> ├─1 \n#> └─█─`*` \n#> ├─2 \n#> └─3\n```\n\n\n:::\n:::\n\n\n\n## Code can generate code\n\n- `rlang::call2()` creates function call\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncall2(\"f\", 1, 2, 3)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> f(1, 2, 3)\n```\n\n\n:::\n:::\n\n\n- Going backwards from the tree, can use functions to create calls\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncall2(\"f1\", call2(\"f2\", \"a\", \"b\"), call2(\"f3\", 1))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> f1(f2(\"a\", \"b\"), f3(1))\n```\n\n\n:::\n\n```{.r .cell-code}\ncall2(\"+\", 1, call2(\"*\", 2, 3))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> 1 + 2 * 3\n```\n\n\n:::\n:::\n\n\n- `!!` bang-bang - **unquote operator**\n - inserts previously defined expressions into the current one\n\n\n::: {.cell}\n\n```{.r .cell-code}\nxx <- expr(x + x)\nyy <- expr(y + y)\nexpr(xx / yy) # Nope!\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> xx/yy\n```\n\n\n:::\n\n```{.r .cell-code}\nexpr(!!xx / !!yy) # Yup!\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> (x + x)/(y + y)\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncv <- function(var) {\n var <- enexpr(var) # Get user's expression\n expr(sd(!!var) / mean(!!var)) # Insert user's expression\n}\n\ncv(x)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> sd(x)/mean(x)\n```\n\n\n:::\n\n```{.r .cell-code}\ncv(x + y)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> sd(x + y)/mean(x + y)\n```\n\n\n:::\n:::\n\n\n- Avoid `paste()` for building code\n - Problems with non-syntactic names and precedence among expressions\n\n> \"You might think this is an esoteric concern, but not worrying about it when generating SQL code in web applications led to SQL injection attacks that have collectively cost billions of dollars.\"\n\n## Evaluation runs code\n\n- **evaluate** - run/execute an expression\n- need both expression and environment\n- `eval()` uses current environment if not set\n- manual evaluation means you can tweak the environment!\n\n\n::: {.cell}\n\n```{.r .cell-code}\nxy <- expr(x + y)\n\neval(xy, env(x = 1, y = 10))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 11\n```\n\n\n:::\n\n```{.r .cell-code}\neval(xy, env(x = 2, y = 100))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 102\n```\n\n\n:::\n:::\n\n\n\n## Customizing evaluations with functions\n- Can also bind names to functions in supplied environment\n- Allows overriding function behaviour\n- This is how dplyr generates SQL for working with databases\n\nFor example...\n\n::: {.cell}\n\n```{.r .cell-code}\nstring_math <- function(x) {\n e <- env(\n caller_env(),\n `+` = function(x, y) paste(x, y),\n `*` = function(x, y) strrep(x, y)\n )\n\n eval(enexpr(x), e)\n}\n\ncohort <- 9\nstring_math(\"Hello\" + \"cohort\" + cohort)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"Hello cohort 9\"\n```\n\n\n:::\n\n```{.r .cell-code}\nstring_math((\"dslc\" + \"is\" + \"awesome---\") * cohort)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"dslc is awesome---dslc is awesome---dslc is awesome---dslc is awesome---dslc is awesome---dslc is awesome---dslc is awesome---dslc is awesome---dslc is awesome---\"\n```\n\n\n:::\n:::\n\n\n\n## Customizing evaluation with data\n\n- Look for variables inside data frame\n- **Data mask** - typically a data frame\n- use `rlang::eval_tidy()` rather than `eval()`\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndf <- data.frame(x = 1:5, y = sample(5))\neval_tidy(expr(x + y), df)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 4 7 4 6 9\n```\n\n\n:::\n:::\n\n\nCatch user input with `enexpr()`...\n\n\n::: {.cell}\n\n```{.r .cell-code}\nwith2 <- function(df, expr) {\n eval_tidy(enexpr(expr), df)\n}\n\nwith2(df, x + y)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 4 7 4 6 9\n```\n\n\n:::\n:::\n\n\nBut there's a bug!\n\n- Evaluates in environment inside `with2()`, but the expression likely refers\n to objects in the Global environment\n \n\n::: {.cell}\n\n```{.r .cell-code}\nwith2 <- function(df, expr) {\n a <- 1000\n eval_tidy(enexpr(expr), df)\n}\n\ndf <- data.frame(x = 1:3)\na <- 10\nwith2(df, x + a)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1001 1002 1003\n```\n\n\n:::\n:::\n\n\n- Solved with Quosures...\n \n## Quosures\n\n- **Quosures** bundles expression with an environment\n- Use `enquo()` instead of `enexpr()` (with `eval_tidy()`)\n\n\n::: {.cell}\n\n```{.r .cell-code}\nwith2 <- function(df, expr) {\n a <- 1000\n eval_tidy(enquo(expr), df)\n}\n\ndf <- data.frame(x = 1:3)\na <- 10\nwith2(df, x + a)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 11 12 13\n```\n\n\n:::\n:::\n\n\n> \"Whenever you use a data mask, you must always use `enquo()` instead of `enexpr()`.\n\nThis comes back in Chapter 20.\n\n### Which environment is bundled?\n- The environment where the expression is created (i.e. the parent of where\n `enquo()` is called)\n\nHere, the global environment\n\n\n::: {.cell}\n\n```{.r .cell-code}\nwith2 <- function(df, expr) {\n a <- 1000\n eq <- enquo(expr)\n message(\"with2() Parent/Calling environment: \")\n print(rlang::caller_env())\n message(\"with2() environment: \")\n print(rlang::current_env())\n message(\"Quosure details: \")\n print(eq) # Print the details of the quosure\n eval_tidy(eq, df)\n}\n\na <- 10000\ndf <- data.frame(x = 1:3)\nwith2(df, x + a)\n```\n\n::: {.cell-output .cell-output-stderr}\n\n```\n#> with2() Parent/Calling environment:\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> <environment: R_GlobalEnv>\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\n#> with2() environment:\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> <environment: 0x000001c9abd33460>\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\n#> Quosure details:\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> <quosure>\n#> expr: ^x + a\n#> env: global\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 10001 10002 10003\n```\n\n\n:::\n:::\n\n\n\nHere, the `fun1()` environment\n\n::: {.cell}\n\n```{.r .cell-code}\nfun1 <- function(df) {\n a <- 10\n message(\"fun1() Parent/Calling environment: \")\n print(rlang::caller_env())\n message(\"fun1() environment: \")\n print(rlang::current_env())\n with2(df, x + a)\n}\n\na <- 10000\ndf <- data.frame(x = 1:3)\nfun1(df)\n```\n\n::: {.cell-output .cell-output-stderr}\n\n```\n#> fun1() Parent/Calling environment:\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> <environment: R_GlobalEnv>\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\n#> fun1() environment:\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> <environment: 0x000001c9a4d3e0e8>\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\n#> with2() Parent/Calling environment:\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> <environment: 0x000001c9a4d3e0e8>\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\n#> with2() environment:\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> <environment: 0x000001c9a4d50b60>\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\n#> Quosure details:\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> <quosure>\n#> expr: ^x + a\n#> env: 0x000001c9a4d3e0e8\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 11 12 13\n```\n\n\n:::\n:::\n\n\n\n\n\n## Meeting Videos\n\n### Cohort 1\n\n<iframe src=\"https://www.youtube.com/embed/10gRbFMoh7g\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 2\n\n<iframe src=\"https://www.youtube.com/embed/vKKDU6x3BE8\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 3\n\n<iframe src=\"https://www.youtube.com/embed/5RLCRFli6QI\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 4\n\n<iframe src=\"https://www.youtube.com/embed/9MDC12hgOWQ\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 5\n\n<iframe src=\"https://www.youtube.com/embed/FSm2_TJmhm0\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 6\n\n<iframe src=\"https://www.youtube.com/embed/Ddd_43gw8nA\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n<details>\n<summary> Meeting chat log </summary>\n\n```\n00:32:31\tOluwafemi Oyedele:\tWhen should eval_tidy() be used instead of eval()?\nbase::eval() is sufficient for simple evaluation. Use eval_tidy() when you'd like to support expressions referring to the .data pronoun, or when you need to support quosures.\n00:37:08\tTrevin (he/him):\thttps://rlang.r-lib.org/reference/topic-defuse.html\n00:38:38\tFederica Gazzelloni:\thttps://rlang.r-lib.org/reference/eval_tidy.html\n00:39:57\tArthur Shaw:\tTidy eval book: https://bookdown.dongzhuoer.com/tidyverse/tidyeval/\n00:40:14\tArthur Shaw:\tAlso very useful resource: https://dplyr.tidyverse.org/articles/programming.html\n00:40:28\tTrevin (he/him):\thttps://ggplot2.tidyverse.org/reference/aes.html\n00:40:37\tFederica Gazzelloni:\thttps://ggplot2.tidyverse.org/reference/tidyeval.html\n00:41:22\tOluwafemi Oyedele:\tIt is Tidyverse design\n00:49:13\tFederica Gazzelloni:\thttps://www.youtube.com/watch?v=2NixH3QAerQ&list=PL3x6DOfs2NGi9lH7q-phZlPrl6HKXYDbn&index=15\n00:50:13\tFederica Gazzelloni:\tMinute: 17:04\n00:54:03\tFederica Gazzelloni:\tcon <- DBI::dbConnect(RSQLite::SQLite(), filename = \":memory:\")\n00:54:18\tFederica Gazzelloni:\tDBI::dbDisconnect(con)\n```\n</details>\n\n### Cohort 7\n\n<iframe src=\"https://www.youtube.com/embed/MX2vNlvIUFo\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n<details>\n\n<summary>Meeting chat log</summary>\n```\n00:11:09\tRyan Honomichl:\thttps://medium.com/analytics-vidhya/become-a-better-r-programmer-with-the-awesome-lobstr-package-af97fcd22602\n00:33:03\tRyan Honomichl:\thttps://rlang.r-lib.org/reference/enquo.html\n00:37:30\tRyan Honomichl:\thttps://rlang.r-lib.org/reference/topic-multiple-columns.html\n00:41:00\tRyan Honomichl:\tbrb\n00:44:37\tRon Legere:\thttps://www.rdocumentation.org/packages/srvyr/versions/1.2.0\n00:44:58\tRon Legere:\thttp://gdfe.co/srvyr/\n00:51:51\tStone:\thttps://cran.r-project.org/web/packages/data.table/vignettes/datatable-intro.html\n```\n</details>\n", - "supporting": [], + "markdown": "---\nengine: knitr\ntitle: Big picture\n---\n\n## Learning objectives:\n\n- Become familiar with some metaprogramming principals and how they relate to each other\n- Review vocabulary associated with metaprogramming\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(rlang)\nlibrary(lobstr)\n```\n:::\n\n\n\n## Code is data\n\n- **expression** - Captured code (*call*, *symbol*, *constant*, or *pairlist*)\n- Use `rlang::expr()`[^1] to capture code directly\n\n\n::: {.cell}\n\n```{.r .cell-code}\nexpr(mean(x, na.rm = TRUE))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> mean(x, na.rm = TRUE)\n```\n\n\n:::\n:::\n\n\n- Use `rlang::enexpr()` to capture code indirectly\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncapture_it <- function(x) { # 'automatically quotes first argument'\n enexpr(x)\n}\ncapture_it(a + b + c)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> a + b + c\n```\n\n\n:::\n:::\n\n\n- 'Captured' code can be modified (like a list)!\n - First element is the function, next elements are the arguments\n\n\n::: {.cell}\n\n```{.r .cell-code}\nf <- expr(f(x = 1, y = 2))\nnames(f)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"\" \"x\" \"y\"\n```\n\n\n:::\n\n```{.r .cell-code}\nff <- fff <- f # Create two copies\n\nff$z <- 3 # Add an argument to one\nfff[[2]] <- NULL # Remove an argument from another\n\nf\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> f(x = 1, y = 2)\n```\n\n\n:::\n\n```{.r .cell-code}\nff\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> f(x = 1, y = 2, z = 3)\n```\n\n\n:::\n\n```{.r .cell-code}\nfff\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> f(y = 2)\n```\n\n\n:::\n:::\n\n\n> More on this next week!\n\n[^1]: Equivalent to `base::bquote()`\n\n## Code is a tree\n\n- **Abstract syntax tree** (AST) - Almost every language represents code as a tree\n- Use `lobstr::ast()` to inspect these code trees\n\n\n::: {.cell}\n\n```{.r .cell-code}\nast(f1(f2(a, b), f3(1)))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> █─f1 \n#> ├─█─f2 \n#> │ ├─a \n#> │ └─b \n#> └─█─f3 \n#> └─1\n```\n\n\n:::\n\n```{.r .cell-code}\nast(1 + 2 * 3)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> █─`+` \n#> ├─1 \n#> └─█─`*` \n#> ├─2 \n#> └─3\n```\n\n\n:::\n:::\n\n\n\n## Code can generate code\n\n- `rlang::call2()` creates function call\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncall2(\"f\", 1, 2, 3)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> f(1, 2, 3)\n```\n\n\n:::\n:::\n\n\n- Going backwards from the tree, can use functions to create calls\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncall2(\"f1\", call2(\"f2\", \"a\", \"b\"), call2(\"f3\", 1))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> f1(f2(\"a\", \"b\"), f3(1))\n```\n\n\n:::\n\n```{.r .cell-code}\ncall2(\"+\", 1, call2(\"*\", 2, 3))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> 1 + 2 * 3\n```\n\n\n:::\n:::\n\n\n- `!!` bang-bang - **unquote operator**\n - inserts previously defined expressions into the current one\n\n\n::: {.cell}\n\n```{.r .cell-code}\nxx <- expr(x + x)\nyy <- expr(y + y)\nexpr(xx / yy) # Nope!\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> xx/yy\n```\n\n\n:::\n\n```{.r .cell-code}\nexpr(!!xx / !!yy) # Yup!\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> (x + x)/(y + y)\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncv <- function(var) {\n var <- enexpr(var) # Get user's expression\n expr(sd(!!var) / mean(!!var)) # Insert user's expression\n}\n\ncv(x)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> sd(x)/mean(x)\n```\n\n\n:::\n\n```{.r .cell-code}\ncv(x + y)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> sd(x + y)/mean(x + y)\n```\n\n\n:::\n:::\n\n\n- Avoid `paste()` for building code\n - Problems with non-syntactic names and precedence among expressions\n\n> \"You might think this is an esoteric concern, but not worrying about it when generating SQL code in web applications led to SQL injection attacks that have collectively cost billions of dollars.\"\n\n## Evaluation runs code\n\n- **evaluate** - run/execute an expression\n- need both expression and environment\n- `eval()` uses current environment if not set\n- manual evaluation means you can tweak the environment!\n\n\n::: {.cell}\n\n```{.r .cell-code}\nxy <- expr(x + y)\n\neval(xy, env(x = 1, y = 10))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 11\n```\n\n\n:::\n\n```{.r .cell-code}\neval(xy, env(x = 2, y = 100))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 102\n```\n\n\n:::\n:::\n\n\n\n## Customizing evaluations with functions\n- Can also bind names to functions in supplied environment\n- Allows overriding function behaviour\n- This is how dplyr generates SQL for working with databases\n\nFor example...\n\n::: {.cell}\n\n```{.r .cell-code}\nstring_math <- function(x) {\n e <- env(\n caller_env(),\n `+` = function(x, y) paste(x, y),\n `*` = function(x, y) strrep(x, y)\n )\n\n eval(enexpr(x), e)\n}\n\ncohort <- 9\nstring_math(\"Hello\" + \"cohort\" + cohort)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"Hello cohort 9\"\n```\n\n\n:::\n\n```{.r .cell-code}\nstring_math((\"dslc\" + \"is\" + \"awesome---\") * cohort)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"dslc is awesome---dslc is awesome---dslc is awesome---dslc is awesome---dslc is awesome---dslc is awesome---dslc is awesome---dslc is awesome---dslc is awesome---\"\n```\n\n\n:::\n:::\n\n\n\n## Customizing evaluation with data\n\n- Look for variables inside data frame\n- **Data mask** - typically a data frame\n- use `rlang::eval_tidy()` rather than `eval()`\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndf <- data.frame(x = 1:5, y = sample(5))\neval_tidy(expr(x + y), df)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 2 4 6 9 9\n```\n\n\n:::\n:::\n\n\nCatch user input with `enexpr()`...\n\n\n::: {.cell}\n\n```{.r .cell-code}\nwith2 <- function(df, expr) {\n eval_tidy(enexpr(expr), df)\n}\n\nwith2(df, x + y)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 2 4 6 9 9\n```\n\n\n:::\n:::\n\n\nBut there's a bug!\n\n- Evaluates in environment inside `with2()`, but the expression likely refers\n to objects in the Global environment\n \n\n::: {.cell}\n\n```{.r .cell-code}\nwith2 <- function(df, expr) {\n a <- 1000\n eval_tidy(enexpr(expr), df)\n}\n\ndf <- data.frame(x = 1:3)\na <- 10\nwith2(df, x + a)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1001 1002 1003\n```\n\n\n:::\n:::\n\n\n- Solved with Quosures...\n \n## Quosures\n\n- **Quosures** bundles expression with an environment\n- Use `enquo()` instead of `enexpr()` (with `eval_tidy()`)\n\n\n::: {.cell}\n\n```{.r .cell-code}\nwith2 <- function(df, expr) {\n a <- 1000\n eval_tidy(enquo(expr), df)\n}\n\ndf <- data.frame(x = 1:3)\na <- 10\nwith2(df, x + a)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 11 12 13\n```\n\n\n:::\n:::\n\n\n> \"Whenever you use a data mask, you must always use `enquo()` instead of `enexpr()`.\n\nThis comes back in Chapter 20.\n\n### Which environment is bundled?\n- The environment where the expression is created (i.e. the parent of where\n `enquo()` is called)\n\nHere, the global environment\n\n\n::: {.cell}\n\n```{.r .cell-code}\nwith2 <- function(df, expr) {\n a <- 1000\n eq <- enquo(expr)\n message(\"with2() Parent/Calling environment: \")\n print(rlang::caller_env())\n message(\"with2() environment: \")\n print(rlang::current_env())\n message(\"Quosure details: \")\n print(eq) # Print the details of the quosure\n eval_tidy(eq, df)\n}\n\na <- 10000\ndf <- data.frame(x = 1:3)\nwith2(df, x + a)\n```\n\n::: {.cell-output .cell-output-stderr}\n\n```\n#> with2() Parent/Calling environment:\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> <environment: R_GlobalEnv>\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\n#> with2() environment:\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> <environment: 0x0000018dede5ddd0>\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\n#> Quosure details:\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> <quosure>\n#> expr: ^x + a\n#> env: global\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 10001 10002 10003\n```\n\n\n:::\n:::\n\n\n\nHere, the `fun1()` environment\n\n::: {.cell}\n\n```{.r .cell-code}\nfun1 <- function(df) {\n a <- 10\n message(\"fun1() Parent/Calling environment: \")\n print(rlang::caller_env())\n message(\"fun1() environment: \")\n print(rlang::current_env())\n with2(df, x + a)\n}\n\na <- 10000\ndf <- data.frame(x = 1:3)\nfun1(df)\n```\n\n::: {.cell-output .cell-output-stderr}\n\n```\n#> fun1() Parent/Calling environment:\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> <environment: R_GlobalEnv>\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\n#> fun1() environment:\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> <environment: 0x0000018df3e748f8>\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\n#> with2() Parent/Calling environment:\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> <environment: 0x0000018df3e748f8>\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\n#> with2() environment:\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> <environment: 0x0000018df3ebc698>\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\n#> Quosure details:\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> <quosure>\n#> expr: ^x + a\n#> env: 0x0000018df3e748f8\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 11 12 13\n```\n\n\n:::\n:::\n\n", + "supporting": [ + "17_files" + ], "filters": [ "rmarkdown/pagebreak.lua" ], diff --git a/_freeze/slides/18/execute-results/html.json b/_freeze/slides/18/execute-results/html.json @@ -1,9 +1,11 @@ { - "hash": "8f6dfb169503d8c433fcd9640d517068", + "hash": "8eddf7bc07df6df9746c96fcb4301db8", "result": { "engine": "knitr", - "markdown": "---\nengine: knitr\ntitle: Expressions\n---\n\n## Learning objectives:\n\n* Understand the idea of the abstract syntax tree (AST). \n* Discuss the data structures that underlie the AST:\n * Constants\n * Symbols\n * Calls\n* Explore the idea behind parsing.\n* Explore some details of R's grammar.\n* Discuss the use or recursive functions to compute on the language.\n* Work with three other more specialized data structures:\n * Pairlists\n * Missing arguments\n * Expression vectors\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(rlang)\nlibrary(lobstr)\n```\n:::\n\n\n## Introduction\n\n> To compute on the language, we first need to understand its structure.\n\n* This requires a few things:\n * New vocabulary.\n * New tools to inspect and modify expressions.\n * Approach the use of the language with new ways of thinking.\n* One of the first new ways of thinking is the distinction between an operation and its result.\n\n\n::: {.cell}\n\n```{.r .cell-code}\ny <- x * 10\n```\n\n::: {.cell-output .cell-output-error}\n\n```\n#> Error: object 'x' not found\n```\n\n\n:::\n:::\n\n\n* We can capture the intent of the code without executing it using the rlang package.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nz <- rlang::expr(y <- x * 10)\n\nz\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> y <- x * 10\n```\n\n\n:::\n:::\n\n\n* We can then evaluate the expression using the **base::eval** function.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- 4\n\nbase::eval(expr(y <- x * 10))\n\ny\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 40\n```\n\n\n:::\n:::\n\n\n### Evaluating multiple expressions \n\n* The function `expression()` allows for multiple expressions, and in some ways it acts similarly to the way files are `source()`d in. That is, we `eval()`uate all of the expressions at once.\n\n* `expression()` returns a vector and can be passed to `eval()`.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nz <- expression(x <- 4, x * 10)\n\neval(z)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 40\n```\n\n\n:::\n\n```{.r .cell-code}\nis.atomic(z)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] FALSE\n```\n\n\n:::\n\n```{.r .cell-code}\nis.vector(z)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] TRUE\n```\n\n\n:::\n:::\n\n\n* `exprs()` does not evaluate everything at once. To evaluate each expression, the individual expressions must be evaluated in a loop.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nfor (i in exprs(x <- 4, x * 10)) {\nprint(i)\nprint(eval(i))\n}\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> x <- 4\n#> [1] 4\n#> x * 10\n#> [1] 40\n```\n\n\n:::\n:::\n\n\n## Abstract Syntax Tree (AST)\n\n* Expressions are objects that capture the structure of code without evaluating it.\n* Expressions are also called abstract syntax trees (ASTs) because the structure of code is hierarchical and can be naturally represented as a tree. \n* Understanding this tree structure is crucial for inspecting and modifying expressions.\n * Branches = Calls\n * Leaves = Symbols and constants\n\n\n::: {.cell}\n\n```{.r .cell-code}\nf(x, \"y\", 1)\n```\n:::\n\n\n\n\n### With `lobstr::ast():`\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlobstr::ast(f(x, \"y\", 1))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> █─f \n#> ├─x \n#> ├─\"y\" \n#> └─1\n```\n\n\n:::\n:::\n\n\n* Some functions might also contain more calls like the example below:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nf(g(1, 2), h(3, 4, i())):\n```\n:::\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlobstr::ast(f(g(1, 2), h(3, 4, i())))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> █─f \n#> ├─█─g \n#> │ ├─1 \n#> │ └─2 \n#> └─█─h \n#> ├─3 \n#> ├─4 \n#> └─█─i\n```\n\n\n:::\n:::\n\n* Read the **hand-drawn diagrams** from left-to-right (ignoring vertical position)\n* Read the **lobstr-drawn diagrams** from top-to-bottom (ignoring horizontal position).\n* The depth within the tree is determined by the nesting of function calls. \n* Depth also determines evaluation order, **as evaluation generally proceeds from deepest-to-shallowest, but this is not guaranteed because of lazy evaluation**.\n\n### Infix calls\n\n> Every call in R can be written in tree form because any call can be written in prefix form.\n\nAn infix operator is a function where the function name is placed between its arguments. Prefix form is when then function name comes before the arguments, which are enclosed in parentheses. [Note that the name infix comes from the words prefix and suffix.]\n\n\n::: {.cell}\n\n```{.r .cell-code}\ny <- x * 10\n`<-`(y, `*`(x, 10))\n```\n:::\n\n\n* A characteristic of the language is that infix functions can always be written as prefix functions; therefore, all function calls can be represented using an AST.\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlobstr::ast(y <- x * 10)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> █─`<-` \n#> ├─y \n#> └─█─`*` \n#> ├─x \n#> └─10\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlobstr::ast(`<-`(y, `*`(x, 10)))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> █─`<-` \n#> ├─y \n#> └─█─`*` \n#> ├─x \n#> └─10\n```\n\n\n:::\n:::\n\n\n* There is no difference between the ASTs for the infix version vs the prefix version, and if you generate an expression with prefix calls, R will still print it in infix form:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nrlang::expr(`<-`(y, `*`(x, 10)))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> y <- x * 10\n```\n\n\n:::\n:::\n\n\n## Expression \n\n* Collectively, the data structures present in the AST are called expressions.\n* These include:\n 1. Constants\n 2. Symbols\n 3. Calls \n 4. Pairlists\n\n### Constants\n\n* Scalar constants are the simplest component of the AST. \n* A constant is either **NULL** or a **length-1** atomic vector (or scalar) \n * e.g., `TRUE`, `1L`, `2.5`, `\"x\"`, or `\"hello\"`. \n* We can test for a constant with `rlang::is_syntactic_literal()`.\n* Constants are self-quoting in the sense that the expression used to represent a constant is the same constant:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nidentical(expr(TRUE), TRUE)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] TRUE\n```\n\n\n:::\n\n```{.r .cell-code}\nidentical(expr(1), 1)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] TRUE\n```\n\n\n:::\n\n```{.r .cell-code}\nidentical(expr(2L), 2L)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] TRUE\n```\n\n\n:::\n\n```{.r .cell-code}\nidentical(expr(\"x\"), \"x\")\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] TRUE\n```\n\n\n:::\n\n```{.r .cell-code}\nidentical(expr(\"hello\"), \"hello\")\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] TRUE\n```\n\n\n:::\n:::\n\n\n### Symbols\n\n* A symbol represents the name of an object.\n * `x`\n * `mtcars`\n * `mean`\n* In base R, the terms symbol and name are used interchangeably (i.e., `is.name()` is identical to `is.symbol()`), but this book used symbol consistently because **\"name\"** has many other meanings.\n* You can create a symbol in two ways: \n 1. by capturing code that references an object with `expr()`.\n 2. turning a string into a symbol with `rlang::sym()`.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nexpr(x)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> x\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsym(\"x\")\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> x\n```\n\n\n:::\n:::\n\n\n* A symbol can be turned back into a string with `as.character()` or `rlang::as_string()`. \n* `as_string()` has the advantage of clearly signalling that you’ll get a character vector of length 1.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nas_string(expr(x))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"x\"\n```\n\n\n:::\n:::\n\n\n* We can recognize a symbol because it is printed without quotes\n\n\n::: {.cell}\n\n```{.r .cell-code}\nexpr(x)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> x\n```\n\n\n:::\n:::\n\n\n* `str()` tells you that it is a symbol, and `is.symbol()` is TRUE:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nstr(expr(x))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> symbol x\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nis.symbol(expr(x))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] TRUE\n```\n\n\n:::\n:::\n\n\n* The symbol type is not vectorised, i.e., a symbol is always length 1. \n* If you want multiple symbols, you’ll need to put them in a list, using `rlang::syms()`.\n\nNote that `as_string()` will not work on expressions which are not symbols.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nas_string(expr(x+y))\n```\n\n::: {.cell-output .cell-output-error}\n\n```\n#> Error in `as_string()`:\n#> ! Can't convert a call to a string.\n```\n\n\n:::\n:::\n\n\n\n### Calls\n\n* A call object represents a captured function call. \n* Call objects are a special type of list. \n * The first component specifies the function to call (usually a symbol, i.e., the name fo the function). \n * The remaining elements are the arguments for that call. \n* Call objects create branches in the AST, because calls can be nested inside other calls.\n* You can identify a call object when printed because it looks just like a function call. \n* Confusingly `typeof()` and `str()` print language for call objects (where we might expect it to return that it is a \"call\" object), but `is.call()` returns TRUE:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlobstr::ast(read.table(\"important.csv\", row.names = FALSE))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> █─read.table \n#> ├─\"important.csv\" \n#> └─row.names = FALSE\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- expr(read.table(\"important.csv\", row.names = FALSE))\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntypeof(x)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"language\"\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nis.call(x)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] TRUE\n```\n\n\n:::\n:::\n\n\n### Subsetting\n\n* Calls generally behave like lists.\n* Since they are list-like, you can use standard subsetting tools. \n* The first element of the call object is the function to call, which is usually a symbol:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx[[1]]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> read.table\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nis.symbol(x[[1]])\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] TRUE\n```\n\n\n:::\n:::\n\n* The remainder of the elements are the arguments:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nis.symbol(x[-1])\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] FALSE\n```\n\n\n:::\n\n```{.r .cell-code}\nas.list(x[-1])\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [[1]]\n#> [1] \"important.csv\"\n#> \n#> $row.names\n#> [1] FALSE\n```\n\n\n:::\n:::\n\n* We can extract individual arguments with [[ or, if named, $:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx[[2]]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"important.csv\"\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx$row.names\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] FALSE\n```\n\n\n:::\n:::\n\n\n* We can determine the number of arguments in a call object by subtracting 1 from its length:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlength(x) - 1\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 2\n```\n\n\n:::\n:::\n\n\n* Extracting specific arguments from calls is challenging because of R’s flexible rules for argument matching:\n * It could potentially be in any location, with the full name, with an abbreviated name, or with no name. \n\n* To work around this problem, you can use `rlang::call_standardise()` which standardizes all arguments to use the full name:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nrlang::call_standardise(x)\n```\n\n::: {.cell-output .cell-output-stderr}\n\n```\n#> Warning: `call_standardise()` is deprecated as of rlang 0.4.11\n#> This warning is displayed once every 8 hours.\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> read.table(file = \"important.csv\", row.names = FALSE)\n```\n\n\n:::\n:::\n\n\n* But If the function uses ... it’s not possible to standardise all arguments.\n* Calls can be modified in the same way as lists:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx$header <- TRUE\nx\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> read.table(\"important.csv\", row.names = FALSE, header = TRUE)\n```\n\n\n:::\n:::\n\n\n### Function position\n\n* The first element of the call object is the function position. This contains the function that will be called when the object is evaluated, and is usually a symbol.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlobstr::ast(foo())\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> █─foo\n```\n\n\n:::\n:::\n\n\n* While R allows you to surround the name of the function with quotes, the parser converts it to a symbol:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlobstr::ast(\"foo\"())\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> █─foo\n```\n\n\n:::\n:::\n\n\n* However, sometimes the function doesn’t exist in the current environment and you need to do some computation to retrieve it: \n * For example, if the function is in another package, is a method of an R6 object, or is created by a function factory. In this case, the function position will be occupied by another call:\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlobstr::ast(pkg::foo(1))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> █─█─`::` \n#> │ ├─pkg \n#> │ └─foo \n#> └─1\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlobstr::ast(obj$foo(1))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> █─█─`$` \n#> │ ├─obj \n#> │ └─foo \n#> └─1\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlobstr::ast(foo(1)(2))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> █─█─foo \n#> │ └─1 \n#> └─2\n```\n\n\n:::\n:::\n\n\n\n\n### Constructing\n\n* You can construct a call object from its components using `rlang::call2()`. \n* The first argument is the name of the function to call (either as a string, a symbol, or another call).\n* The remaining arguments will be passed along to the call:\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncall2(\"mean\", x = expr(x), na.rm = TRUE)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> mean(x = x, na.rm = TRUE)\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncall2(expr(base::mean), x = expr(x), na.rm = TRUE)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> base::mean(x = x, na.rm = TRUE)\n```\n\n\n:::\n:::\n\n\n* Infix calls created in this way still print as usual.\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncall2(\"<-\", expr(x), 10)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> x <- 10\n```\n\n\n:::\n:::\n\n\n## Parsing and grammar\n\n* **Parsing** - The process by which a computer language takes a string and constructs an expression. Parsing is governed by a set of rules known as a grammar. \n* We are going to use `lobstr::ast()` to explore some of the details of R’s grammar, and then show how you can transform back and forth between expressions and strings.\n* **Operator precedence** - Conventions used by the programming language to resolve ambiguity.\n* Infix functions introduce two sources of ambiguity.\n* The first source of ambiguity arises from infix functions: what does 1 + 2 * 3 yield? Do you get 9 (i.e., (1 + 2) * 3), or 7 (i.e., 1 + (2 * 3))? In other words, which of the two possible parse trees below does R use?\n\n\n\n* Programming languages use conventions called operator precedence to resolve this ambiguity. We can use `ast()` to see what R does:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlobstr::ast(1 + 2 * 3)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> █─`+` \n#> ├─1 \n#> └─█─`*` \n#> ├─2 \n#> └─3\n```\n\n\n:::\n:::\n\n\n* PEMDAS (or BEDMAS or BODMAS, depending on where in the world you grew up) is pretty clear on what to do. Other operator precedence isn't as clear. \n* There’s one particularly surprising case in R: \n * ! has a much lower precedence (i.e., it binds less tightly) than you might expect. \n * This allows you to write useful operations like:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlobstr::ast(!x %in% y)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> █─`!` \n#> └─█─`%in%` \n#> ├─x \n#> └─y\n```\n\n\n:::\n:::\n\n* **R has over 30 infix operators divided into 18 precedence** groups. \n* While the details are described in `?Syntax`, very few people have memorized the complete ordering.\n* If there’s any confusion, use parentheses!\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# override PEMDAS\nlobstr::ast((1 + 2) * 3)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> █─`*` \n#> ├─█─`(` \n#> │ └─█─`+` \n#> │ ├─1 \n#> │ └─2 \n#> └─3\n```\n\n\n:::\n:::\n\n\n### Associativity\n\n* The second source of ambiguity is introduced by repeated usage of the same infix function. \n\n\n::: {.cell}\n\n```{.r .cell-code}\n1 + 2 + 3\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 6\n```\n\n\n:::\n\n```{.r .cell-code}\n# What does R do first?\n(1 + 2) + 3\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 6\n```\n\n\n:::\n\n```{.r .cell-code}\n# or\n1 + (2 + 3)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 6\n```\n\n\n:::\n:::\n\n\n* In this case it doesn't matter. Other places it might, like in `ggplot2`. \n\n* In R, most operators are left-associative, i.e., the operations on the left are evaluated first:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlobstr::ast(1 + 2 + 3)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> █─`+` \n#> ├─█─`+` \n#> │ ├─1 \n#> │ └─2 \n#> └─3\n```\n\n\n:::\n:::\n\n\n* There are two exceptions to the left-associative rule:\n 1. exponentiation\n 2. assignment\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlobstr::ast(2 ^ 2 ^ 3)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> █─`^` \n#> ├─2 \n#> └─█─`^` \n#> ├─2 \n#> └─3\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlobstr::ast(x <- y <- z)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> █─`<-` \n#> ├─x \n#> └─█─`<-` \n#> ├─y \n#> └─z\n```\n\n\n:::\n:::\n\n\n### Parsing and deparsing\n\n* Parsing - turning characters you've typed into an AST (i.e., from strings to expressions).\n* R usually takes care of parsing code for us. \n* But occasionally you have code stored as a string, and you want to parse it yourself. \n* You can do so using `rlang::parse_expr()`:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx1 <- \"y <- x + 10\"\nx1\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"y <- x + 10\"\n```\n\n\n:::\n\n```{.r .cell-code}\nis.call(x1)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] FALSE\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx2 <- rlang::parse_expr(x1)\nx2\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> y <- x + 10\n```\n\n\n:::\n\n```{.r .cell-code}\nis.call(x2)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] TRUE\n```\n\n\n:::\n:::\n\n\n* `parse_expr()` always returns a single expression.\n* If you have multiple expression separated by `;` or `,`, you’ll need to use `rlang::parse_exprs()` which is the plural version of `rlang::parse_expr()`. It returns a list of expressions:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx3 <- \"a <- 1; a + 1\"\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nrlang::parse_exprs(x3)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [[1]]\n#> a <- 1\n#> \n#> [[2]]\n#> a + 1\n```\n\n\n:::\n:::\n\n\n* If you find yourself parsing strings into expressions often, **quasiquotation** may be a safer approach.\n * More about quasiquaotation in Chapter 19.\n* The inverse of parsing is deparsing.\n* **Deparsing** - given an expression, you want the string that would generate it. \n* Deparsing happens automatically when you print an expression.\n* You can get the string with `rlang::expr_text()`:\n* Parsing and deparsing are not symmetric.\n * Parsing creates the AST which means that we lose backticks around ordinary names, comments, and whitespace.\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncat(expr_text(expr({\n # This is a comment\n x <- `x` + 1\n})))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> {\n#> x <- x + 1\n#> }\n```\n\n\n:::\n:::\n\n\n## Using the AST to solve more complicated problems\n\n* Here we focus on what we learned to perform recursion on the AST.\n* Two parts of a recursive function:\n * Recursive case: handles the nodes in the tree. Typically, you’ll do something to each child of a node, usually calling the recursive function again, and then combine the results back together again. For expressions, you’ll need to handle calls and pairlists (function arguments).\n * Base case: handles the leaves of the tree. The base cases ensure that the function eventually terminates, by solving the simplest cases directly. For expressions, you need to handle symbols and constants in the base case.\n\n\n### Two helper functions\n\n* First, we need an `epxr_type()` function to return the type of expression element as a string.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nexpr_type <- function(x) {\n if (rlang::is_syntactic_literal(x)) {\n \"constant\"\n } else if (is.symbol(x)) {\n \"symbol\"\n } else if (is.call(x)) {\n \"call\"\n } else if (is.pairlist(x)) {\n \"pairlist\"\n } else {\n typeof(x)\n }\n}\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nexpr_type(expr(\"a\"))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"constant\"\n```\n\n\n:::\n\n```{.r .cell-code}\nexpr_type(expr(x))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"symbol\"\n```\n\n\n:::\n\n```{.r .cell-code}\nexpr_type(expr(f(1, 2)))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"call\"\n```\n\n\n:::\n:::\n\n\n* Second, we need a wrapper function to handle exceptions.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nswitch_expr <- function(x, ...) {\n switch(expr_type(x),\n ...,\n stop(\"Don't know how to handle type \", typeof(x), call. = FALSE)\n )\n}\n```\n:::\n\n\n* Lastly, we can write a basic template that walks the AST using the `switch()` statement.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nrecurse_call <- function(x) {\n switch_expr(x,\n # Base cases\n symbol = ,\n constant = ,\n\n # Recursive cases\n call = ,\n pairlist =\n )\n}\n```\n:::\n\n\n### Specific use cases for `recurse_call()`\n\n### Example 1: Finding F and T\n\n* Using `F` and `T` in our code rather than `FALSE` and `TRUE` is bad practice.\n* Say we want to walk the AST to find times when we use `F` and `T`.\n* Start off by finding the type of `T` vs `TRUE`.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nexpr_type(expr(TRUE))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"constant\"\n```\n\n\n:::\n\n```{.r .cell-code}\nexpr_type(expr(T))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"symbol\"\n```\n\n\n:::\n:::\n\n\n* With this knowledge, we can now write the base cases of our recursive function.\n* The logic is as follows:\n * A constant is never a logical abbreviation and a symbol is an abbreviation if it is \"F\" or \"T\":\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlogical_abbr_rec <- function(x) {\n switch_expr(x,\n constant = FALSE,\n symbol = as_string(x) %in% c(\"F\", \"T\")\n )\n}\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlogical_abbr_rec(expr(TRUE))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] FALSE\n```\n\n\n:::\n\n```{.r .cell-code}\nlogical_abbr_rec(expr(T))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] TRUE\n```\n\n\n:::\n:::\n\n\n* It's best practice to write another wrapper, assuming every input you receive will be an expression.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlogical_abbr <- function(x) {\n logical_abbr_rec(enexpr(x))\n}\n\nlogical_abbr(T)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] TRUE\n```\n\n\n:::\n\n```{.r .cell-code}\nlogical_abbr(FALSE)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] FALSE\n```\n\n\n:::\n:::\n\n\n#### Next step: code for the recursive cases\n\n* Here we want to do the same thing for calls and for pairlists.\n* Here's the logic: recursively apply the function to each subcomponent, and return `TRUE` if any subcomponent contains a logical abbreviation.\n* This is simplified by using the `purrr::some()` function, which iterates over a list and returns `TRUE` if the predicate function is true for any element.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlogical_abbr_rec <- function(x) {\n switch_expr(x,\n # Base cases\n constant = FALSE,\n symbol = as_string(x) %in% c(\"F\", \"T\"),\n # Recursive cases\n call = ,\n # Are we sure this is the correct function to use?\n # Why not logical_abbr_rec?\n pairlist = purrr::some(x, logical_abbr_rec)\n )\n}\n\nlogical_abbr(mean(x, na.rm = T))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] TRUE\n```\n\n\n:::\n\n```{.r .cell-code}\nlogical_abbr(function(x, na.rm = T) FALSE)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] TRUE\n```\n\n\n:::\n:::\n\n\n### Example 2: Finding all variables created by assignment\n\n* Listing all the variables is a little more complicated. \n* Figure out what assignment looks like based on the AST.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nast(x <- 10)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> █─`<-` \n#> ├─x \n#> └─10\n```\n\n\n:::\n:::\n\n\n* Now we need to decide what data structure we're going to use for the results.\n * Easiest thing will be to return a character vector.\n * We would need to use a list if we wanted to return symbols.\n\n### Dealing with the base cases\n\n\n::: {.cell}\n\n```{.r .cell-code}\nfind_assign_rec <- function(x) {\n switch_expr(x,\n constant = ,\n symbol = character()\n )\n}\nfind_assign <- function(x) find_assign_rec(enexpr(x))\n\nfind_assign(\"x\")\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> character(0)\n```\n\n\n:::\n\n```{.r .cell-code}\nfind_assign(x)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> character(0)\n```\n\n\n:::\n:::\n\n\n### Dealing with the recursive cases\n\n* Here is the function to flatten pairlists.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nflat_map_chr <- function(.x, .f, ...) {\n purrr::flatten_chr(purrr::map(.x, .f, ...))\n}\n\nflat_map_chr(letters[1:3], ~ rep(., sample(3, 1)))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"a\" \"b\" \"b\" \"c\" \"c\"\n```\n\n\n:::\n:::\n\n\n* Here is the code needed to identify calls.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nfind_assign_rec <- function(x) {\n switch_expr(x,\n # Base cases\n constant = ,\n symbol = character(),\n\n # Recursive cases\n pairlist = flat_map_chr(as.list(x), find_assign_rec),\n call = {\n if (is_call(x, \"<-\")) {\n as_string(x[[2]])\n } else {\n flat_map_chr(as.list(x), find_assign_rec)\n }\n }\n )\n}\n\nfind_assign(a <- 1)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"a\"\n```\n\n\n:::\n\n```{.r .cell-code}\nfind_assign({\n a <- 1\n {\n b <- 2\n }\n})\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"a\" \"b\"\n```\n\n\n:::\n:::\n\n\n### Make the function more robust\n\n* Throw cases at it that we think might break the function. \n* Write a function to handle these cases.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nfind_assign_call <- function(x) {\n if (is_call(x, \"<-\") && is_symbol(x[[2]])) {\n lhs <- as_string(x[[2]])\n children <- as.list(x)[-1]\n } else {\n lhs <- character()\n children <- as.list(x)\n }\n\n c(lhs, flat_map_chr(children, find_assign_rec))\n}\n\nfind_assign_rec <- function(x) {\n switch_expr(x,\n # Base cases\n constant = ,\n symbol = character(),\n\n # Recursive cases\n pairlist = flat_map_chr(x, find_assign_rec),\n call = find_assign_call(x)\n )\n}\n\nfind_assign(a <- b <- c <- 1)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"a\" \"b\" \"c\"\n```\n\n\n:::\n\n```{.r .cell-code}\nfind_assign(system.time(x <- print(y <- 5)))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"x\" \"y\"\n```\n\n\n:::\n:::\n\n\n* This approach certainly is more complicated, but it's important to start simple and move up.\n\n## Specialised data structures\n\n* Pairlists\n* Missing arguments \n* Expression vectors\n\n### Pairlists\n\n* Pairlists are a remnant of R’s past and have been replaced by lists almost everywhere. \n* The only place you are likely to see pairlists in R is when working with calls to the function, as the formal arguments to a function are stored in a pairlist:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nf <- expr(function(x, y = 10) x + y)\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nargs <- f[[2]]\nargs\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> $x\n#> \n#> \n#> $y\n#> [1] 10\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntypeof(args)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"pairlist\"\n```\n\n\n:::\n:::\n\n* Fortunately, whenever you encounter a pairlist, you can treat it just like a regular list:\n\n\n::: {.cell}\n\n```{.r .cell-code}\npl <- pairlist(x = 1, y = 2)\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlength(pl)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 2\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\npl$x\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1\n```\n\n\n:::\n:::\n\n\n### Missing arguments\n\n* Empty symbols\n* To create an empty symbol, you need to use `missing_arg()` or `expr()`.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmissing_arg()\n```\n\n```{.r .cell-code}\ntypeof(missing_arg())\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"symbol\"\n```\n\n\n:::\n:::\n\n\n* Empty symbols don't print anything.\n * To check, we need to use `rlang::is_missing()`\n\n\n::: {.cell}\n\n```{.r .cell-code}\nis_missing(missing_arg())\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] TRUE\n```\n\n\n:::\n:::\n\n\n* These are usually present in function formals:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nf <- expr(function(x, y = 10) x + y)\n\nargs <- f[[2]]\n\n\nis_missing(args[[1]])\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] TRUE\n```\n\n\n:::\n:::\n\n\n### Expression vectors\n\n* An expression vector is just a list of expressions.\n * The only difference is that calling `eval()` on an expression evaluates each individual expression. \n * Instead, it might be more advantageous to use a list of expressions.\n\n* Expression vectors are only produced by two base functions: \n `expression()` and `parse()`:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nexp1 <- parse(text = c(\" \nx <- 4\nx\n\"))\nexp1\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> expression(x <- 4, x)\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nexp2 <- expression(x <- 4, x)\nexp2\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> expression(x <- 4, x)\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntypeof(exp1)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"expression\"\n```\n\n\n:::\n\n```{.r .cell-code}\ntypeof(exp2)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"expression\"\n```\n\n\n:::\n:::\n\n\n\n- Like calls and pairlists, expression vectors behave like lists:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlength(exp1)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 2\n```\n\n\n:::\n\n```{.r .cell-code}\nexp1[[1]]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> x <- 4\n```\n\n\n:::\n:::\n\n\n## Meeting Videos\n\n### Cohort 1\n\n<iframe src=\"https://www.youtube.com/embed/2NixH3QAerQ\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 2\n\n<iframe src=\"https://www.youtube.com/embed/mYOUgzoRcjI\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 3\n\n<iframe src=\"https://www.youtube.com/embed/5RLCRFli6QI\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n<iframe src=\"https://www.youtube.com/embed/F8df5PMNC8Y\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 4\n\n<iframe src=\"https://www.youtube.com/embed/tSVBlAP5DIY\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 5\n\n<iframe src=\"https://www.youtube.com/embed/Jc_R4yFsYeE\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 6\n\n<iframe src=\"https://www.youtube.com/embed/K8w28ee3CR8\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 7\n\n<iframe src=\"https://www.youtube.com/embed/XPs-TI4BYjk\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n<iframe src=\"https://www.youtube.com/embed/8LPw_VTBsmQ\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n<details>\n<summary>Meeting chat log</summary>\n```\n00:50:48\tStone:\thttps://www.r-bloggers.com/2018/10/quasiquotation-in-r-via-bquote/\n00:58:26\tiPhone:\tSee ya next week!\n```\n</details>\n", - "supporting": [], + "markdown": "---\nengine: knitr\ntitle: Expressions\n---\n\n## Learning objectives:\n\n* Understand the idea of the abstract syntax tree (AST). \n* Discuss the data structures that underlie the AST:\n * Constants\n * Symbols\n * Calls\n* Explore the idea behind parsing.\n* Explore some details of R's grammar.\n* Discuss the use or recursive functions to compute on the language.\n* Work with three other more specialized data structures:\n * Pairlists\n * Missing arguments\n * Expression vectors\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(rlang)\nlibrary(lobstr)\n```\n:::\n\n\n## Introduction\n\n> To compute on the language, we first need to understand its structure.\n\n* This requires a few things:\n * New vocabulary.\n * New tools to inspect and modify expressions.\n * Approach the use of the language with new ways of thinking.\n* One of the first new ways of thinking is the distinction between an operation and its result.\n\n\n::: {.cell}\n\n```{.r .cell-code}\ny <- x * 10\n```\n\n::: {.cell-output .cell-output-error}\n\n```\n#> Error: object 'x' not found\n```\n\n\n:::\n:::\n\n\n* We can capture the intent of the code without executing it using the rlang package.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nz <- rlang::expr(y <- x * 10)\n\nz\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> y <- x * 10\n```\n\n\n:::\n:::\n\n\n* We can then evaluate the expression using the **base::eval** function.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- 4\n\nbase::eval(expr(y <- x * 10))\n\ny\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 40\n```\n\n\n:::\n:::\n\n\n### Evaluating multiple expressions \n\n* The function `expression()` allows for multiple expressions, and in some ways it acts similarly to the way files are `source()`d in. That is, we `eval()`uate all of the expressions at once.\n\n* `expression()` returns a vector and can be passed to `eval()`.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nz <- expression(x <- 4, x * 10)\n\neval(z)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 40\n```\n\n\n:::\n\n```{.r .cell-code}\nis.atomic(z)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] FALSE\n```\n\n\n:::\n\n```{.r .cell-code}\nis.vector(z)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] TRUE\n```\n\n\n:::\n:::\n\n\n* `exprs()` does not evaluate everything at once. To evaluate each expression, the individual expressions must be evaluated in a loop.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nfor (i in exprs(x <- 4, x * 10)) {\nprint(i)\nprint(eval(i))\n}\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> x <- 4\n#> [1] 4\n#> x * 10\n#> [1] 40\n```\n\n\n:::\n:::\n\n\n## Abstract Syntax Tree (AST)\n\n* Expressions are objects that capture the structure of code without evaluating it.\n* Expressions are also called abstract syntax trees (ASTs) because the structure of code is hierarchical and can be naturally represented as a tree. \n* Understanding this tree structure is crucial for inspecting and modifying expressions.\n * Branches = Calls\n * Leaves = Symbols and constants\n\n\n::: {.cell}\n\n```{.r .cell-code}\nf(x, \"y\", 1)\n```\n:::\n\n\n\n\n### With `lobstr::ast():`\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlobstr::ast(f(x, \"y\", 1))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> █─f \n#> ├─x \n#> ├─\"y\" \n#> └─1\n```\n\n\n:::\n:::\n\n\n* Some functions might also contain more calls like the example below:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nf(g(1, 2), h(3, 4, i())):\n```\n:::\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlobstr::ast(f(g(1, 2), h(3, 4, i())))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> █─f \n#> ├─█─g \n#> │ ├─1 \n#> │ └─2 \n#> └─█─h \n#> ├─3 \n#> ├─4 \n#> └─█─i\n```\n\n\n:::\n:::\n\n* Read the **hand-drawn diagrams** from left-to-right (ignoring vertical position)\n* Read the **lobstr-drawn diagrams** from top-to-bottom (ignoring horizontal position).\n* The depth within the tree is determined by the nesting of function calls. \n* Depth also determines evaluation order, **as evaluation generally proceeds from deepest-to-shallowest, but this is not guaranteed because of lazy evaluation**.\n\n### Infix calls\n\n> Every call in R can be written in tree form because any call can be written in prefix form.\n\nAn infix operator is a function where the function name is placed between its arguments. Prefix form is when then function name comes before the arguments, which are enclosed in parentheses. [Note that the name infix comes from the words prefix and suffix.]\n\n\n::: {.cell}\n\n```{.r .cell-code}\ny <- x * 10\n`<-`(y, `*`(x, 10))\n```\n:::\n\n\n* A characteristic of the language is that infix functions can always be written as prefix functions; therefore, all function calls can be represented using an AST.\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlobstr::ast(y <- x * 10)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> █─`<-` \n#> ├─y \n#> └─█─`*` \n#> ├─x \n#> └─10\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlobstr::ast(`<-`(y, `*`(x, 10)))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> █─`<-` \n#> ├─y \n#> └─█─`*` \n#> ├─x \n#> └─10\n```\n\n\n:::\n:::\n\n\n* There is no difference between the ASTs for the infix version vs the prefix version, and if you generate an expression with prefix calls, R will still print it in infix form:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nrlang::expr(`<-`(y, `*`(x, 10)))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> y <- x * 10\n```\n\n\n:::\n:::\n\n\n## Expression \n\n* Collectively, the data structures present in the AST are called expressions.\n* These include:\n 1. Constants\n 2. Symbols\n 3. Calls \n 4. Pairlists\n\n### Constants\n\n* Scalar constants are the simplest component of the AST. \n* A constant is either **NULL** or a **length-1** atomic vector (or scalar) \n * e.g., `TRUE`, `1L`, `2.5`, `\"x\"`, or `\"hello\"`. \n* We can test for a constant with `rlang::is_syntactic_literal()`.\n* Constants are self-quoting in the sense that the expression used to represent a constant is the same constant:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nidentical(expr(TRUE), TRUE)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] TRUE\n```\n\n\n:::\n\n```{.r .cell-code}\nidentical(expr(1), 1)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] TRUE\n```\n\n\n:::\n\n```{.r .cell-code}\nidentical(expr(2L), 2L)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] TRUE\n```\n\n\n:::\n\n```{.r .cell-code}\nidentical(expr(\"x\"), \"x\")\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] TRUE\n```\n\n\n:::\n\n```{.r .cell-code}\nidentical(expr(\"hello\"), \"hello\")\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] TRUE\n```\n\n\n:::\n:::\n\n\n### Symbols\n\n* A symbol represents the name of an object.\n * `x`\n * `mtcars`\n * `mean`\n* In base R, the terms symbol and name are used interchangeably (i.e., `is.name()` is identical to `is.symbol()`), but this book used symbol consistently because **\"name\"** has many other meanings.\n* You can create a symbol in two ways: \n 1. by capturing code that references an object with `expr()`.\n 2. turning a string into a symbol with `rlang::sym()`.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nexpr(x)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> x\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsym(\"x\")\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> x\n```\n\n\n:::\n:::\n\n\n* A symbol can be turned back into a string with `as.character()` or `rlang::as_string()`. \n* `as_string()` has the advantage of clearly signalling that you’ll get a character vector of length 1.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nas_string(expr(x))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"x\"\n```\n\n\n:::\n:::\n\n\n* We can recognize a symbol because it is printed without quotes\n\n\n::: {.cell}\n\n```{.r .cell-code}\nexpr(x)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> x\n```\n\n\n:::\n:::\n\n\n* `str()` tells you that it is a symbol, and `is.symbol()` is TRUE:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nstr(expr(x))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> symbol x\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nis.symbol(expr(x))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] TRUE\n```\n\n\n:::\n:::\n\n\n* The symbol type is not vectorised, i.e., a symbol is always length 1. \n* If you want multiple symbols, you’ll need to put them in a list, using `rlang::syms()`.\n\nNote that `as_string()` will not work on expressions which are not symbols.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nas_string(expr(x+y))\n```\n\n::: {.cell-output .cell-output-error}\n\n```\n#> Error in `as_string()`:\n#> ! Can't convert a call to a string.\n```\n\n\n:::\n:::\n\n\n\n### Calls\n\n* A call object represents a captured function call. \n* Call objects are a special type of list. \n * The first component specifies the function to call (usually a symbol, i.e., the name fo the function). \n * The remaining elements are the arguments for that call. \n* Call objects create branches in the AST, because calls can be nested inside other calls.\n* You can identify a call object when printed because it looks just like a function call. \n* Confusingly `typeof()` and `str()` print language for call objects (where we might expect it to return that it is a \"call\" object), but `is.call()` returns TRUE:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlobstr::ast(read.table(\"important.csv\", row.names = FALSE))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> █─read.table \n#> ├─\"important.csv\" \n#> └─row.names = FALSE\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- expr(read.table(\"important.csv\", row.names = FALSE))\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntypeof(x)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"language\"\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nis.call(x)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] TRUE\n```\n\n\n:::\n:::\n\n\n### Subsetting\n\n* Calls generally behave like lists.\n* Since they are list-like, you can use standard subsetting tools. \n* The first element of the call object is the function to call, which is usually a symbol:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx[[1]]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> read.table\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nis.symbol(x[[1]])\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] TRUE\n```\n\n\n:::\n:::\n\n* The remainder of the elements are the arguments:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nis.symbol(x[-1])\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] FALSE\n```\n\n\n:::\n\n```{.r .cell-code}\nas.list(x[-1])\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [[1]]\n#> [1] \"important.csv\"\n#> \n#> $row.names\n#> [1] FALSE\n```\n\n\n:::\n:::\n\n* We can extract individual arguments with [[ or, if named, $:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx[[2]]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"important.csv\"\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx$row.names\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] FALSE\n```\n\n\n:::\n:::\n\n\n* We can determine the number of arguments in a call object by subtracting 1 from its length:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlength(x) - 1\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 2\n```\n\n\n:::\n:::\n\n\n* Extracting specific arguments from calls is challenging because of R’s flexible rules for argument matching:\n * It could potentially be in any location, with the full name, with an abbreviated name, or with no name. \n\n* To work around this problem, you can use `rlang::call_standardise()` which standardizes all arguments to use the full name:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nrlang::call_standardise(x)\n```\n\n::: {.cell-output .cell-output-stderr}\n\n```\n#> Warning: `call_standardise()` is deprecated as of rlang 0.4.11\n#> This warning is displayed once every 8 hours.\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> read.table(file = \"important.csv\", row.names = FALSE)\n```\n\n\n:::\n:::\n\n\n* But If the function uses ... it’s not possible to standardise all arguments.\n* Calls can be modified in the same way as lists:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx$header <- TRUE\nx\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> read.table(\"important.csv\", row.names = FALSE, header = TRUE)\n```\n\n\n:::\n:::\n\n\n### Function position\n\n* The first element of the call object is the function position. This contains the function that will be called when the object is evaluated, and is usually a symbol.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlobstr::ast(foo())\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> █─foo\n```\n\n\n:::\n:::\n\n\n* While R allows you to surround the name of the function with quotes, the parser converts it to a symbol:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlobstr::ast(\"foo\"())\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> █─foo\n```\n\n\n:::\n:::\n\n\n* However, sometimes the function doesn’t exist in the current environment and you need to do some computation to retrieve it: \n * For example, if the function is in another package, is a method of an R6 object, or is created by a function factory. In this case, the function position will be occupied by another call:\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlobstr::ast(pkg::foo(1))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> █─█─`::` \n#> │ ├─pkg \n#> │ └─foo \n#> └─1\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlobstr::ast(obj$foo(1))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> █─█─`$` \n#> │ ├─obj \n#> │ └─foo \n#> └─1\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlobstr::ast(foo(1)(2))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> █─█─foo \n#> │ └─1 \n#> └─2\n```\n\n\n:::\n:::\n\n\n\n\n### Constructing\n\n* You can construct a call object from its components using `rlang::call2()`. \n* The first argument is the name of the function to call (either as a string, a symbol, or another call).\n* The remaining arguments will be passed along to the call:\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncall2(\"mean\", x = expr(x), na.rm = TRUE)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> mean(x = x, na.rm = TRUE)\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncall2(expr(base::mean), x = expr(x), na.rm = TRUE)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> base::mean(x = x, na.rm = TRUE)\n```\n\n\n:::\n:::\n\n\n* Infix calls created in this way still print as usual.\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncall2(\"<-\", expr(x), 10)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> x <- 10\n```\n\n\n:::\n:::\n\n\n## Parsing and grammar\n\n* **Parsing** - The process by which a computer language takes a string and constructs an expression. Parsing is governed by a set of rules known as a grammar. \n* We are going to use `lobstr::ast()` to explore some of the details of R’s grammar, and then show how you can transform back and forth between expressions and strings.\n* **Operator precedence** - Conventions used by the programming language to resolve ambiguity.\n* Infix functions introduce two sources of ambiguity.\n* The first source of ambiguity arises from infix functions: what does 1 + 2 * 3 yield? Do you get 9 (i.e., (1 + 2) * 3), or 7 (i.e., 1 + (2 * 3))? In other words, which of the two possible parse trees below does R use?\n\n\n\n* Programming languages use conventions called operator precedence to resolve this ambiguity. We can use `ast()` to see what R does:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlobstr::ast(1 + 2 * 3)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> █─`+` \n#> ├─1 \n#> └─█─`*` \n#> ├─2 \n#> └─3\n```\n\n\n:::\n:::\n\n\n* PEMDAS (or BEDMAS or BODMAS, depending on where in the world you grew up) is pretty clear on what to do. Other operator precedence isn't as clear. \n* There’s one particularly surprising case in R: \n * ! has a much lower precedence (i.e., it binds less tightly) than you might expect. \n * This allows you to write useful operations like:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlobstr::ast(!x %in% y)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> █─`!` \n#> └─█─`%in%` \n#> ├─x \n#> └─y\n```\n\n\n:::\n:::\n\n* **R has over 30 infix operators divided into 18 precedence** groups. \n* While the details are described in `?Syntax`, very few people have memorized the complete ordering.\n* If there’s any confusion, use parentheses!\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# override PEMDAS\nlobstr::ast((1 + 2) * 3)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> █─`*` \n#> ├─█─`(` \n#> │ └─█─`+` \n#> │ ├─1 \n#> │ └─2 \n#> └─3\n```\n\n\n:::\n:::\n\n\n### Associativity\n\n* The second source of ambiguity is introduced by repeated usage of the same infix function. \n\n\n::: {.cell}\n\n```{.r .cell-code}\n1 + 2 + 3\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 6\n```\n\n\n:::\n\n```{.r .cell-code}\n# What does R do first?\n(1 + 2) + 3\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 6\n```\n\n\n:::\n\n```{.r .cell-code}\n# or\n1 + (2 + 3)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 6\n```\n\n\n:::\n:::\n\n\n* In this case it doesn't matter. Other places it might, like in `ggplot2`. \n\n* In R, most operators are left-associative, i.e., the operations on the left are evaluated first:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlobstr::ast(1 + 2 + 3)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> █─`+` \n#> ├─█─`+` \n#> │ ├─1 \n#> │ └─2 \n#> └─3\n```\n\n\n:::\n:::\n\n\n* There are two exceptions to the left-associative rule:\n 1. exponentiation\n 2. assignment\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlobstr::ast(2 ^ 2 ^ 3)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> █─`^` \n#> ├─2 \n#> └─█─`^` \n#> ├─2 \n#> └─3\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlobstr::ast(x <- y <- z)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> █─`<-` \n#> ├─x \n#> └─█─`<-` \n#> ├─y \n#> └─z\n```\n\n\n:::\n:::\n\n\n### Parsing and deparsing\n\n* Parsing - turning characters you've typed into an AST (i.e., from strings to expressions).\n* R usually takes care of parsing code for us. \n* But occasionally you have code stored as a string, and you want to parse it yourself. \n* You can do so using `rlang::parse_expr()`:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx1 <- \"y <- x + 10\"\nx1\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"y <- x + 10\"\n```\n\n\n:::\n\n```{.r .cell-code}\nis.call(x1)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] FALSE\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx2 <- rlang::parse_expr(x1)\nx2\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> y <- x + 10\n```\n\n\n:::\n\n```{.r .cell-code}\nis.call(x2)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] TRUE\n```\n\n\n:::\n:::\n\n\n* `parse_expr()` always returns a single expression.\n* If you have multiple expression separated by `;` or `,`, you’ll need to use `rlang::parse_exprs()` which is the plural version of `rlang::parse_expr()`. It returns a list of expressions:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx3 <- \"a <- 1; a + 1\"\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nrlang::parse_exprs(x3)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [[1]]\n#> a <- 1\n#> \n#> [[2]]\n#> a + 1\n```\n\n\n:::\n:::\n\n\n* If you find yourself parsing strings into expressions often, **quasiquotation** may be a safer approach.\n * More about quasiquaotation in Chapter 19.\n* The inverse of parsing is deparsing.\n* **Deparsing** - given an expression, you want the string that would generate it. \n* Deparsing happens automatically when you print an expression.\n* You can get the string with `rlang::expr_text()`:\n* Parsing and deparsing are not symmetric.\n * Parsing creates the AST which means that we lose backticks around ordinary names, comments, and whitespace.\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncat(expr_text(expr({\n # This is a comment\n x <- `x` + 1\n})))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> {\n#> x <- x + 1\n#> }\n```\n\n\n:::\n:::\n\n\n## Using the AST to solve more complicated problems\n\n* Here we focus on what we learned to perform recursion on the AST.\n* Two parts of a recursive function:\n * Recursive case: handles the nodes in the tree. Typically, you’ll do something to each child of a node, usually calling the recursive function again, and then combine the results back together again. For expressions, you’ll need to handle calls and pairlists (function arguments).\n * Base case: handles the leaves of the tree. The base cases ensure that the function eventually terminates, by solving the simplest cases directly. For expressions, you need to handle symbols and constants in the base case.\n\n\n### Two helper functions\n\n* First, we need an `epxr_type()` function to return the type of expression element as a string.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nexpr_type <- function(x) {\n if (rlang::is_syntactic_literal(x)) {\n \"constant\"\n } else if (is.symbol(x)) {\n \"symbol\"\n } else if (is.call(x)) {\n \"call\"\n } else if (is.pairlist(x)) {\n \"pairlist\"\n } else {\n typeof(x)\n }\n}\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nexpr_type(expr(\"a\"))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"constant\"\n```\n\n\n:::\n\n```{.r .cell-code}\nexpr_type(expr(x))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"symbol\"\n```\n\n\n:::\n\n```{.r .cell-code}\nexpr_type(expr(f(1, 2)))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"call\"\n```\n\n\n:::\n:::\n\n\n* Second, we need a wrapper function to handle exceptions.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nswitch_expr <- function(x, ...) {\n switch(expr_type(x),\n ...,\n stop(\"Don't know how to handle type \", typeof(x), call. = FALSE)\n )\n}\n```\n:::\n\n\n* Lastly, we can write a basic template that walks the AST using the `switch()` statement.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nrecurse_call <- function(x) {\n switch_expr(x,\n # Base cases\n symbol = ,\n constant = ,\n\n # Recursive cases\n call = ,\n pairlist =\n )\n}\n```\n:::\n\n\n### Specific use cases for `recurse_call()`\n\n### Example 1: Finding F and T\n\n* Using `F` and `T` in our code rather than `FALSE` and `TRUE` is bad practice.\n* Say we want to walk the AST to find times when we use `F` and `T`.\n* Start off by finding the type of `T` vs `TRUE`.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nexpr_type(expr(TRUE))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"constant\"\n```\n\n\n:::\n\n```{.r .cell-code}\nexpr_type(expr(T))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"symbol\"\n```\n\n\n:::\n:::\n\n\n* With this knowledge, we can now write the base cases of our recursive function.\n* The logic is as follows:\n * A constant is never a logical abbreviation and a symbol is an abbreviation if it is \"F\" or \"T\":\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlogical_abbr_rec <- function(x) {\n switch_expr(x,\n constant = FALSE,\n symbol = as_string(x) %in% c(\"F\", \"T\")\n )\n}\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlogical_abbr_rec(expr(TRUE))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] FALSE\n```\n\n\n:::\n\n```{.r .cell-code}\nlogical_abbr_rec(expr(T))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] TRUE\n```\n\n\n:::\n:::\n\n\n* It's best practice to write another wrapper, assuming every input you receive will be an expression.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlogical_abbr <- function(x) {\n logical_abbr_rec(enexpr(x))\n}\n\nlogical_abbr(T)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] TRUE\n```\n\n\n:::\n\n```{.r .cell-code}\nlogical_abbr(FALSE)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] FALSE\n```\n\n\n:::\n:::\n\n\n#### Next step: code for the recursive cases\n\n* Here we want to do the same thing for calls and for pairlists.\n* Here's the logic: recursively apply the function to each subcomponent, and return `TRUE` if any subcomponent contains a logical abbreviation.\n* This is simplified by using the `purrr::some()` function, which iterates over a list and returns `TRUE` if the predicate function is true for any element.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlogical_abbr_rec <- function(x) {\n switch_expr(x,\n # Base cases\n constant = FALSE,\n symbol = as_string(x) %in% c(\"F\", \"T\"),\n # Recursive cases\n call = ,\n # Are we sure this is the correct function to use?\n # Why not logical_abbr_rec?\n pairlist = purrr::some(x, logical_abbr_rec)\n )\n}\n\nlogical_abbr(mean(x, na.rm = T))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] TRUE\n```\n\n\n:::\n\n```{.r .cell-code}\nlogical_abbr(function(x, na.rm = T) FALSE)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] TRUE\n```\n\n\n:::\n:::\n\n\n### Example 2: Finding all variables created by assignment\n\n* Listing all the variables is a little more complicated. \n* Figure out what assignment looks like based on the AST.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nast(x <- 10)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> █─`<-` \n#> ├─x \n#> └─10\n```\n\n\n:::\n:::\n\n\n* Now we need to decide what data structure we're going to use for the results.\n * Easiest thing will be to return a character vector.\n * We would need to use a list if we wanted to return symbols.\n\n### Dealing with the base cases\n\n\n::: {.cell}\n\n```{.r .cell-code}\nfind_assign_rec <- function(x) {\n switch_expr(x,\n constant = ,\n symbol = character()\n )\n}\nfind_assign <- function(x) find_assign_rec(enexpr(x))\n\nfind_assign(\"x\")\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> character(0)\n```\n\n\n:::\n\n```{.r .cell-code}\nfind_assign(x)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> character(0)\n```\n\n\n:::\n:::\n\n\n### Dealing with the recursive cases\n\n* Here is the function to flatten pairlists.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nflat_map_chr <- function(.x, .f, ...) {\n purrr::flatten_chr(purrr::map(.x, .f, ...))\n}\n\nflat_map_chr(letters[1:3], ~ rep(., sample(3, 1)))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"a\" \"a\" \"b\" \"b\" \"c\" \"c\"\n```\n\n\n:::\n:::\n\n\n* Here is the code needed to identify calls.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nfind_assign_rec <- function(x) {\n switch_expr(x,\n # Base cases\n constant = ,\n symbol = character(),\n\n # Recursive cases\n pairlist = flat_map_chr(as.list(x), find_assign_rec),\n call = {\n if (is_call(x, \"<-\")) {\n as_string(x[[2]])\n } else {\n flat_map_chr(as.list(x), find_assign_rec)\n }\n }\n )\n}\n\nfind_assign(a <- 1)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"a\"\n```\n\n\n:::\n\n```{.r .cell-code}\nfind_assign({\n a <- 1\n {\n b <- 2\n }\n})\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"a\" \"b\"\n```\n\n\n:::\n:::\n\n\n### Make the function more robust\n\n* Throw cases at it that we think might break the function. \n* Write a function to handle these cases.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nfind_assign_call <- function(x) {\n if (is_call(x, \"<-\") && is_symbol(x[[2]])) {\n lhs <- as_string(x[[2]])\n children <- as.list(x)[-1]\n } else {\n lhs <- character()\n children <- as.list(x)\n }\n\n c(lhs, flat_map_chr(children, find_assign_rec))\n}\n\nfind_assign_rec <- function(x) {\n switch_expr(x,\n # Base cases\n constant = ,\n symbol = character(),\n\n # Recursive cases\n pairlist = flat_map_chr(x, find_assign_rec),\n call = find_assign_call(x)\n )\n}\n\nfind_assign(a <- b <- c <- 1)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"a\" \"b\" \"c\"\n```\n\n\n:::\n\n```{.r .cell-code}\nfind_assign(system.time(x <- print(y <- 5)))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"x\" \"y\"\n```\n\n\n:::\n:::\n\n\n* This approach certainly is more complicated, but it's important to start simple and move up.\n\n## Specialised data structures\n\n* Pairlists\n* Missing arguments \n* Expression vectors\n\n### Pairlists\n\n* Pairlists are a remnant of R’s past and have been replaced by lists almost everywhere. \n* The only place you are likely to see pairlists in R is when working with calls to the function, as the formal arguments to a function are stored in a pairlist:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nf <- expr(function(x, y = 10) x + y)\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nargs <- f[[2]]\nargs\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> $x\n#> \n#> \n#> $y\n#> [1] 10\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntypeof(args)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"pairlist\"\n```\n\n\n:::\n:::\n\n* Fortunately, whenever you encounter a pairlist, you can treat it just like a regular list:\n\n\n::: {.cell}\n\n```{.r .cell-code}\npl <- pairlist(x = 1, y = 2)\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlength(pl)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 2\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\npl$x\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1\n```\n\n\n:::\n:::\n\n\n### Missing arguments\n\n* Empty symbols\n* To create an empty symbol, you need to use `missing_arg()` or `expr()`.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmissing_arg()\n```\n\n```{.r .cell-code}\ntypeof(missing_arg())\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"symbol\"\n```\n\n\n:::\n:::\n\n\n* Empty symbols don't print anything.\n * To check, we need to use `rlang::is_missing()`\n\n\n::: {.cell}\n\n```{.r .cell-code}\nis_missing(missing_arg())\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] TRUE\n```\n\n\n:::\n:::\n\n\n* These are usually present in function formals:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nf <- expr(function(x, y = 10) x + y)\n\nargs <- f[[2]]\n\n\nis_missing(args[[1]])\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] TRUE\n```\n\n\n:::\n:::\n\n\n### Expression vectors\n\n* An expression vector is just a list of expressions.\n * The only difference is that calling `eval()` on an expression evaluates each individual expression. \n * Instead, it might be more advantageous to use a list of expressions.\n\n* Expression vectors are only produced by two base functions: \n `expression()` and `parse()`:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nexp1 <- parse(text = c(\" \nx <- 4\nx\n\"))\nexp1\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> expression(x <- 4, x)\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nexp2 <- expression(x <- 4, x)\nexp2\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> expression(x <- 4, x)\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntypeof(exp1)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"expression\"\n```\n\n\n:::\n\n```{.r .cell-code}\ntypeof(exp2)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"expression\"\n```\n\n\n:::\n:::\n\n\n\n- Like calls and pairlists, expression vectors behave like lists:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlength(exp1)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 2\n```\n\n\n:::\n\n```{.r .cell-code}\nexp1[[1]]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> x <- 4\n```\n\n\n:::\n:::\n\n", + "supporting": [ + "18_files" + ], "filters": [ "rmarkdown/pagebreak.lua" ], diff --git a/_freeze/slides/19/execute-results/html.json b/_freeze/slides/19/execute-results/html.json @@ -1,9 +1,11 @@ { - "hash": "7b7bf3e55f7ac2baaa0ee5e2007f377b", + "hash": "29629fbcf0e368d59b034407d4b9669c", "result": { "engine": "knitr", - "markdown": "---\nengine: knitr\ntitle: Quasiquotation\n---\n\n## Learning objectives:\n\n- What quasiquotation means\n- Why it's important\n- Learn some practical uses\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(rlang)\nlibrary(purrr)\n```\n:::\n\n\n## Introduction\n\nThree pillars of *tidy* evaluation\n\n 1. Quasiquotation\n 2. Quosures (chapter 20)\n 3. Data masks (Chapter 20)\n\n**Quasiquotation = quotation + unquotation**\n\n- **Quote.** Capture unevaluated expression... (\"defuse\") \n- **Unquote.** Evaluate selections of quoted expression! (\"inject\")\n- Functions that use these features are said to use Non-standard evaluation (NSE)\n- Note: related to Lisp macros, and also exists in other languages with Lisp heritage, e.g. Julia\n\n> On it's own, Quasiquotation good for programming, but combined with other tools, \n> important for data analysis.\n\n## Motivation\n\nSimple *concrete* example:\n\n`cement()` is a function that works like `paste()` but doesn't need need quotes\n\n(Think of automatically adding 'quotes' to the arguments)\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncement <- function(...) {\n args <- ensyms(...)\n paste(purrr::map(args, as_string), collapse = \" \")\n}\n\ncement(Good, morning, Hadley)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"Good morning Hadley\"\n```\n\n\n:::\n:::\n\n\nWhat if we wanted to use variables? What is an object and what should be quoted?\n\nThis is where 'unquoting' comes in!\n\n\n::: {.cell}\n\n```{.r .cell-code}\nname <- \"Bob\"\ncement(Good, afternoon, !!name) # Bang-bang!\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"Good afternoon Bob\"\n```\n\n\n:::\n:::\n\n\n## Vocabulary {-}\n\nCan think of `cement()` and `paste()` as being 'mirror-images' of each other.\n\n- `paste()` - define what to quote - **Evaluates** arguments\n- `cement()` - define what to unquote - **Quotes** arguments\n\n**Quoting function** similar to, but more precise than, **Non-standard evaluation (NSE)**\n\n- Tidyverse functions - e.g., `dplyr::mutate()`, `tidyr::pivot_longer()`\n- Base functions - e.g., `library()`, `subset()`, `with()`\n\n**Quoting function** arguments cannot be evaluated outside of function:\n\n::: {.cell}\n\n```{.r .cell-code}\ncement(Good, afternoon, Cohort) # No problem\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"Good afternoon Cohort\"\n```\n\n\n:::\n\n```{.r .cell-code}\nGood # Error!\n```\n\n::: {.cell-output .cell-output-error}\n\n```\n#> Error: object 'Good' not found\n```\n\n\n:::\n:::\n\n\n**Non-quoting (standard) function** arguments can be evaluated:\n\n::: {.cell}\n\n```{.r .cell-code}\npaste(\"Good\", \"afternoon\", \"Cohort\")\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"Good afternoon Cohort\"\n```\n\n\n:::\n\n```{.r .cell-code}\n\"Good\"\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"Good\"\n```\n\n\n:::\n:::\n\n\n\n## Quoting\n\n**Capture expressions without evaluating them**\n\n\n::: {.cell}\n::: {.cell-output .cell-output-stderr}\n\n```\n#> Warning in body[[col]][rows][!is.na(result)] <- omit_na(result): number of\n#> items to replace is not a multiple of replacement length\n```\n\n\n:::\n\n::: {.cell-output-display}\n\n```{=html}\n<div id=\"mswiutofea\" style=\"padding-left:0px;padding-right:0px;padding-top:10px;padding-bottom:10px;overflow-x:auto;overflow-y:auto;width:auto;height:auto;\">\n<style>#mswiutofea table {\n font-family: system-ui, 'Segoe UI', Roboto, Helvetica, Arial, sans-serif, 'Apple Color Emoji', 'Segoe UI Emoji', 'Segoe UI Symbol', 'Noto Color Emoji';\n -webkit-font-smoothing: antialiased;\n -moz-osx-font-smoothing: grayscale;\n}\n\n#mswiutofea thead, #mswiutofea tbody, #mswiutofea tfoot, #mswiutofea tr, #mswiutofea td, #mswiutofea th {\n border-style: none;\n}\n\n#mswiutofea p {\n margin: 0;\n padding: 0;\n}\n\n#mswiutofea .gt_table {\n display: table;\n border-collapse: collapse;\n line-height: normal;\n margin-left: auto;\n margin-right: auto;\n color: #333333;\n font-size: 16px;\n font-weight: normal;\n font-style: normal;\n background-color: #FFFFFF;\n width: auto;\n border-top-style: solid;\n border-top-width: 2px;\n border-top-color: #A8A8A8;\n border-right-style: none;\n border-right-width: 2px;\n border-right-color: #D3D3D3;\n border-bottom-style: solid;\n border-bottom-width: 2px;\n border-bottom-color: #A8A8A8;\n border-left-style: none;\n border-left-width: 2px;\n border-left-color: #D3D3D3;\n}\n\n#mswiutofea .gt_caption {\n padding-top: 4px;\n padding-bottom: 4px;\n}\n\n#mswiutofea .gt_title {\n color: #333333;\n font-size: 125%;\n font-weight: initial;\n padding-top: 4px;\n padding-bottom: 4px;\n padding-left: 5px;\n padding-right: 5px;\n border-bottom-color: #FFFFFF;\n border-bottom-width: 0;\n}\n\n#mswiutofea .gt_subtitle {\n color: #333333;\n font-size: 85%;\n font-weight: initial;\n padding-top: 3px;\n padding-bottom: 5px;\n padding-left: 5px;\n padding-right: 5px;\n border-top-color: #FFFFFF;\n border-top-width: 0;\n}\n\n#mswiutofea .gt_heading {\n background-color: #FFFFFF;\n text-align: center;\n border-bottom-color: #FFFFFF;\n border-left-style: none;\n border-left-width: 1px;\n border-left-color: #D3D3D3;\n border-right-style: none;\n border-right-width: 1px;\n border-right-color: #D3D3D3;\n}\n\n#mswiutofea .gt_bottom_border {\n border-bottom-style: solid;\n border-bottom-width: 2px;\n border-bottom-color: #D3D3D3;\n}\n\n#mswiutofea .gt_col_headings {\n border-top-style: solid;\n border-top-width: 2px;\n border-top-color: #D3D3D3;\n border-bottom-style: solid;\n border-bottom-width: 2px;\n border-bottom-color: #D3D3D3;\n border-left-style: none;\n border-left-width: 1px;\n border-left-color: #D3D3D3;\n border-right-style: none;\n border-right-width: 1px;\n border-right-color: #D3D3D3;\n}\n\n#mswiutofea .gt_col_heading {\n color: #333333;\n background-color: #FFFFFF;\n font-size: 100%;\n font-weight: normal;\n text-transform: inherit;\n border-left-style: none;\n border-left-width: 1px;\n border-left-color: #D3D3D3;\n border-right-style: none;\n border-right-width: 1px;\n border-right-color: #D3D3D3;\n vertical-align: bottom;\n padding-top: 5px;\n padding-bottom: 6px;\n padding-left: 5px;\n padding-right: 5px;\n overflow-x: hidden;\n}\n\n#mswiutofea .gt_column_spanner_outer {\n color: #333333;\n background-color: #FFFFFF;\n font-size: 100%;\n font-weight: normal;\n text-transform: inherit;\n padding-top: 0;\n padding-bottom: 0;\n padding-left: 4px;\n padding-right: 4px;\n}\n\n#mswiutofea .gt_column_spanner_outer:first-child {\n padding-left: 0;\n}\n\n#mswiutofea .gt_column_spanner_outer:last-child {\n padding-right: 0;\n}\n\n#mswiutofea .gt_column_spanner {\n border-bottom-style: solid;\n border-bottom-width: 2px;\n border-bottom-color: #D3D3D3;\n vertical-align: bottom;\n padding-top: 5px;\n padding-bottom: 5px;\n overflow-x: hidden;\n display: inline-block;\n width: 100%;\n}\n\n#mswiutofea .gt_spanner_row {\n border-bottom-style: hidden;\n}\n\n#mswiutofea .gt_group_heading {\n padding-top: 8px;\n padding-bottom: 8px;\n padding-left: 5px;\n padding-right: 5px;\n color: #333333;\n background-color: #FFFFFF;\n font-size: 100%;\n font-weight: bold;\n text-transform: inherit;\n border-top-style: solid;\n border-top-width: 2px;\n border-top-color: #D3D3D3;\n border-bottom-style: solid;\n border-bottom-width: 2px;\n border-bottom-color: #D3D3D3;\n border-left-style: none;\n border-left-width: 1px;\n border-left-color: #D3D3D3;\n border-right-style: none;\n border-right-width: 1px;\n border-right-color: #D3D3D3;\n vertical-align: middle;\n text-align: left;\n}\n\n#mswiutofea .gt_empty_group_heading {\n padding: 0.5px;\n color: #333333;\n background-color: #FFFFFF;\n font-size: 100%;\n font-weight: bold;\n border-top-style: solid;\n border-top-width: 2px;\n border-top-color: #D3D3D3;\n border-bottom-style: solid;\n border-bottom-width: 2px;\n border-bottom-color: #D3D3D3;\n vertical-align: middle;\n}\n\n#mswiutofea .gt_from_md > :first-child {\n margin-top: 0;\n}\n\n#mswiutofea .gt_from_md > :last-child {\n margin-bottom: 0;\n}\n\n#mswiutofea .gt_row {\n padding-top: 8px;\n padding-bottom: 8px;\n padding-left: 5px;\n padding-right: 5px;\n margin: 10px;\n border-top-style: solid;\n border-top-width: 1px;\n border-top-color: #D3D3D3;\n border-left-style: none;\n border-left-width: 1px;\n border-left-color: #D3D3D3;\n border-right-style: none;\n border-right-width: 1px;\n border-right-color: #D3D3D3;\n vertical-align: middle;\n overflow-x: hidden;\n}\n\n#mswiutofea .gt_stub {\n color: #333333;\n background-color: #FFFFFF;\n font-size: 100%;\n font-weight: initial;\n text-transform: inherit;\n border-right-style: solid;\n border-right-width: 2px;\n border-right-color: #D3D3D3;\n padding-left: 5px;\n padding-right: 5px;\n}\n\n#mswiutofea .gt_stub_row_group {\n color: #333333;\n background-color: #FFFFFF;\n font-size: 100%;\n font-weight: initial;\n text-transform: inherit;\n border-right-style: solid;\n border-right-width: 2px;\n border-right-color: #D3D3D3;\n padding-left: 5px;\n padding-right: 5px;\n vertical-align: top;\n}\n\n#mswiutofea .gt_row_group_first td {\n border-top-width: 2px;\n}\n\n#mswiutofea .gt_row_group_first th {\n border-top-width: 2px;\n}\n\n#mswiutofea .gt_summary_row {\n color: #333333;\n background-color: #FFFFFF;\n text-transform: inherit;\n padding-top: 8px;\n padding-bottom: 8px;\n padding-left: 5px;\n padding-right: 5px;\n}\n\n#mswiutofea .gt_first_summary_row {\n border-top-style: solid;\n border-top-color: #D3D3D3;\n}\n\n#mswiutofea .gt_first_summary_row.thick {\n border-top-width: 2px;\n}\n\n#mswiutofea .gt_last_summary_row {\n padding-top: 8px;\n padding-bottom: 8px;\n padding-left: 5px;\n padding-right: 5px;\n border-bottom-style: solid;\n border-bottom-width: 2px;\n border-bottom-color: #D3D3D3;\n}\n\n#mswiutofea .gt_grand_summary_row {\n color: #333333;\n background-color: #FFFFFF;\n text-transform: inherit;\n padding-top: 8px;\n padding-bottom: 8px;\n padding-left: 5px;\n padding-right: 5px;\n}\n\n#mswiutofea .gt_first_grand_summary_row {\n padding-top: 8px;\n padding-bottom: 8px;\n padding-left: 5px;\n padding-right: 5px;\n border-top-style: double;\n border-top-width: 6px;\n border-top-color: #D3D3D3;\n}\n\n#mswiutofea .gt_last_grand_summary_row_top {\n padding-top: 8px;\n padding-bottom: 8px;\n padding-left: 5px;\n padding-right: 5px;\n border-bottom-style: double;\n border-bottom-width: 6px;\n border-bottom-color: #D3D3D3;\n}\n\n#mswiutofea .gt_striped {\n background-color: rgba(128, 128, 128, 0.05);\n}\n\n#mswiutofea .gt_table_body {\n border-top-style: solid;\n border-top-width: 2px;\n border-top-color: #D3D3D3;\n border-bottom-style: solid;\n border-bottom-width: 2px;\n border-bottom-color: #D3D3D3;\n}\n\n#mswiutofea .gt_footnotes {\n color: #333333;\n background-color: #FFFFFF;\n border-bottom-style: none;\n border-bottom-width: 2px;\n border-bottom-color: #D3D3D3;\n border-left-style: none;\n border-left-width: 2px;\n border-left-color: #D3D3D3;\n border-right-style: none;\n border-right-width: 2px;\n border-right-color: #D3D3D3;\n}\n\n#mswiutofea .gt_footnote {\n margin: 0px;\n font-size: 90%;\n padding-top: 4px;\n padding-bottom: 4px;\n padding-left: 5px;\n padding-right: 5px;\n}\n\n#mswiutofea .gt_sourcenotes {\n color: #333333;\n background-color: #FFFFFF;\n border-bottom-style: none;\n border-bottom-width: 2px;\n border-bottom-color: #D3D3D3;\n border-left-style: none;\n border-left-width: 2px;\n border-left-color: #D3D3D3;\n border-right-style: none;\n border-right-width: 2px;\n border-right-color: #D3D3D3;\n}\n\n#mswiutofea .gt_sourcenote {\n font-size: 90%;\n padding-top: 4px;\n padding-bottom: 4px;\n padding-left: 5px;\n padding-right: 5px;\n}\n\n#mswiutofea .gt_left {\n text-align: left;\n}\n\n#mswiutofea .gt_center {\n text-align: center;\n}\n\n#mswiutofea .gt_right {\n text-align: right;\n font-variant-numeric: tabular-nums;\n}\n\n#mswiutofea .gt_font_normal {\n font-weight: normal;\n}\n\n#mswiutofea .gt_font_bold {\n font-weight: bold;\n}\n\n#mswiutofea .gt_font_italic {\n font-style: italic;\n}\n\n#mswiutofea .gt_super {\n font-size: 65%;\n}\n\n#mswiutofea .gt_footnote_marks {\n font-size: 75%;\n vertical-align: 0.4em;\n position: initial;\n}\n\n#mswiutofea .gt_asterisk {\n font-size: 100%;\n vertical-align: 0;\n}\n\n#mswiutofea .gt_indent_1 {\n text-indent: 5px;\n}\n\n#mswiutofea .gt_indent_2 {\n text-indent: 10px;\n}\n\n#mswiutofea .gt_indent_3 {\n text-indent: 15px;\n}\n\n#mswiutofea .gt_indent_4 {\n text-indent: 20px;\n}\n\n#mswiutofea .gt_indent_5 {\n text-indent: 25px;\n}\n\n#mswiutofea .katex-display {\n display: inline-flex !important;\n margin-bottom: 0.75em !important;\n}\n\n#mswiutofea div.Reactable > div.rt-table > div.rt-thead > div.rt-tr.rt-tr-group-header > div.rt-th-group:after {\n height: 0px !important;\n}\n</style>\n<table class=\"gt_table\" style=\"table-layout:fixed;\" data-quarto-disable-processing=\"false\" data-quarto-bootstrap=\"false\">\n <colgroup>\n <col style=\"width:100px;\"/>\n <col/>\n <col/>\n </colgroup>\n <thead>\n <tr class=\"gt_col_headings\">\n <th class=\"gt_col_heading gt_columns_bottom_border gt_left\" rowspan=\"1\" colspan=\"1\" style=\"text-align: center; font-weight: bold;\" scope=\"col\" id=\"t\"></th>\n <th class=\"gt_col_heading gt_columns_bottom_border gt_center\" rowspan=\"1\" colspan=\"1\" style=\"text-align: center; font-weight: bold;\" scope=\"col\" id=\"Developer\">Developer</th>\n <th class=\"gt_col_heading gt_columns_bottom_border gt_center\" rowspan=\"1\" colspan=\"1\" style=\"text-align: center; font-weight: bold;\" scope=\"col\" id=\"User\">User</th>\n </tr>\n </thead>\n <tbody class=\"gt_table_body\">\n <tr class=\"gt_group_heading_row\">\n <th colspan=\"3\" class=\"gt_group_heading\" scope=\"colgroup\" id=\"Expression (Quasiquotation)\">Expression (Quasiquotation)</th>\n </tr>\n <tr class=\"gt_row_group_first\"><td headers=\"Expression (Quasiquotation) t\" class=\"gt_row gt_left\" style=\"border-left-width: 1px; border-left-style: hidden; border-left-color: #000000; border-right-width: 1px; border-right-style: hidden; border-right-color: #000000; border-top-width: 1px; border-top-style: solid; border-top-color: #000000; border-bottom-width: 1px; border-bottom-style: hidden; border-bottom-color: #000000;\"><span data-qmd-base64=\"T25l\"><span class='gt_from_md'>One</span></span></td>\n<td headers=\"Expression (Quasiquotation) Developer\" class=\"gt_row gt_center\" style=\"border-left-width: 1px; border-left-style: hidden; border-left-color: #000000; border-right-width: 1px; border-right-style: hidden; border-right-color: #000000; border-top-width: 1px; border-top-style: solid; border-top-color: #000000; border-bottom-width: 1px; border-bottom-style: hidden; border-bottom-color: #000000;\"><span data-qmd-base64=\"YGV4cHIoKWA=\"><span class='gt_from_md'><code>expr()</code></span></span></td>\n<td headers=\"Expression (Quasiquotation) User\" class=\"gt_row gt_center\" style=\"border-left-width: 1px; border-left-style: hidden; border-left-color: #000000; border-right-width: 1px; border-right-style: hidden; border-right-color: #000000; border-top-width: 1px; border-top-style: solid; border-top-color: #000000; border-bottom-width: 1px; border-bottom-style: hidden; border-bottom-color: #000000;\"><span data-qmd-base64=\"YGVuZXhwcigpYA==\"><span class='gt_from_md'><code>enexpr()</code></span></span></td></tr>\n <tr><td headers=\"Expression (Quasiquotation) t\" class=\"gt_row gt_left\" style=\"border-left-width: 1px; border-left-style: hidden; border-left-color: #000000; border-right-width: 1px; border-right-style: hidden; border-right-color: #000000; border-top-width: 1px; border-top-style: hidden; border-top-color: #000000; border-bottom-width: 1px; border-bottom-style: solid; border-bottom-color: #000000;\"><span data-qmd-base64=\"TWFueQ==\"><span class='gt_from_md'>Many</span></span></td>\n<td headers=\"Expression (Quasiquotation) Developer\" class=\"gt_row gt_center\" style=\"border-left-width: 1px; border-left-style: hidden; border-left-color: #000000; border-right-width: 1px; border-right-style: hidden; border-right-color: #000000; border-top-width: 1px; border-top-style: hidden; border-top-color: #000000; border-bottom-width: 1px; border-bottom-style: solid; border-bottom-color: #000000;\"><span data-qmd-base64=\"YGV4cHJzKClg\"><span class='gt_from_md'><code>exprs()</code></span></span></td>\n<td headers=\"Expression (Quasiquotation) User\" class=\"gt_row gt_center\" style=\"border-left-width: 1px; border-left-style: hidden; border-left-color: #000000; border-right-width: 1px; border-right-style: hidden; border-right-color: #000000; border-top-width: 1px; border-top-style: hidden; border-top-color: #000000; border-bottom-width: 1px; border-bottom-style: solid; border-bottom-color: #000000;\"><span data-qmd-base64=\"YGVuZXhwcnMoKWA=\"><span class='gt_from_md'><code>enexprs()</code></span></span></td></tr>\n <tr class=\"gt_group_heading_row\">\n <th colspan=\"3\" class=\"gt_group_heading\" scope=\"colgroup\" id=\"Symbol (Quasiquotation)\">Symbol (Quasiquotation)</th>\n </tr>\n <tr class=\"gt_row_group_first\"><td headers=\"Symbol (Quasiquotation) t\" class=\"gt_row gt_left\" style=\"border-left-width: 1px; border-left-style: hidden; border-left-color: #000000; border-right-width: 1px; border-right-style: hidden; border-right-color: #000000; border-top-width: 1px; border-top-style: solid; border-top-color: #000000; border-bottom-width: 1px; border-bottom-style: hidden; border-bottom-color: #000000;\"><span data-qmd-base64=\"T25l\"><span class='gt_from_md'>One</span></span></td>\n<td headers=\"Symbol (Quasiquotation) Developer\" class=\"gt_row gt_center\" style=\"border-left-width: 1px; border-left-style: hidden; border-left-color: #000000; border-right-width: 1px; border-right-style: hidden; border-right-color: #000000; border-top-width: 1px; border-top-style: solid; border-top-color: #000000; border-bottom-width: 1px; border-bottom-style: hidden; border-bottom-color: #000000;\"><span data-qmd-base64=\"YGV4cHIoKWA=\"><span class='gt_from_md'><code>expr()</code></span></span></td>\n<td headers=\"Symbol (Quasiquotation) User\" class=\"gt_row gt_center\" style=\"border-left-width: 1px; border-left-style: hidden; border-left-color: #000000; border-right-width: 1px; border-right-style: hidden; border-right-color: #000000; border-top-width: 1px; border-top-style: solid; border-top-color: #000000; border-bottom-width: 1px; border-bottom-style: hidden; border-bottom-color: #000000;\"><span data-qmd-base64=\"YGVuc3ltKClg\"><span class='gt_from_md'><code>ensym()</code></span></span></td></tr>\n <tr><td headers=\"Symbol (Quasiquotation) t\" class=\"gt_row gt_left\" style=\"border-left-width: 1px; border-left-style: hidden; border-left-color: #000000; border-right-width: 1px; border-right-style: hidden; border-right-color: #000000; border-top-width: 1px; border-top-style: hidden; border-top-color: #000000; border-bottom-width: 1px; border-bottom-style: hidden; border-bottom-color: #000000;\"><span data-qmd-base64=\"TWFueQ==\"><span class='gt_from_md'>Many</span></span></td>\n<td headers=\"Symbol (Quasiquotation) Developer\" class=\"gt_row gt_center\" style=\"border-left-width: 1px; border-left-style: hidden; border-left-color: #000000; border-right-width: 1px; border-right-style: hidden; border-right-color: #000000; border-top-width: 1px; border-top-style: hidden; border-top-color: #000000; border-bottom-width: 1px; border-bottom-style: hidden; border-bottom-color: #000000;\"><span data-qmd-base64=\"YGV4cHJzKClg\"><span class='gt_from_md'><code>exprs()</code></span></span></td>\n<td headers=\"Symbol (Quasiquotation) User\" class=\"gt_row gt_center\" style=\"border-left-width: 1px; border-left-style: hidden; border-left-color: #000000; border-right-width: 1px; border-right-style: hidden; border-right-color: #000000; border-top-width: 1px; border-top-style: hidden; border-top-color: #000000; border-bottom-width: 1px; border-bottom-style: hidden; border-bottom-color: #000000;\"><span data-qmd-base64=\"YGVuc3ltcygpYA==\"><span class='gt_from_md'><code>ensyms()</code></span></span></td></tr>\n <tr class=\"gt_group_heading_row\">\n <th colspan=\"3\" class=\"gt_group_heading\" scope=\"colgroup\" id=\"R Base (Quotation)\">R Base (Quotation)</th>\n </tr>\n <tr class=\"gt_row_group_first\"><td headers=\"R Base (Quotation) t\" class=\"gt_row gt_left\" style=\"border-left-width: 1px; border-left-style: hidden; border-left-color: #000000; border-right-width: 1px; border-right-style: hidden; border-right-color: #000000; border-top-width: 1px; border-top-style: solid; border-top-color: #000000; border-bottom-width: 1px; border-bottom-style: hidden; border-bottom-color: #000000;\"><span data-qmd-base64=\"T25l\"><span class='gt_from_md'>One</span></span></td>\n<td headers=\"R Base (Quotation) Developer\" class=\"gt_row gt_center\" style=\"border-left-width: 1px; border-left-style: hidden; border-left-color: #000000; border-right-width: 1px; border-right-style: hidden; border-right-color: #000000; border-top-width: 1px; border-top-style: solid; border-top-color: #000000; border-bottom-width: 1px; border-bottom-style: hidden; border-bottom-color: #000000;\"><span data-qmd-base64=\"YHF1b3RlKClg\"><span class='gt_from_md'><code>quote()</code></span></span></td>\n<td headers=\"R Base (Quotation) User\" class=\"gt_row gt_center\" style=\"border-left-width: 1px; border-left-style: hidden; border-left-color: #000000; border-right-width: 1px; border-right-style: hidden; border-right-color: #000000; border-top-width: 1px; border-top-style: solid; border-top-color: #000000; border-bottom-width: 1px; border-bottom-style: hidden; border-bottom-color: #000000;\"><span data-qmd-base64=\"YGFsaXN0KClg\"><span class='gt_from_md'><code>alist()</code></span></span></td></tr>\n <tr><td headers=\"R Base (Quotation) t\" class=\"gt_row gt_left\" style=\"border-left-width: 1px; border-left-style: hidden; border-left-color: #000000; border-right-width: 1px; border-right-style: hidden; border-right-color: #000000; border-top-width: 1px; border-top-style: hidden; border-top-color: #000000; border-bottom-width: 1px; border-bottom-style: solid; border-bottom-color: #000000;\"><span data-qmd-base64=\"TWFueQ==\"><span class='gt_from_md'>Many</span></span></td>\n<td headers=\"R Base (Quotation) Developer\" class=\"gt_row gt_center\" style=\"border-left-width: 1px; border-left-style: hidden; border-left-color: #000000; border-right-width: 1px; border-right-style: hidden; border-right-color: #000000; border-top-width: 1px; border-top-style: hidden; border-top-color: #000000; border-bottom-width: 1px; border-bottom-style: solid; border-bottom-color: #000000;\"><span data-qmd-base64=\"YHN1YnN0aXR1dGUoKWA=\"><span class='gt_from_md'><code>substitute()</code></span></span></td>\n<td headers=\"R Base (Quotation) User\" class=\"gt_row gt_center\" style=\"border-left-width: 1px; border-left-style: hidden; border-left-color: #000000; border-right-width: 1px; border-right-style: hidden; border-right-color: #000000; border-top-width: 1px; border-top-style: hidden; border-top-color: #000000; border-bottom-width: 1px; border-bottom-style: solid; border-bottom-color: #000000;\"><span data-qmd-base64=\"YGFzLmxpc3Qoc3Vic3RpdHV0ZSguLi4oKSkpYA==\"><span class='gt_from_md'><code>as.list(substitute(...()))</code></span></span></td></tr>\n </tbody>\n \n \n</table>\n</div>\n```\n\n:::\n:::\n\n\n- Non-base functions are from **rlang**\n- **Developer** - From you, direct, fixed, interactive\n- **User** - From the user, indirect, varying, programmatic\n\nAlso: \n\n- `bquote()` provides a limited form of quasiquotation\n- `~`, the formula, is a quoting function (see [Section 20.3.4](https://adv-r.hadley.nz/evaluation.html#quosure-impl))\n\n### `expr()` and `exprs()` {-}\n\n::: {.cell}\n\n```{.r .cell-code}\nexpr(x + y)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> x + y\n```\n\n\n:::\n\n```{.r .cell-code}\nexprs(exp1 = x + y, exp2 = x * y)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> $exp1\n#> x + y\n#> \n#> $exp2\n#> x * y\n```\n\n\n:::\n:::\n\n\n### `enexpr()`^[`enexpr()` = **en**rich `expr()`] and `enexprs()` {-}\n\n::: {.cell}\n\n```{.r .cell-code}\nf <- function(x) enexpr(x)\nf(a + b + c)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> a + b + c\n```\n\n\n:::\n\n```{.r .cell-code}\nf2 <- function(x, y) enexprs(exp1 = x, exp2 = y)\nf2(x = a + b, y = c + d)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> $exp1\n#> a + b\n#> \n#> $exp2\n#> c + d\n```\n\n\n:::\n:::\n\n\n### `ensym()` and `ensyms()` {-}\n\n- **[Remember](https://adv-r.hadley.nz/expressions.html#symbols):** Symbol represents the name of an object. Can only be length 1.\n- These are stricter than `enexpr/s()`\n\n\n::: {.cell}\n\n```{.r .cell-code}\nf <- function(x) ensym(x)\nf(a)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> a\n```\n\n\n:::\n\n```{.r .cell-code}\nf2 <- function(x, y) ensyms(sym1 = x, sym2 = y)\nf2(x = a, y = \"b\")\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> $sym1\n#> a\n#> \n#> $sym2\n#> b\n```\n\n\n:::\n:::\n\n\n\n## Unquoting\n\n**Selectively evaluate parts of an expression**\n\n- Merges ASTs with template\n- 1 argument `!!` (**unquote**, **bang-bang**)\n - Unquoting a *function call* evaluates and returns results\n - Unquoting a *function (name)* replaces the function (alternatively use `call2()`)\n- \\>1 arguments `!!!` (**unquote-splice**, **bang-bang-bang**, **triple bang**)\n- `!!` and `!!!` only work like this inside quoting function using rlang\n\n### Basic unquoting {-}\n\n**One argument**\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- expr(a + b)\ny <- expr(c / d)\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nexpr(f(x, y)) # No unquoting\n#> f(x, y)\nexpr(f(!!x, !!y)) # Unquoting\n#> f(a + b, c/d)\n```\n:::\n\n\n**Multiple arguments**\n\n::: {.cell}\n\n```{.r .cell-code}\nz <- exprs(a + b, c + d)\nw <- exprs(exp1 = a + b, exp2 = c + d)\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nexpr(f(z)) # No unquoting\n#> f(z)\nexpr(f(!!!z)) # Unquoting\n#> f(a + b, c + d)\nexpr(f(!!!w)) # Unquoting when named\n#> f(exp1 = a + b, exp2 = c + d)\n```\n:::\n\n\n\n### Special usages or cases {-}\n\nFor example, get the AST of an expression\n\n::: {.cell}\n\n```{.r .cell-code}\nlobstr::ast(x)\n#> x\nlobstr::ast(!!x)\n#> █─`+` \n#> ├─a \n#> └─b\n```\n:::\n\n\n\nUnquote *function call*\n\n::: {.cell}\n\n```{.r .cell-code}\nexpr(f(!!mean(c(100, 200, 300)), y))\n#> f(200, y)\n```\n:::\n\n\nUnquote *function*\n\n::: {.cell}\n\n```{.r .cell-code}\nf <- expr(sd)\nexpr((!!f)(x))\n#> sd(x)\nexpr((!!f)(!!x + !!y))\n#> sd(a + b + c/d)\n```\n:::\n\n\n## Non-quoting\n\nOnly `bquote()` provides a limited form of quasiquotation.\n\nThe rest of base selectively uses or does not use quoting (rather than unquoting). \n\nFour basic forms of quoting/non-quoting:\n\n1. **Pair of functions** - Quoting and non-quoting\n - e.g., `$` (quoting) and `[[` (non-quoting)\n2. **Pair of Arguments** - Quoting and non-quoting\n - e.g., `rm(...)` (quoting) and `rm(list = c(...))` (non-quoting)\n3. **Arg to control quoting**\n - e.g., `library(rlang)` (quoting) and `library(pkg, character.only = TRUE)` (where `pkg <- \"rlang\"`)\n4. **Quote if evaluation fails**\n - `help(var)` - Quote, show help for var\n - `help(var)` (where `var <- \"mean\"`) - No quote, show help for mean\n - `help(var)` (where `var <- 10`) - Quote fails, show help for var\n\n\n## ... (dot-dot-dot) [When using ... with quoting]\n\n- Sometimes need to supply an *arbitrary* list of expressions or arguments in a function (`...`)\n- But need a way to use these when we don't necessarily have the names\n- Remember `!!` and `!!!` only work with functions that use rlang\n- Can use `list2(...)` to turn `...` into \"tidy dots\" which *can* be unquoted and spliced\n- Require `list2()` if going to be passing or using `!!` or `!!!` in `...`\n- `list2()` is a wrapper around `dots_list()` with the most common defaults\n\n**No need for `list2()`**\n\n::: {.cell}\n\n```{.r .cell-code}\nd <- function(...) data.frame(list(...))\nd(x = c(1:3), y = c(2, 4, 6))\n#> x y\n#> 1 1 2\n#> 2 2 4\n#> 3 3 6\n```\n:::\n\n\n**Require `list2()`**\n\n::: {.cell}\n\n```{.r .cell-code}\nvars <- list(x = c(1:3), y = c(2, 4, 6))\nd(!!!vars)\n#> Error in !vars: invalid argument type\nd2 <- function(...) data.frame(list2(...))\nd2(!!!vars)\n#> x y\n#> 1 1 2\n#> 2 2 4\n#> 3 3 6\n# Same result but x and y evaluated later\nvars_expr <- exprs(x = c(1:3), y = c(2, 4, 6))\nd2(!!!vars_expr) \n#> x y\n#> 1 1 2\n#> 2 2 4\n#> 3 3 6\n```\n:::\n\n\nGetting argument names (symbols) from variables\n\n::: {.cell}\n\n```{.r .cell-code}\nnm <- \"z\"\nval <- letters[1:4]\nd2(x = 1:4, !!nm := val)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> x z\n#> 1 1 a\n#> 2 2 b\n#> 3 3 c\n#> 4 4 d\n```\n\n\n:::\n:::\n\n\n## `exec()` [Making your own ...] {-}\n\nWhat if your function doesn't have tidy dots?\n\n\nCan't use `!!` or `:=` if doesn't support rlang or dynamic dots\n\n::: {.cell}\n\n```{.r .cell-code}\nmy_mean <- function(x, arg_name, arg_val) {\n mean(x, !!arg_name := arg_val)\n}\n\nmy_mean(c(NA, 1:10), arg_name = \"na.rm\", arg_val = TRUE) \n#> Error in `my_mean()`:\n#> ! `:=` can only be used within dynamic dots.\n```\n:::\n\n\nLet's use the ... from `exec()`\n\n::: {.cell}\n\n```{.r .cell-code}\nexec(.fn, ..., .env = caller_env())\n```\n:::\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmy_mean <- function(x, arg_name, arg_val) {\n exec(\"mean\", x, !!arg_name := arg_val)\n}\n\nmy_mean(c(NA, 1:10), arg_name = \"na.rm\", arg_val = TRUE) \n#> [1] 5.5\n```\n:::\n\n\nNote that you do not unquote `arg_val`.\n \nAlso `exec` is useful for mapping over a list of functions:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- c(runif(10), NA)\nfuns <- c(\"mean\", \"median\", \"sd\")\npurrr::map_dbl(funs, exec, x, na.rm = TRUE)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 0.3631060 0.2929449 0.3164028\n```\n\n\n:::\n:::\n\n\n \n## Base R `do.call` {-}\n\n`do.call(what, args)`\n\n- `what` is a function to call\n- `args` is a list of arguments to pass to the function.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nnrow(mtcars)\n#> [1] 32\nmtcars3 <- do.call(\"rbind\", list(mtcars, mtcars, mtcars))\nnrow(mtcars3)\n#> [1] 96\n```\n:::\n\n \n\n### Exercise 19.5.5 #1 {-}\n\nOne way to implement `exec` is shown here: Describe how it works. What are the key ideas?\n\n\n::: {.cell}\n\n```{.r .cell-code}\nexec_ <- function(f, ..., .env = caller_env()){\n args <- list2(...)\n do.call(f, args, envir = .env)\n}\n```\n:::\n\n\n## Case Studies (side note)\n\nSometimes you want to run a bunch of models, without having to copy/paste each one.\n\nBUT, you also want the summary function to show the appropriate model call, \nnot one with hidden variables (e.g., `lm(y ~ x, data = data)`). \n\nWe can achieve this by building expressions and unquoting as needed:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(purrr)\n\nvars <- data.frame(x = c(\"hp\", \"hp\"),\n y = c(\"mpg\", \"cyl\"))\n\nx_sym <- syms(vars$x)\ny_sym <- syms(vars$y)\n\nformulae <- map2(x_sym, y_sym, \\(x, y) expr(!!y ~ !!x))\nformulae\n#> [[1]]\n#> mpg ~ hp\n#> \n#> [[2]]\n#> cyl ~ hp\nmodels <- map(formulae, \\(f) expr(lm(!!f, data = mtcars)))\nsummary(eval(models[[1]]))\n#> \n#> Call:\n#> lm(formula = mpg ~ hp, data = mtcars)\n#> \n#> Residuals:\n#> Min 1Q Median 3Q Max \n#> -5.7121 -2.1122 -0.8854 1.5819 8.2360 \n#> \n#> Coefficients:\n#> Estimate Std. Error t value Pr(>|t|) \n#> (Intercept) 30.09886 1.63392 18.421 < 2e-16 ***\n#> hp -0.06823 0.01012 -6.742 1.79e-07 ***\n#> ---\n#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n#> \n#> Residual standard error: 3.863 on 30 degrees of freedom\n#> Multiple R-squared: 0.6024,\tAdjusted R-squared: 0.5892 \n#> F-statistic: 45.46 on 1 and 30 DF, p-value: 1.788e-07\n```\n:::\n\n\nAs a function:\n\n::: {.cell}\n\n```{.r .cell-code}\nlm_df <- function(df, data) {\n x_sym <- map(df$x, as.symbol)\n y_sym <- map(df$y, as.symbol)\n data <- enexpr(data)\n \n formulae <- map2(x_sym, y_sym, \\(x, y) expr(!!y ~ !!x))\n models <- map(formulae, \\(f) expr(lm(!!f, !!data)))\n \n map(models, \\(m) summary(eval(m)))\n}\n\nvars <- data.frame(x = c(\"hp\", \"hp\"),\n y = c(\"mpg\", \"cyl\"))\nlm_df(vars, data = mtcars)\n#> [[1]]\n#> \n#> Call:\n#> lm(formula = mpg ~ hp, data = mtcars)\n#> \n#> Residuals:\n#> Min 1Q Median 3Q Max \n#> -5.7121 -2.1122 -0.8854 1.5819 8.2360 \n#> \n#> Coefficients:\n#> Estimate Std. Error t value Pr(>|t|) \n#> (Intercept) 30.09886 1.63392 18.421 < 2e-16 ***\n#> hp -0.06823 0.01012 -6.742 1.79e-07 ***\n#> ---\n#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n#> \n#> Residual standard error: 3.863 on 30 degrees of freedom\n#> Multiple R-squared: 0.6024,\tAdjusted R-squared: 0.5892 \n#> F-statistic: 45.46 on 1 and 30 DF, p-value: 1.788e-07\n#> \n#> \n#> [[2]]\n#> \n#> Call:\n#> lm(formula = cyl ~ hp, data = mtcars)\n#> \n#> Residuals:\n#> Min 1Q Median 3Q Max \n#> -2.27078 -0.74879 -0.06417 0.63512 1.74067 \n#> \n#> Coefficients:\n#> Estimate Std. Error t value Pr(>|t|) \n#> (Intercept) 3.006795 0.425485 7.067 7.41e-08 ***\n#> hp 0.021684 0.002635 8.229 3.48e-09 ***\n#> ---\n#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n#> \n#> Residual standard error: 1.006 on 30 degrees of freedom\n#> Multiple R-squared: 0.693,\tAdjusted R-squared: 0.6827 \n#> F-statistic: 67.71 on 1 and 30 DF, p-value: 3.478e-09\n```\n:::\n\n\n\n\n\n## Meeting Videos\n\n### Cohort 1\n\n<iframe src=\"https://www.youtube.com/embed/tbByqsRRvdE\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 2\n\n<iframe src=\"https://www.youtube.com/embed/IXE21pR8EJ0\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 3\n\n<iframe src=\"https://www.youtube.com/embed/gxSpz6IePLg\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 4\n\n<iframe src=\"https://www.youtube.com/embed/aniKrZrr4aU\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 5\n\n<iframe src=\"https://www.youtube.com/embed/klcpEb5ZBSM\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 6\n\n<iframe src=\"https://www.youtube.com/embed/OBodjc80y-E\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n<details>\n<summary> Meeting chat log </summary>\n\n```\n01:02:07\tTrevin:\tYeah, that was a great workshop\n01:02:18\tTrevin:\tGlad they posted the resources online\n01:06:39\tTrevin:\tThank you!\n```\n</details>\n\n### Cohort 7\n\n<iframe src=\"https://www.youtube.com/embed/8LPw_VTBsmQ\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n<details>\n<summary>Meeting chat log</summary>\n```\n00:50:48\tStone:\thttps://www.r-bloggers.com/2018/10/quasiquotation-in-r-via-bquote/\n00:58:26\tiPhone:\tSee ya next week!\n```\n</details>\n\n<iframe src=\"https://www.youtube.com/embed/g77Jfl_xrXM\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n<details>\n<summary>Meeting chat log</summary>\n```\n00:55:22\tcollinberke:\thttps://rlang.r-lib.org/reference/embrace-operator.html?q=enquo#under-the-hood\n```\n</details>\n", - "supporting": [], + "markdown": "---\nengine: knitr\ntitle: Quasiquotation\n---\n\n## Learning objectives:\n\n- What quasiquotation means\n- Why it's important\n- Learn some practical uses\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(rlang)\nlibrary(purrr)\n```\n:::\n\n\n## Introduction\n\nThree pillars of *tidy* evaluation\n\n 1. Quasiquotation\n 2. Quosures (chapter 20)\n 3. Data masks (Chapter 20)\n\n**Quasiquotation = quotation + unquotation**\n\n- **Quote.** Capture unevaluated expression... (\"defuse\") \n- **Unquote.** Evaluate selections of quoted expression! (\"inject\")\n- Functions that use these features are said to use Non-standard evaluation (NSE)\n- Note: related to Lisp macros, and also exists in other languages with Lisp heritage, e.g. Julia\n\n> On it's own, Quasiquotation good for programming, but combined with other tools, \n> important for data analysis.\n\n## Motivation\n\nSimple *concrete* example:\n\n`cement()` is a function that works like `paste()` but doesn't need need quotes\n\n(Think of automatically adding 'quotes' to the arguments)\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncement <- function(...) {\n args <- ensyms(...)\n paste(purrr::map(args, as_string), collapse = \" \")\n}\n\ncement(Good, morning, Hadley)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"Good morning Hadley\"\n```\n\n\n:::\n:::\n\n\nWhat if we wanted to use variables? What is an object and what should be quoted?\n\nThis is where 'unquoting' comes in!\n\n\n::: {.cell}\n\n```{.r .cell-code}\nname <- \"Bob\"\ncement(Good, afternoon, !!name) # Bang-bang!\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"Good afternoon Bob\"\n```\n\n\n:::\n:::\n\n\n## Vocabulary {-}\n\nCan think of `cement()` and `paste()` as being 'mirror-images' of each other.\n\n- `paste()` - define what to quote - **Evaluates** arguments\n- `cement()` - define what to unquote - **Quotes** arguments\n\n**Quoting function** similar to, but more precise than, **Non-standard evaluation (NSE)**\n\n- Tidyverse functions - e.g., `dplyr::mutate()`, `tidyr::pivot_longer()`\n- Base functions - e.g., `library()`, `subset()`, `with()`\n\n**Quoting function** arguments cannot be evaluated outside of function:\n\n::: {.cell}\n\n```{.r .cell-code}\ncement(Good, afternoon, Cohort) # No problem\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"Good afternoon Cohort\"\n```\n\n\n:::\n\n```{.r .cell-code}\nGood # Error!\n```\n\n::: {.cell-output .cell-output-error}\n\n```\n#> Error: object 'Good' not found\n```\n\n\n:::\n:::\n\n\n**Non-quoting (standard) function** arguments can be evaluated:\n\n::: {.cell}\n\n```{.r .cell-code}\npaste(\"Good\", \"afternoon\", \"Cohort\")\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"Good afternoon Cohort\"\n```\n\n\n:::\n\n```{.r .cell-code}\n\"Good\"\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"Good\"\n```\n\n\n:::\n:::\n\n\n\n## Quoting\n\n**Capture expressions without evaluating them**\n\n\n::: {.cell}\n::: {.cell-output .cell-output-stderr}\n\n```\n#> Warning in body[[col]][rows][!is.na(result)] <- omit_na(result): number of\n#> items to replace is not a multiple of replacement length\n```\n\n\n:::\n\n::: {.cell-output-display}\n\n```{=html}\n<div id=\"daqhfwlbgd\" style=\"padding-left:0px;padding-right:0px;padding-top:10px;padding-bottom:10px;overflow-x:auto;overflow-y:auto;width:auto;height:auto;\">\n<style>#daqhfwlbgd table {\n font-family: system-ui, 'Segoe UI', Roboto, Helvetica, Arial, sans-serif, 'Apple Color Emoji', 'Segoe UI Emoji', 'Segoe UI Symbol', 'Noto Color Emoji';\n -webkit-font-smoothing: antialiased;\n -moz-osx-font-smoothing: grayscale;\n}\n\n#daqhfwlbgd thead, #daqhfwlbgd tbody, #daqhfwlbgd tfoot, #daqhfwlbgd tr, #daqhfwlbgd td, #daqhfwlbgd th {\n border-style: none;\n}\n\n#daqhfwlbgd p {\n margin: 0;\n padding: 0;\n}\n\n#daqhfwlbgd .gt_table {\n display: table;\n border-collapse: collapse;\n line-height: normal;\n margin-left: auto;\n margin-right: auto;\n color: #333333;\n font-size: 16px;\n font-weight: normal;\n font-style: normal;\n background-color: #FFFFFF;\n width: auto;\n border-top-style: solid;\n border-top-width: 2px;\n border-top-color: #A8A8A8;\n border-right-style: none;\n border-right-width: 2px;\n border-right-color: #D3D3D3;\n border-bottom-style: solid;\n border-bottom-width: 2px;\n border-bottom-color: #A8A8A8;\n border-left-style: none;\n border-left-width: 2px;\n border-left-color: #D3D3D3;\n}\n\n#daqhfwlbgd .gt_caption {\n padding-top: 4px;\n padding-bottom: 4px;\n}\n\n#daqhfwlbgd .gt_title {\n color: #333333;\n font-size: 125%;\n font-weight: initial;\n padding-top: 4px;\n padding-bottom: 4px;\n padding-left: 5px;\n padding-right: 5px;\n border-bottom-color: #FFFFFF;\n border-bottom-width: 0;\n}\n\n#daqhfwlbgd .gt_subtitle {\n color: #333333;\n font-size: 85%;\n font-weight: initial;\n padding-top: 3px;\n padding-bottom: 5px;\n padding-left: 5px;\n padding-right: 5px;\n border-top-color: #FFFFFF;\n border-top-width: 0;\n}\n\n#daqhfwlbgd .gt_heading {\n background-color: #FFFFFF;\n text-align: center;\n border-bottom-color: #FFFFFF;\n border-left-style: none;\n border-left-width: 1px;\n border-left-color: #D3D3D3;\n border-right-style: none;\n border-right-width: 1px;\n border-right-color: #D3D3D3;\n}\n\n#daqhfwlbgd .gt_bottom_border {\n border-bottom-style: solid;\n border-bottom-width: 2px;\n border-bottom-color: #D3D3D3;\n}\n\n#daqhfwlbgd .gt_col_headings {\n border-top-style: solid;\n border-top-width: 2px;\n border-top-color: #D3D3D3;\n border-bottom-style: solid;\n border-bottom-width: 2px;\n border-bottom-color: #D3D3D3;\n border-left-style: none;\n border-left-width: 1px;\n border-left-color: #D3D3D3;\n border-right-style: none;\n border-right-width: 1px;\n border-right-color: #D3D3D3;\n}\n\n#daqhfwlbgd .gt_col_heading {\n color: #333333;\n background-color: #FFFFFF;\n font-size: 100%;\n font-weight: normal;\n text-transform: inherit;\n border-left-style: none;\n border-left-width: 1px;\n border-left-color: #D3D3D3;\n border-right-style: none;\n border-right-width: 1px;\n border-right-color: #D3D3D3;\n vertical-align: bottom;\n padding-top: 5px;\n padding-bottom: 6px;\n padding-left: 5px;\n padding-right: 5px;\n overflow-x: hidden;\n}\n\n#daqhfwlbgd .gt_column_spanner_outer {\n color: #333333;\n background-color: #FFFFFF;\n font-size: 100%;\n font-weight: normal;\n text-transform: inherit;\n padding-top: 0;\n padding-bottom: 0;\n padding-left: 4px;\n padding-right: 4px;\n}\n\n#daqhfwlbgd .gt_column_spanner_outer:first-child {\n padding-left: 0;\n}\n\n#daqhfwlbgd .gt_column_spanner_outer:last-child {\n padding-right: 0;\n}\n\n#daqhfwlbgd .gt_column_spanner {\n border-bottom-style: solid;\n border-bottom-width: 2px;\n border-bottom-color: #D3D3D3;\n vertical-align: bottom;\n padding-top: 5px;\n padding-bottom: 5px;\n overflow-x: hidden;\n display: inline-block;\n width: 100%;\n}\n\n#daqhfwlbgd .gt_spanner_row {\n border-bottom-style: hidden;\n}\n\n#daqhfwlbgd .gt_group_heading {\n padding-top: 8px;\n padding-bottom: 8px;\n padding-left: 5px;\n padding-right: 5px;\n color: #333333;\n background-color: #FFFFFF;\n font-size: 100%;\n font-weight: bold;\n text-transform: inherit;\n border-top-style: solid;\n border-top-width: 2px;\n border-top-color: #D3D3D3;\n border-bottom-style: solid;\n border-bottom-width: 2px;\n border-bottom-color: #D3D3D3;\n border-left-style: none;\n border-left-width: 1px;\n border-left-color: #D3D3D3;\n border-right-style: none;\n border-right-width: 1px;\n border-right-color: #D3D3D3;\n vertical-align: middle;\n text-align: left;\n}\n\n#daqhfwlbgd .gt_empty_group_heading {\n padding: 0.5px;\n color: #333333;\n background-color: #FFFFFF;\n font-size: 100%;\n font-weight: bold;\n border-top-style: solid;\n border-top-width: 2px;\n border-top-color: #D3D3D3;\n border-bottom-style: solid;\n border-bottom-width: 2px;\n border-bottom-color: #D3D3D3;\n vertical-align: middle;\n}\n\n#daqhfwlbgd .gt_from_md > :first-child {\n margin-top: 0;\n}\n\n#daqhfwlbgd .gt_from_md > :last-child {\n margin-bottom: 0;\n}\n\n#daqhfwlbgd .gt_row {\n padding-top: 8px;\n padding-bottom: 8px;\n padding-left: 5px;\n padding-right: 5px;\n margin: 10px;\n border-top-style: solid;\n border-top-width: 1px;\n border-top-color: #D3D3D3;\n border-left-style: none;\n border-left-width: 1px;\n border-left-color: #D3D3D3;\n border-right-style: none;\n border-right-width: 1px;\n border-right-color: #D3D3D3;\n vertical-align: middle;\n overflow-x: hidden;\n}\n\n#daqhfwlbgd .gt_stub {\n color: #333333;\n background-color: #FFFFFF;\n font-size: 100%;\n font-weight: initial;\n text-transform: inherit;\n border-right-style: solid;\n border-right-width: 2px;\n border-right-color: #D3D3D3;\n padding-left: 5px;\n padding-right: 5px;\n}\n\n#daqhfwlbgd .gt_stub_row_group {\n color: #333333;\n background-color: #FFFFFF;\n font-size: 100%;\n font-weight: initial;\n text-transform: inherit;\n border-right-style: solid;\n border-right-width: 2px;\n border-right-color: #D3D3D3;\n padding-left: 5px;\n padding-right: 5px;\n vertical-align: top;\n}\n\n#daqhfwlbgd .gt_row_group_first td {\n border-top-width: 2px;\n}\n\n#daqhfwlbgd .gt_row_group_first th {\n border-top-width: 2px;\n}\n\n#daqhfwlbgd .gt_summary_row {\n color: #333333;\n background-color: #FFFFFF;\n text-transform: inherit;\n padding-top: 8px;\n padding-bottom: 8px;\n padding-left: 5px;\n padding-right: 5px;\n}\n\n#daqhfwlbgd .gt_first_summary_row {\n border-top-style: solid;\n border-top-color: #D3D3D3;\n}\n\n#daqhfwlbgd .gt_first_summary_row.thick {\n border-top-width: 2px;\n}\n\n#daqhfwlbgd .gt_last_summary_row {\n padding-top: 8px;\n padding-bottom: 8px;\n padding-left: 5px;\n padding-right: 5px;\n border-bottom-style: solid;\n border-bottom-width: 2px;\n border-bottom-color: #D3D3D3;\n}\n\n#daqhfwlbgd .gt_grand_summary_row {\n color: #333333;\n background-color: #FFFFFF;\n text-transform: inherit;\n padding-top: 8px;\n padding-bottom: 8px;\n padding-left: 5px;\n padding-right: 5px;\n}\n\n#daqhfwlbgd .gt_first_grand_summary_row {\n padding-top: 8px;\n padding-bottom: 8px;\n padding-left: 5px;\n padding-right: 5px;\n border-top-style: double;\n border-top-width: 6px;\n border-top-color: #D3D3D3;\n}\n\n#daqhfwlbgd .gt_last_grand_summary_row_top {\n padding-top: 8px;\n padding-bottom: 8px;\n padding-left: 5px;\n padding-right: 5px;\n border-bottom-style: double;\n border-bottom-width: 6px;\n border-bottom-color: #D3D3D3;\n}\n\n#daqhfwlbgd .gt_striped {\n background-color: rgba(128, 128, 128, 0.05);\n}\n\n#daqhfwlbgd .gt_table_body {\n border-top-style: solid;\n border-top-width: 2px;\n border-top-color: #D3D3D3;\n border-bottom-style: solid;\n border-bottom-width: 2px;\n border-bottom-color: #D3D3D3;\n}\n\n#daqhfwlbgd .gt_footnotes {\n color: #333333;\n background-color: #FFFFFF;\n border-bottom-style: none;\n border-bottom-width: 2px;\n border-bottom-color: #D3D3D3;\n border-left-style: none;\n border-left-width: 2px;\n border-left-color: #D3D3D3;\n border-right-style: none;\n border-right-width: 2px;\n border-right-color: #D3D3D3;\n}\n\n#daqhfwlbgd .gt_footnote {\n margin: 0px;\n font-size: 90%;\n padding-top: 4px;\n padding-bottom: 4px;\n padding-left: 5px;\n padding-right: 5px;\n}\n\n#daqhfwlbgd .gt_sourcenotes {\n color: #333333;\n background-color: #FFFFFF;\n border-bottom-style: none;\n border-bottom-width: 2px;\n border-bottom-color: #D3D3D3;\n border-left-style: none;\n border-left-width: 2px;\n border-left-color: #D3D3D3;\n border-right-style: none;\n border-right-width: 2px;\n border-right-color: #D3D3D3;\n}\n\n#daqhfwlbgd .gt_sourcenote {\n font-size: 90%;\n padding-top: 4px;\n padding-bottom: 4px;\n padding-left: 5px;\n padding-right: 5px;\n}\n\n#daqhfwlbgd .gt_left {\n text-align: left;\n}\n\n#daqhfwlbgd .gt_center {\n text-align: center;\n}\n\n#daqhfwlbgd .gt_right {\n text-align: right;\n font-variant-numeric: tabular-nums;\n}\n\n#daqhfwlbgd .gt_font_normal {\n font-weight: normal;\n}\n\n#daqhfwlbgd .gt_font_bold {\n font-weight: bold;\n}\n\n#daqhfwlbgd .gt_font_italic {\n font-style: italic;\n}\n\n#daqhfwlbgd .gt_super {\n font-size: 65%;\n}\n\n#daqhfwlbgd .gt_footnote_marks {\n font-size: 75%;\n vertical-align: 0.4em;\n position: initial;\n}\n\n#daqhfwlbgd .gt_asterisk {\n font-size: 100%;\n vertical-align: 0;\n}\n\n#daqhfwlbgd .gt_indent_1 {\n text-indent: 5px;\n}\n\n#daqhfwlbgd .gt_indent_2 {\n text-indent: 10px;\n}\n\n#daqhfwlbgd .gt_indent_3 {\n text-indent: 15px;\n}\n\n#daqhfwlbgd .gt_indent_4 {\n text-indent: 20px;\n}\n\n#daqhfwlbgd .gt_indent_5 {\n text-indent: 25px;\n}\n\n#daqhfwlbgd .katex-display {\n display: inline-flex !important;\n margin-bottom: 0.75em !important;\n}\n\n#daqhfwlbgd div.Reactable > div.rt-table > div.rt-thead > div.rt-tr.rt-tr-group-header > div.rt-th-group:after {\n height: 0px !important;\n}\n</style>\n<table class=\"gt_table\" style=\"table-layout:fixed;\" data-quarto-disable-processing=\"false\" data-quarto-bootstrap=\"false\">\n <colgroup>\n <col style=\"width:100px;\"/>\n <col/>\n <col/>\n </colgroup>\n <thead>\n <tr class=\"gt_col_headings\">\n <th class=\"gt_col_heading gt_columns_bottom_border gt_left\" rowspan=\"1\" colspan=\"1\" style=\"text-align: center; font-weight: bold;\" scope=\"col\" id=\"t\"></th>\n <th class=\"gt_col_heading gt_columns_bottom_border gt_center\" rowspan=\"1\" colspan=\"1\" style=\"text-align: center; font-weight: bold;\" scope=\"col\" id=\"Developer\">Developer</th>\n <th class=\"gt_col_heading gt_columns_bottom_border gt_center\" rowspan=\"1\" colspan=\"1\" style=\"text-align: center; font-weight: bold;\" scope=\"col\" id=\"User\">User</th>\n </tr>\n </thead>\n <tbody class=\"gt_table_body\">\n <tr class=\"gt_group_heading_row\">\n <th colspan=\"3\" class=\"gt_group_heading\" scope=\"colgroup\" id=\"Expression (Quasiquotation)\">Expression (Quasiquotation)</th>\n </tr>\n <tr class=\"gt_row_group_first\"><td headers=\"Expression (Quasiquotation) t\" class=\"gt_row gt_left\" style=\"border-left-width: 1px; border-left-style: hidden; border-left-color: #000000; border-right-width: 1px; border-right-style: hidden; border-right-color: #000000; border-top-width: 1px; border-top-style: solid; border-top-color: #000000; border-bottom-width: 1px; border-bottom-style: hidden; border-bottom-color: #000000;\"><span data-qmd-base64=\"T25l\"><span class='gt_from_md'>One</span></span></td>\n<td headers=\"Expression (Quasiquotation) Developer\" class=\"gt_row gt_center\" style=\"border-left-width: 1px; border-left-style: hidden; border-left-color: #000000; border-right-width: 1px; border-right-style: hidden; border-right-color: #000000; border-top-width: 1px; border-top-style: solid; border-top-color: #000000; border-bottom-width: 1px; border-bottom-style: hidden; border-bottom-color: #000000;\"><span data-qmd-base64=\"YGV4cHIoKWA=\"><span class='gt_from_md'><code>expr()</code></span></span></td>\n<td headers=\"Expression (Quasiquotation) User\" class=\"gt_row gt_center\" style=\"border-left-width: 1px; border-left-style: hidden; border-left-color: #000000; border-right-width: 1px; border-right-style: hidden; border-right-color: #000000; border-top-width: 1px; border-top-style: solid; border-top-color: #000000; border-bottom-width: 1px; border-bottom-style: hidden; border-bottom-color: #000000;\"><span data-qmd-base64=\"YGVuZXhwcigpYA==\"><span class='gt_from_md'><code>enexpr()</code></span></span></td></tr>\n <tr><td headers=\"Expression (Quasiquotation) t\" class=\"gt_row gt_left\" style=\"border-left-width: 1px; border-left-style: hidden; border-left-color: #000000; border-right-width: 1px; border-right-style: hidden; border-right-color: #000000; border-top-width: 1px; border-top-style: hidden; border-top-color: #000000; border-bottom-width: 1px; border-bottom-style: solid; border-bottom-color: #000000;\"><span data-qmd-base64=\"TWFueQ==\"><span class='gt_from_md'>Many</span></span></td>\n<td headers=\"Expression (Quasiquotation) Developer\" class=\"gt_row gt_center\" style=\"border-left-width: 1px; border-left-style: hidden; border-left-color: #000000; border-right-width: 1px; border-right-style: hidden; border-right-color: #000000; border-top-width: 1px; border-top-style: hidden; border-top-color: #000000; border-bottom-width: 1px; border-bottom-style: solid; border-bottom-color: #000000;\"><span data-qmd-base64=\"YGV4cHJzKClg\"><span class='gt_from_md'><code>exprs()</code></span></span></td>\n<td headers=\"Expression (Quasiquotation) User\" class=\"gt_row gt_center\" style=\"border-left-width: 1px; border-left-style: hidden; border-left-color: #000000; border-right-width: 1px; border-right-style: hidden; border-right-color: #000000; border-top-width: 1px; border-top-style: hidden; border-top-color: #000000; border-bottom-width: 1px; border-bottom-style: solid; border-bottom-color: #000000;\"><span data-qmd-base64=\"YGVuZXhwcnMoKWA=\"><span class='gt_from_md'><code>enexprs()</code></span></span></td></tr>\n <tr class=\"gt_group_heading_row\">\n <th colspan=\"3\" class=\"gt_group_heading\" scope=\"colgroup\" id=\"Symbol (Quasiquotation)\">Symbol (Quasiquotation)</th>\n </tr>\n <tr class=\"gt_row_group_first\"><td headers=\"Symbol (Quasiquotation) t\" class=\"gt_row gt_left\" style=\"border-left-width: 1px; border-left-style: hidden; border-left-color: #000000; border-right-width: 1px; border-right-style: hidden; border-right-color: #000000; border-top-width: 1px; border-top-style: solid; border-top-color: #000000; border-bottom-width: 1px; border-bottom-style: hidden; border-bottom-color: #000000;\"><span data-qmd-base64=\"T25l\"><span class='gt_from_md'>One</span></span></td>\n<td headers=\"Symbol (Quasiquotation) Developer\" class=\"gt_row gt_center\" style=\"border-left-width: 1px; border-left-style: hidden; border-left-color: #000000; border-right-width: 1px; border-right-style: hidden; border-right-color: #000000; border-top-width: 1px; border-top-style: solid; border-top-color: #000000; border-bottom-width: 1px; border-bottom-style: hidden; border-bottom-color: #000000;\"><span data-qmd-base64=\"YGV4cHIoKWA=\"><span class='gt_from_md'><code>expr()</code></span></span></td>\n<td headers=\"Symbol (Quasiquotation) User\" class=\"gt_row gt_center\" style=\"border-left-width: 1px; border-left-style: hidden; border-left-color: #000000; border-right-width: 1px; border-right-style: hidden; border-right-color: #000000; border-top-width: 1px; border-top-style: solid; border-top-color: #000000; border-bottom-width: 1px; border-bottom-style: hidden; border-bottom-color: #000000;\"><span data-qmd-base64=\"YGVuc3ltKClg\"><span class='gt_from_md'><code>ensym()</code></span></span></td></tr>\n <tr><td headers=\"Symbol (Quasiquotation) t\" class=\"gt_row gt_left\" style=\"border-left-width: 1px; border-left-style: hidden; border-left-color: #000000; border-right-width: 1px; border-right-style: hidden; border-right-color: #000000; border-top-width: 1px; border-top-style: hidden; border-top-color: #000000; border-bottom-width: 1px; border-bottom-style: hidden; border-bottom-color: #000000;\"><span data-qmd-base64=\"TWFueQ==\"><span class='gt_from_md'>Many</span></span></td>\n<td headers=\"Symbol (Quasiquotation) Developer\" class=\"gt_row gt_center\" style=\"border-left-width: 1px; border-left-style: hidden; border-left-color: #000000; border-right-width: 1px; border-right-style: hidden; border-right-color: #000000; border-top-width: 1px; border-top-style: hidden; border-top-color: #000000; border-bottom-width: 1px; border-bottom-style: hidden; border-bottom-color: #000000;\"><span data-qmd-base64=\"YGV4cHJzKClg\"><span class='gt_from_md'><code>exprs()</code></span></span></td>\n<td headers=\"Symbol (Quasiquotation) User\" class=\"gt_row gt_center\" style=\"border-left-width: 1px; border-left-style: hidden; border-left-color: #000000; border-right-width: 1px; border-right-style: hidden; border-right-color: #000000; border-top-width: 1px; border-top-style: hidden; border-top-color: #000000; border-bottom-width: 1px; border-bottom-style: hidden; border-bottom-color: #000000;\"><span data-qmd-base64=\"YGVuc3ltcygpYA==\"><span class='gt_from_md'><code>ensyms()</code></span></span></td></tr>\n <tr class=\"gt_group_heading_row\">\n <th colspan=\"3\" class=\"gt_group_heading\" scope=\"colgroup\" id=\"R Base (Quotation)\">R Base (Quotation)</th>\n </tr>\n <tr class=\"gt_row_group_first\"><td headers=\"R Base (Quotation) t\" class=\"gt_row gt_left\" style=\"border-left-width: 1px; border-left-style: hidden; border-left-color: #000000; border-right-width: 1px; border-right-style: hidden; border-right-color: #000000; border-top-width: 1px; border-top-style: solid; border-top-color: #000000; border-bottom-width: 1px; border-bottom-style: hidden; border-bottom-color: #000000;\"><span data-qmd-base64=\"T25l\"><span class='gt_from_md'>One</span></span></td>\n<td headers=\"R Base (Quotation) Developer\" class=\"gt_row gt_center\" style=\"border-left-width: 1px; border-left-style: hidden; border-left-color: #000000; border-right-width: 1px; border-right-style: hidden; border-right-color: #000000; border-top-width: 1px; border-top-style: solid; border-top-color: #000000; border-bottom-width: 1px; border-bottom-style: hidden; border-bottom-color: #000000;\"><span data-qmd-base64=\"YHF1b3RlKClg\"><span class='gt_from_md'><code>quote()</code></span></span></td>\n<td headers=\"R Base (Quotation) User\" class=\"gt_row gt_center\" style=\"border-left-width: 1px; border-left-style: hidden; border-left-color: #000000; border-right-width: 1px; border-right-style: hidden; border-right-color: #000000; border-top-width: 1px; border-top-style: solid; border-top-color: #000000; border-bottom-width: 1px; border-bottom-style: hidden; border-bottom-color: #000000;\"><span data-qmd-base64=\"YGFsaXN0KClg\"><span class='gt_from_md'><code>alist()</code></span></span></td></tr>\n <tr><td headers=\"R Base (Quotation) t\" class=\"gt_row gt_left\" style=\"border-left-width: 1px; border-left-style: hidden; border-left-color: #000000; border-right-width: 1px; border-right-style: hidden; border-right-color: #000000; border-top-width: 1px; border-top-style: hidden; border-top-color: #000000; border-bottom-width: 1px; border-bottom-style: solid; border-bottom-color: #000000;\"><span data-qmd-base64=\"TWFueQ==\"><span class='gt_from_md'>Many</span></span></td>\n<td headers=\"R Base (Quotation) Developer\" class=\"gt_row gt_center\" style=\"border-left-width: 1px; border-left-style: hidden; border-left-color: #000000; border-right-width: 1px; border-right-style: hidden; border-right-color: #000000; border-top-width: 1px; border-top-style: hidden; border-top-color: #000000; border-bottom-width: 1px; border-bottom-style: solid; border-bottom-color: #000000;\"><span data-qmd-base64=\"YHN1YnN0aXR1dGUoKWA=\"><span class='gt_from_md'><code>substitute()</code></span></span></td>\n<td headers=\"R Base (Quotation) User\" class=\"gt_row gt_center\" style=\"border-left-width: 1px; border-left-style: hidden; border-left-color: #000000; border-right-width: 1px; border-right-style: hidden; border-right-color: #000000; border-top-width: 1px; border-top-style: hidden; border-top-color: #000000; border-bottom-width: 1px; border-bottom-style: solid; border-bottom-color: #000000;\"><span data-qmd-base64=\"YGFzLmxpc3Qoc3Vic3RpdHV0ZSguLi4oKSkpYA==\"><span class='gt_from_md'><code>as.list(substitute(...()))</code></span></span></td></tr>\n </tbody>\n \n \n</table>\n</div>\n```\n\n:::\n:::\n\n\n- Non-base functions are from **rlang**\n- **Developer** - From you, direct, fixed, interactive\n- **User** - From the user, indirect, varying, programmatic\n\nAlso: \n\n- `bquote()` provides a limited form of quasiquotation\n- `~`, the formula, is a quoting function (see [Section 20.3.4](https://adv-r.hadley.nz/evaluation.html#quosure-impl))\n\n### `expr()` and `exprs()` {-}\n\n::: {.cell}\n\n```{.r .cell-code}\nexpr(x + y)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> x + y\n```\n\n\n:::\n\n```{.r .cell-code}\nexprs(exp1 = x + y, exp2 = x * y)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> $exp1\n#> x + y\n#> \n#> $exp2\n#> x * y\n```\n\n\n:::\n:::\n\n\n### `enexpr()`^[`enexpr()` = **en**rich `expr()`] and `enexprs()` {-}\n\n::: {.cell}\n\n```{.r .cell-code}\nf <- function(x) enexpr(x)\nf(a + b + c)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> a + b + c\n```\n\n\n:::\n\n```{.r .cell-code}\nf2 <- function(x, y) enexprs(exp1 = x, exp2 = y)\nf2(x = a + b, y = c + d)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> $exp1\n#> a + b\n#> \n#> $exp2\n#> c + d\n```\n\n\n:::\n:::\n\n\n### `ensym()` and `ensyms()` {-}\n\n- **[Remember](https://adv-r.hadley.nz/expressions.html#symbols):** Symbol represents the name of an object. Can only be length 1.\n- These are stricter than `enexpr/s()`\n\n\n::: {.cell}\n\n```{.r .cell-code}\nf <- function(x) ensym(x)\nf(a)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> a\n```\n\n\n:::\n\n```{.r .cell-code}\nf2 <- function(x, y) ensyms(sym1 = x, sym2 = y)\nf2(x = a, y = \"b\")\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> $sym1\n#> a\n#> \n#> $sym2\n#> b\n```\n\n\n:::\n:::\n\n\n\n## Unquoting\n\n**Selectively evaluate parts of an expression**\n\n- Merges ASTs with template\n- 1 argument `!!` (**unquote**, **bang-bang**)\n - Unquoting a *function call* evaluates and returns results\n - Unquoting a *function (name)* replaces the function (alternatively use `call2()`)\n- \\>1 arguments `!!!` (**unquote-splice**, **bang-bang-bang**, **triple bang**)\n- `!!` and `!!!` only work like this inside quoting function using rlang\n\n### Basic unquoting {-}\n\n**One argument**\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- expr(a + b)\ny <- expr(c / d)\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nexpr(f(x, y)) # No unquoting\n#> f(x, y)\nexpr(f(!!x, !!y)) # Unquoting\n#> f(a + b, c/d)\n```\n:::\n\n\n**Multiple arguments**\n\n::: {.cell}\n\n```{.r .cell-code}\nz <- exprs(a + b, c + d)\nw <- exprs(exp1 = a + b, exp2 = c + d)\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nexpr(f(z)) # No unquoting\n#> f(z)\nexpr(f(!!!z)) # Unquoting\n#> f(a + b, c + d)\nexpr(f(!!!w)) # Unquoting when named\n#> f(exp1 = a + b, exp2 = c + d)\n```\n:::\n\n\n\n### Special usages or cases {-}\n\nFor example, get the AST of an expression\n\n::: {.cell}\n\n```{.r .cell-code}\nlobstr::ast(x)\n#> x\nlobstr::ast(!!x)\n#> █─`+` \n#> ├─a \n#> └─b\n```\n:::\n\n\n\nUnquote *function call*\n\n::: {.cell}\n\n```{.r .cell-code}\nexpr(f(!!mean(c(100, 200, 300)), y))\n#> f(200, y)\n```\n:::\n\n\nUnquote *function*\n\n::: {.cell}\n\n```{.r .cell-code}\nf <- expr(sd)\nexpr((!!f)(x))\n#> sd(x)\nexpr((!!f)(!!x + !!y))\n#> sd(a + b + c/d)\n```\n:::\n\n\n## Non-quoting\n\nOnly `bquote()` provides a limited form of quasiquotation.\n\nThe rest of base selectively uses or does not use quoting (rather than unquoting). \n\nFour basic forms of quoting/non-quoting:\n\n1. **Pair of functions** - Quoting and non-quoting\n - e.g., `$` (quoting) and `[[` (non-quoting)\n2. **Pair of Arguments** - Quoting and non-quoting\n - e.g., `rm(...)` (quoting) and `rm(list = c(...))` (non-quoting)\n3. **Arg to control quoting**\n - e.g., `library(rlang)` (quoting) and `library(pkg, character.only = TRUE)` (where `pkg <- \"rlang\"`)\n4. **Quote if evaluation fails**\n - `help(var)` - Quote, show help for var\n - `help(var)` (where `var <- \"mean\"`) - No quote, show help for mean\n - `help(var)` (where `var <- 10`) - Quote fails, show help for var\n\n\n## ... (dot-dot-dot) [When using ... with quoting]\n\n- Sometimes need to supply an *arbitrary* list of expressions or arguments in a function (`...`)\n- But need a way to use these when we don't necessarily have the names\n- Remember `!!` and `!!!` only work with functions that use rlang\n- Can use `list2(...)` to turn `...` into \"tidy dots\" which *can* be unquoted and spliced\n- Require `list2()` if going to be passing or using `!!` or `!!!` in `...`\n- `list2()` is a wrapper around `dots_list()` with the most common defaults\n\n**No need for `list2()`**\n\n::: {.cell}\n\n```{.r .cell-code}\nd <- function(...) data.frame(list(...))\nd(x = c(1:3), y = c(2, 4, 6))\n#> x y\n#> 1 1 2\n#> 2 2 4\n#> 3 3 6\n```\n:::\n\n\n**Require `list2()`**\n\n::: {.cell}\n\n```{.r .cell-code}\nvars <- list(x = c(1:3), y = c(2, 4, 6))\nd(!!!vars)\n#> Error in !vars: invalid argument type\nd2 <- function(...) data.frame(list2(...))\nd2(!!!vars)\n#> x y\n#> 1 1 2\n#> 2 2 4\n#> 3 3 6\n# Same result but x and y evaluated later\nvars_expr <- exprs(x = c(1:3), y = c(2, 4, 6))\nd2(!!!vars_expr) \n#> x y\n#> 1 1 2\n#> 2 2 4\n#> 3 3 6\n```\n:::\n\n\nGetting argument names (symbols) from variables\n\n::: {.cell}\n\n```{.r .cell-code}\nnm <- \"z\"\nval <- letters[1:4]\nd2(x = 1:4, !!nm := val)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> x z\n#> 1 1 a\n#> 2 2 b\n#> 3 3 c\n#> 4 4 d\n```\n\n\n:::\n:::\n\n\n## `exec()` [Making your own ...] {-}\n\nWhat if your function doesn't have tidy dots?\n\n\nCan't use `!!` or `:=` if doesn't support rlang or dynamic dots\n\n::: {.cell}\n\n```{.r .cell-code}\nmy_mean <- function(x, arg_name, arg_val) {\n mean(x, !!arg_name := arg_val)\n}\n\nmy_mean(c(NA, 1:10), arg_name = \"na.rm\", arg_val = TRUE) \n#> Error in `my_mean()`:\n#> ! `:=` can only be used within dynamic dots.\n```\n:::\n\n\nLet's use the ... from `exec()`\n\n::: {.cell}\n\n```{.r .cell-code}\nexec(.fn, ..., .env = caller_env())\n```\n:::\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmy_mean <- function(x, arg_name, arg_val) {\n exec(\"mean\", x, !!arg_name := arg_val)\n}\n\nmy_mean(c(NA, 1:10), arg_name = \"na.rm\", arg_val = TRUE) \n#> [1] 5.5\n```\n:::\n\n\nNote that you do not unquote `arg_val`.\n \nAlso `exec` is useful for mapping over a list of functions:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- c(runif(10), NA)\nfuns <- c(\"mean\", \"median\", \"sd\")\npurrr::map_dbl(funs, exec, x, na.rm = TRUE)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 0.4445205 0.4886247 0.3166360\n```\n\n\n:::\n:::\n\n\n \n## Base R `do.call` {-}\n\n`do.call(what, args)`\n\n- `what` is a function to call\n- `args` is a list of arguments to pass to the function.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nnrow(mtcars)\n#> [1] 32\nmtcars3 <- do.call(\"rbind\", list(mtcars, mtcars, mtcars))\nnrow(mtcars3)\n#> [1] 96\n```\n:::\n\n \n\n### Exercise 19.5.5 #1 {-}\n\nOne way to implement `exec` is shown here: Describe how it works. What are the key ideas?\n\n\n::: {.cell}\n\n```{.r .cell-code}\nexec_ <- function(f, ..., .env = caller_env()){\n args <- list2(...)\n do.call(f, args, envir = .env)\n}\n```\n:::\n\n\n## Case Studies (side note)\n\nSometimes you want to run a bunch of models, without having to copy/paste each one.\n\nBUT, you also want the summary function to show the appropriate model call, \nnot one with hidden variables (e.g., `lm(y ~ x, data = data)`). \n\nWe can achieve this by building expressions and unquoting as needed:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(purrr)\n\nvars <- data.frame(x = c(\"hp\", \"hp\"),\n y = c(\"mpg\", \"cyl\"))\n\nx_sym <- syms(vars$x)\ny_sym <- syms(vars$y)\n\nformulae <- map2(x_sym, y_sym, \\(x, y) expr(!!y ~ !!x))\nformulae\n#> [[1]]\n#> mpg ~ hp\n#> \n#> [[2]]\n#> cyl ~ hp\nmodels <- map(formulae, \\(f) expr(lm(!!f, data = mtcars)))\nsummary(eval(models[[1]]))\n#> \n#> Call:\n#> lm(formula = mpg ~ hp, data = mtcars)\n#> \n#> Residuals:\n#> Min 1Q Median 3Q Max \n#> -5.7121 -2.1122 -0.8854 1.5819 8.2360 \n#> \n#> Coefficients:\n#> Estimate Std. Error t value Pr(>|t|) \n#> (Intercept) 30.09886 1.63392 18.421 < 2e-16 ***\n#> hp -0.06823 0.01012 -6.742 1.79e-07 ***\n#> ---\n#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n#> \n#> Residual standard error: 3.863 on 30 degrees of freedom\n#> Multiple R-squared: 0.6024,\tAdjusted R-squared: 0.5892 \n#> F-statistic: 45.46 on 1 and 30 DF, p-value: 1.788e-07\n```\n:::\n\n\nAs a function:\n\n::: {.cell}\n\n```{.r .cell-code}\nlm_df <- function(df, data) {\n x_sym <- map(df$x, as.symbol)\n y_sym <- map(df$y, as.symbol)\n data <- enexpr(data)\n \n formulae <- map2(x_sym, y_sym, \\(x, y) expr(!!y ~ !!x))\n models <- map(formulae, \\(f) expr(lm(!!f, !!data)))\n \n map(models, \\(m) summary(eval(m)))\n}\n\nvars <- data.frame(x = c(\"hp\", \"hp\"),\n y = c(\"mpg\", \"cyl\"))\nlm_df(vars, data = mtcars)\n#> [[1]]\n#> \n#> Call:\n#> lm(formula = mpg ~ hp, data = mtcars)\n#> \n#> Residuals:\n#> Min 1Q Median 3Q Max \n#> -5.7121 -2.1122 -0.8854 1.5819 8.2360 \n#> \n#> Coefficients:\n#> Estimate Std. Error t value Pr(>|t|) \n#> (Intercept) 30.09886 1.63392 18.421 < 2e-16 ***\n#> hp -0.06823 0.01012 -6.742 1.79e-07 ***\n#> ---\n#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n#> \n#> Residual standard error: 3.863 on 30 degrees of freedom\n#> Multiple R-squared: 0.6024,\tAdjusted R-squared: 0.5892 \n#> F-statistic: 45.46 on 1 and 30 DF, p-value: 1.788e-07\n#> \n#> \n#> [[2]]\n#> \n#> Call:\n#> lm(formula = cyl ~ hp, data = mtcars)\n#> \n#> Residuals:\n#> Min 1Q Median 3Q Max \n#> -2.27078 -0.74879 -0.06417 0.63512 1.74067 \n#> \n#> Coefficients:\n#> Estimate Std. Error t value Pr(>|t|) \n#> (Intercept) 3.006795 0.425485 7.067 7.41e-08 ***\n#> hp 0.021684 0.002635 8.229 3.48e-09 ***\n#> ---\n#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n#> \n#> Residual standard error: 1.006 on 30 degrees of freedom\n#> Multiple R-squared: 0.693,\tAdjusted R-squared: 0.6827 \n#> F-statistic: 67.71 on 1 and 30 DF, p-value: 3.478e-09\n```\n:::\n\n", + "supporting": [ + "19_files" + ], "filters": [ "rmarkdown/pagebreak.lua" ], diff --git a/_freeze/slides/20/execute-results/html.json b/_freeze/slides/20/execute-results/html.json @@ -1,9 +1,11 @@ { - "hash": "76ffa7bffe7d3780f5454ae5d6beca80", + "hash": "4b17f38460385558b331471436bda17a", "result": { "engine": "knitr", - "markdown": "---\nengine: knitr\ntitle: Evaluation\n---\n\n## Learning objectives:\n\n- Learn evaluation basics\n- Learn about **quosures** and **data mask**\n- Understand tidy evaluation\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(rlang)\nlibrary(purrr)\n```\n:::\n\n\n## A bit of a recap\n\n- Metaprogramming: To separate our description of the action from the action itself - Separate the code from its evaluation.\n- Quasiquotation: combine code written by the *function's author* with code written by the *function's user*.\n - Unquotation: it gives the *user* the ability to evaluate parts of a quoted argument.\n - Evaluation: it gives the *developer* the ability to evluated quoted expression in custom environments.\n\n**Tidy evaluation**: quasiquotation, quosures and data masks\n\n## Evaluation basics \n\nWe use `eval()` to evaluate, run, or execute expressions. It requires two arguments: \n\n- `expr`: the object to evaluate, either an expression or a symbol.\n- `env`: the environment in which to evaluate the expression or where to look for the values. \nDefaults to current env.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsumexpr <- expr(x + y)\nx <- 10\ny <- 40\neval(sumexpr)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 50\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\neval(sumexpr, envir = env(x = 1000, y = 10))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1010\n```\n\n\n:::\n:::\n\n\n\n## Application: reimplementing `source()`\n\nWhat do we need?\n\n- Read the file being sourced. \n- Parse its expressions (quote them?)\n- Evaluate each expression saving the results \n- Return the results\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsource2 <- function(path, env = caller_env()) {\n file <- paste(readLines(path, warn = FALSE), collapse = \"\\n\")\n exprs <- parse_exprs(file)\n\n res <- NULL\n for (i in seq_along(exprs)) {\n res <- eval(exprs[[i]], env)\n }\n\n invisible(res)\n}\n```\n:::\n\n\nThe real source is much more complex.\n\n## Quosures\n\n**quosures** are a data structure from `rlang` containing both and expression and an environment\n\n*Quoting* + *closure* because it quotes the expression and encloses the environment.\n\nThree ways to create them:\n\n- Used mostly for learning: `new_quosure()`, creates a quosure from its components.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nq1 <- rlang::new_quosure(expr(x + y), \n env(x = 1, y = 10))\n```\n:::\n\n\nWith a quosure, we can use `eval_tidy()` directly. \n\n\n::: {.cell}\n\n```{.r .cell-code}\nrlang::eval_tidy(q1)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 11\n```\n\n\n:::\n:::\n\n\nAnd get its components\n\n\n::: {.cell}\n\n```{.r .cell-code}\nrlang::get_expr(q1)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> x + y\n```\n\n\n:::\n\n```{.r .cell-code}\nrlang::get_env(q1)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> <environment: 0x00000226d60330a8>\n```\n\n\n:::\n:::\n\n\nOr set them\n\n\n::: {.cell}\n\n```{.r .cell-code}\nq1 <- set_env(q1, env(x = 3, y = 4))\neval_tidy(q1)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 7\n```\n\n\n:::\n:::\n\n\n\n- Used in the real world: `enquo()` o `enquos()`, to capture user supplied expressions. They take the environment from where they're created. \n\n\n::: {.cell}\n\n```{.r .cell-code}\nfoo <- function(x) enquo(x)\nquo_foo <- foo(a + b)\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nget_expr(quo_foo)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> a + b\n```\n\n\n:::\n\n```{.r .cell-code}\nget_env(quo_foo)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> <environment: R_GlobalEnv>\n```\n\n\n:::\n:::\n\n\n- Almost never used: `quo()` and `quos()`, to match to `expr()` and `exprs()`.\n\n## Quosures and `...`\n\nQuosures are just a convenience, but they are essential when it comes to working with `...`, because you can have each argument from `...` associated with a different environment. \n\n\n::: {.cell}\n\n```{.r .cell-code}\ng <- function(...) {\n ## Creating our quosures from ...\n enquos(...)\n}\n\ncreateQuos <- function(...) {\n ## symbol from the function environment\n x <- 1\n g(..., f = x)\n}\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\n## symbol from the global environment\nx <- 0\nqs <- createQuos(global = x)\nqs\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> <list_of<quosure>>\n#> \n#> $global\n#> <quosure>\n#> expr: ^x\n#> env: global\n#> \n#> $f\n#> <quosure>\n#> expr: ^x\n#> env: 0x00000226d3a61a88\n```\n\n\n:::\n:::\n\n\n## Other facts about quosures\n\nFormulas were the inspiration for closures because they also capture an expression and an environment\n\n\n::: {.cell}\n\n```{.r .cell-code}\nf <- ~runif(3)\nstr(f)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> Class 'formula' language ~runif(3)\n#> ..- attr(*, \".Environment\")=<environment: R_GlobalEnv>\n```\n\n\n:::\n:::\n\n\nThere was an early version of tidy evaluation with formulas, but there's no easy way to implement quasiquotation with them. \n\nThey are actually call objects \n\n\n::: {.cell}\n\n```{.r .cell-code}\nq4 <- new_quosure(expr(x + y + z))\nclass(q4)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"quosure\" \"formula\"\n```\n\n\n:::\n\n```{.r .cell-code}\nis.call(q4)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] TRUE\n```\n\n\n:::\n:::\n\n\nwith an attribute to store the environment\n\n\n::: {.cell}\n\n```{.r .cell-code}\nattr(q4, \".Environment\")\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> <environment: R_GlobalEnv>\n```\n\n\n:::\n:::\n\n\n\n**Nested quosures**\n\nWith quosiquotation we can embed quosures in expressions. \n\n\n::: {.cell}\n\n```{.r .cell-code}\nq2 <- new_quosure(expr(x), env(x = 1))\nq3 <- new_quosure(expr(x), env(x = 100))\n\nnq <- expr(!!q2 + !!q3)\n```\n:::\n\n\nAnd evaluate them \n\n\n::: {.cell}\n\n```{.r .cell-code}\neval_tidy(nq)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 101\n```\n\n\n:::\n:::\n\n\nBut for printing it's better to use `expr_print(x)` \n\n\n::: {.cell}\n\n```{.r .cell-code}\nexpr_print(nq)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> (^x) + (^x)\n```\n\n\n:::\n\n```{.r .cell-code}\nnq\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> (~x) + ~x\n```\n\n\n:::\n:::\n\n\n## Data mask\n\nA data frame where the evaluated code will look first for its variable definitions. \n\nUsed in packages like dplyr and ggplot. \n\nTo use it we need to supply the data mask as a second argument to `eval_tidy()`\n\n\n::: {.cell}\n\n```{.r .cell-code}\nq1 <- new_quosure(expr(x * y), env(x = 100))\ndf <- data.frame(y = 1:10)\n\neval_tidy(q1, df)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 100 200 300 400 500 600 700 800 900 1000\n```\n\n\n:::\n:::\n\n\nEverything together, in one function. \n\n\n::: {.cell}\n\n```{.r .cell-code}\nwith2 <- function(data, expr) {\n expr <- enquo(expr)\n eval_tidy(expr, data)\n}\n```\n:::\n\n\nBut we need to create the objects that are not part of our data mask\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- 100\nwith2(df, x * y)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 100 200 300 400 500 600 700 800 900 1000\n```\n\n\n:::\n:::\n\n\nAlso doable with `base::eval()` instead of `rlang::eval_tidy()` but we have to use `base::substitute()` instead of `enquo()` (like we did for `enexpr()`) and we need to specify the environment.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nwith3 <- function(data, expr) {\n expr <- substitute(expr)\n eval(expr, data, caller_env())\n}\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nwith3(df, x*y)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 100 200 300 400 500 600 700 800 900 1000\n```\n\n\n:::\n:::\n\n\n## Pronouns: .data$ and .env$\n\n**Ambiguity!!**\n\nAn object value can come from the env or from the data mask\n\n\n::: {.cell}\n\n```{.r .cell-code}\nq1 <- new_quosure(expr(x * y + x), env = env(x = 1))\ndf <- data.frame(y = 1:5, \n x = 10)\n\neval_tidy(q1, df)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 20 30 40 50 60\n```\n\n\n:::\n:::\n\n\nWe use pronouns: \n\n- `.data$x`: `x` from the data mask\n- `.env$x`: `x` from the environment\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nq1 <- new_quosure(expr(.data$x * y + .env$x), env = env(x = 1))\neval_tidy(q1, df)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 11 21 31 41 51\n```\n\n\n:::\n:::\n\n\n## Application: reimplementing `base::subset()`\n\n`base::subset()` works like `dplyr::filter()`: it selects rows of a data frame given an expression. \n\nWhat do we need?\n\n- Quote the expression to filter\n- Figure out which rows in the data frame pass the filter\n- Subset the data frame\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsubset2 <- function(data, rows) {\n rows <- enquo(rows)\n rows_val <- eval_tidy(rows, data)\n stopifnot(is.logical(rows_val))\n\n data[rows_val, , drop = FALSE]\n}\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsample_df <- data.frame(a = 1:5, b = 5:1, c = c(5, 3, 2, 4, 1))\n\n# Shorthand for sample_df[sample_df$b == sample_df$c, ]\nsubset2(sample_df, b == c)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> a b c\n#> 1 1 5 5\n#> 5 5 1 1\n```\n\n\n:::\n:::\n\n\n## Using tidy evaluation\n\nMost of the time we might not call it directly, but call a function that uses `eval_tidy()` (becoming developer AND user)\n\n**Use case**: resample and subset\n\nWe have a function that resamples a dataset: \n\n\n::: {.cell}\n\n```{.r .cell-code}\nresample <- function(df, n) {\n idx <- sample(nrow(df), n, replace = TRUE)\n df[idx, , drop = FALSE]\n}\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nresample(sample_df, 10)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> a b c\n#> 2 2 4 3\n#> 4 4 2 4\n#> 1 1 5 5\n#> 4.1 4 2 4\n#> 2.1 2 4 3\n#> 4.2 4 2 4\n#> 5 5 1 1\n#> 5.1 5 1 1\n#> 4.3 4 2 4\n#> 4.4 4 2 4\n```\n\n\n:::\n:::\n\n\nBut we also want to use subset and we want to create a function that allow us to resample and subset (with `subset2()`) in a single step. \n\nFirst attempt: \n\n\n::: {.cell}\n\n```{.r .cell-code}\nsubsample <- function(df, cond, n = nrow(df)) {\n df <- subset2(df, cond)\n resample(df, n)\n}\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsubsample(sample_df, b == c, 10)\n```\n\n::: {.cell-output .cell-output-error}\n\n```\n#> Error: object 'b' not found\n```\n\n\n:::\n:::\n\n\nWhat happened? \n\n`subsample()` doesn't quote any arguments and `cond` is evaluated normally\n\nSo we have to quote `cond` and unquote it when we pass it to `subset2()`\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsubsample <- function(df, cond, n = nrow(df)) {\n cond <- enquo(cond)\n\n df <- subset2(df, !!cond)\n resample(df, n)\n}\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsubsample(sample_df, b == c, 10)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> a b c\n#> 1 1 5 5\n#> 1.1 1 5 5\n#> 5 5 1 1\n#> 1.2 1 5 5\n#> 5.1 5 1 1\n#> 5.2 5 1 1\n#> 5.3 5 1 1\n#> 1.3 1 5 5\n#> 5.4 5 1 1\n#> 5.5 5 1 1\n```\n\n\n:::\n:::\n\n\n**Be careful!**, potential ambiguity:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nthreshold_x <- function(df, val) {\n subset2(df, x >= val)\n}\n```\n:::\n\n\nWhat would happen if `x` exists in the calling environment but doesn't exist in `df`? Or if `val` also exists in `df`?\n\nSo, as developers of `threshold_x()` and users of `subset2()`, we have to add some pronouns:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nthreshold_x <- function(df, val) {\n subset2(df, .data$x >= .env$val)\n}\n```\n:::\n\n\n\nJust remember: \n\n> As a general rule of thumb, as a function author it’s your responsibility \n> to avoid ambiguity with any expressions that you create; \n> it’s the user’s responsibility to avoid ambiguity in expressions that they create.\n\n\n## Base evaluation\n\nCheck 20.6 in the book!\n\n## Meeting Videos\n\n### Cohort 1\n\n<iframe src=\"https://www.youtube.com/embed/4En_Ypvtjqw\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 2\n\n<iframe src=\"https://www.youtube.com/embed/ewHAlVwCGtY\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 3\n\n<iframe src=\"https://www.youtube.com/embed/0K1vyiV8_qo\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 4\n\n<iframe src=\"https://www.youtube.com/embed/kfwjJDuyN8U\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 5\n\n<iframe src=\"https://www.youtube.com/embed/WzfD9GK6nCI\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 6\n\n<iframe src=\"https://www.youtube.com/embed/8FT2BA18Ghg\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n<details>\n<summary> Meeting chat log </summary>\n\n```\n01:00:42\tTrevin:\tThey just want to help you present that’s all\n```\n</details>\n\n### Cohort 7\n\n<iframe src=\"https://www.youtube.com/embed/g77Jfl_xrXM\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n<details>\n<summary>Meeting chat log</summary>\n```\n00:55:22\tcollinberke:\thttps://rlang.r-lib.org/reference/embrace-operator.html?q=enquo#under-the-hood\n```\n</details>\n\n<iframe src=\"https://www.youtube.com/embed/wPLrafScijE\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n", - "supporting": [], + "markdown": "---\nengine: knitr\ntitle: Evaluation\n---\n\n## Learning objectives:\n\n- Learn evaluation basics\n- Learn about **quosures** and **data mask**\n- Understand tidy evaluation\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(rlang)\nlibrary(purrr)\n```\n:::\n\n\n## A bit of a recap\n\n- Metaprogramming: To separate our description of the action from the action itself - Separate the code from its evaluation.\n- Quasiquotation: combine code written by the *function's author* with code written by the *function's user*.\n - Unquotation: it gives the *user* the ability to evaluate parts of a quoted argument.\n - Evaluation: it gives the *developer* the ability to evluated quoted expression in custom environments.\n\n**Tidy evaluation**: quasiquotation, quosures and data masks\n\n## Evaluation basics \n\nWe use `eval()` to evaluate, run, or execute expressions. It requires two arguments: \n\n- `expr`: the object to evaluate, either an expression or a symbol.\n- `env`: the environment in which to evaluate the expression or where to look for the values. \nDefaults to current env.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsumexpr <- expr(x + y)\nx <- 10\ny <- 40\neval(sumexpr)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 50\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\neval(sumexpr, envir = env(x = 1000, y = 10))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1010\n```\n\n\n:::\n:::\n\n\n\n## Application: reimplementing `source()`\n\nWhat do we need?\n\n- Read the file being sourced. \n- Parse its expressions (quote them?)\n- Evaluate each expression saving the results \n- Return the results\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsource2 <- function(path, env = caller_env()) {\n file <- paste(readLines(path, warn = FALSE), collapse = \"\\n\")\n exprs <- parse_exprs(file)\n\n res <- NULL\n for (i in seq_along(exprs)) {\n res <- eval(exprs[[i]], env)\n }\n\n invisible(res)\n}\n```\n:::\n\n\nThe real source is much more complex.\n\n## Quosures\n\n**quosures** are a data structure from `rlang` containing both and expression and an environment\n\n*Quoting* + *closure* because it quotes the expression and encloses the environment.\n\nThree ways to create them:\n\n- Used mostly for learning: `new_quosure()`, creates a quosure from its components.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nq1 <- rlang::new_quosure(expr(x + y), \n env(x = 1, y = 10))\n```\n:::\n\n\nWith a quosure, we can use `eval_tidy()` directly. \n\n\n::: {.cell}\n\n```{.r .cell-code}\nrlang::eval_tidy(q1)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 11\n```\n\n\n:::\n:::\n\n\nAnd get its components\n\n\n::: {.cell}\n\n```{.r .cell-code}\nrlang::get_expr(q1)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> x + y\n```\n\n\n:::\n\n```{.r .cell-code}\nrlang::get_env(q1)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> <environment: 0x000001f161f6d698>\n```\n\n\n:::\n:::\n\n\nOr set them\n\n\n::: {.cell}\n\n```{.r .cell-code}\nq1 <- set_env(q1, env(x = 3, y = 4))\neval_tidy(q1)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 7\n```\n\n\n:::\n:::\n\n\n\n- Used in the real world: `enquo()` o `enquos()`, to capture user supplied expressions. They take the environment from where they're created. \n\n\n::: {.cell}\n\n```{.r .cell-code}\nfoo <- function(x) enquo(x)\nquo_foo <- foo(a + b)\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nget_expr(quo_foo)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> a + b\n```\n\n\n:::\n\n```{.r .cell-code}\nget_env(quo_foo)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> <environment: R_GlobalEnv>\n```\n\n\n:::\n:::\n\n\n- Almost never used: `quo()` and `quos()`, to match to `expr()` and `exprs()`.\n\n## Quosures and `...`\n\nQuosures are just a convenience, but they are essential when it comes to working with `...`, because you can have each argument from `...` associated with a different environment. \n\n\n::: {.cell}\n\n```{.r .cell-code}\ng <- function(...) {\n ## Creating our quosures from ...\n enquos(...)\n}\n\ncreateQuos <- function(...) {\n ## symbol from the function environment\n x <- 1\n g(..., f = x)\n}\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\n## symbol from the global environment\nx <- 0\nqs <- createQuos(global = x)\nqs\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> <list_of<quosure>>\n#> \n#> $global\n#> <quosure>\n#> expr: ^x\n#> env: global\n#> \n#> $f\n#> <quosure>\n#> expr: ^x\n#> env: 0x000001f15fc6f2e0\n```\n\n\n:::\n:::\n\n\n## Other facts about quosures\n\nFormulas were the inspiration for closures because they also capture an expression and an environment\n\n\n::: {.cell}\n\n```{.r .cell-code}\nf <- ~runif(3)\nstr(f)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> Class 'formula' language ~runif(3)\n#> ..- attr(*, \".Environment\")=<environment: R_GlobalEnv>\n```\n\n\n:::\n:::\n\n\nThere was an early version of tidy evaluation with formulas, but there's no easy way to implement quasiquotation with them. \n\nThey are actually call objects \n\n\n::: {.cell}\n\n```{.r .cell-code}\nq4 <- new_quosure(expr(x + y + z))\nclass(q4)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"quosure\" \"formula\"\n```\n\n\n:::\n\n```{.r .cell-code}\nis.call(q4)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] TRUE\n```\n\n\n:::\n:::\n\n\nwith an attribute to store the environment\n\n\n::: {.cell}\n\n```{.r .cell-code}\nattr(q4, \".Environment\")\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> <environment: R_GlobalEnv>\n```\n\n\n:::\n:::\n\n\n\n**Nested quosures**\n\nWith quosiquotation we can embed quosures in expressions. \n\n\n::: {.cell}\n\n```{.r .cell-code}\nq2 <- new_quosure(expr(x), env(x = 1))\nq3 <- new_quosure(expr(x), env(x = 100))\n\nnq <- expr(!!q2 + !!q3)\n```\n:::\n\n\nAnd evaluate them \n\n\n::: {.cell}\n\n```{.r .cell-code}\neval_tidy(nq)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 101\n```\n\n\n:::\n:::\n\n\nBut for printing it's better to use `expr_print(x)` \n\n\n::: {.cell}\n\n```{.r .cell-code}\nexpr_print(nq)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> (^x) + (^x)\n```\n\n\n:::\n\n```{.r .cell-code}\nnq\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> (~x) + ~x\n```\n\n\n:::\n:::\n\n\n## Data mask\n\nA data frame where the evaluated code will look first for its variable definitions. \n\nUsed in packages like dplyr and ggplot. \n\nTo use it we need to supply the data mask as a second argument to `eval_tidy()`\n\n\n::: {.cell}\n\n```{.r .cell-code}\nq1 <- new_quosure(expr(x * y), env(x = 100))\ndf <- data.frame(y = 1:10)\n\neval_tidy(q1, df)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 100 200 300 400 500 600 700 800 900 1000\n```\n\n\n:::\n:::\n\n\nEverything together, in one function. \n\n\n::: {.cell}\n\n```{.r .cell-code}\nwith2 <- function(data, expr) {\n expr <- enquo(expr)\n eval_tidy(expr, data)\n}\n```\n:::\n\n\nBut we need to create the objects that are not part of our data mask\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- 100\nwith2(df, x * y)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 100 200 300 400 500 600 700 800 900 1000\n```\n\n\n:::\n:::\n\n\nAlso doable with `base::eval()` instead of `rlang::eval_tidy()` but we have to use `base::substitute()` instead of `enquo()` (like we did for `enexpr()`) and we need to specify the environment.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nwith3 <- function(data, expr) {\n expr <- substitute(expr)\n eval(expr, data, caller_env())\n}\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nwith3(df, x*y)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 100 200 300 400 500 600 700 800 900 1000\n```\n\n\n:::\n:::\n\n\n## Pronouns: .data$ and .env$\n\n**Ambiguity!!**\n\nAn object value can come from the env or from the data mask\n\n\n::: {.cell}\n\n```{.r .cell-code}\nq1 <- new_quosure(expr(x * y + x), env = env(x = 1))\ndf <- data.frame(y = 1:5, \n x = 10)\n\neval_tidy(q1, df)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 20 30 40 50 60\n```\n\n\n:::\n:::\n\n\nWe use pronouns: \n\n- `.data$x`: `x` from the data mask\n- `.env$x`: `x` from the environment\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nq1 <- new_quosure(expr(.data$x * y + .env$x), env = env(x = 1))\neval_tidy(q1, df)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 11 21 31 41 51\n```\n\n\n:::\n:::\n\n\n## Application: reimplementing `base::subset()`\n\n`base::subset()` works like `dplyr::filter()`: it selects rows of a data frame given an expression. \n\nWhat do we need?\n\n- Quote the expression to filter\n- Figure out which rows in the data frame pass the filter\n- Subset the data frame\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsubset2 <- function(data, rows) {\n rows <- enquo(rows)\n rows_val <- eval_tidy(rows, data)\n stopifnot(is.logical(rows_val))\n\n data[rows_val, , drop = FALSE]\n}\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsample_df <- data.frame(a = 1:5, b = 5:1, c = c(5, 3, 2, 4, 1))\n\n# Shorthand for sample_df[sample_df$b == sample_df$c, ]\nsubset2(sample_df, b == c)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> a b c\n#> 1 1 5 5\n#> 5 5 1 1\n```\n\n\n:::\n:::\n\n\n## Using tidy evaluation\n\nMost of the time we might not call it directly, but call a function that uses `eval_tidy()` (becoming developer AND user)\n\n**Use case**: resample and subset\n\nWe have a function that resamples a dataset: \n\n\n::: {.cell}\n\n```{.r .cell-code}\nresample <- function(df, n) {\n idx <- sample(nrow(df), n, replace = TRUE)\n df[idx, , drop = FALSE]\n}\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nresample(sample_df, 10)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> a b c\n#> 4 4 2 4\n#> 3 3 3 2\n#> 1 1 5 5\n#> 1.1 1 5 5\n#> 3.1 3 3 2\n#> 5 5 1 1\n#> 5.1 5 1 1\n#> 3.2 3 3 2\n#> 5.2 5 1 1\n#> 4.1 4 2 4\n```\n\n\n:::\n:::\n\n\nBut we also want to use subset and we want to create a function that allow us to resample and subset (with `subset2()`) in a single step. \n\nFirst attempt: \n\n\n::: {.cell}\n\n```{.r .cell-code}\nsubsample <- function(df, cond, n = nrow(df)) {\n df <- subset2(df, cond)\n resample(df, n)\n}\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsubsample(sample_df, b == c, 10)\n```\n\n::: {.cell-output .cell-output-error}\n\n```\n#> Error: object 'b' not found\n```\n\n\n:::\n:::\n\n\nWhat happened? \n\n`subsample()` doesn't quote any arguments and `cond` is evaluated normally\n\nSo we have to quote `cond` and unquote it when we pass it to `subset2()`\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsubsample <- function(df, cond, n = nrow(df)) {\n cond <- enquo(cond)\n\n df <- subset2(df, !!cond)\n resample(df, n)\n}\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsubsample(sample_df, b == c, 10)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> a b c\n#> 5 5 1 1\n#> 5.1 5 1 1\n#> 5.2 5 1 1\n#> 1 1 5 5\n#> 5.3 5 1 1\n#> 1.1 1 5 5\n#> 5.4 5 1 1\n#> 5.5 5 1 1\n#> 1.2 1 5 5\n#> 1.3 1 5 5\n```\n\n\n:::\n:::\n\n\n**Be careful!**, potential ambiguity:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nthreshold_x <- function(df, val) {\n subset2(df, x >= val)\n}\n```\n:::\n\n\nWhat would happen if `x` exists in the calling environment but doesn't exist in `df`? Or if `val` also exists in `df`?\n\nSo, as developers of `threshold_x()` and users of `subset2()`, we have to add some pronouns:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nthreshold_x <- function(df, val) {\n subset2(df, .data$x >= .env$val)\n}\n```\n:::\n\n\n\nJust remember: \n\n> As a general rule of thumb, as a function author it’s your responsibility \n> to avoid ambiguity with any expressions that you create; \n> it’s the user’s responsibility to avoid ambiguity in expressions that they create.\n\n\n## Base evaluation\n\nCheck 20.6 in the book!\n", + "supporting": [ + "20_files" + ], "filters": [ "rmarkdown/pagebreak.lua" ], diff --git a/_freeze/slides/21/execute-results/html.json b/_freeze/slides/21/execute-results/html.json @@ -1,9 +1,11 @@ { - "hash": "067e821481bf80586dcb97aa1113e54f", + "hash": "c596baeea77da01b0a3fcca05af29a1f", "result": { "engine": "knitr", - "markdown": "---\nengine: knitr\ntitle: Translating R code\n---\n\n## Learning objectives:\n\n- Build DSL (domain specific languages) to aid interoperability between R, HTML and LaTeX\n- Reinforce metaprogramming concepts (expressions, quasiquotation, evaluation)\n\n\n::: {.cell}\n::: {.cell-output-display}\n\n```{=html}\n<div class=\"DiagrammeR html-widget html-fill-item\" id=\"htmlwidget-ef79d3ab7e051da14c53\" style=\"width:100%;height:464px;\"></div>\n<script type=\"application/json\" data-for=\"htmlwidget-ef79d3ab7e051da14c53\">{\"x\":{\"diagram\":\"\\ngraph LR\\n\\nexpressions --> R\\nquasiquotation --> R\\nevaluation --> R\\n\\nR --> HTML\\nR --> LaTeX\\n\"},\"evals\":[],\"jsHooks\":[]}</script>\n```\n\n:::\n:::\n\n\n<details>\n<summary>Mermaid code</summary>\n\n::: {.cell}\n\n```{.r .cell-code}\nDiagrammeR::mermaid(\"\ngraph LR\n\nexpressions --> R\nquasiquotation --> R\nevaluation --> R\n\nR --> HTML\nR --> LaTeX\n\")\n```\n:::\n\n\n</details>\n\n<details>\n<summary>Session Info</summary>\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(DiagrammeR) #for Mermaid flowchart\nlibrary(lobstr) #abstract syntax trees\nlibrary(purrr) #functional programming\nlibrary(rlang) #tidy evaluation\n\n# from section 18.5\nexpr_type <- function(x) {\n if (rlang::is_syntactic_literal(x)) {\n \"constant\"\n } else if (is.symbol(x)) {\n \"symbol\"\n } else if (is.call(x)) {\n \"call\"\n } else if (is.pairlist(x)) {\n \"pairlist\"\n } else {\n typeof(x)\n }\n}\nflat_map_chr <- function(.x, .f, ...) {\n purrr::flatten_chr(purrr::map(.x, .f, ...))\n}\nswitch_expr <- function(x, ...) {\n switch(expr_type(x),\n ...,\n stop(\"Don't know how to handle type \", typeof(x), call. = FALSE)\n )\n}\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nutils::sessionInfo()\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> R version 4.5.1 (2025-06-13 ucrt)\n#> Platform: x86_64-w64-mingw32/x64\n#> Running under: Windows 11 x64 (build 26100)\n#> \n#> Matrix products: default\n#> LAPACK version 3.12.1\n#> \n#> locale:\n#> [1] LC_COLLATE=English_United States.utf8 \n#> [2] LC_CTYPE=English_United States.utf8 \n#> [3] LC_MONETARY=English_United States.utf8\n#> [4] LC_NUMERIC=C \n#> [5] LC_TIME=English_United States.utf8 \n#> \n#> time zone: America/Chicago\n#> tzcode source: internal\n#> \n#> attached base packages:\n#> [1] stats graphics grDevices utils datasets methods base \n#> \n#> other attached packages:\n#> [1] rlang_1.1.6 purrr_1.1.0 lobstr_1.1.2 DiagrammeR_1.0.11\n#> \n#> loaded via a namespace (and not attached):\n#> [1] digest_0.6.37 RColorBrewer_1.1-3 R6_2.6.1 fastmap_1.2.0 \n#> [5] xfun_0.52 magrittr_2.0.3 glue_1.8.0 knitr_1.50 \n#> [9] htmltools_0.5.8.1 rmarkdown_2.29 lifecycle_1.0.4 cli_3.6.5 \n#> [13] visNetwork_2.1.2 vctrs_0.6.5 compiler_4.5.1 tools_4.5.1 \n#> [17] evaluate_1.0.4 yaml_2.3.10 jsonlite_2.0.0 htmlwidgets_1.6.4 \n#> [21] keyring_1.4.1\n```\n\n\n:::\n:::\n\n\n</details>\n\n## Case Study: MCQ\n\nWe are going to use R code to generate HTML or LaTeX to produce multiple-choice questions such as\n\n### Pop Quiz!\n\n1. What is the **derivative** of $f(x) = 1 + 2\\cos(3\\pi x + 4)$?\n\n a. $f'(x) = 6\\pi\\sin(3\\pi x + 4)$\n b. $f'(x) = -6\\pi\\sin(3\\pi x + 4)$\n c. $f'(x) = 24\\pi\\sin(3\\pi x + 4)$\n d. $f'(x) = -24\\pi\\sin(3\\pi x + 4)$\n\n\n\n\n\n---\n\nAs developers, we may be asking ourselves:\n\n* What are the expressions?\n* What are the symbols?\n* Will we have to quote inputs from the user (math teacher)?\n\n\n## HTML\n\nWe are trying to produce\n\n```{}\n<body>\n <h1 id = 'pop_quiz'>Pop Quiz</h1>\n <ol>\n <li>What is the <b>derivative</b> of $f(x) = 1 + 2\\cos(3\\pi x + 4)$?</li>\n <ol>\n <li>$f'(x) = 6\\pi\\sin(3\\pi x + 4)$</li>\n <li>$f'(x) = -6\\pi\\sin(3\\pi x + 4)$</li>\n <li>$f'(x) = 24\\pi\\sin(3\\pi x + 4)$</li>\n <li>$f'(x) = -24\\pi\\sin(3\\pi x + 4)$</li>\n </ol>\n </ol>\n <img src = 'calculus_cat.png' width = '100' height = '100' />\n</body>\n```\n\nusing DSL\n\n\n::: {.cell}\n\n```{.r .cell-code}\nwith_html(\n body(\n h1(\"Pop quiz!\", id = \"pop_quiz\"),\n ol(\n li(\"What is the \", b(\"derivative\"), \"of $f(x) = 1 + 2cos(3pi x + 4)$?\"),\n ol(\n li(\"$f'(x) = 6pi*sin(3pi x + 4)$\"),\n li(\"$f'(x) = -6pi*sin(3pi x + 4)$\"),\n li(\"$f'(x) = 24pi*sin(3pi x + 4)$\"),\n li(\"$f'(x) = -24pi*sin(3pi x + 4)$\")\n )\n ),\n img(src = \"images/translating/calculus_cat.png\", width = 100, height = 100)\n )\n)\n```\n:::\n\n\nIn particular,\n\n* **tags** such as `<b></b>` have *attributes*\n* **void tags** such as `<img />`\n* special characters: `&`, `<`, and `>`\n\n\n<details>\n<summary>HTML verification</summary>\n\n```{=html}\n<body>\n <h1 id = 'pop_quiz'>Pop Quiz</h1>\n <ol>\n <li>What is the <b>derivative</b> of $f(x) = 1 + 2\\cos(3\\pi x + 4)$?</li>\n <ol>\n <li>$f'(x) = 6\\pi\\sin(3\\pi x + 4)$</li>\n <li>$f'(x) = -6\\pi\\sin(3\\pi x + 4)$</li>\n <li>$f'(x) = 24\\pi\\sin(3\\pi x + 4)$</li>\n <li>$f'(x) = -24\\pi\\sin(3\\pi x + 4)$</li>\n </ol>\n </ol>\n <img src = 'images/translating/calculus_cat.png' width = '100' height = '100' />\n</body>\n```\n\n</details>\n\n\n## Escaping\n\n* need to escape `&`, `<`, and `>`\n* don't \"double escape\"\n* leave HTML alone\n\n### S3 Class\n\n\n::: {.cell}\n\n```{.r .cell-code}\nhtml <- function(x) structure(x, class = \"advr_html\")\n\n#dispatch\nprint.advr_html <- function(x, ...) {\n out <- paste0(\"<HTML> \", x)\n cat(paste(strwrap(out), collapse = \"\\n\"), \"\\n\", sep = \"\")\n}\n```\n:::\n\n\n### Generic\n\n\n::: {.cell}\n\n```{.r .cell-code}\nescape <- function(x) UseMethod(\"escape\")\nescape.character <- function(x) {\n x <- gsub(\"&\", \"&\", x)\n x <- gsub(\"<\", \"<\", x)\n x <- gsub(\">\", \">\", x)\n html(x)\n}\nescape.advr_html <- function(x) x\n```\n:::\n\n\n### Checks\n\n\n::: {.cell}\n\n```{.r .cell-code}\nescape(\"This is some text.\")\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> <HTML> This is some text.\n```\n\n\n:::\n\n```{.r .cell-code}\nescape(\"x > 1 & y < 2\")\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> <HTML> x > 1 & y < 2\n```\n\n\n:::\n\n```{.r .cell-code}\nescape(escape(\"This is some text. 1 > 2\")) #double escape\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> <HTML> This is some text. 1 > 2\n```\n\n\n:::\n\n```{.r .cell-code}\nescape(html(\"<hr />\")) #already html\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> <HTML> <hr />\n```\n\n\n:::\n:::\n\n\n\n## Named Components\n\n```{}\nli(\"What is the \", b(\"derivative\"), \"of $f(x) = 1 + 2\\cos(3\\pi x + 4)$?\")\n```\n\n* aiming to classify `li` and `b` as **named components**\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndots_partition <- function(...) {\n dots <- list2(...)\n \n if (is.null(names(dots))) {\n is_named <- rep(FALSE, length(dots))\n} else {\n is_named <- names(dots) != \"\"\n}\n \n list(\n named = dots[is_named],\n unnamed = dots[!is_named]\n )\n}\n```\n:::\n\n\n### Check\n\n\n::: {.cell}\n\n```{.r .cell-code}\nstr(dots_partition(company = \"Posit\",\n software = \"RStudio\",\n \"DSLC\",\n \"Cohort 9\"))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> List of 2\n#> $ named :List of 2\n#> ..$ company : chr \"Posit\"\n#> ..$ software: chr \"RStudio\"\n#> $ unnamed:List of 2\n#> ..$ : chr \"DSLC\"\n#> ..$ : chr \"Cohort 9\"\n```\n\n\n:::\n:::\n\n\n<details>\n<summary>HTML Attributes</summary>\n\nFound among the textbook's [source code](https://github.com/hadley/adv-r/blob/master/dsl-html-attributes.r)\n\n\n::: {.cell}\n\n```{.r .cell-code}\nhtml_attributes <- function(list) {\n if (length(list) == 0) return(\"\")\n\n attr <- map2_chr(names(list), list, html_attribute)\n paste0(\" \", unlist(attr), collapse = \"\")\n}\nhtml_attribute <- function(name, value = NULL) {\n if (length(value) == 0) return(name) # for attributes with no value\n if (length(value) != 1) stop(\"`value` must be NULL or length 1\")\n\n if (is.logical(value)) {\n # Convert T and F to true and false\n value <- tolower(value)\n } else {\n value <- escape_attr(value)\n }\n paste0(name, \"='\", value, \"'\")\n}\nescape_attr <- function(x) {\n x <- escape.character(x)\n x <- gsub(\"\\'\", ''', x)\n x <- gsub(\"\\\"\", '"', x)\n x <- gsub(\"\\r\", '
', x)\n x <- gsub(\"\\n\", '
', x)\n x\n}\n```\n:::\n\n\n\n</details>\n\n\n## Tags (calls)\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntag <- function(tag) {\n new_function(\n exprs(... = ), #arguments of new function\n expr({ #body of the new function\n \n #classify tags as named components\n dots <- dots_partition(...)\n \n #focus on named components as the tags\n attribs <- html_attributes(dots$named)\n \n # otherwise, nested code\n children <- map_chr(dots$unnamed, escape)\n\n # paste brackets, tag names, and attributes together\n # then unquote user arguments\n html(paste0(\n !!paste0(\"<\", tag), attribs, \">\",\n paste(children, collapse = \"\"),\n !!paste0(\"</\", tag, \">\")\n ))\n }),\n caller_env() #return the environment\n )\n}\n```\n:::\n\n\n<details>\n<summary>Void tags</summary>\n\n\n::: {.cell}\n\n```{.r .cell-code}\nvoid_tag <- function(tag) {\n new_function(\n exprs(... = ), #allows for missing arguments\n expr({\n dots <- dots_partition(...)\n \n # error check\n if (length(dots$unnamed) > 0) {\n abort(!!paste0(\"<\", tag, \"> must not have unnamed arguments\"))\n }\n attribs <- html_attributes(dots$named)\n\n html(paste0(!!paste0(\"<\", tag), attribs, \" />\"))\n }),\n caller_env()\n )\n}\n```\n:::\n\n\n</details>\n\n### Checks\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntag(\"ol\")\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> function (...) \n#> {\n#> dots <- dots_partition(...)\n#> attribs <- html_attributes(dots$named)\n#> children <- map_chr(dots$unnamed, escape)\n#> html(paste0(\"<ol\", attribs, \">\", paste(children, collapse = \"\"), \n#> \"</ol>\"))\n#> }\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nimg <- void_tag(\"img\")\n```\n:::\n\n\n\n```{.r .cell-code}\nimg()\n```\n\n<HTML> <img />\n\n\n::: {.cell}\n\n```{.r .cell-code}\nimg(src = \"images/translating/calculus_cat.png\",\n width = 100,\n height = 100)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> <HTML> <img src='images/translating/calculus_cat.png' width='100'\n#> height='100' />\n```\n\n\n:::\n:::\n\n\n\n## Tags (processing)\n\n<details>\n<summary>Venn Diagram</summary>\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntags <- c(\"a\", \"abbr\", \"address\", \"article\", \"aside\", \"audio\",\n \"b\",\"bdi\", \"bdo\", \"blockquote\", \"body\", \"button\", \"canvas\",\n \"caption\",\"cite\", \"code\", \"colgroup\", \"data\", \"datalist\",\n \"dd\", \"del\",\"details\", \"dfn\", \"div\", \"dl\", \"dt\", \"em\",\n \"eventsource\",\"fieldset\", \"figcaption\", \"figure\", \"footer\",\n \"form\", \"h1\", \"h2\", \"h3\", \"h4\", \"h5\", \"h6\", \"head\", \"header\",\n \"hgroup\", \"html\", \"i\",\"iframe\", \"ins\", \"kbd\", \"label\",\n \"legend\", \"li\", \"mark\", \"map\",\"menu\", \"meter\", \"nav\",\n \"noscript\", \"object\", \"ol\", \"optgroup\", \"option\", \"output\",\n \"p\", \"pre\", \"progress\", \"q\", \"ruby\", \"rp\",\"rt\", \"s\", \"samp\",\n \"script\", \"section\", \"select\", \"small\", \"span\", \"strong\",\n \"style\", \"sub\", \"summary\", \"sup\", \"table\", \"tbody\", \"td\",\n \"textarea\", \"tfoot\", \"th\", \"thead\", \"time\", \"title\", \"tr\",\n \"u\", \"ul\", \"var\", \"video\"\n)\n\nvoid_tags <- c(\"area\", \"base\", \"br\", \"col\", \"command\", \"embed\",\n \"hr\", \"img\", \"input\", \"keygen\", \"link\", \"meta\", \"param\",\n \"source\", \"track\", \"wbr\"\n)\n```\n:::\n\n\n</details>\n\n\n::: {.cell}\n\n```{.r .cell-code}\nhtml_tags <- c(\n tags |> #list of tag names from HTML\n set_names() |> #named variable to avoid reserved words!\n map(tag), #make them function calls\n void_tags |>\n set_names() |>\n map(void_tag)\n)\n```\n:::\n\n\n\n### Example\n\n\n::: {.cell}\n\n```{.r .cell-code}\nhtml_tags$ol(\n html_tags$li(\"What is the \", \n html_tags$b(\"derivative\"),\n \"of $f(x) = 1 + 2cos(3pi x + 4)$?\"))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> <HTML> <ol><li>What is the <b>derivative</b>of $f(x) = 1 + 2cos(3pi x +\n#> 4)$?</li></ol>\n```\n\n\n:::\n:::\n\n\n\n## Bringing the HTML Together\n\n\n::: {.cell}\n\n```{.r .cell-code}\nwith_html <- function(code) {\n eval_tidy(enquo(code), html_tags)\n}\n```\n:::\n\n\n### Main Example\n\n\n::: {.cell}\n\n```{.r .cell-code}\nwith_html(\n body(\n h1(\"Pop quiz!\", id = \"pop_quiz\"),\n ol(\n li(\"What is the \", b(\"derivative\"), \"of $f(x) = 1 + 2cos(3pi x + 4)$?\"),\n ol(\n li(\"$f'(x) = 6pi*sin(3pi x + 4)$\"),\n li(\"$f'(x) = -6pi*sin(3pi x + 4)$\"),\n li(\"$f'(x) = 24pi*sin(3pi x + 4)$\"),\n li(\"$f'(x) = -24pi*sin(3pi x + 4)$\")\n )\n ),\n img(src = \"images/translating/calculus_cat.png\", width = 100, height = 100)\n )\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> <HTML> <body><h1 id='pop_quiz'>Pop quiz!</h1><ol><li>What is the\n#> <b>derivative</b>of $f(x) = 1 + 2cos(3pi x + 4)$?</li><ol><li>$f'(x) =\n#> 6pi*sin(3pi x + 4)$</li><li>$f'(x) = -6pi*sin(3pi x +\n#> 4)$</li><li>$f'(x) = 24pi*sin(3pi x + 4)$</li><li>$f'(x) =\n#> -24pi*sin(3pi x + 4)$</li></ol></ol><img\n#> src='images/translating/calculus_cat.png' width='100' height='100'\n#> /></body>\n```\n\n\n:::\n:::\n\n\n### Check\n\n```{=html}\n<h1 id='pop_quiz'>Pop quiz!</h1><ol><li>What is the <b>derivative</b> of $f(x) = 1 + 2cos(3pi x + 4)$?</li><ol><li>$f'(x) = 6pi*sin(3pi x + 4)$</li><li>$f'(x) = -6pi*sin(3pi x + 4)$</li><li>$f'(x) = 24pi*sin(3pi x + 4)$</li><li>$f'(x) = -24pi*sin(3pi x + 4)$</li></ol></ol><img src='images/translating/calculus_cat.png' width='100' height='100' />\n```\n\n\n## LaTeX\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlatex <- function(x) structure(x, class = \"advr_latex\")\nprint.advr_latex <- function(x) { cat(\"<LATEX> \", x, \"\\n\", sep = \"\") }\n```\n:::\n\n\n### to_math\n\n\n::: {.cell}\n\n```{.r .cell-code}\nto_math <- function(x) {\n expr <- enexpr(x)\n latex( #return LaTeX code\n eval_bare( #eval_bare to ensure use of latex environment \n expr, #expression (not quosure)\n latex_env(expr) #need to define latex_env\n ))\n}\n```\n:::\n\n\n## Known Symbols\n\n\n::: {.cell}\n\n```{.r .cell-code}\ngreek_letters <- c(\n \"alpha\", \"beta\", \"chi\", \"delta\", \"Delta\", \"epsilon\", \"eta\", \n\"gamma\", \"Gamma\", \"iota\", \"kappa\", \"lambda\", \"Lambda\", \"mu\", \n\"nu\", \"omega\", \"Omega\", \"phi\", \"Phi\", \"pi\", \"Pi\", \"psi\", \"Psi\", \n\"rho\", \"sigma\", \"Sigma\", \"tau\", \"theta\", \"Theta\", \"upsilon\", \n\"Upsilon\", \"varepsilon\", \"varphi\", \"varrho\", \"vartheta\", \"xi\", \n\"Xi\", \"zeta\"\n)\n\ngreek_env <- rlang::as_environment(\n rlang::set_names(\n paste0(\"\\\\\", greek_letters), #latex values\n greek_letters #R names\n )\n)\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nstr(as.list(greek_env))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> List of 38\n#> $ zeta : chr \"\\\\zeta\"\n#> $ Xi : chr \"\\\\Xi\"\n#> $ xi : chr \"\\\\xi\"\n#> $ vartheta : chr \"\\\\vartheta\"\n#> $ varrho : chr \"\\\\varrho\"\n#> $ varphi : chr \"\\\\varphi\"\n#> $ varepsilon: chr \"\\\\varepsilon\"\n#> $ Upsilon : chr \"\\\\Upsilon\"\n#> $ upsilon : chr \"\\\\upsilon\"\n#> $ Theta : chr \"\\\\Theta\"\n#> $ theta : chr \"\\\\theta\"\n#> $ tau : chr \"\\\\tau\"\n#> $ Sigma : chr \"\\\\Sigma\"\n#> $ sigma : chr \"\\\\sigma\"\n#> $ rho : chr \"\\\\rho\"\n#> $ Psi : chr \"\\\\Psi\"\n#> $ psi : chr \"\\\\psi\"\n#> $ Pi : chr \"\\\\Pi\"\n#> $ pi : chr \"\\\\pi\"\n#> $ Phi : chr \"\\\\Phi\"\n#> $ phi : chr \"\\\\phi\"\n#> $ Omega : chr \"\\\\Omega\"\n#> $ omega : chr \"\\\\omega\"\n#> $ nu : chr \"\\\\nu\"\n#> $ mu : chr \"\\\\mu\"\n#> $ Lambda : chr \"\\\\Lambda\"\n#> $ lambda : chr \"\\\\lambda\"\n#> $ kappa : chr \"\\\\kappa\"\n#> $ iota : chr \"\\\\iota\"\n#> $ Gamma : chr \"\\\\Gamma\"\n#> $ gamma : chr \"\\\\gamma\"\n#> $ eta : chr \"\\\\eta\"\n#> $ epsilon : chr \"\\\\epsilon\"\n#> $ Delta : chr \"\\\\Delta\"\n#> $ delta : chr \"\\\\delta\"\n#> $ chi : chr \"\\\\chi\"\n#> $ beta : chr \"\\\\beta\"\n#> $ alpha : chr \"\\\\alpha\"\n```\n\n\n:::\n:::\n\n\n\n## Known Functions\n\n### Unary Operations\n\n\n::: {.cell}\n\n```{.r .cell-code}\nunary_op <- function(left, right) {\n new_function(\n exprs(e1 = ),\n expr(\n paste0(!!left, e1, !!right)\n ),\n caller_env()\n )\n}\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\n#example\nunary_op(\"\\\\sqrt{\", \"}\")\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> function (e1) \n#> paste0(\"\\\\sqrt{\", e1, \"}\")\n```\n\n\n:::\n:::\n\n\n### Binary Operations\n\n\n::: {.cell}\n\n```{.r .cell-code}\nbinary_op <- function(sep) {\n new_function(\n exprs(e1 = , e2 = ),\n expr(\n paste0(e1, !!sep, e2)\n ),\n caller_env()\n )\n}\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\n#example\nbinary_op(\"+\")\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> function (e1, e2) \n#> paste0(e1, \"+\", e2)\n```\n\n\n:::\n:::\n\n\n<details>\n<summary>Even more LaTeX syntax</summary>\n\n\n::: {.cell}\n\n```{.r .cell-code}\nknown_func_env <- child_env(\n .parent = empty_env(),\n \n # Binary operators\n `+` = binary_op(\" + \"),\n `-` = binary_op(\" - \"),\n `*` = binary_op(\" * \"),\n `/` = binary_op(\" / \"),\n `^` = binary_op(\"^\"),\n `[` = binary_op(\"_\"),\n\n # Grouping\n `{` = unary_op(\"\\\\left{ \", \" \\\\right}\"),\n `(` = unary_op(\"\\\\left( \", \" \\\\right)\"),\n paste = paste,\n\n # Other math functions\n sqrt = unary_op(\"\\\\sqrt{\", \"}\"),\n sin = unary_op(\"\\\\sin(\", \")\"),\n cos = unary_op(\"\\\\cos(\", \")\"),\n tan = unary_op(\"\\\\tan(\", \")\"),\n log = unary_op(\"\\\\log(\", \")\"),\n abs = unary_op(\"\\\\left| \", \"\\\\right| \"),\n frac = function(a, b) {\n paste0(\"\\\\frac{\", a, \"}{\", b, \"}\")\n },\n\n # Labelling\n hat = unary_op(\"\\\\hat{\", \"}\"),\n tilde = unary_op(\"\\\\tilde{\", \"}\")\n)\n```\n:::\n\n\n</details>\n\n\n## Unknown Symbols\n\n\n::: {.cell}\n\n```{.r .cell-code}\nnames_grabber <- function(x) {\n switch_expr(x,\n constant = character(),\n symbol = as.character(x),\n call = flat_map_chr(as.list(x[-1]), names_grabber)\n ) |>\n unique()\n}\n```\n:::\n\n\n$$x + y + f(a, b, c, 10)$$\n\n\n::: {.cell}\n\n```{.r .cell-code}\nnames_grabber(expr(x + y + f(a, b, c, 10)))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"x\" \"y\" \"a\" \"b\" \"c\"\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlobstr::ast(expr(x + y + f(a, b, c, 10)))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> █─expr \n#> └─█─`+` \n#> ├─█─`+` \n#> │ ├─x \n#> │ └─y \n#> └─█─f \n#> ├─a \n#> ├─b \n#> ├─c \n#> └─10\n```\n\n\n:::\n:::\n\n\n\n## Unknown Functions\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncalls_grabber <- function(x) {\n switch_expr(x,\n constant = ,\n symbol = character(),\n call = {\n fname <- as.character(x[[1]])\n children <- flat_map_chr(as.list(x[-1]), calls_grabber)\n c(fname, children)\n }\n ) |>\n unique()\n}\n```\n:::\n\n\n$$f(g + b, c, d(a))$$\n\n\n::: {.cell}\n\n```{.r .cell-code}\nnames_grabber(expr(f(g + b, c, d(a))))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"g\" \"b\" \"c\" \"a\"\n```\n\n\n:::\n\n```{.r .cell-code}\ncalls_grabber(expr(f(g + b, c, d(a))))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"f\" \"+\" \"d\"\n```\n\n\n:::\n\n```{.r .cell-code}\nlobstr::ast(expr(f(g + b, c, d(a))))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> █─expr \n#> └─█─f \n#> ├─█─`+` \n#> │ ├─g \n#> │ └─b \n#> ├─c \n#> └─█─d \n#> └─a\n```\n\n\n:::\n:::\n\n\n---\n\n\n::: {.cell}\n\n```{.r .cell-code}\nseek_closure <- function(op) {\n # change math font for function names\n # apply ending parenthesis\n new_function(\n exprs(... = ),\n expr({\n contents <- paste(..., collapse = \", \")\n paste0(!!paste0(\"\\\\mathrm{\", op, \"}(\"), contents, \")\")\n })\n )\n}\n```\n:::\n\n\n## Bringing the LaTeX Together\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlatex_env <- function(expr) {\n \n # Unknown Functions\n calls <- calls_grabber(expr)\n call_list <- map(set_names(calls), seek_closure)\n call_env <- as_environment(call_list)\n\n # Known Functions\n known_func_env <- env_clone(known_func_env, call_env)\n\n # Unknown Symbols\n names <- names_grabber(expr)\n symbol_env <- as_environment(set_names(names), parent = known_func_env)\n\n # Known symbols\n greek_env <- env_clone(greek_env, parent = symbol_env)\n greek_env\n}\n\nto_math <- function(x) {\n expr <- enexpr(x)\n latex( #return LaTeX code\n eval_bare( #eval_bare to ensure use of latex environment \n expr, #expression (not quosure)\n latex_env(expr) #need to define latex_env\n ))\n}\n```\n:::\n\n\n### Check\n\n\n::: {.cell}\n\n```{.r .cell-code}\nto_math(sin(pi) + f(a))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> <LATEX> \\sin(\\pi) + \\mathrm{f}(a)\n```\n\n\n:::\n:::\n\n\n## Finishing the Example\n\n(TO DO)\n\n\n\n\n\n## Meeting Videos\n\n### Cohort 1\n\n<iframe src=\"https://www.youtube.com/embed/fixyitpXrwY\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n<iframe src=\"https://www.youtube.com/embed/h3RNPyhIjas\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 2\n\n<iframe src=\"https://www.youtube.com/embed/pj0hTW1CtbI\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 3\n\n(no video)\n\n### Cohort 4\n\n<iframe src=\"https://www.youtube.com/embed/0TclsXa085Y\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 5\n\n<iframe src=\"https://www.youtube.com/embed/v_dkrIEdmKE\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 6\n\n<iframe src=\"https://www.youtube.com/embed/_-uwFjO5CyM\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n<details>\n<summary> Meeting chat log </summary>\n\n```\n00:30:16\tArthur Shaw:\thttps://www.w3schools.com/html/html_entities.asp\n00:32:29\tArthur Shaw:\tBeta symbol in HTML: Β\n00:56:55\tArthur Shaw:\thttps://dbplyr.tidyverse.org/articles/translation-function.html\n00:57:48\tArthur Shaw:\thttps://dtplyr.tidyverse.org/index.html\n00:58:43\tArthur Shaw:\thttps://dtplyr.tidyverse.org/articles/translation.html\n```\n</details>\n", - "supporting": [], + "markdown": "---\nengine: knitr\ntitle: Translating R code\n---\n\n## Learning objectives:\n\n- Build DSL (domain specific languages) to aid interoperability between R, HTML and LaTeX\n- Reinforce metaprogramming concepts (expressions, quasiquotation, evaluation)\n\n\n::: {.cell}\n::: {.cell-output-display}\n\n```{=html}\n<div class=\"DiagrammeR html-widget html-fill-item\" id=\"htmlwidget-4e2b51612b3b0cac2d87\" style=\"width:100%;height:464px;\"></div>\n<script type=\"application/json\" data-for=\"htmlwidget-4e2b51612b3b0cac2d87\">{\"x\":{\"diagram\":\"\\ngraph LR\\n\\nexpressions --> R\\nquasiquotation --> R\\nevaluation --> R\\n\\nR --> HTML\\nR --> LaTeX\\n\"},\"evals\":[],\"jsHooks\":[]}</script>\n```\n\n:::\n:::\n\n\n<details>\n<summary>Mermaid code</summary>\n\n::: {.cell}\n\n```{.r .cell-code}\nDiagrammeR::mermaid(\"\ngraph LR\n\nexpressions --> R\nquasiquotation --> R\nevaluation --> R\n\nR --> HTML\nR --> LaTeX\n\")\n```\n:::\n\n\n</details>\n\n<details>\n<summary>Session Info</summary>\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(DiagrammeR) #for Mermaid flowchart\nlibrary(lobstr) #abstract syntax trees\nlibrary(purrr) #functional programming\nlibrary(rlang) #tidy evaluation\n\n# from section 18.5\nexpr_type <- function(x) {\n if (rlang::is_syntactic_literal(x)) {\n \"constant\"\n } else if (is.symbol(x)) {\n \"symbol\"\n } else if (is.call(x)) {\n \"call\"\n } else if (is.pairlist(x)) {\n \"pairlist\"\n } else {\n typeof(x)\n }\n}\nflat_map_chr <- function(.x, .f, ...) {\n purrr::flatten_chr(purrr::map(.x, .f, ...))\n}\nswitch_expr <- function(x, ...) {\n switch(expr_type(x),\n ...,\n stop(\"Don't know how to handle type \", typeof(x), call. = FALSE)\n )\n}\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nutils::sessionInfo()\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> R version 4.5.1 (2025-06-13 ucrt)\n#> Platform: x86_64-w64-mingw32/x64\n#> Running under: Windows 11 x64 (build 26100)\n#> \n#> Matrix products: default\n#> LAPACK version 3.12.1\n#> \n#> locale:\n#> [1] LC_COLLATE=English_United States.utf8 \n#> [2] LC_CTYPE=English_United States.utf8 \n#> [3] LC_MONETARY=English_United States.utf8\n#> [4] LC_NUMERIC=C \n#> [5] LC_TIME=English_United States.utf8 \n#> \n#> time zone: America/Chicago\n#> tzcode source: internal\n#> \n#> attached base packages:\n#> [1] stats graphics grDevices utils datasets methods base \n#> \n#> other attached packages:\n#> [1] rlang_1.1.6 purrr_1.1.0 lobstr_1.1.2 DiagrammeR_1.0.11\n#> \n#> loaded via a namespace (and not attached):\n#> [1] digest_0.6.37 RColorBrewer_1.1-3 R6_2.6.1 fastmap_1.2.0 \n#> [5] xfun_0.52 magrittr_2.0.3 glue_1.8.0 knitr_1.50 \n#> [9] htmltools_0.5.8.1 rmarkdown_2.29 lifecycle_1.0.4 cli_3.6.5 \n#> [13] visNetwork_2.1.2 vctrs_0.6.5 compiler_4.5.1 tools_4.5.1 \n#> [17] evaluate_1.0.4 yaml_2.3.10 jsonlite_2.0.0 htmlwidgets_1.6.4 \n#> [21] keyring_1.4.1\n```\n\n\n:::\n:::\n\n\n</details>\n\n## Case Study: MCQ\n\nWe are going to use R code to generate HTML or LaTeX to produce multiple-choice questions such as\n\n### Pop Quiz!\n\n1. What is the **derivative** of $f(x) = 1 + 2\\cos(3\\pi x + 4)$?\n\n a. $f'(x) = 6\\pi\\sin(3\\pi x + 4)$\n b. $f'(x) = -6\\pi\\sin(3\\pi x + 4)$\n c. $f'(x) = 24\\pi\\sin(3\\pi x + 4)$\n d. $f'(x) = -24\\pi\\sin(3\\pi x + 4)$\n\n\n\n\n\n---\n\nAs developers, we may be asking ourselves:\n\n* What are the expressions?\n* What are the symbols?\n* Will we have to quote inputs from the user (math teacher)?\n\n\n## HTML\n\nWe are trying to produce\n\n```{}\n<body>\n <h1 id = 'pop_quiz'>Pop Quiz</h1>\n <ol>\n <li>What is the <b>derivative</b> of $f(x) = 1 + 2\\cos(3\\pi x + 4)$?</li>\n <ol>\n <li>$f'(x) = 6\\pi\\sin(3\\pi x + 4)$</li>\n <li>$f'(x) = -6\\pi\\sin(3\\pi x + 4)$</li>\n <li>$f'(x) = 24\\pi\\sin(3\\pi x + 4)$</li>\n <li>$f'(x) = -24\\pi\\sin(3\\pi x + 4)$</li>\n </ol>\n </ol>\n <img src = 'calculus_cat.png' width = '100' height = '100' />\n</body>\n```\n\nusing DSL\n\n\n::: {.cell}\n\n```{.r .cell-code}\nwith_html(\n body(\n h1(\"Pop quiz!\", id = \"pop_quiz\"),\n ol(\n li(\"What is the \", b(\"derivative\"), \"of $f(x) = 1 + 2cos(3pi x + 4)$?\"),\n ol(\n li(\"$f'(x) = 6pi*sin(3pi x + 4)$\"),\n li(\"$f'(x) = -6pi*sin(3pi x + 4)$\"),\n li(\"$f'(x) = 24pi*sin(3pi x + 4)$\"),\n li(\"$f'(x) = -24pi*sin(3pi x + 4)$\")\n )\n ),\n img(src = \"images/translating/calculus_cat.png\", width = 100, height = 100)\n )\n)\n```\n:::\n\n\nIn particular,\n\n* **tags** such as `<b></b>` have *attributes*\n* **void tags** such as `<img />`\n* special characters: `&`, `<`, and `>`\n\n\n<details>\n<summary>HTML verification</summary>\n\n```{=html}\n<body>\n <h1 id = 'pop_quiz'>Pop Quiz</h1>\n <ol>\n <li>What is the <b>derivative</b> of $f(x) = 1 + 2\\cos(3\\pi x + 4)$?</li>\n <ol>\n <li>$f'(x) = 6\\pi\\sin(3\\pi x + 4)$</li>\n <li>$f'(x) = -6\\pi\\sin(3\\pi x + 4)$</li>\n <li>$f'(x) = 24\\pi\\sin(3\\pi x + 4)$</li>\n <li>$f'(x) = -24\\pi\\sin(3\\pi x + 4)$</li>\n </ol>\n </ol>\n <img src = 'images/translating/calculus_cat.png' width = '100' height = '100' />\n</body>\n```\n\n</details>\n\n\n## Escaping\n\n* need to escape `&`, `<`, and `>`\n* don't \"double escape\"\n* leave HTML alone\n\n### S3 Class\n\n\n::: {.cell}\n\n```{.r .cell-code}\nhtml <- function(x) structure(x, class = \"advr_html\")\n\n#dispatch\nprint.advr_html <- function(x, ...) {\n out <- paste0(\"<HTML> \", x)\n cat(paste(strwrap(out), collapse = \"\\n\"), \"\\n\", sep = \"\")\n}\n```\n:::\n\n\n### Generic\n\n\n::: {.cell}\n\n```{.r .cell-code}\nescape <- function(x) UseMethod(\"escape\")\nescape.character <- function(x) {\n x <- gsub(\"&\", \"&\", x)\n x <- gsub(\"<\", \"<\", x)\n x <- gsub(\">\", \">\", x)\n html(x)\n}\nescape.advr_html <- function(x) x\n```\n:::\n\n\n### Checks\n\n\n::: {.cell}\n\n```{.r .cell-code}\nescape(\"This is some text.\")\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> <HTML> This is some text.\n```\n\n\n:::\n\n```{.r .cell-code}\nescape(\"x > 1 & y < 2\")\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> <HTML> x > 1 & y < 2\n```\n\n\n:::\n\n```{.r .cell-code}\nescape(escape(\"This is some text. 1 > 2\")) #double escape\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> <HTML> This is some text. 1 > 2\n```\n\n\n:::\n\n```{.r .cell-code}\nescape(html(\"<hr />\")) #already html\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> <HTML> <hr />\n```\n\n\n:::\n:::\n\n\n\n## Named Components\n\n```{}\nli(\"What is the \", b(\"derivative\"), \"of $f(x) = 1 + 2\\cos(3\\pi x + 4)$?\")\n```\n\n* aiming to classify `li` and `b` as **named components**\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndots_partition <- function(...) {\n dots <- list2(...)\n \n if (is.null(names(dots))) {\n is_named <- rep(FALSE, length(dots))\n} else {\n is_named <- names(dots) != \"\"\n}\n \n list(\n named = dots[is_named],\n unnamed = dots[!is_named]\n )\n}\n```\n:::\n\n\n### Check\n\n\n::: {.cell}\n\n```{.r .cell-code}\nstr(dots_partition(company = \"Posit\",\n software = \"RStudio\",\n \"DSLC\",\n \"Cohort 9\"))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> List of 2\n#> $ named :List of 2\n#> ..$ company : chr \"Posit\"\n#> ..$ software: chr \"RStudio\"\n#> $ unnamed:List of 2\n#> ..$ : chr \"DSLC\"\n#> ..$ : chr \"Cohort 9\"\n```\n\n\n:::\n:::\n\n\n<details>\n<summary>HTML Attributes</summary>\n\nFound among the textbook's [source code](https://github.com/hadley/adv-r/blob/master/dsl-html-attributes.r)\n\n\n::: {.cell}\n\n```{.r .cell-code}\nhtml_attributes <- function(list) {\n if (length(list) == 0) return(\"\")\n\n attr <- map2_chr(names(list), list, html_attribute)\n paste0(\" \", unlist(attr), collapse = \"\")\n}\nhtml_attribute <- function(name, value = NULL) {\n if (length(value) == 0) return(name) # for attributes with no value\n if (length(value) != 1) stop(\"`value` must be NULL or length 1\")\n\n if (is.logical(value)) {\n # Convert T and F to true and false\n value <- tolower(value)\n } else {\n value <- escape_attr(value)\n }\n paste0(name, \"='\", value, \"'\")\n}\nescape_attr <- function(x) {\n x <- escape.character(x)\n x <- gsub(\"\\'\", ''', x)\n x <- gsub(\"\\\"\", '"', x)\n x <- gsub(\"\\r\", '
', x)\n x <- gsub(\"\\n\", '
', x)\n x\n}\n```\n:::\n\n\n\n</details>\n\n\n## Tags (calls)\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntag <- function(tag) {\n new_function(\n exprs(... = ), #arguments of new function\n expr({ #body of the new function\n \n #classify tags as named components\n dots <- dots_partition(...)\n \n #focus on named components as the tags\n attribs <- html_attributes(dots$named)\n \n # otherwise, nested code\n children <- map_chr(dots$unnamed, escape)\n\n # paste brackets, tag names, and attributes together\n # then unquote user arguments\n html(paste0(\n !!paste0(\"<\", tag), attribs, \">\",\n paste(children, collapse = \"\"),\n !!paste0(\"</\", tag, \">\")\n ))\n }),\n caller_env() #return the environment\n )\n}\n```\n:::\n\n\n<details>\n<summary>Void tags</summary>\n\n\n::: {.cell}\n\n```{.r .cell-code}\nvoid_tag <- function(tag) {\n new_function(\n exprs(... = ), #allows for missing arguments\n expr({\n dots <- dots_partition(...)\n \n # error check\n if (length(dots$unnamed) > 0) {\n abort(!!paste0(\"<\", tag, \"> must not have unnamed arguments\"))\n }\n attribs <- html_attributes(dots$named)\n\n html(paste0(!!paste0(\"<\", tag), attribs, \" />\"))\n }),\n caller_env()\n )\n}\n```\n:::\n\n\n</details>\n\n### Checks\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntag(\"ol\")\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> function (...) \n#> {\n#> dots <- dots_partition(...)\n#> attribs <- html_attributes(dots$named)\n#> children <- map_chr(dots$unnamed, escape)\n#> html(paste0(\"<ol\", attribs, \">\", paste(children, collapse = \"\"), \n#> \"</ol>\"))\n#> }\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nimg <- void_tag(\"img\")\n```\n:::\n\n\n\n```{.r .cell-code}\nimg()\n```\n\n<HTML> <img />\n\n\n::: {.cell}\n\n```{.r .cell-code}\nimg(src = \"images/translating/calculus_cat.png\",\n width = 100,\n height = 100)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> <HTML> <img src='images/translating/calculus_cat.png' width='100'\n#> height='100' />\n```\n\n\n:::\n:::\n\n\n\n## Tags (processing)\n\n<details>\n<summary>Venn Diagram</summary>\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntags <- c(\"a\", \"abbr\", \"address\", \"article\", \"aside\", \"audio\",\n \"b\",\"bdi\", \"bdo\", \"blockquote\", \"body\", \"button\", \"canvas\",\n \"caption\",\"cite\", \"code\", \"colgroup\", \"data\", \"datalist\",\n \"dd\", \"del\",\"details\", \"dfn\", \"div\", \"dl\", \"dt\", \"em\",\n \"eventsource\",\"fieldset\", \"figcaption\", \"figure\", \"footer\",\n \"form\", \"h1\", \"h2\", \"h3\", \"h4\", \"h5\", \"h6\", \"head\", \"header\",\n \"hgroup\", \"html\", \"i\",\"iframe\", \"ins\", \"kbd\", \"label\",\n \"legend\", \"li\", \"mark\", \"map\",\"menu\", \"meter\", \"nav\",\n \"noscript\", \"object\", \"ol\", \"optgroup\", \"option\", \"output\",\n \"p\", \"pre\", \"progress\", \"q\", \"ruby\", \"rp\",\"rt\", \"s\", \"samp\",\n \"script\", \"section\", \"select\", \"small\", \"span\", \"strong\",\n \"style\", \"sub\", \"summary\", \"sup\", \"table\", \"tbody\", \"td\",\n \"textarea\", \"tfoot\", \"th\", \"thead\", \"time\", \"title\", \"tr\",\n \"u\", \"ul\", \"var\", \"video\"\n)\n\nvoid_tags <- c(\"area\", \"base\", \"br\", \"col\", \"command\", \"embed\",\n \"hr\", \"img\", \"input\", \"keygen\", \"link\", \"meta\", \"param\",\n \"source\", \"track\", \"wbr\"\n)\n```\n:::\n\n\n</details>\n\n\n::: {.cell}\n\n```{.r .cell-code}\nhtml_tags <- c(\n tags |> #list of tag names from HTML\n set_names() |> #named variable to avoid reserved words!\n map(tag), #make them function calls\n void_tags |>\n set_names() |>\n map(void_tag)\n)\n```\n:::\n\n\n\n### Example\n\n\n::: {.cell}\n\n```{.r .cell-code}\nhtml_tags$ol(\n html_tags$li(\"What is the \", \n html_tags$b(\"derivative\"),\n \"of $f(x) = 1 + 2cos(3pi x + 4)$?\"))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> <HTML> <ol><li>What is the <b>derivative</b>of $f(x) = 1 + 2cos(3pi x +\n#> 4)$?</li></ol>\n```\n\n\n:::\n:::\n\n\n\n## Bringing the HTML Together\n\n\n::: {.cell}\n\n```{.r .cell-code}\nwith_html <- function(code) {\n eval_tidy(enquo(code), html_tags)\n}\n```\n:::\n\n\n### Main Example\n\n\n::: {.cell}\n\n```{.r .cell-code}\nwith_html(\n body(\n h1(\"Pop quiz!\", id = \"pop_quiz\"),\n ol(\n li(\"What is the \", b(\"derivative\"), \"of $f(x) = 1 + 2cos(3pi x + 4)$?\"),\n ol(\n li(\"$f'(x) = 6pi*sin(3pi x + 4)$\"),\n li(\"$f'(x) = -6pi*sin(3pi x + 4)$\"),\n li(\"$f'(x) = 24pi*sin(3pi x + 4)$\"),\n li(\"$f'(x) = -24pi*sin(3pi x + 4)$\")\n )\n ),\n img(src = \"images/translating/calculus_cat.png\", width = 100, height = 100)\n )\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> <HTML> <body><h1 id='pop_quiz'>Pop quiz!</h1><ol><li>What is the\n#> <b>derivative</b>of $f(x) = 1 + 2cos(3pi x + 4)$?</li><ol><li>$f'(x) =\n#> 6pi*sin(3pi x + 4)$</li><li>$f'(x) = -6pi*sin(3pi x +\n#> 4)$</li><li>$f'(x) = 24pi*sin(3pi x + 4)$</li><li>$f'(x) =\n#> -24pi*sin(3pi x + 4)$</li></ol></ol><img\n#> src='images/translating/calculus_cat.png' width='100' height='100'\n#> /></body>\n```\n\n\n:::\n:::\n\n\n### Check\n\n```{=html}\n<h1 id='pop_quiz'>Pop quiz!</h1><ol><li>What is the <b>derivative</b> of $f(x) = 1 + 2cos(3pi x + 4)$?</li><ol><li>$f'(x) = 6pi*sin(3pi x + 4)$</li><li>$f'(x) = -6pi*sin(3pi x + 4)$</li><li>$f'(x) = 24pi*sin(3pi x + 4)$</li><li>$f'(x) = -24pi*sin(3pi x + 4)$</li></ol></ol><img src='images/translating/calculus_cat.png' width='100' height='100' />\n```\n\n\n## LaTeX\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlatex <- function(x) structure(x, class = \"advr_latex\")\nprint.advr_latex <- function(x) { cat(\"<LATEX> \", x, \"\\n\", sep = \"\") }\n```\n:::\n\n\n### to_math\n\n\n::: {.cell}\n\n```{.r .cell-code}\nto_math <- function(x) {\n expr <- enexpr(x)\n latex( #return LaTeX code\n eval_bare( #eval_bare to ensure use of latex environment \n expr, #expression (not quosure)\n latex_env(expr) #need to define latex_env\n ))\n}\n```\n:::\n\n\n## Known Symbols\n\n\n::: {.cell}\n\n```{.r .cell-code}\ngreek_letters <- c(\n \"alpha\", \"beta\", \"chi\", \"delta\", \"Delta\", \"epsilon\", \"eta\", \n\"gamma\", \"Gamma\", \"iota\", \"kappa\", \"lambda\", \"Lambda\", \"mu\", \n\"nu\", \"omega\", \"Omega\", \"phi\", \"Phi\", \"pi\", \"Pi\", \"psi\", \"Psi\", \n\"rho\", \"sigma\", \"Sigma\", \"tau\", \"theta\", \"Theta\", \"upsilon\", \n\"Upsilon\", \"varepsilon\", \"varphi\", \"varrho\", \"vartheta\", \"xi\", \n\"Xi\", \"zeta\"\n)\n\ngreek_env <- rlang::as_environment(\n rlang::set_names(\n paste0(\"\\\\\", greek_letters), #latex values\n greek_letters #R names\n )\n)\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nstr(as.list(greek_env))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> List of 38\n#> $ zeta : chr \"\\\\zeta\"\n#> $ Xi : chr \"\\\\Xi\"\n#> $ xi : chr \"\\\\xi\"\n#> $ vartheta : chr \"\\\\vartheta\"\n#> $ varrho : chr \"\\\\varrho\"\n#> $ varphi : chr \"\\\\varphi\"\n#> $ varepsilon: chr \"\\\\varepsilon\"\n#> $ Upsilon : chr \"\\\\Upsilon\"\n#> $ upsilon : chr \"\\\\upsilon\"\n#> $ Theta : chr \"\\\\Theta\"\n#> $ theta : chr \"\\\\theta\"\n#> $ tau : chr \"\\\\tau\"\n#> $ Sigma : chr \"\\\\Sigma\"\n#> $ sigma : chr \"\\\\sigma\"\n#> $ rho : chr \"\\\\rho\"\n#> $ Psi : chr \"\\\\Psi\"\n#> $ psi : chr \"\\\\psi\"\n#> $ Pi : chr \"\\\\Pi\"\n#> $ pi : chr \"\\\\pi\"\n#> $ Phi : chr \"\\\\Phi\"\n#> $ phi : chr \"\\\\phi\"\n#> $ Omega : chr \"\\\\Omega\"\n#> $ omega : chr \"\\\\omega\"\n#> $ nu : chr \"\\\\nu\"\n#> $ mu : chr \"\\\\mu\"\n#> $ Lambda : chr \"\\\\Lambda\"\n#> $ lambda : chr \"\\\\lambda\"\n#> $ kappa : chr \"\\\\kappa\"\n#> $ iota : chr \"\\\\iota\"\n#> $ Gamma : chr \"\\\\Gamma\"\n#> $ gamma : chr \"\\\\gamma\"\n#> $ eta : chr \"\\\\eta\"\n#> $ epsilon : chr \"\\\\epsilon\"\n#> $ Delta : chr \"\\\\Delta\"\n#> $ delta : chr \"\\\\delta\"\n#> $ chi : chr \"\\\\chi\"\n#> $ beta : chr \"\\\\beta\"\n#> $ alpha : chr \"\\\\alpha\"\n```\n\n\n:::\n:::\n\n\n\n## Known Functions\n\n### Unary Operations\n\n\n::: {.cell}\n\n```{.r .cell-code}\nunary_op <- function(left, right) {\n new_function(\n exprs(e1 = ),\n expr(\n paste0(!!left, e1, !!right)\n ),\n caller_env()\n )\n}\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\n#example\nunary_op(\"\\\\sqrt{\", \"}\")\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> function (e1) \n#> paste0(\"\\\\sqrt{\", e1, \"}\")\n```\n\n\n:::\n:::\n\n\n### Binary Operations\n\n\n::: {.cell}\n\n```{.r .cell-code}\nbinary_op <- function(sep) {\n new_function(\n exprs(e1 = , e2 = ),\n expr(\n paste0(e1, !!sep, e2)\n ),\n caller_env()\n )\n}\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\n#example\nbinary_op(\"+\")\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> function (e1, e2) \n#> paste0(e1, \"+\", e2)\n```\n\n\n:::\n:::\n\n\n<details>\n<summary>Even more LaTeX syntax</summary>\n\n\n::: {.cell}\n\n```{.r .cell-code}\nknown_func_env <- child_env(\n .parent = empty_env(),\n \n # Binary operators\n `+` = binary_op(\" + \"),\n `-` = binary_op(\" - \"),\n `*` = binary_op(\" * \"),\n `/` = binary_op(\" / \"),\n `^` = binary_op(\"^\"),\n `[` = binary_op(\"_\"),\n\n # Grouping\n `{` = unary_op(\"\\\\left{ \", \" \\\\right}\"),\n `(` = unary_op(\"\\\\left( \", \" \\\\right)\"),\n paste = paste,\n\n # Other math functions\n sqrt = unary_op(\"\\\\sqrt{\", \"}\"),\n sin = unary_op(\"\\\\sin(\", \")\"),\n cos = unary_op(\"\\\\cos(\", \")\"),\n tan = unary_op(\"\\\\tan(\", \")\"),\n log = unary_op(\"\\\\log(\", \")\"),\n abs = unary_op(\"\\\\left| \", \"\\\\right| \"),\n frac = function(a, b) {\n paste0(\"\\\\frac{\", a, \"}{\", b, \"}\")\n },\n\n # Labelling\n hat = unary_op(\"\\\\hat{\", \"}\"),\n tilde = unary_op(\"\\\\tilde{\", \"}\")\n)\n```\n:::\n\n\n</details>\n\n\n## Unknown Symbols\n\n\n::: {.cell}\n\n```{.r .cell-code}\nnames_grabber <- function(x) {\n switch_expr(x,\n constant = character(),\n symbol = as.character(x),\n call = flat_map_chr(as.list(x[-1]), names_grabber)\n ) |>\n unique()\n}\n```\n:::\n\n\n$$x + y + f(a, b, c, 10)$$\n\n\n::: {.cell}\n\n```{.r .cell-code}\nnames_grabber(expr(x + y + f(a, b, c, 10)))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"x\" \"y\" \"a\" \"b\" \"c\"\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlobstr::ast(expr(x + y + f(a, b, c, 10)))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> █─expr \n#> └─█─`+` \n#> ├─█─`+` \n#> │ ├─x \n#> │ └─y \n#> └─█─f \n#> ├─a \n#> ├─b \n#> ├─c \n#> └─10\n```\n\n\n:::\n:::\n\n\n\n## Unknown Functions\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncalls_grabber <- function(x) {\n switch_expr(x,\n constant = ,\n symbol = character(),\n call = {\n fname <- as.character(x[[1]])\n children <- flat_map_chr(as.list(x[-1]), calls_grabber)\n c(fname, children)\n }\n ) |>\n unique()\n}\n```\n:::\n\n\n$$f(g + b, c, d(a))$$\n\n\n::: {.cell}\n\n```{.r .cell-code}\nnames_grabber(expr(f(g + b, c, d(a))))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"g\" \"b\" \"c\" \"a\"\n```\n\n\n:::\n\n```{.r .cell-code}\ncalls_grabber(expr(f(g + b, c, d(a))))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"f\" \"+\" \"d\"\n```\n\n\n:::\n\n```{.r .cell-code}\nlobstr::ast(expr(f(g + b, c, d(a))))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> █─expr \n#> └─█─f \n#> ├─█─`+` \n#> │ ├─g \n#> │ └─b \n#> ├─c \n#> └─█─d \n#> └─a\n```\n\n\n:::\n:::\n\n\n---\n\n\n::: {.cell}\n\n```{.r .cell-code}\nseek_closure <- function(op) {\n # change math font for function names\n # apply ending parenthesis\n new_function(\n exprs(... = ),\n expr({\n contents <- paste(..., collapse = \", \")\n paste0(!!paste0(\"\\\\mathrm{\", op, \"}(\"), contents, \")\")\n })\n )\n}\n```\n:::\n\n\n## Bringing the LaTeX Together\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlatex_env <- function(expr) {\n \n # Unknown Functions\n calls <- calls_grabber(expr)\n call_list <- map(set_names(calls), seek_closure)\n call_env <- as_environment(call_list)\n\n # Known Functions\n known_func_env <- env_clone(known_func_env, call_env)\n\n # Unknown Symbols\n names <- names_grabber(expr)\n symbol_env <- as_environment(set_names(names), parent = known_func_env)\n\n # Known symbols\n greek_env <- env_clone(greek_env, parent = symbol_env)\n greek_env\n}\n\nto_math <- function(x) {\n expr <- enexpr(x)\n latex( #return LaTeX code\n eval_bare( #eval_bare to ensure use of latex environment \n expr, #expression (not quosure)\n latex_env(expr) #need to define latex_env\n ))\n}\n```\n:::\n\n\n### Check\n\n\n::: {.cell}\n\n```{.r .cell-code}\nto_math(sin(pi) + f(a))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> <LATEX> \\sin(\\pi) + \\mathrm{f}(a)\n```\n\n\n:::\n:::\n\n\n## Finishing the Example\n\n(TO DO)\n", + "supporting": [ + "21_files" + ], "filters": [ "rmarkdown/pagebreak.lua" ], diff --git a/_freeze/slides/22/execute-results/html.json b/_freeze/slides/22/execute-results/html.json @@ -1,9 +1,11 @@ { - "hash": "b9f62429ee002b7e43157ec353c42485", + "hash": "c41d31acf029963308762ac94aeb9b3e", "result": { "engine": "knitr", - "markdown": "---\nengine: knitr\ntitle: Debugging\n---\n\n## Learning objectives:\n\n- General strategy for finding and fixing errors.\n\n- Explore the `traceback()` function to locate exactly where an error occurred\n\n- Explore how to pause the execution of a function and launch environment where we can interactively explore what’s happening\n\n- Explore debugging when you’re running code non-interactively\n\n- Explore non-error problems that occasionally also need debugging\n\n## Introduction {-}\n\n> Finding bug in code, is a process of confirming the many things that we believe are true — until we find one which is not true.\n\n**—Norm Matloff**\n\n> Debugging is like being the detective in a crime movie where you're also the murderer. \n\n**-Filipe Fortes**\n\n### Strategies for finding and fixing errors {-}\n\n#### Google! {-}\nWhenever you see an error message, start by googling it. We can automate this process with the [{errorist}](https://github.com/coatless-rpkg/errorist) and [{searcher}](https://github.com/coatless-rpkg/searcher) packages. \n\n#### Make it repeatable {-}\nTo find the root cause of an error, you’re going to need to execute the code many times as you consider and reject hypotheses. It’s worth some upfront investment to make the problem both easy and fast to reproduce.\n\n#### Figure out where it is {-}\nTo find the bug, adopt the scientific method: **generate hypotheses**, **design experiments to test them**, and **record your results**. This may seem like a lot of work, but a systematic approach will end up saving you time. \n\n#### Fix it and test it {-}\nOnce you’ve found the bug, you need to figure out how to fix it and to check that the fix actually worked. It’s very useful to have automated tests in place. \n\n## Locating errors {-}\nThe most important tool for finding errors is `traceback()`, which shows you the sequence of calls (also known as the **call stack**) that lead to the error.\n\n- Here’s a simple example where `f()` calls `g()` calls `h()` calls `i()`, which checks if its argument is numeric:\n\n\nWhen we run `f(\"a\")` code in RStudio we see:\n\n\n\n\nIf you click **“Show traceback”** you see:\n\n\n\nYou read the `traceback()` output from bottom to top: the initial call is `f()`, which calls `g()`, then `h()`, then `i()`, which triggers the error. \n\n## Lazy evaluation {-}\nOne drawback to `traceback()` is that it always **linearises** the call tree, which can be confusing if there is much lazy evaluation involved. For example, take the following example where the error happens when evaluating the first argument to `f()`:\n\n\n\n\n\nNote: `rlang::with_abort()` is no longer an exported object from 'namespace:rlang'. There is an [open issue](https://github.com/hadley/adv-r/issues/1740) about a fix for the chapter but no drop-in replacement.\n\n\n## Interactive debugger {-}\nEnter the interactive debugger is wwith RStudio’s **“Rerun with Debug”** tool. This reruns the command that created the error, pausing execution where the error occurred. Otherwise, you can insert a call to `browser()` where you want to pause, and re-run the function. \n\n\n\n`browser()` is just a regular function call which means that you can run it conditionally by wrapping it in an `if` statement:\n\n\n\n\n\n\n## `browser()` commands {-}\n`browser()` provides a few special commands. \n\n\n\n- Next, `n`: executes the next step in the function.\n\n- Step into, or `s`: works like next, but if the next step is a function, it will step into that function so you can explore it interactively.\n\n- Finish, or `f`: finishes execution of the current loop or function.\n\n- Continue, `c`: leaves interactive debugging and continues regular execution of the function. \n- Stop, `Q`: stops debugging, terminates the function, and returns to the global workspace. \n\n\n## Alternatives {-}\nThere are three alternatives to using `browser()`: setting breakpoints in RStudio, `options(error = recover)`, and `debug()` and other related functions.\n\n## Breakpoints {-}\nIn RStudio, you can set a breakpoint by clicking to the left of the line number, or pressing **Shift + F9.** There are two small downsides to breakpoints:\n\n- There are a few unusual situations in which breakpoints will not work. [Read breakpoint troubleshooting for more details](https://support.posit.co/hc/en-us/articles/200534337-Breakpoint-Troubleshooting)\n\n- RStudio currently does not support conditional breakpoints.\n\n## `recover()` {-}\nWhen you set `options(error = recover)`, when you get an error, you’ll get an interactive prompt that displays the traceback and gives you the ability to interactively debug inside any of the frames:\n\n\nYou can return to default error handling with `options(error = NULL)`.\n\n## `debug()` {-}\n\nAnother approach is to call a function that inserts the `browser()` call:\n\n- `debug()` inserts a browser statement in the first line of the specified function. undebug() removes it. \n\n- `utils::setBreakpoint()` works similarly, but instead of taking a function name, it takes a file name and line number and finds the appropriate function for you.\n\n\n## Call stack {-}\nThe call stacks printed by `traceback()`, `browser()` & `where`, and `recover()` are not consistent. \n\n\n\nRStudio displays calls in the same order as `traceback()`. rlang functions use the same ordering and numbering as `recover()`, but also use indenting to reinforce the hierarchy of calls.\n\n## Non-interactive debugging {-}\n\nWhen you can’t explore interactively...\n\n### `callr::r()` {-}\n\n`callr::r(f, list(1, 2))` calls `f(1, 2)` in a fresh session to help diagnose:\n\n- Is the global environment different? Have you loaded different packages? Are objects left from previous sessions causing differences?\n\n- Is the working directory different?\n\n- Is the `PATH` environment variable different?\n\n- Is the `R_LIBS` environment variable different?\n\n### `dump.frames()` {-}\n\n`dump.frames()` is the equivalent to `recover()` for non-interactive code.\n\n\n\n### Print debugging {-}\n\nInsert numerous print statements to precisely locate the problem, and see the values of important variables. Print debugging is particularly useful for compiled code.\n\n\n\n\n### RMarkdown {-}\n\n- If you’re knitting the file using RStudio, switch to calling `rmarkdown::render(\"path/to/file.Rmd\")` instead to run the code in the current session. \n\n- For interactive debugging, you’ll need to call `sink()` in the error handler. For example, to use `recover()` with RMarkdown, you’d put the following code in your setup block:\n\n{height=\"110\"}\n\n\n\n## Non-error failures {-}\nThere are other ways for a function to fail apart from throwing an error:\n\n- A function may generate an unexpected warning. Convert warnings into errors with `options(warn = 2)` and use the the call stack.\n\n- A function may generate an unexpected message. The removal of `with_abort()` from {rlang} breaks this solution.\n\n- A function might never return. \n\n- The worst scenario is that your code might crash R completely, leaving you with no way to interactively debug your code. This indicates a bug in compiled (C or C++) code.\n\n## Link to some useful resources on debugging {-}\n\n- Jenny Bryan's [\"Object of type closure is not subsettable\"](https://github.com/jennybc/debugging#readme) talk from rstudio::conf 2020\n\n- Jenny Bryan and Jim Hester's book: [\"What They Forgot to Teach You About R\"](https://rstats.wtf/debugging-r) Ch12\n\n- Hadley's video on a [minimal reprex for a shiny app](https://www.youtube.com/watch?v=9w8ANOAlWy4) \n\n## Meeting Videos {-}\n\n### Cohort 1 {-}\n\n<iframe src=\"https://www.youtube.com/embed/ROMefwMuqXU\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 2 {-}\n\n<iframe src=\"https://www.youtube.com/embed/N43p4txxxlY\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 3 {-}\n\n<iframe src=\"https://www.youtube.com/embed/Jdb00nepeWQ\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 4 {-}\n\n<iframe src=\"https://www.youtube.com/embed/tOql7ZD6P58\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 5 {-}\n\n<iframe src=\"https://www.youtube.com/embed/EqsSWUQ6ZW0\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 6 {-}\n\n<iframe src=\"https://www.youtube.com/embed/YvT-knh1baA\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n<details>\n<summary> Meeting chat log </summary>\n\n```\n00:12:43\tTrevin Flickinger:\tHello everyone!\n00:13:03\tOluwafemi Oyedele:\tHello, Good evening!!!\n00:22:10\tTrevin Flickinger:\tMy connection is slow so I’ll be in the chat\n00:32:45\tTrevin Flickinger:\tIf you start with “continue” it should error out after the first call\n00:56:18\tTrevin Flickinger:\tSys.frame(-1) shows it goes back one frame\n00:59:55\tfg:\tthanks\n01:03:11\tArthur Shaw:\tAnyone else lose the presentation?\n01:03:20\tfg:\tyes\n01:03:22\tfg:\t?\n01:04:26\tTrevin Flickinger:\tI thought that was my internet connection\n01:05:07\tTrevin Flickinger:\tThank you!\n01:08:42\tTrevin Flickinger:\tI need to use debug( ) more as well\n01:10:15\tTrevin Flickinger:\t21st works for me\n01:10:29\tOluwafemi Oyedele:\tSame here!!!\n```\n</details>\n\n### Cohort 7 {-}\n\n<iframe src=\"https://www.youtube.com/embed/T_uFW9xXoJk\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n", - "supporting": [], + "markdown": "---\nengine: knitr\ntitle: Debugging\n---\n\n## Learning objectives:\n\n- General strategy for finding and fixing errors.\n\n- Explore the `traceback()` function to locate exactly where an error occurred\n\n- Explore how to pause the execution of a function and launch environment where we can interactively explore what’s happening\n\n- Explore debugging when you’re running code non-interactively\n\n- Explore non-error problems that occasionally also need debugging\n\n## Introduction {-}\n\n> Finding bug in code, is a process of confirming the many things that we believe are true — until we find one which is not true.\n\n**—Norm Matloff**\n\n> Debugging is like being the detective in a crime movie where you're also the murderer. \n\n**-Filipe Fortes**\n\n### Strategies for finding and fixing errors {-}\n\n#### Google! {-}\nWhenever you see an error message, start by googling it. We can automate this process with the [{errorist}](https://github.com/coatless-rpkg/errorist) and [{searcher}](https://github.com/coatless-rpkg/searcher) packages. \n\n#### Make it repeatable {-}\nTo find the root cause of an error, you’re going to need to execute the code many times as you consider and reject hypotheses. It’s worth some upfront investment to make the problem both easy and fast to reproduce.\n\n#### Figure out where it is {-}\nTo find the bug, adopt the scientific method: **generate hypotheses**, **design experiments to test them**, and **record your results**. This may seem like a lot of work, but a systematic approach will end up saving you time. \n\n#### Fix it and test it {-}\nOnce you’ve found the bug, you need to figure out how to fix it and to check that the fix actually worked. It’s very useful to have automated tests in place. \n\n## Locating errors {-}\nThe most important tool for finding errors is `traceback()`, which shows you the sequence of calls (also known as the **call stack**) that lead to the error.\n\n- Here’s a simple example where `f()` calls `g()` calls `h()` calls `i()`, which checks if its argument is numeric:\n\n\nWhen we run `f(\"a\")` code in RStudio we see:\n\n\n\n\nIf you click **“Show traceback”** you see:\n\n\n\nYou read the `traceback()` output from bottom to top: the initial call is `f()`, which calls `g()`, then `h()`, then `i()`, which triggers the error. \n\n## Lazy evaluation {-}\nOne drawback to `traceback()` is that it always **linearises** the call tree, which can be confusing if there is much lazy evaluation involved. For example, take the following example where the error happens when evaluating the first argument to `f()`:\n\n\n\n\n\nNote: `rlang::with_abort()` is no longer an exported object from 'namespace:rlang'. There is an [open issue](https://github.com/hadley/adv-r/issues/1740) about a fix for the chapter but no drop-in replacement.\n\n\n## Interactive debugger {-}\nEnter the interactive debugger is wwith RStudio’s **“Rerun with Debug”** tool. This reruns the command that created the error, pausing execution where the error occurred. Otherwise, you can insert a call to `browser()` where you want to pause, and re-run the function. \n\n\n\n`browser()` is just a regular function call which means that you can run it conditionally by wrapping it in an `if` statement:\n\n\n\n\n\n\n## `browser()` commands {-}\n`browser()` provides a few special commands. \n\n\n\n- Next, `n`: executes the next step in the function.\n\n- Step into, or `s`: works like next, but if the next step is a function, it will step into that function so you can explore it interactively.\n\n- Finish, or `f`: finishes execution of the current loop or function.\n\n- Continue, `c`: leaves interactive debugging and continues regular execution of the function. \n- Stop, `Q`: stops debugging, terminates the function, and returns to the global workspace. \n\n\n## Alternatives {-}\nThere are three alternatives to using `browser()`: setting breakpoints in RStudio, `options(error = recover)`, and `debug()` and other related functions.\n\n## Breakpoints {-}\nIn RStudio, you can set a breakpoint by clicking to the left of the line number, or pressing **Shift + F9.** There are two small downsides to breakpoints:\n\n- There are a few unusual situations in which breakpoints will not work. [Read breakpoint troubleshooting for more details](https://support.posit.co/hc/en-us/articles/200534337-Breakpoint-Troubleshooting)\n\n- RStudio currently does not support conditional breakpoints.\n\n## `recover()` {-}\nWhen you set `options(error = recover)`, when you get an error, you’ll get an interactive prompt that displays the traceback and gives you the ability to interactively debug inside any of the frames:\n\n\nYou can return to default error handling with `options(error = NULL)`.\n\n## `debug()` {-}\n\nAnother approach is to call a function that inserts the `browser()` call:\n\n- `debug()` inserts a browser statement in the first line of the specified function. undebug() removes it. \n\n- `utils::setBreakpoint()` works similarly, but instead of taking a function name, it takes a file name and line number and finds the appropriate function for you.\n\n\n## Call stack {-}\nThe call stacks printed by `traceback()`, `browser()` & `where`, and `recover()` are not consistent. \n\n\n\nRStudio displays calls in the same order as `traceback()`. rlang functions use the same ordering and numbering as `recover()`, but also use indenting to reinforce the hierarchy of calls.\n\n## Non-interactive debugging {-}\n\nWhen you can’t explore interactively...\n\n### `callr::r()` {-}\n\n`callr::r(f, list(1, 2))` calls `f(1, 2)` in a fresh session to help diagnose:\n\n- Is the global environment different? Have you loaded different packages? Are objects left from previous sessions causing differences?\n\n- Is the working directory different?\n\n- Is the `PATH` environment variable different?\n\n- Is the `R_LIBS` environment variable different?\n\n### `dump.frames()` {-}\n\n`dump.frames()` is the equivalent to `recover()` for non-interactive code.\n\n\n\n### Print debugging {-}\n\nInsert numerous print statements to precisely locate the problem, and see the values of important variables. Print debugging is particularly useful for compiled code.\n\n\n\n\n### RMarkdown {-}\n\n- If you’re knitting the file using RStudio, switch to calling `rmarkdown::render(\"path/to/file.Rmd\")` instead to run the code in the current session. \n\n- For interactive debugging, you’ll need to call `sink()` in the error handler. For example, to use `recover()` with RMarkdown, you’d put the following code in your setup block:\n\n{height=\"110\"}\n\n\n\n## Non-error failures {-}\nThere are other ways for a function to fail apart from throwing an error:\n\n- A function may generate an unexpected warning. Convert warnings into errors with `options(warn = 2)` and use the the call stack.\n\n- A function may generate an unexpected message. The removal of `with_abort()` from {rlang} breaks this solution.\n\n- A function might never return. \n\n- The worst scenario is that your code might crash R completely, leaving you with no way to interactively debug your code. This indicates a bug in compiled (C or C++) code.\n\n## Link to some useful resources on debugging {-}\n\n- Jenny Bryan's [\"Object of type closure is not subsettable\"](https://github.com/jennybc/debugging#readme) talk from rstudio::conf 2020\n\n- Jenny Bryan and Jim Hester's book: [\"What They Forgot to Teach You About R\"](https://rstats.wtf/debugging-r) Ch12\n\n- Hadley's video on a [minimal reprex for a shiny app](https://www.youtube.com/watch?v=9w8ANOAlWy4) \n", + "supporting": [ + "22_files" + ], "filters": [ "rmarkdown/pagebreak.lua" ], diff --git a/_freeze/slides/23/execute-results/html.json b/_freeze/slides/23/execute-results/html.json @@ -1,8 +1,8 @@ { - "hash": "7cfd05e332b54b87cde2a852cde1f19b", + "hash": "498fd72af1822effcb92021a43a9d7c2", "result": { "engine": "knitr", - "markdown": "---\nengine: knitr\ntitle: Measuring performance\n---\n\n## Learning objectives:\n\n- Understand how to improve your code for making it faster\n- Learn what are the tools for improving your code\n- Test how to profile your code\n\n\n## Introduction\n\n> \"Before you can make your code faster, you first need to figure out what’s making it slow.\"\n\n\n\n::: {.cell layout-align=\"center\"}\n::: {.cell-output-display}\n{fig-align='center' width=348}\n:::\n:::\n\n\n\n- **profile** your code: measure the run-time of each line of code using realistic inputs\n- **experiment** with alternatives to find faster code\n- **microbenchmark** to measure the difference in performance.\n\n\n\n## Profiling\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(profvis)\nlibrary(bench)\n```\n:::\n\n\n\nThe tool to use is a **profiler**, it allows for **sampling** the code performance through stopping the execution of code every few milliseconds and recording all the steps.\n\nExample:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nf <- function() {\n pause(0.1)\n g()\n h()\n}\ng <- function() {\n pause(0.1)\n h()\n}\nh <- function() {\n pause(0.1)\n}\n```\n:::\n\n\nProfile the execution of f():\n\n profvis::pause() is used instead of Sys.sleep()\n profile f(), with utils::Rprof()\n \n\n::: {.cell}\n\n```{.r .cell-code}\ntmp <- tempfile()\nRprof(tmp, interval = 0.1)\nf()\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> NULL\n```\n\n\n:::\n\n```{.r .cell-code}\nRprof(NULL)\nwriteLines(readLines(tmp))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> sample.interval=100000\n#> \"pause\" \"f\" \"eval\" \"eval\" \"withVisible\" \"withCallingHandlers\" \"eval\" \"eval\" \"with_handlers\" \"doWithOneRestart\" \"withOneRestart\" \"withRestartList\" \"doWithOneRestart\" \"withOneRestart\" \"withRestartList\" \"withRestarts\" \"evaluate::evaluate\" \"evaluate\" \"in_dir\" \"in_input_dir\" \"eng_r\" \"block_exec\" \"call_block\" \"process_group\" \"withCallingHandlers\" \"xfun:::handle_error\" \"process_file\" \"knitr::knit\" \"rmarkdown::render\" \"execute\" \".main\" \n#> \"pause\" \"g\" \"f\" \"eval\" \"eval\" \"withVisible\" \"withCallingHandlers\" \"eval\" \"eval\" \"with_handlers\" \"doWithOneRestart\" \"withOneRestart\" \"withRestartList\" \"doWithOneRestart\" \"withOneRestart\" \"withRestartList\" \"withRestarts\" \"evaluate::evaluate\" \"evaluate\" \"in_dir\" \"in_input_dir\" \"eng_r\" \"block_exec\" \"call_block\" \"process_group\" \"withCallingHandlers\" \"xfun:::handle_error\" \"process_file\" \"knitr::knit\" \"rmarkdown::render\" \"execute\" \".main\" \n#> \"pause\" \"h\" \"g\" \"f\" \"eval\" \"eval\" \"withVisible\" \"withCallingHandlers\" \"eval\" \"eval\" \"with_handlers\" \"doWithOneRestart\" \"withOneRestart\" \"withRestartList\" \"doWithOneRestart\" \"withOneRestart\" \"withRestartList\" \"withRestarts\" \"evaluate::evaluate\" \"evaluate\" \"in_dir\" \"in_input_dir\" \"eng_r\" \"block_exec\" \"call_block\" \"process_group\" \"withCallingHandlers\" \"xfun:::handle_error\" \"process_file\" \"knitr::knit\" \"rmarkdown::render\" \"execute\" \".main\" \n#> \"pause\" \"h\" \"f\" \"eval\" \"eval\" \"withVisible\" \"withCallingHandlers\" \"eval\" \"eval\" \"with_handlers\" \"doWithOneRestart\" \"withOneRestart\" \"withRestartList\" \"doWithOneRestart\" \"withOneRestart\" \"withRestartList\" \"withRestarts\" \"evaluate::evaluate\" \"evaluate\" \"in_dir\" \"in_input_dir\" \"eng_r\" \"block_exec\" \"call_block\" \"process_group\" \"withCallingHandlers\" \"xfun:::handle_error\" \"process_file\" \"knitr::knit\" \"rmarkdown::render\" \"execute\" \".main\"\n```\n\n\n:::\n:::\n\n \n \n**Visualising profiles**\n\nMakes easier to build up a mental model of what you need to change:\n\n profvis::profvis()\n utils::summaryRprof()\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsource(\"scripts/profiling-example.R\")\nprofvis(f())\n```\n\n::: {.cell-output-display}\n\n```{=html}\n<div class=\"profvis html-widget html-fill-item\" id=\"htmlwidget-bada5a5b82a551b731af\" style=\"width:100%;height:600px;\"></div>\n<script type=\"application/json\" data-for=\"htmlwidget-bada5a5b82a551b731af\">{\"x\":{\"message\":{\"prof\":{\"time\":[1,1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,8,9,9,9,10,10,10,11,11,11,12,12,12,13,13,13,14,14,14,14,15,15,15,15,16,16,16,16,17,17,17,17,18,18,18,18,19,19,19,19,20,20,20,20,21,21,21,22,22,22,23,23,23,24,24,24,25,25,25,26,26,26],\"depth\":[3,2,1,2,1,2,1,2,1,2,1,2,1,2,1,3,2,1,3,2,1,3,2,1,3,2,1,3,2,1,3,2,1,4,3,2,1,4,3,2,1,4,3,2,1,4,3,2,1,4,3,2,1,4,3,2,1,4,3,2,1,3,2,1,3,2,1,3,2,1,3,2,1,3,2,1,3,2,1],\"label\":[\"Rprof\",\"profvis\",\".main\",\"pause\",\"f\",\"pause\",\"f\",\"pause\",\"f\",\"pause\",\"f\",\"pause\",\"f\",\"pause\",\"f\",\"pause\",\"g\",\"f\",\"pause\",\"g\",\"f\",\"pause\",\"g\",\"f\",\"pause\",\"g\",\"f\",\"pause\",\"g\",\"f\",\"pause\",\"g\",\"f\",\"pause\",\"h\",\"g\",\"f\",\"pause\",\"h\",\"g\",\"f\",\"pause\",\"h\",\"g\",\"f\",\"pause\",\"h\",\"g\",\"f\",\"pause\",\"h\",\"g\",\"f\",\"pause\",\"h\",\"g\",\"f\",\"pause\",\"h\",\"g\",\"f\",\"pause\",\"h\",\"f\",\"pause\",\"h\",\"f\",\"pause\",\"h\",\"f\",\"pause\",\"h\",\"f\",\"pause\",\"h\",\"f\",\"pause\",\"h\",\"f\"],\"filenum\":[null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null],\"linenum\":[null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null],\"memalloc\":[10.16163635253906,10.16163635253906,10.16163635253906,10.17041015625,10.17041015625,10.17041015625,10.17041015625,10.17041015625,10.17041015625,10.17041015625,10.17041015625,10.17041015625,10.17041015625,10.17041015625,10.17041015625,10.17041015625,10.17041015625,10.17041015625,10.17041015625,10.17041015625,10.17041015625,10.17041015625,10.17041015625,10.17041015625,10.17041015625,10.17041015625,10.17041015625,10.17041015625,10.17041015625,10.17041015625,10.17041015625,10.17041015625,10.17041015625,10.17041015625,10.17041015625,10.17041015625,10.17041015625,10.17041015625,10.17041015625,10.17041015625,10.17041015625,10.17041015625,10.17041015625,10.17041015625,10.17041015625,10.17041015625,10.17041015625,10.17041015625,10.17041015625,10.17041015625,10.17041015625,10.17041015625,10.17041015625,10.17041015625,10.17041015625,10.17041015625,10.17041015625,10.17041015625,10.17041015625,10.17041015625,10.17041015625,10.17041015625,10.17041015625,10.17041015625,10.17041015625,10.17041015625,10.17041015625,10.17041015625,10.17041015625,10.17041015625,10.17041015625,10.17041015625,10.17041015625,10.17041015625,10.17041015625,10.17041015625,10.17041015625,10.17041015625,10.17041015625],\"meminc\":[0,0,0,0.0087738037109375,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],\"filename\":[null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null]},\"interval\":10,\"files\":[],\"prof_output\":\"C:\\\\Users\\\\jonth\\\\AppData\\\\Local\\\\Temp\\\\RtmpQBRpZV\\\\file35786b153.prof\",\"highlight\":{\"output\":[\"^output\\\\$\"],\"gc\":[\"^<GC>$\"],\"stacktrace\":[\"^\\\\.\\\\.stacktraceo(n|ff)\\\\.\\\\.$\"]},\"split\":\"h\"}},\"evals\":[],\"jsHooks\":[]}</script>\n```\n\n:::\n:::\n\n\n**Memory profiling and the garbage collector**\n\nProfiling a loop that modifies an existing variable:\n\n::: {.cell}\n\n```{.r .cell-code}\nprofvis::profvis({\n x <- integer()\nfor (i in 1:1e4) {\n x <- c(x, i)\n}\n})\n```\n\n::: {.cell-output-display}\n\n```{=html}\n<div class=\"profvis html-widget html-fill-item\" id=\"htmlwidget-5a6fde18c6928dde5df7\" style=\"width:100%;height:600px;\"></div>\n<script type=\"application/json\" data-for=\"htmlwidget-5a6fde18c6928dde5df7\">{\"x\":{\"message\":{\"prof\":{\"time\":[1,1,2,2,3,3,4,4,5,5],\"depth\":[2,1,2,1,2,1,2,1,2,1],\"label\":[\"c\",\".main\",\"c\",\".main\",\"c\",\".main\",\"c\",\".main\",\"c\",\".main\"],\"filenum\":[null,null,null,null,null,null,null,null,null,null],\"linenum\":[null,null,null,null,null,null,null,null,null,null],\"memalloc\":[42.66594696044922,42.66594696044922,23.44996643066406,23.44996643066406,57.82827758789062,57.82827758789062,42.49221801757812,42.49221801757812,24.62628936767578,24.62628936767578],\"meminc\":[0,0,-19.21598052978516,0,34.37831115722656,0,-15.3360595703125,0,-17.86592864990234,0],\"filename\":[null,null,null,null,null,null,null,null,null,null]},\"interval\":10,\"files\":[],\"prof_output\":\"C:\\\\Users\\\\jonth\\\\AppData\\\\Local\\\\Temp\\\\RtmpQBRpZV\\\\file3578cfde24.prof\",\"highlight\":{\"output\":[\"^output\\\\$\"],\"gc\":[\"^<GC>$\"],\"stacktrace\":[\"^\\\\.\\\\.stacktraceo(n|ff)\\\\.\\\\.$\"]},\"split\":\"h\"}},\"evals\":[],\"jsHooks\":[]}</script>\n```\n\n:::\n:::\n\n\nYou can figure out what is the source of the problem by looking at the memory column. In this case, **copy-on-modify** acts in each iteration of the loop creating another copy of x.\n\n\n**Limitations**\n\n- Profiling does not extend to C code\n- Anonymous functions are hard to figure out\n- Arguments are evaluated inside another function\n\n\n### Exercise\n\n::: {.cell}\n\n```{.r .cell-code}\nprofvis::profvis({\n f <- function(n = 1e5) {\n x <- rep(1, n)\n rm(x)\n}\n},torture = TRUE)\n```\n:::\n\n\n ?rm()\n \n[solution](https://advanced-r-solutions.rbind.io/measuring-performance.html) \n \n## Microbenchmarking\n\n\n*Measurement of the performance of a very small piece of code* is useful for comparing small snippets of code for specific tasks.\n\n\n::: {.cell layout-align=\"center\"}\n::: {.cell-output-display}\n{fig-align='center' width=159}\n:::\n:::\n\n\n\nThe {bench} package uses a high precision time.\n\n bench::mark()\n \n \n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(bench)\nx <- runif(100)\n(lb <- bench::mark(\n sqrt(x),\n x ^ 0.5\n))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> # A tibble: 2 × 6\n#> expression min median `itr/sec` mem_alloc `gc/sec`\n#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>\n#> 1 sqrt(x) 200ns 400ns 2112914. 848B 0\n#> 2 x^0.5 2.3µs 2.7µs 350293. 848B 0\n```\n\n\n:::\n:::\n\n- heavily right-skewed distribution\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nrequire(ggbeeswarm)\n```\n\n::: {.cell-output .cell-output-stderr}\n\n```\n#> Loading required package: ggbeeswarm\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\n#> Loading required package: ggplot2\n```\n\n\n:::\n\n```{.r .cell-code}\nplot(lb)\n```\n\n::: {.cell-output-display}\n{width=672}\n:::\n:::\n\n\n\n## Resources\n\n- [profvis package](https://rstudio.github.io/profvis/)\n- [bench package](https://cran.r-project.org/web/packages/bench/bench.pdf)\n- [solutions](https://advanced-r-solutions.rbind.io/measuring-performance.html)\n\n\n\n\n\n\n\n## Meeting Videos\n\n### Cohort 1\n\n(no video)\n\n### Cohort 2\n\n<iframe src=\"https://www.youtube.com/embed/_zeLDufwTwY\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 3\n\n<iframe src=\"https://www.youtube.com/embed/Jdb00nepeWQ\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 4\n\n<iframe src=\"https://www.youtube.com/embed/sCso4FAF1DE\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 5\n\n<iframe src=\"https://www.youtube.com/embed/pOaiDK7J7EE\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 6\n\n<iframe src=\"https://www.youtube.com/embed/d_pzz_AsoRQ\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 7\n\n<iframe src=\"https://www.youtube.com/embed/4hngR1c9oP4\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n", + "markdown": "---\nengine: knitr\ntitle: Measuring performance\n---\n\n## Learning objectives:\n\n- Understand how to improve your code for making it faster\n- Learn what are the tools for improving your code\n- Test how to profile your code\n\n\n## Introduction\n\n> \"Before you can make your code faster, you first need to figure out what’s making it slow.\"\n\n\n\n::: {.cell layout-align=\"center\"}\n::: {.cell-output-display}\n{fig-align='center' width=348}\n:::\n:::\n\n\n\n- **profile** your code: measure the run-time of each line of code using realistic inputs\n- **experiment** with alternatives to find faster code\n- **microbenchmark** to measure the difference in performance.\n\n\n\n## Profiling\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(profvis)\nlibrary(bench)\n```\n:::\n\n\n\nThe tool to use is a **profiler**, it allows for **sampling** the code performance through stopping the execution of code every few milliseconds and recording all the steps.\n\nExample:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nf <- function() {\n pause(0.1)\n g()\n h()\n}\ng <- function() {\n pause(0.1)\n h()\n}\nh <- function() {\n pause(0.1)\n}\n```\n:::\n\n\nProfile the execution of f():\n\n profvis::pause() is used instead of Sys.sleep()\n profile f(), with utils::Rprof()\n \n\n::: {.cell}\n\n```{.r .cell-code}\ntmp <- tempfile()\nRprof(tmp, interval = 0.1)\nf()\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> NULL\n```\n\n\n:::\n\n```{.r .cell-code}\nRprof(NULL)\nwriteLines(readLines(tmp))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> sample.interval=100000\n#> \"pause\" \"f\" \"eval\" \"eval\" \"withVisible\" \"withCallingHandlers\" \"eval\" \"eval\" \"with_handlers\" \"doWithOneRestart\" \"withOneRestart\" \"withRestartList\" \"doWithOneRestart\" \"withOneRestart\" \"withRestartList\" \"withRestarts\" \"evaluate::evaluate\" \"evaluate\" \"in_dir\" \"in_input_dir\" \"eng_r\" \"block_exec\" \"call_block\" \"process_group\" \"withCallingHandlers\" \"xfun:::handle_error\" \"process_file\" \"knitr::knit\" \"rmarkdown::render\" \"execute\" \".main\" \n#> \"pause\" \"g\" \"f\" \"eval\" \"eval\" \"withVisible\" \"withCallingHandlers\" \"eval\" \"eval\" \"with_handlers\" \"doWithOneRestart\" \"withOneRestart\" \"withRestartList\" \"doWithOneRestart\" \"withOneRestart\" \"withRestartList\" \"withRestarts\" \"evaluate::evaluate\" \"evaluate\" \"in_dir\" \"in_input_dir\" \"eng_r\" \"block_exec\" \"call_block\" \"process_group\" \"withCallingHandlers\" \"xfun:::handle_error\" \"process_file\" \"knitr::knit\" \"rmarkdown::render\" \"execute\" \".main\" \n#> \"pause\" \"h\" \"g\" \"f\" \"eval\" \"eval\" \"withVisible\" \"withCallingHandlers\" \"eval\" \"eval\" \"with_handlers\" \"doWithOneRestart\" \"withOneRestart\" \"withRestartList\" \"doWithOneRestart\" \"withOneRestart\" \"withRestartList\" \"withRestarts\" \"evaluate::evaluate\" \"evaluate\" \"in_dir\" \"in_input_dir\" \"eng_r\" \"block_exec\" \"call_block\" \"process_group\" \"withCallingHandlers\" \"xfun:::handle_error\" \"process_file\" \"knitr::knit\" \"rmarkdown::render\" \"execute\" \".main\" \n#> \"pause\" \"h\" \"f\" \"eval\" \"eval\" \"withVisible\" \"withCallingHandlers\" \"eval\" \"eval\" \"with_handlers\" \"doWithOneRestart\" \"withOneRestart\" \"withRestartList\" \"doWithOneRestart\" \"withOneRestart\" \"withRestartList\" \"withRestarts\" \"evaluate::evaluate\" \"evaluate\" \"in_dir\" \"in_input_dir\" \"eng_r\" \"block_exec\" \"call_block\" \"process_group\" \"withCallingHandlers\" \"xfun:::handle_error\" \"process_file\" \"knitr::knit\" \"rmarkdown::render\" \"execute\" \".main\"\n```\n\n\n:::\n:::\n\n \n \n**Visualising profiles**\n\nMakes easier to build up a mental model of what you need to change:\n\n profvis::profvis()\n utils::summaryRprof()\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsource(\"scripts/profiling-example.R\")\nprofvis(f())\n```\n\n::: {.cell-output-display}\n\n```{=html}\n<div class=\"profvis html-widget html-fill-item\" id=\"htmlwidget-a76651cfc65320a86296\" style=\"width:100%;height:600px;\"></div>\n<script type=\"application/json\" data-for=\"htmlwidget-a76651cfc65320a86296\">{\"x\":{\"message\":{\"prof\":{\"time\":[1,1,2,2,3,3,4,4,5,5,6,6,7,7,7,8,8,8,9,9,9,10,10,10,11,11,11,12,12,12,13,13,13,14,14,14,14,15,15,15,15,16,16,16,16,17,17,17,17,18,18,18,18,19,19,19,19,20,20,20,21,21,21,22,22,22,23,23,23,24,24,24,25,25,25],\"depth\":[2,1,2,1,2,1,2,1,2,1,2,1,3,2,1,3,2,1,3,2,1,3,2,1,3,2,1,3,2,1,3,2,1,4,3,2,1,4,3,2,1,4,3,2,1,4,3,2,1,4,3,2,1,4,3,2,1,3,2,1,3,2,1,3,2,1,3,2,1,3,2,1,3,2,1],\"label\":[\"pause\",\"f\",\"pause\",\"f\",\"pause\",\"f\",\"pause\",\"f\",\"pause\",\"f\",\"pause\",\"f\",\"pause\",\"g\",\"f\",\"pause\",\"g\",\"f\",\"pause\",\"g\",\"f\",\"pause\",\"g\",\"f\",\"pause\",\"g\",\"f\",\"pause\",\"g\",\"f\",\"pause\",\"g\",\"f\",\"pause\",\"h\",\"g\",\"f\",\"pause\",\"h\",\"g\",\"f\",\"pause\",\"h\",\"g\",\"f\",\"pause\",\"h\",\"g\",\"f\",\"pause\",\"h\",\"g\",\"f\",\"pause\",\"h\",\"g\",\"f\",\"pause\",\"h\",\"f\",\"pause\",\"h\",\"f\",\"pause\",\"h\",\"f\",\"pause\",\"h\",\"f\",\"pause\",\"h\",\"f\",\"pause\",\"h\",\"f\"],\"filenum\":[null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null],\"linenum\":[null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null],\"memalloc\":[10.16792297363281,10.16792297363281,10.16792297363281,10.16792297363281,10.16792297363281,10.16792297363281,10.16792297363281,10.16792297363281,10.16792297363281,10.16792297363281,10.16792297363281,10.16792297363281,10.16792297363281,10.16792297363281,10.16792297363281,10.16792297363281,10.16792297363281,10.16792297363281,10.16792297363281,10.16792297363281,10.16792297363281,10.16792297363281,10.16792297363281,10.16792297363281,10.16792297363281,10.16792297363281,10.16792297363281,10.16792297363281,10.16792297363281,10.16792297363281,10.16792297363281,10.16792297363281,10.16792297363281,10.16792297363281,10.16792297363281,10.16792297363281,10.16792297363281,10.16792297363281,10.16792297363281,10.16792297363281,10.16792297363281,10.16792297363281,10.16792297363281,10.16792297363281,10.16792297363281,10.16792297363281,10.16792297363281,10.16792297363281,10.16792297363281,10.16792297363281,10.16792297363281,10.16792297363281,10.16792297363281,10.16792297363281,10.16792297363281,10.16792297363281,10.16792297363281,10.16792297363281,10.16792297363281,10.16792297363281,10.16792297363281,10.16792297363281,10.16792297363281,10.16792297363281,10.16792297363281,10.16792297363281,10.16792297363281,10.16792297363281,10.16792297363281,10.16792297363281,10.16792297363281,10.16792297363281,10.16792297363281,10.16792297363281,10.16792297363281],\"meminc\":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],\"filename\":[null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null]},\"interval\":10,\"files\":[],\"prof_output\":\"C:\\\\Users\\\\jonth\\\\AppData\\\\Local\\\\Temp\\\\RtmpgFLX7B\\\\file8a2412f12f4c.prof\",\"highlight\":{\"output\":[\"^output\\\\$\"],\"gc\":[\"^<GC>$\"],\"stacktrace\":[\"^\\\\.\\\\.stacktraceo(n|ff)\\\\.\\\\.$\"]},\"split\":\"h\"}},\"evals\":[],\"jsHooks\":[]}</script>\n```\n\n:::\n:::\n\n\n**Memory profiling and the garbage collector**\n\nProfiling a loop that modifies an existing variable:\n\n::: {.cell}\n\n```{.r .cell-code}\nprofvis::profvis({\n x <- integer()\nfor (i in 1:1e4) {\n x <- c(x, i)\n}\n})\n```\n\n::: {.cell-output-display}\n\n```{=html}\n<div class=\"profvis html-widget html-fill-item\" id=\"htmlwidget-a23be7babeaed78bc39a\" style=\"width:100%;height:600px;\"></div>\n<script type=\"application/json\" data-for=\"htmlwidget-a23be7babeaed78bc39a\">{\"x\":{\"message\":{\"prof\":{\"time\":[1,1,1,2,2,3,3,4,4,5,5,6,6,6],\"depth\":[3,2,1,2,1,2,1,2,1,2,1,3,2,1],\"label\":[\"Rprof\",\"profvis::profvis\",\".main\",\"c\",\".main\",\"c\",\".main\",\"c\",\".main\",\"c\",\".main\",\"<GC>\",\"c\",\".main\"],\"filenum\":[null,null,null,null,null,null,null,null,null,null,null,null,null,null],\"linenum\":[null,null,null,null,null,null,null,null,null,null,null,null,null,null],\"memalloc\":[10.79654693603516,10.79654693603516,10.79654693603516,45.85549163818359,45.85549163818359,27.03416442871094,27.03416442871094,54.31883239746094,54.31883239746094,30.6883544921875,30.6883544921875,36.67101287841797,36.67101287841797,36.67101287841797],\"meminc\":[0,0,0,35.05894470214844,0,-18.82132720947266,0,27.28466796875,0,-23.63047790527344,0,5.982658386230469,0,0],\"filename\":[null,null,null,null,null,null,null,null,null,null,null,null,null,null]},\"interval\":10,\"files\":[],\"prof_output\":\"C:\\\\Users\\\\jonth\\\\AppData\\\\Local\\\\Temp\\\\RtmpgFLX7B\\\\file8a24629a5d4c.prof\",\"highlight\":{\"output\":[\"^output\\\\$\"],\"gc\":[\"^<GC>$\"],\"stacktrace\":[\"^\\\\.\\\\.stacktraceo(n|ff)\\\\.\\\\.$\"]},\"split\":\"h\"}},\"evals\":[],\"jsHooks\":[]}</script>\n```\n\n:::\n:::\n\n\nYou can figure out what is the source of the problem by looking at the memory column. In this case, **copy-on-modify** acts in each iteration of the loop creating another copy of x.\n\n\n**Limitations**\n\n- Profiling does not extend to C code\n- Anonymous functions are hard to figure out\n- Arguments are evaluated inside another function\n\n\n### Exercise\n\n::: {.cell}\n\n```{.r .cell-code}\nprofvis::profvis({\n f <- function(n = 1e5) {\n x <- rep(1, n)\n rm(x)\n}\n},torture = TRUE)\n```\n:::\n\n\n ?rm()\n \n[solution](https://advanced-r-solutions.rbind.io/measuring-performance.html) \n \n## Microbenchmarking\n\n\n*Measurement of the performance of a very small piece of code* is useful for comparing small snippets of code for specific tasks.\n\n\n::: {.cell layout-align=\"center\"}\n::: {.cell-output-display}\n{fig-align='center' width=159}\n:::\n:::\n\n\n\nThe {bench} package uses a high precision time.\n\n bench::mark()\n \n \n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(bench)\nx <- runif(100)\n(lb <- bench::mark(\n sqrt(x),\n x ^ 0.5\n))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> # A tibble: 2 × 6\n#> expression min median `itr/sec` mem_alloc `gc/sec`\n#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>\n#> 1 sqrt(x) 300ns 400ns 1977692. 848B 0\n#> 2 x^0.5 2.9µs 3µs 321570. 848B 0\n```\n\n\n:::\n:::\n\n- heavily right-skewed distribution\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nrequire(ggbeeswarm)\n```\n\n::: {.cell-output .cell-output-stderr}\n\n```\n#> Loading required package: ggbeeswarm\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\n#> Loading required package: ggplot2\n```\n\n\n:::\n\n```{.r .cell-code}\nplot(lb)\n```\n\n::: {.cell-output-display}\n{width=672}\n:::\n:::\n\n\n\n## Resources\n\n- [profvis package](https://rstudio.github.io/profvis/)\n- [bench package](https://cran.r-project.org/web/packages/bench/bench.pdf)\n- [solutions](https://advanced-r-solutions.rbind.io/measuring-performance.html)\n", "supporting": [ "23_files" ], diff --git a/_freeze/slides/23/figure-html/unnamed-chunk-10-1.png b/_freeze/slides/23/figure-html/unnamed-chunk-10-1.png Binary files differ. diff --git a/_freeze/slides/24/execute-results/html.json b/_freeze/slides/24/execute-results/html.json @@ -1,9 +1,11 @@ { - "hash": "afa59feb4bf4c77dbe6b3312d94087e3", + "hash": "355b584146ead093c1e2e038c1794779", "result": { "engine": "knitr", - "markdown": "---\nengine: knitr\ntitle: Improving performance\n---\n\n## Overview\n\n1. Code organization\n2. Check for existing solutions\n3. Do as little as possible\n4. Vectorise\n5. Avoid Copies\n\n## Organizing code\n\n- Write a function for each approach\n\n::: {.cell}\n\n```{.r .cell-code}\nmean1 <- function(x) mean(x)\nmean2 <- function(x) sum(x) / length(x)\n```\n:::\n\n- Keep old functions that you've tried, even the failures\n- Generate a representative test case\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- runif(1e5)\n```\n:::\n\n- Use `bench::mark` to compare the different versions (and include unit tests)\n\n::: {.cell}\n\n```{.r .cell-code}\nbench::mark(\n mean1(x),\n mean2(x)\n)[c(\"expression\", \"min\", \"median\", \"itr/sec\", \"n_gc\")]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> # A tibble: 2 × 4\n#> expression min median `itr/sec`\n#> <bch:expr> <bch:tm> <bch:tm> <dbl>\n#> 1 mean1(x) 122.6µs 131.7µs 7224.\n#> 2 mean2(x) 60.4µs 67.4µs 13986.\n```\n\n\n:::\n:::\n\n\n## Check for Existing Solution\n- CRAN task views (http://cran.rstudio.com/web/views/)\n- Reverse dependencies of Rcpp (https://cran.r-project.org/web/packages/Rcpp/)\n- Talk to others!\n - Google (rseek)\n - Stackoverflow ([R])\n - https://community.rstudio.com/\n - DSLC community\n\n## Do as little as possible\n- use a function tailored to a more specific type of input or output, or to a more specific problem\n - `rowSums()`, `colSums()`, `rowMeans()`, and `colMeans()` are faster than equivalent invocations that use `apply()` because they are vectorised\n - `vapply()` is faster than `sapply()` because it pre-specifies the output type\n - `any(x == 10)` is much faster than `10 %in% x` because testing equality is simpler than testing set inclusion\n- Some functions coerce their inputs into a specific type. If your input is not the right type, the function has to do extra work\n - e.g. `apply()` will always turn a dataframe into a matrix\n- Other examples\n - `read.csv()`: specify known column types with `colClasses`. (Also consider\n switching to `readr::read_csv()` or `data.table::fread()` which are \n considerably faster than `read.csv()`.)\n\n - `factor()`: specify known levels with `levels`.\n\n - `cut()`: don't generate labels with `labels = FALSE` if you don't need them,\n or, even better, use `findInterval()` as mentioned in the \"see also\" section\n of the documentation.\n \n - `unlist(x, use.names = FALSE)` is much faster than `unlist(x)`.\n\n - `interaction()`: if you only need combinations that exist in the data, use\n `drop = TRUE`.\n \n## Avoiding Method Dispatch\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- runif(1e2)\nbench::mark(\n mean(x),\n mean.default(x)\n)[c(\"expression\", \"min\", \"median\", \"itr/sec\", \"n_gc\")]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> # A tibble: 2 × 4\n#> expression min median `itr/sec`\n#> <bch:expr> <bch:tm> <bch:tm> <dbl>\n#> 1 mean(x) 2.4µs 3.1µs 284100.\n#> 2 mean.default(x) 800ns 1.2µs 822341.\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- runif(1e2)\nbench::mark(\n mean(x),\n mean.default(x),\n .Internal(mean(x))\n)[c(\"expression\", \"min\", \"median\", \"itr/sec\", \"n_gc\")]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> # A tibble: 3 × 4\n#> expression min median `itr/sec`\n#> <bch:expr> <bch:tm> <bch:tm> <dbl>\n#> 1 mean(x) 2.5µs 3.5µs 270278.\n#> 2 mean.default(x) 900ns 1.2µs 837472.\n#> 3 .Internal(mean(x)) 100ns 200ns 4738440.\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- runif(1e4)\nbench::mark(\n mean(x),\n mean.default(x),\n .Internal(mean(x))\n)[c(\"expression\", \"min\", \"median\", \"itr/sec\", \"n_gc\")]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> # A tibble: 3 × 4\n#> expression min median `itr/sec`\n#> <bch:expr> <bch:tm> <bch:tm> <dbl>\n#> 1 mean(x) 14.2µs 16µs 60044.\n#> 2 mean.default(x) 12.7µs 14.6µs 65783.\n#> 3 .Internal(mean(x)) 12µs 13.2µs 75085.\n```\n\n\n:::\n:::\n\n\n## Avoiding Input Coercion\n- `as.data.frame()` is quite slow because it coerces each element into a data frame and then `rbind()`s them together\n- instead, if you have a named list with vectors of equal length, you can directly transform it into a data frame\n\n\n::: {.cell}\n\n```{.r .cell-code}\nquickdf <- function(l) {\n class(l) <- \"data.frame\"\n attr(l, \"row.names\") <- .set_row_names(length(l[[1]]))\n l\n}\nl <- lapply(1:26, function(i) runif(1e3))\nnames(l) <- letters\nbench::mark(\n as.data.frame = as.data.frame(l),\n quick_df = quickdf(l)\n)[c(\"expression\", \"min\", \"median\", \"itr/sec\", \"n_gc\")]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> # A tibble: 2 × 4\n#> expression min median `itr/sec`\n#> <bch:expr> <bch:tm> <bch:tm> <dbl>\n#> 1 as.data.frame 538.2µs 686.2µs 1359.\n#> 2 quick_df 3.2µs 4.6µs 199551.\n```\n\n\n:::\n:::\n\n\n*Caveat!* This method is fast because it's dangerous!\n\n## Vectorise\n- vectorisation means finding the existing R function that is implemented in C and most closely applies to your problem\n- Vectorised functions that apply to many scenarios\n - `rowSums()`, `colSums()`, `rowMeans()`, and `colMeans()`\n - Vectorised subsetting can lead to big improvements in speed\n - `cut()` and `findInterval()` for converting continuous variables to categorical\n - Be aware of vectorised functions like `cumsum()` and `diff()`\n - Matrix algebra is a general example of vectorisation\n\n## Avoiding copies\n\n- Whenever you use c(), append(), cbind(), rbind(), or paste() to create a bigger object, R must first allocate space for the new object and then copy the old object to its new home.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nrandom_string <- function() {\n paste(sample(letters, 50, replace = TRUE), collapse = \"\")\n}\nstrings10 <- replicate(10, random_string())\nstrings100 <- replicate(100, random_string())\ncollapse <- function(xs) {\n out <- \"\"\n for (x in xs) {\n out <- paste0(out, x)\n }\n out\n}\nbench::mark(\n loop10 = collapse(strings10),\n loop100 = collapse(strings100),\n vec10 = paste(strings10, collapse = \"\"),\n vec100 = paste(strings100, collapse = \"\"),\n check = FALSE\n)[c(\"expression\", \"min\", \"median\", \"itr/sec\", \"n_gc\")]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> # A tibble: 4 × 4\n#> expression min median `itr/sec`\n#> <bch:expr> <bch:tm> <bch:tm> <dbl>\n#> 1 loop10 13.3µs 17.9µs 52625.\n#> 2 loop100 376.7µs 470.5µs 2074.\n#> 3 vec10 2.2µs 2.9µs 328368.\n#> 4 vec100 13.5µs 16.4µs 58208.\n```\n\n\n:::\n:::\n\n\n## Case study: t-test\n\n\n::: {.cell}\n\n```{.r .cell-code}\nm <- 1000\nn <- 50\nX <- matrix(rnorm(m * n, mean = 10, sd = 3), nrow = m)\ngrp <- rep(1:2, each = n / 2)\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# formula interface\nsystem.time(\n for (i in 1:m) {\n t.test(X[i, ] ~ grp)$statistic\n }\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> user system elapsed \n#> 0.30 0.00 0.33\n```\n\n\n:::\n\n```{.r .cell-code}\n# provide two vectors\nsystem.time(\n for (i in 1:m) {\n t.test(X[i, grp == 1], X[i, grp == 2])$statistic\n }\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> user system elapsed \n#> 0.08 0.00 0.10\n```\n\n\n:::\n:::\n\n\nAdd functionality to save values\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncompT <- function(i){\n t.test(X[i, grp == 1], X[i, grp == 2])$statistic\n}\nsystem.time(t1 <- purrr::map_dbl(1:m, compT))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> user system elapsed \n#> 0.09 0.00 0.08\n```\n\n\n:::\n:::\n\n\nIf you look at the source code of `stats:::t.test.default()`, you’ll see that it does a lot more than just compute the t-statistic.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Do less work\nmy_t <- function(x, grp) {\n t_stat <- function(x) {\n m <- mean(x)\n n <- length(x)\n var <- sum((x - m) ^ 2) / (n - 1)\n list(m = m, n = n, var = var)\n }\n g1 <- t_stat(x[grp == 1])\n g2 <- t_stat(x[grp == 2])\n se_total <- sqrt(g1$var / g1$n + g2$var / g2$n)\n (g1$m - g2$m) / se_total\n}\nsystem.time(t2 <- purrr::map_dbl(1:m, ~ my_t(X[.,], grp)))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> user system elapsed \n#> 0.03 0.00 0.01\n```\n\n\n:::\n\n```{.r .cell-code}\nstopifnot(all.equal(t1, t2))\n```\n:::\n\n\nThis gives us a six-fold speed improvement!\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Vectorise it\nrowtstat <- function(X, grp){\n t_stat <- function(X) {\n m <- rowMeans(X)\n n <- ncol(X)\n var <- rowSums((X - m) ^ 2) / (n - 1)\n list(m = m, n = n, var = var)\n }\n g1 <- t_stat(X[, grp == 1])\n g2 <- t_stat(X[, grp == 2])\n se_total <- sqrt(g1$var / g1$n + g2$var / g2$n)\n (g1$m - g2$m) / se_total\n}\nsystem.time(t3 <- rowtstat(X, grp))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> user system elapsed \n#> 0.00 0.00 0.02\n```\n\n\n:::\n\n```{.r .cell-code}\nstopifnot(all.equal(t1, t3))\n```\n:::\n\n\n1000 times faster than when we started!\n\n## Other techniques\n* [Read R blogs](http://www.r-bloggers.com/) to see what performance\n problems other people have struggled with, and how they have made their\n code faster.\n\n* Read other R programming books, like The Art of R Programming or Patrick Burns'\n [_R Inferno_](http://www.burns-stat.com/documents/books/the-r-inferno/) to\n learn about common traps.\n\n* Take an algorithms and data structure course to learn some\n well known ways of tackling certain classes of problems. I have heard\n good things about Princeton's\n [Algorithms course](https://www.coursera.org/course/algs4partI) offered on\n Coursera.\n \n* Learn how to parallelise your code. Two places to start are\n Parallel R and Parallel Computing for Data Science\n\n* Read general books about optimisation like Mature optimisation\n or the Pragmatic Programmer\n \n* Read more R code. StackOverflow, R Mailing List, DSLC, GitHub, etc.\n\n## Meeting Videos\n\n### Cohort 1\n\n(no video)\n\n### Cohort 2\n\n<iframe src=\"https://www.youtube.com/embed/fSdAqlkeq6I\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 3\n\n<iframe src=\"https://www.youtube.com/embed/yCkvUcT7wW8\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 4\n\n<iframe src=\"https://www.youtube.com/embed/LCaqvuv3JNg\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 5\n\n<iframe src=\"https://www.youtube.com/embed/pOaiDK7J7EE\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 6\n\n<iframe src=\"https://www.youtube.com/embed/UaXimKd3vg8\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n<details>\n<summary> Meeting chat log </summary>\n\n```\n00:24:42\tArthur Shaw:\tI wonder if there's a task view for R Universe: https://r-universe.dev/search/\n01:01:13\tArthur Shaw:\thttps://www.alexejgossmann.com/benchmarking_r/\n01:04:34\tTrevin:\tI agree that the chapter is a good jumping off point. Gonna have to dig into some of the listed resources 😄\n```\n</details>\n\n### Cohort 7\n\n<iframe src=\"https://www.youtube.com/embed/rOkrHvN8Uqg\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n<details>\n\n<summary>Meeting chat log</summary>\n```\n00:23:48\tRon Legere:\thttps://www.mathworks.com/help/matlab/matlab_prog/vectorization.html\n```\n</details>\n", - "supporting": [], + "markdown": "---\nengine: knitr\ntitle: Improving performance\n---\n\n## Overview\n\n1. Code organization\n2. Check for existing solutions\n3. Do as little as possible\n4. Vectorise\n5. Avoid Copies\n\n## Organizing code\n\n- Write a function for each approach\n\n::: {.cell}\n\n```{.r .cell-code}\nmean1 <- function(x) mean(x)\nmean2 <- function(x) sum(x) / length(x)\n```\n:::\n\n- Keep old functions that you've tried, even the failures\n- Generate a representative test case\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- runif(1e5)\n```\n:::\n\n- Use `bench::mark` to compare the different versions (and include unit tests)\n\n::: {.cell}\n\n```{.r .cell-code}\nbench::mark(\n mean1(x),\n mean2(x)\n)[c(\"expression\", \"min\", \"median\", \"itr/sec\", \"n_gc\")]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> # A tibble: 2 × 4\n#> expression min median `itr/sec`\n#> <bch:expr> <bch:tm> <bch:tm> <dbl>\n#> 1 mean1(x) 146.4µs 166.8µs 5839.\n#> 2 mean2(x) 67.1µs 75.7µs 12950.\n```\n\n\n:::\n:::\n\n\n## Check for Existing Solution\n- CRAN task views (http://cran.rstudio.com/web/views/)\n- Reverse dependencies of Rcpp (https://cran.r-project.org/web/packages/Rcpp/)\n- Talk to others!\n - Google (rseek)\n - Stackoverflow ([R])\n - https://community.rstudio.com/\n - DSLC community\n\n## Do as little as possible\n- use a function tailored to a more specific type of input or output, or to a more specific problem\n - `rowSums()`, `colSums()`, `rowMeans()`, and `colMeans()` are faster than equivalent invocations that use `apply()` because they are vectorised\n - `vapply()` is faster than `sapply()` because it pre-specifies the output type\n - `any(x == 10)` is much faster than `10 %in% x` because testing equality is simpler than testing set inclusion\n- Some functions coerce their inputs into a specific type. If your input is not the right type, the function has to do extra work\n - e.g. `apply()` will always turn a dataframe into a matrix\n- Other examples\n - `read.csv()`: specify known column types with `colClasses`. (Also consider\n switching to `readr::read_csv()` or `data.table::fread()` which are \n considerably faster than `read.csv()`.)\n\n - `factor()`: specify known levels with `levels`.\n\n - `cut()`: don't generate labels with `labels = FALSE` if you don't need them,\n or, even better, use `findInterval()` as mentioned in the \"see also\" section\n of the documentation.\n \n - `unlist(x, use.names = FALSE)` is much faster than `unlist(x)`.\n\n - `interaction()`: if you only need combinations that exist in the data, use\n `drop = TRUE`.\n \n## Avoiding Method Dispatch\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- runif(1e2)\nbench::mark(\n mean(x),\n mean.default(x)\n)[c(\"expression\", \"min\", \"median\", \"itr/sec\", \"n_gc\")]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> # A tibble: 2 × 4\n#> expression min median `itr/sec`\n#> <bch:expr> <bch:tm> <bch:tm> <dbl>\n#> 1 mean(x) 2.8µs 3.1µs 295383.\n#> 2 mean.default(x) 900ns 1.1µs 839532.\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- runif(1e2)\nbench::mark(\n mean(x),\n mean.default(x),\n .Internal(mean(x))\n)[c(\"expression\", \"min\", \"median\", \"itr/sec\", \"n_gc\")]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> # A tibble: 3 × 4\n#> expression min median `itr/sec`\n#> <bch:expr> <bch:tm> <bch:tm> <dbl>\n#> 1 mean(x) 2.7µs 2.9µs 310733.\n#> 2 mean.default(x) 900ns 1.1µs 849192.\n#> 3 .Internal(mean(x)) 100ns 200ns 4495392.\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- runif(1e4)\nbench::mark(\n mean(x),\n mean.default(x),\n .Internal(mean(x))\n)[c(\"expression\", \"min\", \"median\", \"itr/sec\", \"n_gc\")]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> # A tibble: 3 × 4\n#> expression min median `itr/sec`\n#> <bch:expr> <bch:tm> <bch:tm> <dbl>\n#> 1 mean(x) 16.6µs 17.3µs 55604.\n#> 2 mean.default(x) 14.4µs 14.9µs 64958.\n#> 3 .Internal(mean(x)) 13.6µs 13.7µs 70368.\n```\n\n\n:::\n:::\n\n\n## Avoiding Input Coercion\n- `as.data.frame()` is quite slow because it coerces each element into a data frame and then `rbind()`s them together\n- instead, if you have a named list with vectors of equal length, you can directly transform it into a data frame\n\n\n::: {.cell}\n\n```{.r .cell-code}\nquickdf <- function(l) {\n class(l) <- \"data.frame\"\n attr(l, \"row.names\") <- .set_row_names(length(l[[1]]))\n l\n}\nl <- lapply(1:26, function(i) runif(1e3))\nnames(l) <- letters\nbench::mark(\n as.data.frame = as.data.frame(l),\n quick_df = quickdf(l)\n)[c(\"expression\", \"min\", \"median\", \"itr/sec\", \"n_gc\")]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> # A tibble: 2 × 4\n#> expression min median `itr/sec`\n#> <bch:expr> <bch:tm> <bch:tm> <dbl>\n#> 1 as.data.frame 589.5µs 656.2µs 1447.\n#> 2 quick_df 3.7µs 4.3µs 207929.\n```\n\n\n:::\n:::\n\n\n*Caveat!* This method is fast because it's dangerous!\n\n## Vectorise\n- vectorisation means finding the existing R function that is implemented in C and most closely applies to your problem\n- Vectorised functions that apply to many scenarios\n - `rowSums()`, `colSums()`, `rowMeans()`, and `colMeans()`\n - Vectorised subsetting can lead to big improvements in speed\n - `cut()` and `findInterval()` for converting continuous variables to categorical\n - Be aware of vectorised functions like `cumsum()` and `diff()`\n - Matrix algebra is a general example of vectorisation\n\n## Avoiding copies\n\n- Whenever you use c(), append(), cbind(), rbind(), or paste() to create a bigger object, R must first allocate space for the new object and then copy the old object to its new home.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nrandom_string <- function() {\n paste(sample(letters, 50, replace = TRUE), collapse = \"\")\n}\nstrings10 <- replicate(10, random_string())\nstrings100 <- replicate(100, random_string())\ncollapse <- function(xs) {\n out <- \"\"\n for (x in xs) {\n out <- paste0(out, x)\n }\n out\n}\nbench::mark(\n loop10 = collapse(strings10),\n loop100 = collapse(strings100),\n vec10 = paste(strings10, collapse = \"\"),\n vec100 = paste(strings100, collapse = \"\"),\n check = FALSE\n)[c(\"expression\", \"min\", \"median\", \"itr/sec\", \"n_gc\")]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> # A tibble: 4 × 4\n#> expression min median `itr/sec`\n#> <bch:expr> <bch:tm> <bch:tm> <dbl>\n#> 1 loop10 17.1µs 18µs 52743.\n#> 2 loop100 460.3µs 491.5µs 1959.\n#> 3 vec10 2.7µs 2.9µs 317912.\n#> 4 vec100 16.3µs 18.5µs 51297.\n```\n\n\n:::\n:::\n\n\n## Case study: t-test\n\n\n::: {.cell}\n\n```{.r .cell-code}\nm <- 1000\nn <- 50\nX <- matrix(rnorm(m * n, mean = 10, sd = 3), nrow = m)\ngrp <- rep(1:2, each = n / 2)\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# formula interface\nsystem.time(\n for (i in 1:m) {\n t.test(X[i, ] ~ grp)$statistic\n }\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> user system elapsed \n#> 0.28 0.00 0.33\n```\n\n\n:::\n\n```{.r .cell-code}\n# provide two vectors\nsystem.time(\n for (i in 1:m) {\n t.test(X[i, grp == 1], X[i, grp == 2])$statistic\n }\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> user system elapsed \n#> 0.08 0.00 0.08\n```\n\n\n:::\n:::\n\n\nAdd functionality to save values\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncompT <- function(i){\n t.test(X[i, grp == 1], X[i, grp == 2])$statistic\n}\nsystem.time(t1 <- purrr::map_dbl(1:m, compT))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> user system elapsed \n#> 0.09 0.00 0.09\n```\n\n\n:::\n:::\n\n\nIf you look at the source code of `stats:::t.test.default()`, you’ll see that it does a lot more than just compute the t-statistic.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Do less work\nmy_t <- function(x, grp) {\n t_stat <- function(x) {\n m <- mean(x)\n n <- length(x)\n var <- sum((x - m) ^ 2) / (n - 1)\n list(m = m, n = n, var = var)\n }\n g1 <- t_stat(x[grp == 1])\n g2 <- t_stat(x[grp == 2])\n se_total <- sqrt(g1$var / g1$n + g2$var / g2$n)\n (g1$m - g2$m) / se_total\n}\nsystem.time(t2 <- purrr::map_dbl(1:m, ~ my_t(X[.,], grp)))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> user system elapsed \n#> 0.02 0.00 0.01\n```\n\n\n:::\n\n```{.r .cell-code}\nstopifnot(all.equal(t1, t2))\n```\n:::\n\n\nThis gives us a six-fold speed improvement!\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Vectorise it\nrowtstat <- function(X, grp){\n t_stat <- function(X) {\n m <- rowMeans(X)\n n <- ncol(X)\n var <- rowSums((X - m) ^ 2) / (n - 1)\n list(m = m, n = n, var = var)\n }\n g1 <- t_stat(X[, grp == 1])\n g2 <- t_stat(X[, grp == 2])\n se_total <- sqrt(g1$var / g1$n + g2$var / g2$n)\n (g1$m - g2$m) / se_total\n}\nsystem.time(t3 <- rowtstat(X, grp))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> user system elapsed \n#> 0.02 0.00 0.02\n```\n\n\n:::\n\n```{.r .cell-code}\nstopifnot(all.equal(t1, t3))\n```\n:::\n\n\n1000 times faster than when we started!\n\n## Other techniques\n* [Read R blogs](http://www.r-bloggers.com/) to see what performance\n problems other people have struggled with, and how they have made their\n code faster.\n\n* Read other R programming books, like The Art of R Programming or Patrick Burns'\n [_R Inferno_](http://www.burns-stat.com/documents/books/the-r-inferno/) to\n learn about common traps.\n\n* Take an algorithms and data structure course to learn some\n well known ways of tackling certain classes of problems. I have heard\n good things about Princeton's\n [Algorithms course](https://www.coursera.org/course/algs4partI) offered on\n Coursera.\n \n* Learn how to parallelise your code. Two places to start are\n Parallel R and Parallel Computing for Data Science\n\n* Read general books about optimisation like Mature optimisation\n or the Pragmatic Programmer\n \n* Read more R code. StackOverflow, R Mailing List, DSLC, GitHub, etc.\n", + "supporting": [ + "24_files" + ], "filters": [ "rmarkdown/pagebreak.lua" ], diff --git a/_freeze/slides/25/execute-results/html.json b/_freeze/slides/25/execute-results/html.json @@ -1,9 +1,11 @@ { - "hash": "e13aade2d23c86d6c088e495237b1d65", + "hash": "842c1c5354574cc08224d4899d85eeba", "result": { "engine": "knitr", - "markdown": "---\nengine: knitr\ntitle: Rewriting R code in C++\n---\n\n## Learning objectives:\n\n- Learn to improve performance by rewriting bottlenecks in C++\n\n- Introduction to the [{Rcpp} package](https://www.rcpp.org/)\n\n## Introduction\n\nIn this chapter we'll learn how to rewrite **R** code in **C++** to make it faster using the **Rcpp package**. The **Rcpp** package makes it simple to connect C++ to R! With C++ you can fix:\n\n- Loops that can't be easily vectorised because subsequent iterations depend on previous ones.\n\n- Recursive functions, or problems which involve calling functions millions of times. The overhead of calling a function in C++ is much lower than in R.\n\n- Problems that require advanced data structures and algorithms that R doesn't provide. Through the **standard template library (STL)**, C++ has efficient implementations of many important data structures, from ordered maps to double-ended queue\n\n<center>Like how?</center>\n\n<center> </center>\n\n<center></center>\n\n \n\n## Getting started with C++\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(Rcpp)\n```\n:::\n\n\nInstall a C++ compiler:\n\n- Rtools, on Windows\n- Xcode, on Mac\n- Sudo apt-get install r-base-dev or similar, on Linux.\n\n\n### First example {-}\n\nRcpp compiling the C++ code:\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncppFunction('int add(int x, int y, int z) {\n int sum = x + y + z;\n return sum;\n}')\n# add works like a regular R function\nadd\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> function (x, y, z) \n#> .Call(<pointer: 0x00007ff9d7b215f0>, x, y, z)\n```\n\n\n:::\n\n```{.r .cell-code}\nadd(1, 2, 3)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 6\n```\n\n\n:::\n:::\n\n\nSome things to note:\n\n\n- The syntax to create a function is different.\n- Types of inputs and outputs must be explicitly declared\n- Use = for assignment, not `<-`.\n- Every statement is terminated by a ;\n- C++ has it's own name for the types we are used to:\n - scalar types are `int`, `double`, `bool` and `String`\n - vector types (for Rcpp) are `IntegerVector`, `NumericVector`, `LogicalVector` and `CharacterVector`\n - Other R types are available in C++: `List`, `Function`, `DataFrame`, and more.\n \n- Explicitly use a `return` statement to return a value from a function.\n\n \n\n## Example with scalar input and output {-}\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsignR <- function(x) {\n if (x > 0) {\n 1\n } else if (x == 0) {\n 0\n } else {\n -1\n }\n}\n\na <- -0.5\nb <- 0.5\nc <- 0\nsignR(c)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 0\n```\n\n\n:::\n:::\n\n\nTranslation:\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncppFunction('int signC(int x) {\n if (x > 0) {\n return 1;\n } else if (x == 0) {\n return 0;\n } else {\n return -1;\n }\n}')\n```\n:::\n\n\n* Note that the `if` syntax is identical! Not everything is different!\n\n## Vector Input, Scalar output:{-}\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsumR <- function(x) {\n total <- 0\n for (i in seq_along(x)) {\n total <- total + x[i]\n }\n total\n}\n\nx<- runif(100)\nsumR(x)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 48.57133\n```\n\n\n:::\n:::\n\n\nTranslation:\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncppFunction('double sumC(NumericVector x) {\n int n = x.size();\n double total = 0;\n for(int i = 0; i < n; ++i) {\n total += x[i];\n }\n return total;\n}')\n```\n:::\n\n\nSome observations:\n\n- vector indices *start at 0*\n- The for statement has a different syntax: for(init; check; increment)\n- Methods are called with `.`\n- `total += x[i]` is equivalent to `total = total + x[i]`.\n- other in-place operators are `-=`, `*=`, `and /=`\n\n\nTo check for the fastest way we can use:\n\n\n::: {.cell}\n\n```{.r .cell-code}\n?bench::mark\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- runif(1e3)\nbench::mark(\n sum(x),\n sumC(x),\n sumR(x)\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> # A tibble: 3 × 6\n#> expression min median `itr/sec` mem_alloc `gc/sec`\n#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>\n#> 1 sum(x) 700ns 800ns 1176844. 0B 0\n#> 2 sumC(x) 800ns 1.1µs 865688. 0B 0\n#> 3 sumR(x) 13.8µs 19.7µs 48653. 0B 0\n```\n\n\n:::\n:::\n\n\n## Vector input and output {-}\n\n\n::: {.cell}\n\n```{.r .cell-code}\npdistR <- function(x, ys) {\n sqrt((x - ys) ^ 2)\n}\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncppFunction('NumericVector pdistC(double x, NumericVector ys) {\n int n = ys.size();\n NumericVector out(n);\n\n for(int i = 0; i < n; ++i) {\n out[i] = sqrt(pow(ys[i] - x, 2.0));\n }\n return out;\n}')\n```\n:::\n\n\nNote: uses `pow()`, not `^`, for exponentiation\n\n\n::: {.cell}\n\n```{.r .cell-code}\ny <- runif(1e6)\nbench::mark(\n pdistR(0.5, y),\n pdistC(0.5, y)\n)[1:6]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> # A tibble: 2 × 6\n#> expression min median `itr/sec` mem_alloc `gc/sec`\n#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>\n#> 1 pdistR(0.5, y) 3.23ms 3.69ms 268. 7.63MB 136.\n#> 2 pdistC(0.5, y) 1.66ms 2.08ms 473. 7.63MB 176.\n```\n\n\n:::\n:::\n\n\n## Source your C++ code {-}\n\nSource stand-alone C++ files into R using `sourceCpp()`\n\n\nC++ files have extension `.cpp`\n\n```\n#include <Rcpp.h>\nusing namespace Rcpp;\n```\n\nAnd for each function that you want available within R, you need to prefix it with:\n\n```\n// [[Rcpp::export]]\n```\n\nInside a cpp file you can include `R` code using special comments\n\n```\n/*** R\nrcode here\n*/\n```\n\n\n\n### Example {-}\n\nThis block in Rmarkdown uses `{Rcpp}` as a short hand for engine = \"Rcpp\". \n\n\n::: {.cell}\n\n```{.cpp .cell-code}\n#include <Rcpp.h>\nusing namespace Rcpp;\n\n// [[Rcpp::export]]\ndouble meanC(NumericVector x) {\n int n = x.size();\n double total = 0;\n\n for(int i = 0; i < n; ++i) {\n total += x[i];\n }\n return total / n;\n}\n\n/*** R\nx <- runif(1e5)\nbench::mark(\n mean(x),\n meanC(x)\n)\n*/\n```\n:::\n\n\nNOTE: For some reason although the r code above runs, `knit` doesn't include the output. Why?\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- runif(1e5)\nbench::mark(\n mean(x),\n meanC(x)\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> # A tibble: 2 × 6\n#> expression min median `itr/sec` mem_alloc `gc/sec`\n#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>\n#> 1 mean(x) 122.6µs 136.8µs 6910. 0B 0\n#> 2 meanC(x) 40.5µs 50.8µs 18955. 0B 0\n```\n\n\n:::\n:::\n\n\n\n\n## Data frames, functions, and attributes\n\n### Lists and Dataframes {-}\n\nContrived example to illustrate how to access a dataframe from c++:\n\n\n::: {.cell}\n\n```{.cpp .cell-code}\n#include <Rcpp.h>\nusing namespace Rcpp;\n\n// [[Rcpp::export]]\ndouble mpe(List mod) {\n if (!mod.inherits(\"lm\")) stop(\"Input must be a linear model\");\n\n NumericVector resid = as<NumericVector>(mod[\"residuals\"]);\n NumericVector fitted = as<NumericVector>(mod[\"fitted.values\"]);\n\n int n = resid.size();\n double err = 0;\n for(int i = 0; i < n; ++i) {\n err += resid[i] / (fitted[i] + resid[i]);\n }\n return err / n;\n}\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmod <- lm(mpg ~ wt, data = mtcars)\nmpe(mod)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] -0.01541615\n```\n\n\n:::\n:::\n\n\n- Note that you must *cast* the values to the required type. C++ needs to know the types in advance.\n\n### Functions {-}\n\n\n::: {.cell}\n\n```{.cpp .cell-code}\n#include <Rcpp.h>\nusing namespace Rcpp;\n\n// [[Rcpp::export]]\nRObject callWithOne(Function f) {\n return f(1);\n}\n```\n:::\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncallWithOne(function(x) x + 1)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 2\n```\n\n\n:::\n:::\n\n\n\n* Other values can be accessed from c++ including\n\n * attributes (use: `.attr()`. Also `.names()` is alias for name attribute.\n * `Environment`, `DottedPair`, `Language`, `Symbol` , etc. \n\n## Missing values\n\n### Missing values behave differently for C++ scalers{-}\n\n* Scalar NA's in Cpp : `NA_LOGICAL`, `NA_INTEGER`, `NA_REAL`, `NA_STRING`.\n\n* Integers (`int`) stores R NA's as the smallest integer. Better to use length 1 `IntegerVector`\n* Doubles use IEEE 754 NaN , which behaves a bit differently for logical expressions (but ok for math expressions). \n\n\n::: {.cell}\n\n```{.r .cell-code}\nevalCpp(\"NA_REAL || FALSE\")\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] TRUE\n```\n\n\n:::\n:::\n\n\n* Strings are a class from Rcpp, so they handle missing values fine.\n\n* `bool` can only hold two values, so be careful. Consider using vectors of length 1 or coercing to `int`\n\n\n### Vectors\n\n* Vectors are all type introduced by RCpp and know how to handle missing values if you use the specific type for that vector.\n\n\n::: {.cell}\n\n```{.cpp .cell-code}\n#include <Rcpp.h>\nusing namespace Rcpp;\n\n// [[Rcpp::export]]\nList missing_sampler() {\n return List::create(\n NumericVector::create(NA_REAL),\n IntegerVector::create(NA_INTEGER),\n LogicalVector::create(NA_LOGICAL),\n CharacterVector::create(NA_STRING)\n );\n}\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nstr(missing_sampler())\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> List of 4\n#> $ : num NA\n#> $ : int NA\n#> $ : logi NA\n#> $ : chr NA\n```\n\n\n:::\n:::\n\n\n## Standard Template Library\n\nSTL provides powerful data structures and algorithms for C++. \n\n### Iterators {-}\n\nIterators are used extensively in the STL to abstract away details of underlying data structures.\n\nIf you an iterator `it`, you can:\n\n- Get the value by 'dereferencing' with `*it`\n- Advance to the next value with `++it`\n- Compare iterators (locations) with `==`\n\n\n### Algorithms {-}\n\n* The real power of iterators comes from using them with STL algorithms. \n \n* A good reference is [https://en.cppreference.com/w/cpp/algorithm]\n\n* Book provides examples using `accumulate` and `upper_buond`\n\n* Another Example:\n\n\n::: {.cell}\n\n```{.cpp .cell-code}\n\n#include <algorithm>\n#include <Rcpp.h>\n\nusing namespace Rcpp;\n \n \n// Explicit iterator version\n \n// [[Rcpp::export]]\nNumericVector square_C_it(NumericVector x){\n NumericVector out(x.size());\n // Each container has its own iterator type\n NumericVector::iterator in_it;\n NumericVector::iterator out_it;\n \n for(in_it = x.begin(), out_it = out.begin(); in_it != x.end(); ++in_it, ++out_it) {\n *out_it = pow(*in_it,2);\n }\n \n return out;\n \n}\n \n \n// Use algorithm 'transform'\n \n// [[Rcpp::export]]\nNumericVector square_C(NumericVector x) {\n \n NumericVector out(x.size());\n \n \n std::transform(x.begin(),x.end(), out.begin(),\n [](double v) -> double { return v*v; });\n return out;\n}\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsquare_C(c(1.0,2.0,3.0))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1 4 9\n```\n\n\n:::\n:::\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsquare_C_it(c(1.0,2.0,3.0))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1 4 9\n```\n\n\n:::\n:::\n\n\n## Data Structures {-}\n\nSTL provides a large set of data structures. Some of the most important:\n\n* `std::vector` - like an `R` vector, except knows how to grow efficiently\n\n* `std::unordered_set` - unique set of values. Ordered version `std::set`. Unordered is more efficient.\n\n* `std::map` - Moslty similar to `R` lists, provide an association between a key and a value. There is also an unordered version. \n\nA quick example illustrating the `map`:\n\n\n::: {.cell}\n\n```{.cpp .cell-code}\n#include <Rcpp.h>\nusing namespace Rcpp;\n\n// [[Rcpp::export]]\nstd::map<double, int> tableC(NumericVector x) {\n // Note the types are <key, value>\n std::map<double, int> counts;\n\n int n = x.size();\n for (int i = 0; i < n; i++) {\n counts[x[i]]++;\n }\n\n return counts;\n}\n```\n:::\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nres = tableC(c(1,1,2,1,4,5))\nres\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> 1 2 4 5 \n#> 3 1 1 1\n```\n\n\n:::\n:::\n\n\n* Note that the map is converted to a named vector in this case on return\n\n \nTo learn more about the STL data structures see [containers](https://en.cppreference.com/w/cpp/container) at `cppreference`\n\n## Case Studies\n\n\n\nReal life uses of C++ to replace slow R code.\n\n\n## Case study 1: Gibbs sampler {-}\n\nThe [Gibbs sampler](https://en.wikipedia.org/wiki/Gibbs_sampling) is a method for estimating parameters expectations. It is a **MCMC algorithm** that has been adapted to sample from multidimensional target distributions. Gibbs sampling generates a **Markov chain** of samples, each of which is correlated with nearby samples. \n\n[Example blogged by Dirk Eddelbuettel](https://dirk.eddelbuettel.com/blog/2011/07/14/), the R and C++ code is very similar but runs about 20 times faster.\n\n> \"Darren Wilkinson stresses the rather pragmatic aspects of how fast and/or easy it is to write the code, rather than just the mere runtime.\n\n\n<center></center>\n\n\nR code:\n\n\n::: {.cell}\n\n```{.r .cell-code}\ngibbs_r <- function(N, thin) {\n mat <- matrix(nrow = N, ncol = 2)\n x <- y <- 0\n\n for (i in 1:N) {\n for (j in 1:thin) {\n x <- rgamma(1, 3, y * y + 4)\n y <- rnorm(1, 1 / (x + 1), 1 / sqrt(2 * (x + 1)))\n }\n mat[i, ] <- c(x, y)\n }\n mat\n}\n```\n:::\n\n\nActions to convert R to C++: \n\n- Add type declarations to all variables \n- Use `(` instead of `[` to index into the matrix \n- Subscript the results of `rgamma` and `rnorm` to convert from a vector into a scalar.\n\n\n::: {.cell}\n\n```{.cpp .cell-code}\n#include <Rcpp.h>\nusing namespace Rcpp;\n\n// [[Rcpp::export]]\nNumericMatrix gibbs_cpp(int N, int thin) {\n NumericMatrix mat(N, 2);\n double x = 0, y = 0;\n\n for(int i = 0; i < N; i++) {\n for(int j = 0; j < thin; j++) {\n x = rgamma(1, 3, 1 / (y * y + 4))[0];\n y = rnorm(1, 1 / (x + 1), 1 / sqrt(2 * (x + 1)))[0];\n }\n mat(i, 0) = x;\n mat(i, 1) = y;\n }\n\n return(mat);\n}\n```\n:::\n\n\nChecking who's best:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nbench::mark(\n gibbs_r(100, 10),\n gibbs_cpp(100, 10),\n check = FALSE\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> # A tibble: 2 × 6\n#> expression min median `itr/sec` mem_alloc `gc/sec`\n#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>\n#> 1 gibbs_r(100, 10) 1.37ms 1.88ms 517. 103.22KB 28.9\n#> 2 gibbs_cpp(100, 10) 148.5µs 211.3µs 4600. 1.61KB 29.6\n```\n\n\n:::\n:::\n\n\n## Case study 2: predict a model response from three inputs {-}\n\n[Rcpp is smoking fast for agent based models in data frames](https://gweissman.github.io/post/rcpp-is-smoking-fast-for-agent-based-models-in-data-frames/) by Gary Weissman, MD, MSHP.\n\nStarts with this code:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nvacc1a <- function(age, female, ily) {\n p <- 0.25 + 0.3 * 1 / (1 - exp(0.04 * age)) + 0.1 * ily\n p <- p * if (female) 1.25 else 0.75\n p <- max(0, p)\n p <- min(1, p)\n p\n}\n```\n:::\n\n\nR code with a for loop:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nvacc1 <- function(age, female, ily) {\n n <- length(age)\n out <- numeric(n)\n for (i in seq_len(n)) {\n out[i] <- vacc1a(age[i], female[i], ily[i])\n }\n out\n}\n```\n:::\n\n\nVectorized R code:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nvacc2 <- function(age, female, ily) {\n p <- 0.25 + 0.3 * 1 / (1 - exp(0.04 * age)) + 0.1 * ily\n p <- p * ifelse(female, 1.25, 0.75)\n p <- pmax(0, p)\n p <- pmin(1, p)\n p\n}\n```\n:::\n\n\nC++:\n\n\n::: {.cell}\n\n```{.cpp .cell-code}\n#include <Rcpp.h>\nusing namespace Rcpp;\n\ndouble vacc3a(double age, bool female, bool ily){\n double p = 0.25 + 0.3 * 1 / (1 - exp(0.04 * age)) + 0.1 * ily;\n p = p * (female ? 1.25 : 0.75);\n p = std::max(p, 0.0);\n p = std::min(p, 1.0);\n return p;\n}\n\n// [[Rcpp::export]]\nNumericVector vacc3(NumericVector age, LogicalVector female, \n LogicalVector ily) {\n int n = age.size();\n NumericVector out(n);\n\n for(int i = 0; i < n; ++i) {\n out[i] = vacc3a(age[i], female[i], ily[i]);\n }\n\n return out;\n}\n```\n:::\n\n\nSample data:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nn <- 1000\nage <- rnorm(n, mean = 50, sd = 10)\nfemale <- sample(c(T, F), n, rep = TRUE)\nily <- sample(c(T, F), n, prob = c(0.8, 0.2), rep = TRUE)\n\nstopifnot(\n all.equal(vacc1(age, female, ily), vacc2(age, female, ily)),\n all.equal(vacc1(age, female, ily), vacc3(age, female, ily))\n)\n```\n:::\n\n\n<center>**Who's faster?**</center>\n<center></center>\n\n\n::: {.cell}\n\n```{.r .cell-code}\nbench::mark(\n vacc1 = vacc1(age, female, ily),\n vacc2 = vacc2(age, female, ily),\n vacc3 = vacc3(age, female, ily)\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> # A tibble: 3 × 6\n#> expression min median `itr/sec` mem_alloc `gc/sec`\n#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>\n#> 1 vacc1 779.8µs 1.01ms 929. 7.86KB 45.6 \n#> 2 vacc2 48.1µs 61.7µs 15618. 146.68KB 31.1 \n#> 3 vacc3 30.8µs 38.7µs 25332. 11.98KB 5.07\n```\n\n\n:::\n:::\n\n\n## Resources\n\n- [Rcpp: Seamless R and C++ Integration](https:\\\\Rcpp.org)\n- [cpp-tutorial](https://www.learncpp.com) is often recommended. Lots of ads though!\n- [cpp-reference](https://en.cppreference.com/w/cpp)\n- [C++20 for Programmers](https://www.pearson.com/en-us/subject-catalog/p/c20-for-programmers-an-objects-natural-approach/P200000000211/9780137570461) is a newer book that covers modern c++ for people who know programming in another language.\n \n## Op Success!\n\n\n\n\n## Meeting Videos\n\n### Cohort 1\n\n<iframe src=\"https://www.youtube.com/embed/2JDeacWl1DM\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n<iframe src=\"https://www.youtube.com/embed/sLWCelHpcqc\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 2\n\n<iframe src=\"https://www.youtube.com/embed/rQwOosOJpaY\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 3\n\n<iframe src=\"https://www.youtube.com/embed/ZWdIeR1jK9Q\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 4\n\n<iframe src=\"https://www.youtube.com/embed/_K8DKF3Fzes\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 5\n\n<iframe src=\"https://www.youtube.com/embed/nske4iqsgh0\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n### Cohort 6\n\n<iframe src=\"https://www.youtube.com/embed/hyVK08jXiYw\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n<details>\n\n<summary>Meeting chat log</summary>\n```\n00:10:13\tArthur Shaw:\tDid things freeze for anyone else?\n00:55:40\tFederica Gazzelloni:\thttps://en.cppreference.com/w/cpp/container\n00:57:44\tFederica Gazzelloni:\thttps://dirk.eddelbuettel.com/blog/2011/07/14/\n01:07:33\tTrevin:\tI don’t have experience\n01:07:54\tOluwafemi Oyedele:\tSame here!!!\n01:11:57\tArthur Shaw:\tDoes anyone know any packages that use C++? The one that comes to mind for me is haven, which uses a C++ library\n01:12:30\tTrevin:\tWhen I was looking, one that stood out to me was rstan\n01:13:02\tArthur Shaw:\tReacted to \"When I was looking, ...\" with 👍\n```\n</details>\n\n### Cohort 7\n\n<iframe src=\"https://www.youtube.com/embed/Luu7JsixQgY\" width=\"100%\" height=\"400px\" data-external=\"1\"></iframe>\n\n<details>\n\n<summary>Meeting chat log</summary>\n```\n00:43:02\tGus Lipkin:\tI think I found the definition for `mean`\n\nAn R call goes to *a which then calls the C function *b\n\n*a: https://github.com/wch/r-source/blob/trunk/src/library/base/R/mean.R\n*b: https://github.com/wch/r-source/blob/trunk/src/library/stats/src/cov.c#L207\n\nIt looks like the second pass only happens if `R_FINITE(mean_from_first_pass)` which tries to call `isfinite` from C++ and if it’s not there, it’ll make sure it is a number and is not positive or negative infinity.\n00:49:55\tGus Lipkin:\tI feel bad for dropping in on the last chapter and getting Collin’s thanks 😅 I wish I’d joined sooner.\n```\n</details>\n", - "supporting": [], + "markdown": "---\nengine: knitr\ntitle: Rewriting R code in C++\n---\n\n## Learning objectives:\n\n- Learn to improve performance by rewriting bottlenecks in C++\n\n- Introduction to the [{Rcpp} package](https://www.rcpp.org/)\n\n## Introduction\n\nIn this chapter we'll learn how to rewrite **R** code in **C++** to make it faster using the **Rcpp package**. The **Rcpp** package makes it simple to connect C++ to R! With C++ you can fix:\n\n- Loops that can't be easily vectorised because subsequent iterations depend on previous ones.\n\n- Recursive functions, or problems which involve calling functions millions of times. The overhead of calling a function in C++ is much lower than in R.\n\n- Problems that require advanced data structures and algorithms that R doesn't provide. Through the **standard template library (STL)**, C++ has efficient implementations of many important data structures, from ordered maps to double-ended queue\n\n<center>Like how?</center>\n\n<center> </center>\n\n<center></center>\n\n \n\n## Getting started with C++\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(Rcpp)\n```\n:::\n\n\nInstall a C++ compiler:\n\n- Rtools, on Windows\n- Xcode, on Mac\n- Sudo apt-get install r-base-dev or similar, on Linux.\n\n\n### First example {-}\n\nRcpp compiling the C++ code:\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncppFunction('int add(int x, int y, int z) {\n int sum = x + y + z;\n return sum;\n}')\n# add works like a regular R function\nadd\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> function (x, y, z) \n#> .Call(<pointer: 0x00007ff9d7e315f0>, x, y, z)\n```\n\n\n:::\n\n```{.r .cell-code}\nadd(1, 2, 3)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 6\n```\n\n\n:::\n:::\n\n\nSome things to note:\n\n\n- The syntax to create a function is different.\n- Types of inputs and outputs must be explicitly declared\n- Use = for assignment, not `<-`.\n- Every statement is terminated by a ;\n- C++ has it's own name for the types we are used to:\n - scalar types are `int`, `double`, `bool` and `String`\n - vector types (for Rcpp) are `IntegerVector`, `NumericVector`, `LogicalVector` and `CharacterVector`\n - Other R types are available in C++: `List`, `Function`, `DataFrame`, and more.\n \n- Explicitly use a `return` statement to return a value from a function.\n\n \n\n## Example with scalar input and output {-}\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsignR <- function(x) {\n if (x > 0) {\n 1\n } else if (x == 0) {\n 0\n } else {\n -1\n }\n}\n\na <- -0.5\nb <- 0.5\nc <- 0\nsignR(c)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 0\n```\n\n\n:::\n:::\n\n\nTranslation:\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncppFunction('int signC(int x) {\n if (x > 0) {\n return 1;\n } else if (x == 0) {\n return 0;\n } else {\n return -1;\n }\n}')\n```\n:::\n\n\n* Note that the `if` syntax is identical! Not everything is different!\n\n## Vector Input, Scalar output:{-}\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsumR <- function(x) {\n total <- 0\n for (i in seq_along(x)) {\n total <- total + x[i]\n }\n total\n}\n\nx<- runif(100)\nsumR(x)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 51.16287\n```\n\n\n:::\n:::\n\n\nTranslation:\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncppFunction('double sumC(NumericVector x) {\n int n = x.size();\n double total = 0;\n for(int i = 0; i < n; ++i) {\n total += x[i];\n }\n return total;\n}')\n```\n:::\n\n\nSome observations:\n\n- vector indices *start at 0*\n- The for statement has a different syntax: for(init; check; increment)\n- Methods are called with `.`\n- `total += x[i]` is equivalent to `total = total + x[i]`.\n- other in-place operators are `-=`, `*=`, `and /=`\n\n\nTo check for the fastest way we can use:\n\n\n::: {.cell}\n\n```{.r .cell-code}\n?bench::mark\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- runif(1e3)\nbench::mark(\n sum(x),\n sumC(x),\n sumR(x)\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> # A tibble: 3 × 6\n#> expression min median `itr/sec` mem_alloc `gc/sec`\n#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>\n#> 1 sum(x) 700ns 1.6µs 611939. 0B 0\n#> 2 sumC(x) 900ns 1.7µs 469982. 0B 0\n#> 3 sumR(x) 14.2µs 27.4µs 33865. 0B 0\n```\n\n\n:::\n:::\n\n\n## Vector input and output {-}\n\n\n::: {.cell}\n\n```{.r .cell-code}\npdistR <- function(x, ys) {\n sqrt((x - ys) ^ 2)\n}\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncppFunction('NumericVector pdistC(double x, NumericVector ys) {\n int n = ys.size();\n NumericVector out(n);\n\n for(int i = 0; i < n; ++i) {\n out[i] = sqrt(pow(ys[i] - x, 2.0));\n }\n return out;\n}')\n```\n:::\n\n\nNote: uses `pow()`, not `^`, for exponentiation\n\n\n::: {.cell}\n\n```{.r .cell-code}\ny <- runif(1e6)\nbench::mark(\n pdistR(0.5, y),\n pdistC(0.5, y)\n)[1:6]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> # A tibble: 2 × 6\n#> expression min median `itr/sec` mem_alloc `gc/sec`\n#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>\n#> 1 pdistR(0.5, y) 4.37ms 8.76ms 119. 7.63MB 63.2\n#> 2 pdistC(0.5, y) 2.26ms 5.7ms 182. 7.63MB 87.5\n```\n\n\n:::\n:::\n\n\n## Source your C++ code {-}\n\nSource stand-alone C++ files into R using `sourceCpp()`\n\n\nC++ files have extension `.cpp`\n\n```\n#include <Rcpp.h>\nusing namespace Rcpp;\n```\n\nAnd for each function that you want available within R, you need to prefix it with:\n\n```\n// [[Rcpp::export]]\n```\n\nInside a cpp file you can include `R` code using special comments\n\n```\n/*** R\nrcode here\n*/\n```\n\n\n\n### Example {-}\n\nThis block in Rmarkdown uses `{Rcpp}` as a short hand for engine = \"Rcpp\". \n\n\n::: {.cell}\n\n```{.cpp .cell-code}\n#include <Rcpp.h>\nusing namespace Rcpp;\n\n// [[Rcpp::export]]\ndouble meanC(NumericVector x) {\n int n = x.size();\n double total = 0;\n\n for(int i = 0; i < n; ++i) {\n total += x[i];\n }\n return total / n;\n}\n\n/*** R\nx <- runif(1e5)\nbench::mark(\n mean(x),\n meanC(x)\n)\n*/\n```\n:::\n\n\nNOTE: For some reason although the r code above runs, `knit` doesn't include the output. Why?\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- runif(1e5)\nbench::mark(\n mean(x),\n meanC(x)\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> # A tibble: 2 × 6\n#> expression min median `itr/sec` mem_alloc `gc/sec`\n#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>\n#> 1 mean(x) 122.8µs 234µs 4084. 0B 2.02\n#> 2 meanC(x) 40.4µs 114µs 9270. 0B 0\n```\n\n\n:::\n:::\n\n\n\n\n## Data frames, functions, and attributes\n\n### Lists and Dataframes {-}\n\nContrived example to illustrate how to access a dataframe from c++:\n\n\n::: {.cell}\n\n```{.cpp .cell-code}\n#include <Rcpp.h>\nusing namespace Rcpp;\n\n// [[Rcpp::export]]\ndouble mpe(List mod) {\n if (!mod.inherits(\"lm\")) stop(\"Input must be a linear model\");\n\n NumericVector resid = as<NumericVector>(mod[\"residuals\"]);\n NumericVector fitted = as<NumericVector>(mod[\"fitted.values\"]);\n\n int n = resid.size();\n double err = 0;\n for(int i = 0; i < n; ++i) {\n err += resid[i] / (fitted[i] + resid[i]);\n }\n return err / n;\n}\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmod <- lm(mpg ~ wt, data = mtcars)\nmpe(mod)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] -0.01541615\n```\n\n\n:::\n:::\n\n\n- Note that you must *cast* the values to the required type. C++ needs to know the types in advance.\n\n### Functions {-}\n\n\n::: {.cell}\n\n```{.cpp .cell-code}\n#include <Rcpp.h>\nusing namespace Rcpp;\n\n// [[Rcpp::export]]\nRObject callWithOne(Function f) {\n return f(1);\n}\n```\n:::\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncallWithOne(function(x) x + 1)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 2\n```\n\n\n:::\n:::\n\n\n\n* Other values can be accessed from c++ including\n\n * attributes (use: `.attr()`. Also `.names()` is alias for name attribute.\n * `Environment`, `DottedPair`, `Language`, `Symbol` , etc. \n\n## Missing values\n\n### Missing values behave differently for C++ scalers{-}\n\n* Scalar NA's in Cpp : `NA_LOGICAL`, `NA_INTEGER`, `NA_REAL`, `NA_STRING`.\n\n* Integers (`int`) stores R NA's as the smallest integer. Better to use length 1 `IntegerVector`\n* Doubles use IEEE 754 NaN , which behaves a bit differently for logical expressions (but ok for math expressions). \n\n\n::: {.cell}\n\n```{.r .cell-code}\nevalCpp(\"NA_REAL || FALSE\")\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] TRUE\n```\n\n\n:::\n:::\n\n\n* Strings are a class from Rcpp, so they handle missing values fine.\n\n* `bool` can only hold two values, so be careful. Consider using vectors of length 1 or coercing to `int`\n\n\n### Vectors\n\n* Vectors are all type introduced by RCpp and know how to handle missing values if you use the specific type for that vector.\n\n\n::: {.cell}\n\n```{.cpp .cell-code}\n#include <Rcpp.h>\nusing namespace Rcpp;\n\n// [[Rcpp::export]]\nList missing_sampler() {\n return List::create(\n NumericVector::create(NA_REAL),\n IntegerVector::create(NA_INTEGER),\n LogicalVector::create(NA_LOGICAL),\n CharacterVector::create(NA_STRING)\n );\n}\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nstr(missing_sampler())\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> List of 4\n#> $ : num NA\n#> $ : int NA\n#> $ : logi NA\n#> $ : chr NA\n```\n\n\n:::\n:::\n\n\n## Standard Template Library\n\nSTL provides powerful data structures and algorithms for C++. \n\n### Iterators {-}\n\nIterators are used extensively in the STL to abstract away details of underlying data structures.\n\nIf you an iterator `it`, you can:\n\n- Get the value by 'dereferencing' with `*it`\n- Advance to the next value with `++it`\n- Compare iterators (locations) with `==`\n\n\n### Algorithms {-}\n\n* The real power of iterators comes from using them with STL algorithms. \n \n* A good reference is [https://en.cppreference.com/w/cpp/algorithm]\n\n* Book provides examples using `accumulate` and `upper_buond`\n\n* Another Example:\n\n\n::: {.cell}\n\n```{.cpp .cell-code}\n\n#include <algorithm>\n#include <Rcpp.h>\n\nusing namespace Rcpp;\n \n \n// Explicit iterator version\n \n// [[Rcpp::export]]\nNumericVector square_C_it(NumericVector x){\n NumericVector out(x.size());\n // Each container has its own iterator type\n NumericVector::iterator in_it;\n NumericVector::iterator out_it;\n \n for(in_it = x.begin(), out_it = out.begin(); in_it != x.end(); ++in_it, ++out_it) {\n *out_it = pow(*in_it,2);\n }\n \n return out;\n \n}\n \n \n// Use algorithm 'transform'\n \n// [[Rcpp::export]]\nNumericVector square_C(NumericVector x) {\n \n NumericVector out(x.size());\n \n \n std::transform(x.begin(),x.end(), out.begin(),\n [](double v) -> double { return v*v; });\n return out;\n}\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsquare_C(c(1.0,2.0,3.0))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1 4 9\n```\n\n\n:::\n:::\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsquare_C_it(c(1.0,2.0,3.0))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1 4 9\n```\n\n\n:::\n:::\n\n\n## Data Structures {-}\n\nSTL provides a large set of data structures. Some of the most important:\n\n* `std::vector` - like an `R` vector, except knows how to grow efficiently\n\n* `std::unordered_set` - unique set of values. Ordered version `std::set`. Unordered is more efficient.\n\n* `std::map` - Moslty similar to `R` lists, provide an association between a key and a value. There is also an unordered version. \n\nA quick example illustrating the `map`:\n\n\n::: {.cell}\n\n```{.cpp .cell-code}\n#include <Rcpp.h>\nusing namespace Rcpp;\n\n// [[Rcpp::export]]\nstd::map<double, int> tableC(NumericVector x) {\n // Note the types are <key, value>\n std::map<double, int> counts;\n\n int n = x.size();\n for (int i = 0; i < n; i++) {\n counts[x[i]]++;\n }\n\n return counts;\n}\n```\n:::\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nres = tableC(c(1,1,2,1,4,5))\nres\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> 1 2 4 5 \n#> 3 1 1 1\n```\n\n\n:::\n:::\n\n\n* Note that the map is converted to a named vector in this case on return\n\n \nTo learn more about the STL data structures see [containers](https://en.cppreference.com/w/cpp/container) at `cppreference`\n\n## Case Studies\n\n\n\nReal life uses of C++ to replace slow R code.\n\n\n## Case study 1: Gibbs sampler {-}\n\nThe [Gibbs sampler](https://en.wikipedia.org/wiki/Gibbs_sampling) is a method for estimating parameters expectations. It is a **MCMC algorithm** that has been adapted to sample from multidimensional target distributions. Gibbs sampling generates a **Markov chain** of samples, each of which is correlated with nearby samples. \n\n[Example blogged by Dirk Eddelbuettel](https://dirk.eddelbuettel.com/blog/2011/07/14/), the R and C++ code is very similar but runs about 20 times faster.\n\n> \"Darren Wilkinson stresses the rather pragmatic aspects of how fast and/or easy it is to write the code, rather than just the mere runtime.\n\n\n<center></center>\n\n\nR code:\n\n\n::: {.cell}\n\n```{.r .cell-code}\ngibbs_r <- function(N, thin) {\n mat <- matrix(nrow = N, ncol = 2)\n x <- y <- 0\n\n for (i in 1:N) {\n for (j in 1:thin) {\n x <- rgamma(1, 3, y * y + 4)\n y <- rnorm(1, 1 / (x + 1), 1 / sqrt(2 * (x + 1)))\n }\n mat[i, ] <- c(x, y)\n }\n mat\n}\n```\n:::\n\n\nActions to convert R to C++: \n\n- Add type declarations to all variables \n- Use `(` instead of `[` to index into the matrix \n- Subscript the results of `rgamma` and `rnorm` to convert from a vector into a scalar.\n\n\n::: {.cell}\n\n```{.cpp .cell-code}\n#include <Rcpp.h>\nusing namespace Rcpp;\n\n// [[Rcpp::export]]\nNumericMatrix gibbs_cpp(int N, int thin) {\n NumericMatrix mat(N, 2);\n double x = 0, y = 0;\n\n for(int i = 0; i < N; i++) {\n for(int j = 0; j < thin; j++) {\n x = rgamma(1, 3, 1 / (y * y + 4))[0];\n y = rnorm(1, 1 / (x + 1), 1 / sqrt(2 * (x + 1)))[0];\n }\n mat(i, 0) = x;\n mat(i, 1) = y;\n }\n\n return(mat);\n}\n```\n:::\n\n\nChecking who's best:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nbench::mark(\n gibbs_r(100, 10),\n gibbs_cpp(100, 10),\n check = FALSE\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> # A tibble: 2 × 6\n#> expression min median `itr/sec` mem_alloc `gc/sec`\n#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>\n#> 1 gibbs_r(100, 10) 1.31ms 3.83ms 283. 103.22KB 18.0\n#> 2 gibbs_cpp(100, 10) 150.1µs 416.3µs 2456. 1.61KB 16.9\n```\n\n\n:::\n:::\n\n\n## Case study 2: predict a model response from three inputs {-}\n\n[Rcpp is smoking fast for agent based models in data frames](https://gweissman.github.io/post/rcpp-is-smoking-fast-for-agent-based-models-in-data-frames/) by Gary Weissman, MD, MSHP.\n\nStarts with this code:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nvacc1a <- function(age, female, ily) {\n p <- 0.25 + 0.3 * 1 / (1 - exp(0.04 * age)) + 0.1 * ily\n p <- p * if (female) 1.25 else 0.75\n p <- max(0, p)\n p <- min(1, p)\n p\n}\n```\n:::\n\n\nR code with a for loop:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nvacc1 <- function(age, female, ily) {\n n <- length(age)\n out <- numeric(n)\n for (i in seq_len(n)) {\n out[i] <- vacc1a(age[i], female[i], ily[i])\n }\n out\n}\n```\n:::\n\n\nVectorized R code:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nvacc2 <- function(age, female, ily) {\n p <- 0.25 + 0.3 * 1 / (1 - exp(0.04 * age)) + 0.1 * ily\n p <- p * ifelse(female, 1.25, 0.75)\n p <- pmax(0, p)\n p <- pmin(1, p)\n p\n}\n```\n:::\n\n\nC++:\n\n\n::: {.cell}\n\n```{.cpp .cell-code}\n#include <Rcpp.h>\nusing namespace Rcpp;\n\ndouble vacc3a(double age, bool female, bool ily){\n double p = 0.25 + 0.3 * 1 / (1 - exp(0.04 * age)) + 0.1 * ily;\n p = p * (female ? 1.25 : 0.75);\n p = std::max(p, 0.0);\n p = std::min(p, 1.0);\n return p;\n}\n\n// [[Rcpp::export]]\nNumericVector vacc3(NumericVector age, LogicalVector female, \n LogicalVector ily) {\n int n = age.size();\n NumericVector out(n);\n\n for(int i = 0; i < n; ++i) {\n out[i] = vacc3a(age[i], female[i], ily[i]);\n }\n\n return out;\n}\n```\n:::\n\n\nSample data:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nn <- 1000\nage <- rnorm(n, mean = 50, sd = 10)\nfemale <- sample(c(T, F), n, rep = TRUE)\nily <- sample(c(T, F), n, prob = c(0.8, 0.2), rep = TRUE)\n\nstopifnot(\n all.equal(vacc1(age, female, ily), vacc2(age, female, ily)),\n all.equal(vacc1(age, female, ily), vacc3(age, female, ily))\n)\n```\n:::\n\n\n<center>**Who's faster?**</center>\n<center></center>\n\n\n::: {.cell}\n\n```{.r .cell-code}\nbench::mark(\n vacc1 = vacc1(age, female, ily),\n vacc2 = vacc2(age, female, ily),\n vacc3 = vacc3(age, female, ily)\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> # A tibble: 3 × 6\n#> expression min median `itr/sec` mem_alloc `gc/sec`\n#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>\n#> 1 vacc1 750.5µs 1.88ms 546. 7.86KB 28.8 \n#> 2 vacc2 48.4µs 72.1µs 10005. 146.68KB 16.9 \n#> 3 vacc3 30.8µs 41.5µs 15519. 11.98KB 4.06\n```\n\n\n:::\n:::\n\n\n## Resources\n\n- [Rcpp: Seamless R and C++ Integration](https:\\\\Rcpp.org)\n- [cpp-tutorial](https://www.learncpp.com) is often recommended. Lots of ads though!\n- [cpp-reference](https://en.cppreference.com/w/cpp)\n- [C++20 for Programmers](https://www.pearson.com/en-us/subject-catalog/p/c20-for-programmers-an-objects-natural-approach/P200000000211/9780137570461) is a newer book that covers modern c++ for people who know programming in another language.\n \n## Op Success!\n\n\n", + "supporting": [ + "25_files" + ], "filters": [ "rmarkdown/pagebreak.lua" ], diff --git a/extract_vids.R b/extract_vids.R @@ -0,0 +1,58 @@ +chapter_numbers <- 3:25 + +purrr::walk(chapter_numbers, \(chapter_number) { + chapter_number_wide <- stringr::str_pad(chapter_number, 2, pad = "0") + chapter <- readLines(glue::glue("slides/{chapter_number_wide}.Rmd")) + vids_start <- stringr::str_which(chapter, "^## Meeting Videos") + if (length(vids_start)) { + fs::dir_create("videos", chapter_number_wide) + txt <- paste(chapter[vids_start:length(chapter)], collapse = "\n") + vid_sections <- stringr::str_split_1(txt, "\\n###\\s*") |> + stringr::str_subset("^## Meeting Videos", negate = TRUE) + if (length(vid_sections)) { + purrr::walk(vid_sections, \(vid_section) { + section_lines <- stringr::str_split_1(vid_section, "\\n") |> + purrr::keep(~ nchar(.x) > 0) + cohort <- stringr::str_extract(section_lines[[1]], "\\d+") + cohort_wide <- stringr::str_pad(cohort, 2, pad = "0") + youtube_codes <- purrr::discard( + stringr::str_extract( + section_lines, + "https://www\\.youtube\\.com/embed/([^\"]+)", + 1 + ), + is.na + ) + youtube_embeds <- glue::glue( + "{{< video https://www.youtube.com/embed/[youtube_codes] >}}", + .open = "[", + .close = "]" + ) + chat_log <- stringr::str_subset( + section_lines[-1], + "https://www\\.youtube\\.com/embed/", + negate = TRUE + ) |> + paste(collapse = "\n") + + cat( + c( + "---", + glue::glue("title: Cohort {cohort}"), + "---", + youtube_embeds, + chat_log + ) |> + purrr::keep(~ nchar(.x) > 0), + sep = "\n", + file = fs::path( + "videos", + chapter_number_wide, + cohort_wide, + ext = "qmd" + ) + ) + }) + } + } +}) diff --git a/slides/02.qmd b/slides/02.qmd @@ -479,68 +479,3 @@ mem_used() 2. R and OS are lazy and don't reclaim/release memory until it's needed 3. R counts memory from objects, but there are gaps due to those that are deleted -> *memory fragmentation* [less memory actually available they you might think] - - -## Meeting Videos - -### Cohort 1 - -(no video recorded) - -### Cohort 2 - -`r knitr::include_url("https://www.youtube.com/embed/pCiNj2JRK50")` - -### Cohort 3 - -`r knitr::include_url("https://www.youtube.com/embed/-bEXdOoxO_E")` - -### Cohort 4 - -`r knitr::include_url("https://www.youtube.com/embed/gcVU_F-L6zY")` - -### Cohort 5 - -`r knitr::include_url("https://www.youtube.com/embed/aqcvKox9V0Q")` - -### Cohort 6 - -`r knitr::include_url("https://www.youtube.com/embed/O4Oo_qO7SIY")` - -<details> -<summary> Meeting chat log </summary> - -``` -00:16:57 Federica Gazzelloni: cohort 2 video: https://www.youtube.com/watch?v=pCiNj2JRK50 -00:18:39 Federica Gazzelloni: cohort 2 presentation: https://r4ds.github.io/bookclub-Advanced_R/Presentations/Week02/Cohort2_America/Chapter2Slides.html#1 -00:40:24 Arthur Shaw: Just the opposite, Ryan. Very clear presentation! -00:51:54 Trevin: parquet? -00:53:00 Arthur Shaw: We may all be right. {arrow} looks to deal with feather and parquet files: https://arrow.apache.org/docs/r/ -01:00:04 Arthur Shaw: Some questions for future meetings. (1) I find Ryan's use of slides hugely effective in conveying information. Would it be OK if future sessions (optionally) used slides? If so, should/could we commit slides to some folder on the repo? (2) I think reusing the images from Hadley's books really helps understanding and discussion. Is that OK to do? Here I'm thinking about copyright concerns. (If possible, I would rather not redraw variants of Hadley's images.) -01:01:35 Federica Gazzelloni: It's all ok, you can use past presentation, you don't need to push them to the repo, you can use the images from the book -01:07:19 Federica Gazzelloni: Can I use: gc(reset = TRUE) safely? -``` -</details> - -### Cohort 7 - -`r knitr::include_url("https://www.youtube.com/embed/kpAUoGO6elE")` - -<details> - -<summary>Meeting chat log</summary> -``` -00:09:40 Ryan Honomichl: https://drdoane.com/three-deep-truths-about-r/ -00:12:51 Robert Hilly: Be right back -00:36:12 Ryan Honomichl: brb -00:41:18 Ron: I tried mapply and also got different answers -00:41:44 collinberke: Interesting, would like to know more what is going on. -00:49:57 Robert Hilly: simple_map <- function(x, f, ...) { - out <- vector("list", length(x)) - for (i in seq_along(x)) { - out[[i]] <- f(x[[i]], ...) - } - out -} -``` -</details> diff --git a/slides/03.Rmd b/slides/03.Rmd @@ -1949,76 +1949,3 @@ You can make a list-array by assigning dimensions to a list. You can make a matr <details><summary>Answer(s)</summary> Tibbles have an enhanced print method, never coerce strings to factors, and provide stricter subsetting methods. </details> - - - - - - - - - - - - - -## Meeting Videos - -### Cohort 1 - -`r knitr::include_url("https://www.youtube.com/embed/pQ-xDAPEQaw")` - -### Cohort 2 - -`r knitr::include_url("https://www.youtube.com/embed/CpLM6SdpTFY")` - -### Cohort 3 - -`r knitr::include_url("https://www.youtube.com/embed/9E4RlbW8vxU")` - -### Cohort 4 - -`r knitr::include_url("https://www.youtube.com/embed/LCAgxwm5Ydg")` - -### Cohort 5 - -`r knitr::include_url("https://www.youtube.com/embed/DrVY6DE9ymY")` - -### Cohort 6 - -`r knitr::include_url("https://www.youtube.com/embed/mmcnkIjANps")` - -<details> -<summary>Meeting chat log</summary> - -``` -00:10:18 Oluwafemi Oyedele: Hi, good evening -00:23:31 Federica Gazzelloni: Hi Kiante! -00:24:21 Federica Gazzelloni: Thanks Arthur -00:25:46 Trevin: Welcome Matt! -00:26:02 Matt Dupree: hello! thank you! -00:30:34 Federica Gazzelloni: Hello Matt! -00:30:46 Matt Dupree: hello! -00:38:24 Ryan Metcalf: `rlang::cpl()` = “complex”. For example `0+1i` -00:55:37 Trevin: > two <- c(1,2,3) -> names(two) <- c("one", "two") -> two - one two <NA> - 1 2 3 -00:57:25 Ryan Metcalf: Excellent Trevin. You beat me to the output! Assuming we didn't supply the string, `NA` is entered instead. -01:08:50 Ryan Metcalf: Without further research, this is the "Unix Epoch”. However, varying operating systems use different Epochs. -``` -</details> - -### Cohort 7 - -`r knitr::include_url("https://www.youtube.com/embed/QcdByYHo1ms")` - -<details> - -<summary>Meeting chat log</summary> -``` -00:54:07 Ron: https://www.tidyverse.org/blog/2021/03/clock-0-1-0/ -01:14:39 Robert Hilly: https://www.amazon.com/Effective-Pandas-Patterns-Manipulation-Treading/dp/B09MYXXSFM -``` -</details> diff --git a/slides/04.Rmd b/slides/04.Rmd @@ -521,63 +521,3 @@ df x1 & y1 # [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE ``` - -## Meeting Videos - -### Cohort 1 - -`r knitr::include_url("https://www.youtube.com/embed/eLMpCc0t1cg")` - -### Cohort 2 - -`r knitr::include_url("https://www.youtube.com/embed/Mhq-TX4eA64")` - -### Cohort 3 - -`r knitr::include_url("https://www.youtube.com/embed/CvvXkXiF3Ig")` - -### Cohort 4 - -`r knitr::include_url("https://www.youtube.com/embed/Hxghhpe9fYs")` - -### Cohort 5 - -`r knitr::include_url("https://www.youtube.com/embed/qtUgKhw39Yo")` - -### Cohort 6 - -`r knitr::include_url("https://www.youtube.com/embed/-WjBA6yqW0Q")` - -<details> -<summary> Meeting chat log </summary> - -``` -00:36:02 Arthur Shaw: TIL that the subset operator has parameters. Thanks, Trevin! -00:38:55 Vaibhav Janve: its interesting that carriage "a" has two set of wheels instread of 4. I wonder that choice is because its atomic. -00:40:44 Arthur Shaw: @Vaibhav, because the load is lighter, the carriage needs fewer axles? ;) I agree: it's a confusing graphical choice. -00:41:11 Vaibhav Janve: lol -01:05:53 Vaibhav Janve: Thank you Trevin! -``` -</details> - -### Cohort 7 - -`r knitr::include_url("https://www.youtube.com/embed/LBU-Ew_IM7A")` - -`r knitr::include_url("https://www.youtube.com/embed/W9CoQ15NlOc")` - -<details> - -<summary>Meeting chat log</summary> -``` -00:40:18 Ryan Honomichl: What type of vector does each of the following calls to ifelse() return? - -* "ifelse returns a value with the same shape as test which is filled with elements selected from either yes or no depending on whether the element of test is TRUE or FALSE." -00:42:11 Ryan Honomichl: "I recommend assigning the results of an if statement only when the entire expression fits on one line; otherwise it tends to be hard to read" -00:42:46 Ryan Honomichl: * When you use the single argument form without an `else` statement, `if` invisibly returns NULL if the condition is FALSE. - -- Since functions like c() and paste() drop NULL inputs, this allows for a compact expression of certain idioms -00:54:15 collinberke: https://docs.google.com/spreadsheets/d/1ScrbEw_-vB9DruaJhjtVY8HLQmuNPqyWeOOjmG6OY1M/edit?usp=sharing -00:58:46 collinberke: https://www.youtube.com/@safe4democracy/videos -``` -</details> diff --git a/slides/05.Rmd b/slides/05.Rmd @@ -418,73 +418,3 @@ repeat { } ``` - - ---- - - - - -## Meeting Videos - -### Cohort 1 - -`r knitr::include_url("https://www.youtube.com/embed/96eY6YS_3hU")` - -### Cohort 2 - -`r knitr::include_url("https://www.youtube.com/embed/x5I_uHnMxIk")` - -### Cohort 3 - -`r knitr::include_url("https://www.youtube.com/embed/u6UMGWDuxDE")` - -### Cohort 4 - -`r knitr::include_url("https://www.youtube.com/embed/G4YOvwsSw2Q")` - -### Cohort 5 - -`r knitr::include_url("https://www.youtube.com/embed/AZwJjsl8xiI")` - -### Cohort 6 - -`r knitr::include_url("https://www.youtube.com/embed/wg2QZ3rMIqM")` - -<details> -<summary> Meeting chat log </summary> - -``` -00:16:34 Federica Gazzelloni: https://github.com/r4ds/bookclub-Advanced_R -00:22:28 Federica Gazzelloni: https://stackoverflow.com/questions/50646133/dplyr-if-else-vs-base-r-ifelse -00:26:20 Trevin: case_when() is great, makes it easy to read -00:54:01 Trevin: out[I, ] -00:54:14 Trevin: out[i, ] -00:55:03 Trevin: I think you have to specify number of rows and columns before.. -00:55:30 Trevin: iterations = 10 - variables = 2 - - output <- matrix(ncol=variables, nrow=iterations) -00:55:43 Trevin: https://stackoverflow.com/questions/13442461/populating-a-data-frame-in-r-in-a-loop -``` -</details> - -### Cohort 7 - -`r knitr::include_url("https://www.youtube.com/embed/W9CoQ15NlOc")` - -<details> - -<summary>Meeting chat log</summary> -``` -00:40:18 Ryan Honomichl: What type of vector does each of the following calls to ifelse() return? - -* "ifelse returns a value with the same shape as test which is filled with elements selected from either yes or no depending on whether the element of test is TRUE or FALSE." -00:42:11 Ryan Honomichl: "I recommend assigning the results of an if statement only when the entire expression fits on one line; otherwise it tends to be hard to read" -00:42:46 Ryan Honomichl: * When you use the single argument form without an `else` statement, `if` invisibly returns NULL if the condition is FALSE. - -- Since functions like c() and paste() drop NULL inputs, this allows for a compact expression of certain idioms -00:54:15 collinberke: https://docs.google.com/spreadsheets/d/1ScrbEw_-vB9DruaJhjtVY8HLQmuNPqyWeOOjmG6OY1M/edit?usp=sharing -00:58:46 collinberke: https://www.youtube.com/@safe4democracy/videos -``` -</details> diff --git a/slides/06.Rmd b/slides/06.Rmd @@ -437,120 +437,3 @@ ggplot(output_full, aes(time, proportion, color = variable, group = variable)) + ylab("Prevalence") + labs(color = "Compartment", title = "SIR Model") ``` - - - - ---- - - - -## Meeting Videos - -### Cohort 1 - -`r knitr::include_url("https://www.youtube.com/embed/UwzGhMndWzs")` - -### Cohort 2 - -`r knitr::include_url("https://www.youtube.com/embed/51PMEM4Efb8")` - -### Cohort 3 - -`r knitr::include_url("https://www.youtube.com/embed/Vwuo-e_Ir0s")` - -### Cohort 4 - -`r knitr::include_url("https://www.youtube.com/embed/lg5rzOU6lsg")` - -### Cohort 5 - -`r knitr::include_url("https://www.youtube.com/embed/q8K0Jl5hiV0")` - -### Cohort 6 - -`r knitr::include_url("https://www.youtube.com/embed/BPd6-G9e32I")` - -<details> -<summary> Meeting chat log </summary> - -``` -00:01:11 Oluwafemi Oyedele: Hi, Good evening -00:01:22 Federica Gazzelloni: Hello! -00:43:19 Federica Gazzelloni: https://r4ds.github.io/bookclub-Advanced_R/QandA/docs/welcome.html -00:52:48 Priyanka: sounds good actually -00:52:59 Federica Gazzelloni: 👍🏻 -``` -</details> - -`r knitr::include_url("https://www.youtube.com/embed/GCDXXkBQrGk")` - -<details> -<summary> Meeting chat log </summary> - -``` -00:09:30 Oluwafemi Oyedele: Hi, Good evening -00:10:41 Federica Gazzelloni: Hi -00:14:40 Federica Gazzelloni: that's great! -00:54:24 Trevin: Also, sorry if you are repeating 🙂 -00:54:52 Arthur Shaw: @ryan, thank you so much for the awesome synthesis! Could you share your reference list? I'd love to dive more deeply into the material you presented. -00:57:02 Ryan Metcalf: https://cran.r-project.org/doc/manuals/r-release/R-lang.pdf -00:59:32 Trevin: https://github.com/COHHIO/RmData -01:01:48 Ryan Metcalf: https://mastering-shiny.org/ -01:02:02 Ryan Metcalf: https://engineering-shiny.org/ -01:02:15 Arthur Shaw: @trevin, if you get bored with beepr, move to BRRR ;) -01:02:16 Arthur Shaw: https://github.com/brooke-watson/BRRR -01:09:27 Ryan Metcalf: This is amazing Trevin! I'll take a closer look. Is it ok to reach out to you with any questions? -01:09:43 Trevin: Yeah, feel free to reach out -``` -</details> - -`r knitr::include_url("https://www.youtube.com/embed/NaiQa_u-j1k")` - -<details> -<summary> Meeting chat log </summary> - -``` -00:05:34 Trevin: I didn't catch that -00:06:02 priyanka gagneja: i won't be presenting I said .. so you two have the stage -00:08:39 Federica Gazzelloni: no worries -00:08:46 Federica Gazzelloni: next time you do it -00:08:56 Federica Gazzelloni: did you sign up? -00:09:45 Trevin: Discord is free: https://discord.gg/rstudioconf2022 -00:10:04 Trevin: Free stream link: https://www.rstudio.com/conference/stream -00:24:32 Arthur Shaw: Maybe silly question: is the magrittr pipe an infix function? -00:32:15 Trevin: https://colinfay.me/playing-r-infix-functions/ -00:33:23 Arthur Shaw: Maybe another example of an infix function: lubridate's `%within%` -00:33:47 Trevin: That's a good one too ^ -00:33:55 priyanka gagneja: yes within would be good. -00:40:13 Arthur Shaw: no -00:49:50 Arthur Shaw: Sorry for dropping in and out. My WiFi router is having issues today--maybe is failing. -01:08:59 Trevin: Looking forward to it 🙂 -``` -</details> - -### Cohort 7 - -`r knitr::include_url("https://www.youtube.com/embed/tz2z9l41IhU")` - -<details> - -<summary>Meeting chat log</summary> -``` -00:31:54 Ronald Legere: https://en.wikipedia.org/wiki/First-class_function -00:42:55 Ronald Legere: https://en.wikipedia.org/wiki/Immediately_invoked_function_expression -``` -</details> - -`r knitr::include_url("https://www.youtube.com/embed/AbdcI57vbcg")` - -<details> - -<summary>Meeting chat log</summary> -``` -00:54:02 Ron: Book gives this simple example of when you might want to use prefix form of an infix operator: lapply(list(1:3, 4:5), `+`, 3) -00:56:49 collinberke: https://colinfay.me/playing-r-infix-functions/#:~:text=What%20are%20infix%20functions%3F,%2C%20%2B%20%2C%20and%20so%20on. -01:07:36 Ron: x[3] <- 33 -01:07:51 Ron: `[<-`(x,3,value =33) -``` -</details> diff --git a/slides/07.Rmd b/slides/07.Rmd @@ -11,81 +11,3 @@ title: Environments - ADD SLIDES AS SECTIONS (`##`). - TRY TO KEEP THEM RELATIVELY SLIDE-LIKE; THESE ARE NOTES, NOT THE BOOK ITSELF. - -## Meeting Videos - -### Cohort 1 - -`r knitr::include_url("https://www.youtube.com/embed/mk7iu1-P8ZU")` - -### Cohort 2 - -`r knitr::include_url("https://www.youtube.com/embed/syRMRYKN30k")` - -### Cohort 3 - -`r knitr::include_url("https://www.youtube.com/embed/fW7Di01gLhw")` - -### Cohort 4 - -`r knitr::include_url("https://www.youtube.com/embed/Aw_Q7PMYJkA")` - -### Cohort 5 - -`r knitr::include_url("https://www.youtube.com/embed/tuafimbMyKk")` - -### Cohort 6 - -`r knitr::include_url("https://www.youtube.com/embed/buUaaOu89EQ")` - -<details> -<summary> Meeting chat log </summary> - -``` -00:08:41 Arthur Shaw: Hello, everyone! -00:21:31 Federica Gazzelloni: ?walk: Apply a function to each element of a list or atomic vector -00:23:15 Federica Gazzelloni: ?caller_env: Get properties of the current or caller frame -00:24:56 Trevin: purrr::walk(x, function(x, ce, ace = rlang::caller_env()) { - .ce <- rlang::caller_env() - message("Internal: ") - print(.ce) - message("Argument: ") - print(ace) - message("External: ") - print(ce) - message("Internal: ",paste0(ls(.ce), collapse = "\n")) - message("Argument: ",paste0(ls(ace), collapse = "\n")) - message("External: ",paste0(ls(ce), collapse = "\n")) -}, ce = rlang::caller_env()) -00:29:39 Federica Gazzelloni: ??iwalk: Apply a function to each element of a vector, and its index -00:35:30 Arthur Shaw: https://magrittr.tidyverse.org/reference/tee.html -00:36:05 Federica Gazzelloni: ?`%T>%` -00:46:59 Trevin: ?eval -01:06:03 Federica Gazzelloni: https://cran.r-project.org/web/packages/withr/index.html -01:09:21 Federica Gazzelloni: https://github.com/r-lib/withr -01:10:38 Trevin: I'm okay if we meet next week -01:10:53 Oluwafemi Oyedele: I am ok with next week -``` -</details> - -### Cohort 7 - -`r knitr::include_url("https://www.youtube.com/embed/PUXrijnsWy0")` - -<details> -<summary>Meeting chat log</summary> -``` -00:06:49 Ryan Honomichl: https://r4ds.github.io/bookclub-Advanced_R/QandA/docs/environments.html -``` -</details> - -`r knitr::include_url("https://www.youtube.com/embed/6xECnY4ro48")` - -<details> -<summary>Meeting chat log</summary> -``` -00:14:44 collinberke: https://ivelasq.rbind.io/blog/macos-rig/index.html -00:21:10 collinberke: https://github.com/tidyverse/dplyr/blob/main/NAMESPACE -01:00:21 collinberke: https://r4ds.hadley.nz/iteration.html -``` -</details> diff --git a/slides/08.Rmd b/slides/08.Rmd @@ -494,59 +494,3 @@ See [the sub-section in the book](https://adv-r.hadley.nz/conditions.html#condit - [Including contextual information with error chains](https://rlang.r-lib.org/reference/topic-error-chaining.html) - [Formatting messages with cli](https://rlang.r-lib.org/reference/topic-condition-formatting.html) - [Other resources](https://github.com/rstudio-conf-2022/pkg-dev-masterclass/blob/main/materials/5-error-resources.md) from error message segment of rstudio::conf(2022) workshop "Package Development Masterclass" - -## Meeting Videos - -### Cohort 1 - -`r knitr::include_url("https://www.youtube.com/embed/mwiNe083DLU")` - -### Cohort 2 - -`r knitr::include_url("https://www.youtube.com/embed/ZFUr7YRSu2o")` - -### Cohort 3 - -`r knitr::include_url("https://www.youtube.com/embed/UZhrsVz6wi0")` - -`r knitr::include_url("https://www.youtube.com/embed/Wt7p71_BuYY")` - -### Cohort 4 - -`r knitr::include_url("https://www.youtube.com/embed/WinIo5mrUZo")` - -### Cohort 5 - -`r knitr::include_url("https://www.youtube.com/embed/VFs-2sl5C70")` - -### Cohort 6 - -`r knitr::include_url("https://www.youtube.com/embed/VwmrbPUQY1k")` - -<details> -<summary> Meeting chat log </summary> - -``` -00:19:16 Trevin: https://style.tidyverse.org/error-messages.html -00:20:14 Trevin: More on errors in the design guide: https://design.tidyverse.org/ -01:14:27 Federica Gazzelloni: more info here: https://colinfay.me/learn-shiny-production/ -``` -</details> - -### Cohort 7 - -`r knitr::include_url("https://www.youtube.com/embed/t1N6XdidvNo")` - -<details> - -<summary>Meeting chat log</summary> -``` -00:34:09 Ron: Someone did: https://cran.r-project.org/web/packages/comprehenr/vignettes/Introduction.html -00:47:58 collinberke: https://purrr.tidyverse.org/reference/safely.html -00:48:24 Ron: it's a function operator ! -00:49:37 Ron: \(x) length(unique(x) is not too verbose though -00:49:39 Ron: ;) -01:06:50 collinberke: https://colinfay.me/purrr-mappers/ -01:07:45 collinberke: https://colinfay.me/purrr-web-mining/ -``` -</details> diff --git a/slides/09.Rmd b/slides/09.Rmd @@ -567,64 +567,3 @@ A <- list(1,10,"a") map_dbl(.x = A, .f = possibly(log, otherwise = NA_real_) ) ``` - - -## Meeting Videos - -### Cohort 1 - -`r knitr::include_url("https://www.youtube.com/embed/o0a6aJ4kCkU")` - -### Cohort 2 - -`r knitr::include_url("https://www.youtube.com/embed/YrZ13_4vUMw")` - -### Cohort 3 - -`r knitr::include_url("https://www.youtube.com/embed/DUHXo527mHs")` - -### Cohort 4 - -`r knitr::include_url("https://www.youtube.com/embed/SpDpmhW62Ns")` - -### Cohort 5 - -`r knitr::include_url("https://www.youtube.com/embed/tYqFMtmhmiI")` - -### Cohort 6 - -`r knitr::include_url("https://www.youtube.com/embed/HmDlvnp6uNQ")` - -<details> -<summary> Meeting chat log </summary> - -``` -00:15:49 Matt Dupree: did anyone else lose audio? -00:15:59 Federica Gazzelloni: not me -00:16:02 Arthur Shaw: Not me either -00:16:04 Trevin: okay for me -00:16:27 Matt Dupree: gonna try rejoining -00:43:14 Matt Dupree: oh i didn't know they invisibly returned .x! That's useful! -00:48:29 Arthur Shaw: Very cool trick ! -``` -</details> - -### Cohort 7 - -`r knitr::include_url("https://www.youtube.com/embed/t1N6XdidvNo")` - -<details> - -<summary>Meeting chat log</summary> -``` -00:34:09 Ron: Someone did: https://cran.r-project.org/web/packages/comprehenr/vignettes/Introduction.html -00:47:58 collinberke: https://purrr.tidyverse.org/reference/safely.html -00:48:24 Ron: it's a function operator ! -00:49:37 Ron: \(x) length(unique(x) is not too verbose though -00:49:39 Ron: ;) -01:06:50 collinberke: https://colinfay.me/purrr-mappers/ -01:07:45 collinberke: https://colinfay.me/purrr-web-mining/ -``` -</details> - -`r knitr::include_url("https://www.youtube.com/embed/6gY3KZWYC00")` diff --git a/slides/10.Rmd b/slides/10.Rmd @@ -633,43 +633,3 @@ rlang::env_unbind(globalenv(), names(funs)) --> - - -## Meeting Videos - -### Cohort 1 - -`r knitr::include_url("https://www.youtube.com/embed/enI5Ynq6olI")` - -### Cohort 2 - -`r knitr::include_url("https://www.youtube.com/embed/U-CoF7MCik0")` - -### Cohort 3 - -`r knitr::include_url("https://www.youtube.com/embed/qgn7WTITnNs")` - -### Cohort 4 - -`r knitr::include_url("https://www.youtube.com/embed/GHp2W4JxVaY")` - -### Cohort 5 - -`r knitr::include_url("https://www.youtube.com/embed/8TGXjzi0n0o")` - -### Cohort 6 - -`r knitr::include_url("https://www.youtube.com/embed/FUoYwYFqT7Q")` - -<details> -<summary> Meeting chat log </summary> - -``` -01:02:25 Trevin: I'm good with combining 👍 -01:02:57 Oluwafemi Oyedele: I agree with combining the chapter!!! -``` -</details> - -### Cohort 7 - -`r knitr::include_url("https://www.youtube.com/embed/7GLyO3IntgE")` diff --git a/slides/11.Rmd b/slides/11.Rmd @@ -202,56 +202,3 @@ walk2( ## Exercise {-} 2) Should you memoise file.download? Why or why not? - - - -## Meeting Videos - -### Cohort 1 - -`r knitr::include_url("https://www.youtube.com/embed/zzUY03gt_pA")` - -### Cohort 2 - -`r knitr::include_url("https://www.youtube.com/embed/fD1QJB2pHik")` - -### Cohort 3 - -`r knitr::include_url("https://www.youtube.com/embed/Re6y5CQzwG4")` - -### Cohort 4 - -`r knitr::include_url("https://www.youtube.com/embed/rVooJFdbePs")` - -### Cohort 5 - -`r knitr::include_url("https://www.youtube.com/embed/XOurCfeJLGc")` - -### Cohort 6 - -`r knitr::include_url("https://www.youtube.com/embed/EPs57es2MsE")` - -<details> -<summary> Meeting chat log </summary> - -``` -00:01:42 Oluwafemi Oyedele: Hi, Good Evening !!! -00:05:52 Arthur Shaw: @federica, love the hex stickers behind you. All from rstudio::conf? -00:07:12 Arthur Shaw: I tried doing the same. I had a hard time ordering them. I also thought I'd make the stickers into magnets so that I could rearrange them in future. -00:48:34 Oluwafemi Oyedele: Thank you !!! -``` -</details> - -### Cohort 7 - -`r knitr::include_url("https://www.youtube.com/embed/WDehjcuc7xs")` - -<details> - -<summary>Meeting chat log</summary> -``` -00:18:21 collinberke: Jenny Bryan debugging: https://www.youtube.com/watch?v=vgYS-F8opgE -00:31:10 collinberke: https://purrr.tidyverse.org/reference/slowly.html -00:47:43 Robert Hilly: By guys! -``` -</details> diff --git a/slides/12.Rmd b/slides/12.Rmd @@ -341,53 +341,3 @@ is.numeric(factor("x")) ``` But Advanced R consistently uses numeric to mean integer or double type. - -## Meeting Videos - -### Cohort 1 - -`r knitr::include_url("https://www.youtube.com/embed/Fy3JF5Em6qY")` - -### Cohort 2 - -`r knitr::include_url("https://www.youtube.com/embed/9GkgNC15EAw")` - -### Cohort 3 - -`r knitr::include_url("https://www.youtube.com/embed/IL6iJhAsZAY")` - -### Cohort 4 - -`r knitr::include_url("https://www.youtube.com/embed/4la5adcWwKE")` - -### Cohort 5 - -`r knitr::include_url("https://www.youtube.com/embed/NeHtEGab1Og")` - -### Cohort 6 - -`r knitr::include_url("https://www.youtube.com/embed/rfidR7tI_nQ")` - -<details> -<summary> Meeting chat log </summary> - -``` -00:35:02 Trevin: sloop (“sail the seas of OOP”) -00:42:40 Ryan Metcalf: Awesome input Trevin! I jumped to the vignette, but didn't see the reference directly. -01:00:01 Trevin: If you're interested there may be a new “R Packages" cohort starting up soon (also a new version of the book coming out soonish as well?) -01:08:23 Oluwafemi Oyedele: Thank you !!! -``` -</details> - -### Cohort 7 - -`r knitr::include_url("https://www.youtube.com/embed/mOpmvc9h_4M")` - -<details> - -<summary>Meeting chat log</summary> -``` -00:35:43 Stone: base::InternalMethods -00:48:04 collinberke: https://cran.r-project.org/doc/manuals/R-exts.html -``` -</details> diff --git a/slides/13.Rmd b/slides/13.Rmd @@ -375,46 +375,3 @@ There's no easy solution to this problem in base R. There is a solution in the vectors package: `vctrs::vec_restore()` <!-- TODO: read docs/vignettes to be able to summarize how this works --> - -## Meeting Videos - -### Cohort 1 - -`r knitr::include_url("https://www.youtube.com/embed/Fy3JF5Em6qY")` - -### Cohort 2 - -`r knitr::include_url("https://www.youtube.com/embed/9GkgNC15EAw")` - -### Cohort 3 - -`r knitr::include_url("https://www.youtube.com/embed/q7lFXSLdC1g")` - -`r knitr::include_url("https://www.youtube.com/embed/2rHS_urTGFg")` - -### Cohort 4 - -`r knitr::include_url("https://www.youtube.com/embed/4la5adcWwKE")` - -`r knitr::include_url("https://www.youtube.com/embed/eTCT2O58GYM")` - -### Cohort 5 - -`r knitr::include_url("https://www.youtube.com/embed/NeHtEGab1Og")` - -### Cohort 6 - -`r knitr::include_url("https://www.youtube.com/embed/vzbl2o-MEeQ")` - -<details> -<summary> Meeting chat log </summary> - -``` -00:05:30 Oluwafemi Oyedele: Hi everyone, Good Evening !!! -00:09:44 Trevin: I agree Arthur, need to look at that package some more -``` -</details> - -### Cohort 7 - -`r knitr::include_url("https://www.youtube.com/embed/zNLx4q8TCKQ")` diff --git a/slides/14.Rmd b/slides/14.Rmd @@ -493,63 +493,3 @@ db_a$file$path == db_b$file$path * RC is tied to R, so any bug fixes need a newer version of R. * This is especially important if you're writing packages that need to work with multiple R versions. * R6 and RC are similar, so if you need RC, it will only require a small amount of additional effort to learn RC. - -## Meeting Videos - -### Cohort 1 - -`r knitr::include_url("https://www.youtube.com/embed/hPjaOdprgow")` - -### Cohort 2 - -`r knitr::include_url("https://www.youtube.com/embed/LVkDJ28XJUE")` - -### Cohort 3 - -`r knitr::include_url("https://www.youtube.com/embed/rCjQTbQ22qc")` - -`r knitr::include_url("https://www.youtube.com/embed/ii6xhOzT_HQ")` - -### Cohort 4 - -`r knitr::include_url("https://www.youtube.com/embed/i_z6pHavhX0")` - -### Cohort 5 - -`r knitr::include_url("https://www.youtube.com/embed/NXmlqK2LxWw")` - -### Cohort 6 - -`r knitr::include_url("https://www.youtube.com/embed/EuTubeJ1VUw")` - -<details> -<summary> Meeting chat log </summary> - -``` -00:11:34 Trevin: https://engineering-shiny.org/common-app-caveats.html?q=R6#using-r6-as-data-storage -00:39:36 Federica Gazzelloni: new R7: https://rconsortium.github.io/OOP-WG/ -00:40:04 Federica Gazzelloni: R7 designed to be a successor to S3 and S4 -00:40:40 Federica Gazzelloni: R6: https://r6.r-lib.org/articles/Introduction.html -00:52:44 Trevin: https://advanced-r-solutions.rbind.io/r6.html#controlling-access -01:00:34 Federica Gazzelloni: interesting: https://r-craft.org/r-news/object-oriented-programming-oop-in-r-with-r6-the-complete-guide/ -01:01:58 Trevin: https://hadley.shinyapps.io/cran-downloads/ -01:02:33 Oluwafemi Oyedele: Thank you !!! -``` -</details> - -### Cohort 7 - -`r knitr::include_url("https://www.youtube.com/embed/Q4FA0BB_PeY")` - -<details> - -<summary>Meeting chat log</summary> -``` -00:06:57 Ron Legere: https://arxiv.org/abs/2303.12712 -00:07:07 Ron Legere: ^^ shows some of the power and limitations -00:39:41 collinberke: https://www.youtube.com/watch?v=3GEFd8rZQgY&list=WL&index=11 -00:49:20 iPhone: Sorry fellas need to jump early. See you next week! -01:05:21 collinberke: https://github.com/r4ds/bookclub-advr -01:09:30 Ron Legere: makeActiveBinding -``` -</details> diff --git a/slides/15.Rmd b/slides/15.Rmd @@ -321,43 +321,3 @@ format.Period <- function(x, ...) { ``` See source code [here](https://github.com/tidyverse/lubridate/blob/0bb49b21c88736240219dc67e7ed0eb3df15d9b1/R/periods.r#L195) - -## Meeting Videos - -### Cohort 1 - -`r knitr::include_url("https://www.youtube.com/embed/a1jzpWiksyA")` - -### Cohort 2 - -`r knitr::include_url("https://www.youtube.com/embed/bzo37PHCM1I")` - -### Cohort 3 - -`r knitr::include_url("https://www.youtube.com/embed/WWnJ5Cl-aTE")` - -`r knitr::include_url("https://www.youtube.com/embed/_byYFTQHp1Y")` - -### Cohort 4 - -`r knitr::include_url("https://www.youtube.com/embed/M8Poajmj-HU")` - -### Cohort 5 - -`r knitr::include_url("https://www.youtube.com/embed/unNfE1fDFEY")` - -### Cohort 6 - -`r knitr::include_url("https://www.youtube.com/embed/q1-QUFJsbLA")` - -### Cohort 7 - -`r knitr::include_url("https://www.youtube.com/embed/puvaJtv9gQw")` - -<details> - -<summary>Meeting chat log</summary> -``` -01:09:37 Ron Legere: https://en.wikipedia.org/wiki/Composition_over_inheritance -``` -</details> diff --git a/slides/16.Rmd b/slides/16.Rmd @@ -350,48 +350,3 @@ $$ | _Create object_ | `class()` or `structure()` with `class` argument or constructor function | `new()` | Use registered method function | `$new()` | | _Additional components_ | attributes | slots | properties | | | | | | | | - -## Meeting Videos {-} - -### Cohort 1 {-} - -`r knitr::include_url("https://www.youtube.com/embed/W1uc8HbyZvI")` - -### Cohort 2 {-} - -`r knitr::include_url("https://www.youtube.com/embed/bzo37PHCM1I")` - -### Cohort 3 {-} - -`r knitr::include_url("https://www.youtube.com/embed/_byYFTQHp1Y")` - -### Cohort 4 {-} - -`r knitr::include_url("https://www.youtube.com/embed/vdKDPBcOc6Y")` - -### Cohort 5 {-} - -`r knitr::include_url("https://www.youtube.com/embed/3EvqtVYTFVM")` - -### Cohort 6 {-} - -`r knitr::include_url("https://www.youtube.com/embed/vEButxFIvLw")` - -<details> -<summary> Meeting chat log </summary> - -``` -00:11:36 Oluwafemi Oyedele: I have not built anything with them!!! -00:16:31 Arthur Shaw: https://cran.r-project.org/web/packages/sp/index.html -00:19:05 Arthur Shaw: Apparently Hadley asked the same question we're asking several years ago: https://stackoverflow.com/questions/5437238/which-packages-make-good-use-of-s4-objects -00:19:16 Trevin: HA -00:23:54 Trevin: Your audio is breaking up Federica -01:06:58 Federica Gazzelloni: https://mastering-shiny.org/reactive-motivation.html?q=R6#event-driven -01:07:37 Federica Gazzelloni: https://engineering-shiny.org/common-app-caveats.html?q=R6#using-r6-as-data-storage -01:10:52 Oluwafemi Oyedele: Thank you !!! -``` -</details> - -### Cohort 7 {-} - -`r knitr::include_url("https://www.youtube.com/embed/2vxnzqWp-OU")` diff --git a/slides/17.Rmd b/slides/17.Rmd @@ -242,70 +242,3 @@ a <- 10000 df <- data.frame(x = 1:3) fun1(df) ``` - - - - -## Meeting Videos - -### Cohort 1 - -`r knitr::include_url("https://www.youtube.com/embed/10gRbFMoh7g")` - -### Cohort 2 - -`r knitr::include_url("https://www.youtube.com/embed/vKKDU6x3BE8")` - -### Cohort 3 - -`r knitr::include_url("https://www.youtube.com/embed/5RLCRFli6QI")` - -### Cohort 4 - -`r knitr::include_url("https://www.youtube.com/embed/9MDC12hgOWQ")` - -### Cohort 5 - -`r knitr::include_url("https://www.youtube.com/embed/FSm2_TJmhm0")` - -### Cohort 6 - -`r knitr::include_url("https://www.youtube.com/embed/Ddd_43gw8nA")` - -<details> -<summary> Meeting chat log </summary> - -``` -00:32:31 Oluwafemi Oyedele: When should eval_tidy() be used instead of eval()? -base::eval() is sufficient for simple evaluation. Use eval_tidy() when you'd like to support expressions referring to the .data pronoun, or when you need to support quosures. -00:37:08 Trevin (he/him): https://rlang.r-lib.org/reference/topic-defuse.html -00:38:38 Federica Gazzelloni: https://rlang.r-lib.org/reference/eval_tidy.html -00:39:57 Arthur Shaw: Tidy eval book: https://bookdown.dongzhuoer.com/tidyverse/tidyeval/ -00:40:14 Arthur Shaw: Also very useful resource: https://dplyr.tidyverse.org/articles/programming.html -00:40:28 Trevin (he/him): https://ggplot2.tidyverse.org/reference/aes.html -00:40:37 Federica Gazzelloni: https://ggplot2.tidyverse.org/reference/tidyeval.html -00:41:22 Oluwafemi Oyedele: It is Tidyverse design -00:49:13 Federica Gazzelloni: https://www.youtube.com/watch?v=2NixH3QAerQ&list=PL3x6DOfs2NGi9lH7q-phZlPrl6HKXYDbn&index=15 -00:50:13 Federica Gazzelloni: Minute: 17:04 -00:54:03 Federica Gazzelloni: con <- DBI::dbConnect(RSQLite::SQLite(), filename = ":memory:") -00:54:18 Federica Gazzelloni: DBI::dbDisconnect(con) -``` -</details> - -### Cohort 7 - -`r knitr::include_url("https://www.youtube.com/embed/MX2vNlvIUFo")` - -<details> - -<summary>Meeting chat log</summary> -``` -00:11:09 Ryan Honomichl: https://medium.com/analytics-vidhya/become-a-better-r-programmer-with-the-awesome-lobstr-package-af97fcd22602 -00:33:03 Ryan Honomichl: https://rlang.r-lib.org/reference/enquo.html -00:37:30 Ryan Honomichl: https://rlang.r-lib.org/reference/topic-multiple-columns.html -00:41:00 Ryan Honomichl: brb -00:44:37 Ron Legere: https://www.rdocumentation.org/packages/srvyr/versions/1.2.0 -00:44:58 Ron Legere: http://gdfe.co/srvyr/ -00:51:51 Stone: https://cran.r-project.org/web/packages/data.table/vignettes/datatable-intro.html -``` -</details> diff --git a/slides/18.Rmd b/slides/18.Rmd @@ -803,45 +803,3 @@ typeof(exp2) length(exp1) exp1[[1]] ``` - -## Meeting Videos - -### Cohort 1 - -`r knitr::include_url("https://www.youtube.com/embed/2NixH3QAerQ")` - -### Cohort 2 - -`r knitr::include_url("https://www.youtube.com/embed/mYOUgzoRcjI")` - -### Cohort 3 - -`r knitr::include_url("https://www.youtube.com/embed/5RLCRFli6QI")` - -`r knitr::include_url("https://www.youtube.com/embed/F8df5PMNC8Y")` - -### Cohort 4 - -`r knitr::include_url("https://www.youtube.com/embed/tSVBlAP5DIY")` - -### Cohort 5 - -`r knitr::include_url("https://www.youtube.com/embed/Jc_R4yFsYeE")` - -### Cohort 6 - -`r knitr::include_url("https://www.youtube.com/embed/K8w28ee3CR8")` - -### Cohort 7 - -`r knitr::include_url("https://www.youtube.com/embed/XPs-TI4BYjk")` - -`r knitr::include_url("https://www.youtube.com/embed/8LPw_VTBsmQ")` - -<details> -<summary>Meeting chat log</summary> -``` -00:50:48 Stone: https://www.r-bloggers.com/2018/10/quasiquotation-in-r-via-bquote/ -00:58:26 iPhone: See ya next week! -``` -</details> diff --git a/slides/19.Rmd b/slides/19.Rmd @@ -370,63 +370,3 @@ vars <- data.frame(x = c("hp", "hp"), y = c("mpg", "cyl")) lm_df(vars, data = mtcars) ``` - - - - -## Meeting Videos - -### Cohort 1 - -`r knitr::include_url("https://www.youtube.com/embed/tbByqsRRvdE")` - -### Cohort 2 - -`r knitr::include_url("https://www.youtube.com/embed/IXE21pR8EJ0")` - -### Cohort 3 - -`r knitr::include_url("https://www.youtube.com/embed/gxSpz6IePLg")` - -### Cohort 4 - -`r knitr::include_url("https://www.youtube.com/embed/aniKrZrr4aU")` - -### Cohort 5 - -`r knitr::include_url("https://www.youtube.com/embed/klcpEb5ZBSM")` - -### Cohort 6 - -`r knitr::include_url("https://www.youtube.com/embed/OBodjc80y-E")` - -<details> -<summary> Meeting chat log </summary> - -``` -01:02:07 Trevin: Yeah, that was a great workshop -01:02:18 Trevin: Glad they posted the resources online -01:06:39 Trevin: Thank you! -``` -</details> - -### Cohort 7 - -`r knitr::include_url("https://www.youtube.com/embed/8LPw_VTBsmQ")` - -<details> -<summary>Meeting chat log</summary> -``` -00:50:48 Stone: https://www.r-bloggers.com/2018/10/quasiquotation-in-r-via-bquote/ -00:58:26 iPhone: See ya next week! -``` -</details> - -`r knitr::include_url("https://www.youtube.com/embed/g77Jfl_xrXM")` - -<details> -<summary>Meeting chat log</summary> -``` -00:55:22 collinberke: https://rlang.r-lib.org/reference/embrace-operator.html?q=enquo#under-the-hood -``` -</details> diff --git a/slides/20.Rmd b/slides/20.Rmd @@ -369,50 +369,3 @@ Just remember: ## Base evaluation Check 20.6 in the book! - -## Meeting Videos - -### Cohort 1 - -`r knitr::include_url("https://www.youtube.com/embed/4En_Ypvtjqw")` - -### Cohort 2 - -`r knitr::include_url("https://www.youtube.com/embed/ewHAlVwCGtY")` - -### Cohort 3 - -`r knitr::include_url("https://www.youtube.com/embed/0K1vyiV8_qo")` - -### Cohort 4 - -`r knitr::include_url("https://www.youtube.com/embed/kfwjJDuyN8U")` - -### Cohort 5 - -`r knitr::include_url("https://www.youtube.com/embed/WzfD9GK6nCI")` - -### Cohort 6 - -`r knitr::include_url("https://www.youtube.com/embed/8FT2BA18Ghg")` - -<details> -<summary> Meeting chat log </summary> - -``` -01:00:42 Trevin: They just want to help you present that’s all -``` -</details> - -### Cohort 7 - -`r knitr::include_url("https://www.youtube.com/embed/g77Jfl_xrXM")` - -<details> -<summary>Meeting chat log</summary> -``` -00:55:22 collinberke: https://rlang.r-lib.org/reference/embrace-operator.html?q=enquo#under-the-hood -``` -</details> - -`r knitr::include_url("https://www.youtube.com/embed/wPLrafScijE")` diff --git a/slides/21.Rmd b/slides/21.Rmd @@ -679,47 +679,3 @@ to_math(sin(pi) + f(a)) ## Finishing the Example (TO DO) - - - - - -## Meeting Videos - -### Cohort 1 - -`r knitr::include_url("https://www.youtube.com/embed/fixyitpXrwY")` - -`r knitr::include_url("https://www.youtube.com/embed/h3RNPyhIjas")` - -### Cohort 2 - -`r knitr::include_url("https://www.youtube.com/embed/pj0hTW1CtbI")` - -### Cohort 3 - -(no video) - -### Cohort 4 - -`r knitr::include_url("https://www.youtube.com/embed/0TclsXa085Y")` - -### Cohort 5 - -`r knitr::include_url("https://www.youtube.com/embed/v_dkrIEdmKE")` - -### Cohort 6 - -`r knitr::include_url("https://www.youtube.com/embed/_-uwFjO5CyM")` - -<details> -<summary> Meeting chat log </summary> - -``` -00:30:16 Arthur Shaw: https://www.w3schools.com/html/html_entities.asp -00:32:29 Arthur Shaw: Beta symbol in HTML: Β -00:56:55 Arthur Shaw: https://dbplyr.tidyverse.org/articles/translation-function.html -00:57:48 Arthur Shaw: https://dtplyr.tidyverse.org/index.html -00:58:43 Arthur Shaw: https://dtplyr.tidyverse.org/articles/translation.html -``` -</details> diff --git a/slides/22.Rmd b/slides/22.Rmd @@ -182,54 +182,3 @@ There are other ways for a function to fail apart from throwing an error: - Jenny Bryan and Jim Hester's book: ["What They Forgot to Teach You About R"](https://rstats.wtf/debugging-r) Ch12 - Hadley's video on a [minimal reprex for a shiny app](https://www.youtube.com/watch?v=9w8ANOAlWy4) - -## Meeting Videos {-} - -### Cohort 1 {-} - -`r knitr::include_url("https://www.youtube.com/embed/ROMefwMuqXU")` - -### Cohort 2 {-} - -`r knitr::include_url("https://www.youtube.com/embed/N43p4txxxlY")` - -### Cohort 3 {-} - -`r knitr::include_url("https://www.youtube.com/embed/Jdb00nepeWQ")` - -### Cohort 4 {-} - -`r knitr::include_url("https://www.youtube.com/embed/tOql7ZD6P58")` - -### Cohort 5 {-} - -`r knitr::include_url("https://www.youtube.com/embed/EqsSWUQ6ZW0")` - -### Cohort 6 {-} - -`r knitr::include_url("https://www.youtube.com/embed/YvT-knh1baA")` - -<details> -<summary> Meeting chat log </summary> - -``` -00:12:43 Trevin Flickinger: Hello everyone! -00:13:03 Oluwafemi Oyedele: Hello, Good evening!!! -00:22:10 Trevin Flickinger: My connection is slow so I’ll be in the chat -00:32:45 Trevin Flickinger: If you start with “continue” it should error out after the first call -00:56:18 Trevin Flickinger: Sys.frame(-1) shows it goes back one frame -00:59:55 fg: thanks -01:03:11 Arthur Shaw: Anyone else lose the presentation? -01:03:20 fg: yes -01:03:22 fg: ? -01:04:26 Trevin Flickinger: I thought that was my internet connection -01:05:07 Trevin Flickinger: Thank you! -01:08:42 Trevin Flickinger: I need to use debug( ) more as well -01:10:15 Trevin Flickinger: 21st works for me -01:10:29 Oluwafemi Oyedele: Same here!!! -``` -</details> - -### Cohort 7 {-} - -`r knitr::include_url("https://www.youtube.com/embed/T_uFW9xXoJk")` diff --git a/slides/23.Rmd b/slides/23.Rmd @@ -152,39 +152,3 @@ plot(lb) - [profvis package](https://rstudio.github.io/profvis/) - [bench package](https://cran.r-project.org/web/packages/bench/bench.pdf) - [solutions](https://advanced-r-solutions.rbind.io/measuring-performance.html) - - - - - - - -## Meeting Videos - -### Cohort 1 - -(no video) - -### Cohort 2 - -`r knitr::include_url("https://www.youtube.com/embed/_zeLDufwTwY")` - -### Cohort 3 - -`r knitr::include_url("https://www.youtube.com/embed/Jdb00nepeWQ")` - -### Cohort 4 - -`r knitr::include_url("https://www.youtube.com/embed/sCso4FAF1DE")` - -### Cohort 5 - -`r knitr::include_url("https://www.youtube.com/embed/pOaiDK7J7EE")` - -### Cohort 6 - -`r knitr::include_url("https://www.youtube.com/embed/d_pzz_AsoRQ")` - -### Cohort 7 - -`r knitr::include_url("https://www.youtube.com/embed/4hngR1c9oP4")` diff --git a/slides/24.Rmd b/slides/24.Rmd @@ -242,51 +242,3 @@ stopifnot(all.equal(t1, t3)) or the Pragmatic Programmer * Read more R code. StackOverflow, R Mailing List, DSLC, GitHub, etc. - -## Meeting Videos - -### Cohort 1 - -(no video) - -### Cohort 2 - -`r knitr::include_url("https://www.youtube.com/embed/fSdAqlkeq6I")` - -### Cohort 3 - -`r knitr::include_url("https://www.youtube.com/embed/yCkvUcT7wW8")` - -### Cohort 4 - -`r knitr::include_url("https://www.youtube.com/embed/LCaqvuv3JNg")` - -### Cohort 5 - -`r knitr::include_url("https://www.youtube.com/embed/pOaiDK7J7EE")` - -### Cohort 6 - -`r knitr::include_url("https://www.youtube.com/embed/UaXimKd3vg8")` - -<details> -<summary> Meeting chat log </summary> - -``` -00:24:42 Arthur Shaw: I wonder if there's a task view for R Universe: https://r-universe.dev/search/ -01:01:13 Arthur Shaw: https://www.alexejgossmann.com/benchmarking_r/ -01:04:34 Trevin: I agree that the chapter is a good jumping off point. Gonna have to dig into some of the listed resources 😄 -``` -</details> - -### Cohort 7 - -`r knitr::include_url("https://www.youtube.com/embed/rOkrHvN8Uqg")` - -<details> - -<summary>Meeting chat log</summary> -``` -00:23:48 Ron Legere: https://www.mathworks.com/help/matlab/matlab_prog/vectorization.html -``` -</details> diff --git a/slides/25.Rmd b/slides/25.Rmd @@ -645,67 +645,3 @@ bench::mark( ## Op Success!  - - -## Meeting Videos - -### Cohort 1 - -`r knitr::include_url("https://www.youtube.com/embed/2JDeacWl1DM")` - -`r knitr::include_url("https://www.youtube.com/embed/sLWCelHpcqc")` - -### Cohort 2 - -`r knitr::include_url("https://www.youtube.com/embed/rQwOosOJpaY")` - -### Cohort 3 - -`r knitr::include_url("https://www.youtube.com/embed/ZWdIeR1jK9Q")` - -### Cohort 4 - -`r knitr::include_url("https://www.youtube.com/embed/_K8DKF3Fzes")` - -### Cohort 5 - -`r knitr::include_url("https://www.youtube.com/embed/nske4iqsgh0")` - -### Cohort 6 - -`r knitr::include_url("https://www.youtube.com/embed/hyVK08jXiYw")` - -<details> - -<summary>Meeting chat log</summary> -``` -00:10:13 Arthur Shaw: Did things freeze for anyone else? -00:55:40 Federica Gazzelloni: https://en.cppreference.com/w/cpp/container -00:57:44 Federica Gazzelloni: https://dirk.eddelbuettel.com/blog/2011/07/14/ -01:07:33 Trevin: I don’t have experience -01:07:54 Oluwafemi Oyedele: Same here!!! -01:11:57 Arthur Shaw: Does anyone know any packages that use C++? The one that comes to mind for me is haven, which uses a C++ library -01:12:30 Trevin: When I was looking, one that stood out to me was rstan -01:13:02 Arthur Shaw: Reacted to "When I was looking, ..." with 👍 -``` -</details> - -### Cohort 7 - -`r knitr::include_url("https://www.youtube.com/embed/Luu7JsixQgY")` - -<details> - -<summary>Meeting chat log</summary> -``` -00:43:02 Gus Lipkin: I think I found the definition for `mean` - -An R call goes to *a which then calls the C function *b - -*a: https://github.com/wch/r-source/blob/trunk/src/library/base/R/mean.R -*b: https://github.com/wch/r-source/blob/trunk/src/library/stats/src/cov.c#L207 - -It looks like the second pass only happens if `R_FINITE(mean_from_first_pass)` which tries to call `isfinite` from C++ and if it’s not there, it’ll make sure it is a number and is not positive or negative infinity. -00:49:55 Gus Lipkin: I feel bad for dropping in on the last chapter and getting Collin’s thanks 😅 I wish I’d joined sooner. -``` -</details> diff --git a/videos/02/02.qmd b/videos/02/02.qmd @@ -0,0 +1,5 @@ +--- +title: Cohort 2 +--- + +{{< video https://www.youtube.com/embed/pCiNj2JRK50 >}} diff --git a/videos/02/03.qmd b/videos/02/03.qmd @@ -0,0 +1,5 @@ +--- +title: Cohort 3 +--- + +{{< video https://www.youtube.com/embed/-bEXdOoxO_E >}} diff --git a/videos/02/04.qmd b/videos/02/04.qmd @@ -0,0 +1,5 @@ +--- +title: Cohort 4 +--- + +{{< video https://www.youtube.com/embed/gcVU_F-L6zY >}} diff --git a/videos/02/05.qmd b/videos/02/05.qmd @@ -0,0 +1,5 @@ +--- +title: Cohort 5 +--- + +{{< video https://www.youtube.com/embed/aqcvKox9V0Q >}} diff --git a/videos/02/06.qmd b/videos/02/06.qmd @@ -0,0 +1,20 @@ +--- +title: Cohort 6 +--- + +{{< video https://www.youtube.com/embed/O4Oo_qO7SIY >}} + +<details> +<summary> Meeting chat log </summary> + +``` +00:16:57 Federica Gazzelloni: cohort 2 video: https://www.youtube.com/watch?v=pCiNj2JRK50 +00:18:39 Federica Gazzelloni: cohort 2 presentation: https://r4ds.github.io/bookclub-Advanced_R/Presentations/Week02/Cohort2_America/Chapter2Slides.html#1 +00:40:24 Arthur Shaw: Just the opposite, Ryan. Very clear presentation! +00:51:54 Trevin: parquet? +00:53:00 Arthur Shaw: We may all be right. {arrow} looks to deal with feather and parquet files: https://arrow.apache.org/docs/r/ +01:00:04 Arthur Shaw: Some questions for future meetings. (1) I find Ryan's use of slides hugely effective in conveying information. Would it be OK if future sessions (optionally) used slides? If so, should/could we commit slides to some folder on the repo? (2) I think reusing the images from Hadley's books really helps understanding and discussion. Is that OK to do? Here I'm thinking about copyright concerns. (If possible, I would rather not redraw variants of Hadley's images.) +01:01:35 Federica Gazzelloni: It's all ok, you can use past presentation, you don't need to push them to the repo, you can use the images from the book +01:07:19 Federica Gazzelloni: Can I use: gc(reset = TRUE) safely? +``` +</details> diff --git a/videos/02/07.qmd b/videos/02/07.qmd @@ -0,0 +1,24 @@ +--- +title: Cohort 7 +--- + +{{< video https://www.youtube.com/embed/kpAUoGO6elE >}} + +<details> + +<summary>Meeting chat log</summary> +``` +00:09:40 Ryan Honomichl: https://drdoane.com/three-deep-truths-about-r/ +00:12:51 Robert Hilly: Be right back +00:36:12 Ryan Honomichl: brb +00:41:18 Ron: I tried mapply and also got different answers +00:41:44 collinberke: Interesting, would like to know more what is going on. +00:49:57 Robert Hilly: simple_map <- function(x, f, ...) { + out <- vector("list", length(x)) + for (i in seq_along(x)) { + out[[i]] <- f(x[[i]], ...) + } + out +} +``` +</details> diff --git a/videos/03/01.qmd b/videos/03/01.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 1 +--- +{{< video https://www.youtube.com/embed/pQ-xDAPEQaw >}} diff --git a/videos/03/02.qmd b/videos/03/02.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 2 +--- +{{< video https://www.youtube.com/embed/CpLM6SdpTFY >}} diff --git a/videos/03/03.qmd b/videos/03/03.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 3 +--- +{{< video https://www.youtube.com/embed/9E4RlbW8vxU >}} diff --git a/videos/03/04.qmd b/videos/03/04.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 4 +--- +{{< video https://www.youtube.com/embed/LCAgxwm5Ydg >}} diff --git a/videos/03/05.qmd b/videos/03/05.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 5 +--- +{{< video https://www.youtube.com/embed/DrVY6DE9ymY >}} diff --git a/videos/03/06.qmd b/videos/03/06.qmd @@ -0,0 +1,24 @@ +--- +title: Cohort 6 +--- +{{< video https://www.youtube.com/embed/mmcnkIjANps >}} +<details> +<summary>Meeting chat log</summary> +``` +00:10:18 Oluwafemi Oyedele: Hi, good evening +00:23:31 Federica Gazzelloni: Hi Kiante! +00:24:21 Federica Gazzelloni: Thanks Arthur +00:25:46 Trevin: Welcome Matt! +00:26:02 Matt Dupree: hello! thank you! +00:30:34 Federica Gazzelloni: Hello Matt! +00:30:46 Matt Dupree: hello! +00:38:24 Ryan Metcalf: `rlang::cpl()` = “complex”. For example `0+1i` +00:55:37 Trevin: > two <- c(1,2,3) +> names(two) <- c("one", "two") +> two + one two <NA> + 1 2 3 +00:57:25 Ryan Metcalf: Excellent Trevin. You beat me to the output! Assuming we didn't supply the string, `NA` is entered instead. +01:08:50 Ryan Metcalf: Without further research, this is the "Unix Epoch”. However, varying operating systems use different Epochs. +``` +</details> diff --git a/videos/03/07.qmd b/videos/03/07.qmd @@ -0,0 +1,11 @@ +--- +title: Cohort 7 +--- +{{< video https://www.youtube.com/embed/QcdByYHo1ms >}} +<details> +<summary>Meeting chat log</summary> +``` +00:54:07 Ron: https://www.tidyverse.org/blog/2021/03/clock-0-1-0/ +01:14:39 Robert Hilly: https://www.amazon.com/Effective-Pandas-Patterns-Manipulation-Treading/dp/B09MYXXSFM +``` +</details> diff --git a/videos/04/01.qmd b/videos/04/01.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 1 +--- +{{< video https://www.youtube.com/embed/eLMpCc0t1cg >}} diff --git a/videos/04/02.qmd b/videos/04/02.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 2 +--- +{{< video https://www.youtube.com/embed/Mhq-TX4eA64 >}} diff --git a/videos/04/03.qmd b/videos/04/03.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 3 +--- +{{< video https://www.youtube.com/embed/CvvXkXiF3Ig >}} diff --git a/videos/04/04.qmd b/videos/04/04.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 4 +--- +{{< video https://www.youtube.com/embed/Hxghhpe9fYs >}} diff --git a/videos/04/05.qmd b/videos/04/05.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 5 +--- +{{< video https://www.youtube.com/embed/qtUgKhw39Yo >}} diff --git a/videos/04/06.qmd b/videos/04/06.qmd @@ -0,0 +1,14 @@ +--- +title: Cohort 6 +--- +{{< video https://www.youtube.com/embed/-WjBA6yqW0Q >}} +<details> +<summary> Meeting chat log </summary> +``` +00:36:02 Arthur Shaw: TIL that the subset operator has parameters. Thanks, Trevin! +00:38:55 Vaibhav Janve: its interesting that carriage "a" has two set of wheels instread of 4. I wonder that choice is because its atomic. +00:40:44 Arthur Shaw: @Vaibhav, because the load is lighter, the carriage needs fewer axles? ;) I agree: it's a confusing graphical choice. +00:41:11 Vaibhav Janve: lol +01:05:53 Vaibhav Janve: Thank you Trevin! +``` +</details> diff --git a/videos/04/07.qmd b/videos/04/07.qmd @@ -0,0 +1,17 @@ +--- +title: Cohort 7 +--- +{{< video https://www.youtube.com/embed/LBU-Ew_IM7A >}} +{{< video https://www.youtube.com/embed/W9CoQ15NlOc >}} +<details> +<summary>Meeting chat log</summary> +``` +00:40:18 Ryan Honomichl: What type of vector does each of the following calls to ifelse() return? +* "ifelse returns a value with the same shape as test which is filled with elements selected from either yes or no depending on whether the element of test is TRUE or FALSE." +00:42:11 Ryan Honomichl: "I recommend assigning the results of an if statement only when the entire expression fits on one line; otherwise it tends to be hard to read" +00:42:46 Ryan Honomichl: * When you use the single argument form without an `else` statement, `if` invisibly returns NULL if the condition is FALSE. +- Since functions like c() and paste() drop NULL inputs, this allows for a compact expression of certain idioms +00:54:15 collinberke: https://docs.google.com/spreadsheets/d/1ScrbEw_-vB9DruaJhjtVY8HLQmuNPqyWeOOjmG6OY1M/edit?usp=sharing +00:58:46 collinberke: https://www.youtube.com/@safe4democracy/videos +``` +</details> diff --git a/videos/05/01.qmd b/videos/05/01.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 1 +--- +{{< video https://www.youtube.com/embed/96eY6YS_3hU >}} diff --git a/videos/05/02.qmd b/videos/05/02.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 2 +--- +{{< video https://www.youtube.com/embed/x5I_uHnMxIk >}} diff --git a/videos/05/03.qmd b/videos/05/03.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 3 +--- +{{< video https://www.youtube.com/embed/u6UMGWDuxDE >}} diff --git a/videos/05/04.qmd b/videos/05/04.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 4 +--- +{{< video https://www.youtube.com/embed/G4YOvwsSw2Q >}} diff --git a/videos/05/05.qmd b/videos/05/05.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 5 +--- +{{< video https://www.youtube.com/embed/AZwJjsl8xiI >}} diff --git a/videos/05/06.qmd b/videos/05/06.qmd @@ -0,0 +1,19 @@ +--- +title: Cohort 6 +--- +{{< video https://www.youtube.com/embed/wg2QZ3rMIqM >}} +<details> +<summary> Meeting chat log </summary> +``` +00:16:34 Federica Gazzelloni: https://github.com/r4ds/bookclub-Advanced_R +00:22:28 Federica Gazzelloni: https://stackoverflow.com/questions/50646133/dplyr-if-else-vs-base-r-ifelse +00:26:20 Trevin: case_when() is great, makes it easy to read +00:54:01 Trevin: out[I, ] +00:54:14 Trevin: out[i, ] +00:55:03 Trevin: I think you have to specify number of rows and columns before.. +00:55:30 Trevin: iterations = 10 + variables = 2 + output <- matrix(ncol=variables, nrow=iterations) +00:55:43 Trevin: https://stackoverflow.com/questions/13442461/populating-a-data-frame-in-r-in-a-loop +``` +</details> diff --git a/videos/05/07.qmd b/videos/05/07.qmd @@ -0,0 +1,16 @@ +--- +title: Cohort 7 +--- +{{< video https://www.youtube.com/embed/W9CoQ15NlOc >}} +<details> +<summary>Meeting chat log</summary> +``` +00:40:18 Ryan Honomichl: What type of vector does each of the following calls to ifelse() return? +* "ifelse returns a value with the same shape as test which is filled with elements selected from either yes or no depending on whether the element of test is TRUE or FALSE." +00:42:11 Ryan Honomichl: "I recommend assigning the results of an if statement only when the entire expression fits on one line; otherwise it tends to be hard to read" +00:42:46 Ryan Honomichl: * When you use the single argument form without an `else` statement, `if` invisibly returns NULL if the condition is FALSE. +- Since functions like c() and paste() drop NULL inputs, this allows for a compact expression of certain idioms +00:54:15 collinberke: https://docs.google.com/spreadsheets/d/1ScrbEw_-vB9DruaJhjtVY8HLQmuNPqyWeOOjmG6OY1M/edit?usp=sharing +00:58:46 collinberke: https://www.youtube.com/@safe4democracy/videos +``` +</details> diff --git a/videos/06/01.qmd b/videos/06/01.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 1 +--- +{{< video https://www.youtube.com/embed/UwzGhMndWzs >}} diff --git a/videos/06/02.qmd b/videos/06/02.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 2 +--- +{{< video https://www.youtube.com/embed/51PMEM4Efb8 >}} diff --git a/videos/06/03.qmd b/videos/06/03.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 3 +--- +{{< video https://www.youtube.com/embed/Vwuo-e_Ir0s >}} diff --git a/videos/06/04.qmd b/videos/06/04.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 4 +--- +{{< video https://www.youtube.com/embed/lg5rzOU6lsg >}} diff --git a/videos/06/05.qmd b/videos/06/05.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 5 +--- +{{< video https://www.youtube.com/embed/q8K0Jl5hiV0 >}} diff --git a/videos/06/06.qmd b/videos/06/06.qmd @@ -0,0 +1,54 @@ +--- +title: Cohort 6 +--- +{{< video https://www.youtube.com/embed/BPd6-G9e32I >}} +{{< video https://www.youtube.com/embed/GCDXXkBQrGk >}} +{{< video https://www.youtube.com/embed/NaiQa_u-j1k >}} +<details> +<summary> Meeting chat log </summary> +``` +00:01:11 Oluwafemi Oyedele: Hi, Good evening +00:01:22 Federica Gazzelloni: Hello! +00:43:19 Federica Gazzelloni: https://r4ds.github.io/bookclub-Advanced_R/QandA/docs/welcome.html +00:52:48 Priyanka: sounds good actually +00:52:59 Federica Gazzelloni: 👍🏻 +``` +</details> +<details> +<summary> Meeting chat log </summary> +``` +00:09:30 Oluwafemi Oyedele: Hi, Good evening +00:10:41 Federica Gazzelloni: Hi +00:14:40 Federica Gazzelloni: that's great! +00:54:24 Trevin: Also, sorry if you are repeating 🙂 +00:54:52 Arthur Shaw: @ryan, thank you so much for the awesome synthesis! Could you share your reference list? I'd love to dive more deeply into the material you presented. +00:57:02 Ryan Metcalf: https://cran.r-project.org/doc/manuals/r-release/R-lang.pdf +00:59:32 Trevin: https://github.com/COHHIO/RmData +01:01:48 Ryan Metcalf: https://mastering-shiny.org/ +01:02:02 Ryan Metcalf: https://engineering-shiny.org/ +01:02:15 Arthur Shaw: @trevin, if you get bored with beepr, move to BRRR ;) +01:02:16 Arthur Shaw: https://github.com/brooke-watson/BRRR +01:09:27 Ryan Metcalf: This is amazing Trevin! I'll take a closer look. Is it ok to reach out to you with any questions? +01:09:43 Trevin: Yeah, feel free to reach out +``` +</details> +<details> +<summary> Meeting chat log </summary> +``` +00:05:34 Trevin: I didn't catch that +00:06:02 priyanka gagneja: i won't be presenting I said .. so you two have the stage +00:08:39 Federica Gazzelloni: no worries +00:08:46 Federica Gazzelloni: next time you do it +00:08:56 Federica Gazzelloni: did you sign up? +00:09:45 Trevin: Discord is free: https://discord.gg/rstudioconf2022 +00:10:04 Trevin: Free stream link: https://www.rstudio.com/conference/stream +00:24:32 Arthur Shaw: Maybe silly question: is the magrittr pipe an infix function? +00:32:15 Trevin: https://colinfay.me/playing-r-infix-functions/ +00:33:23 Arthur Shaw: Maybe another example of an infix function: lubridate's `%within%` +00:33:47 Trevin: That's a good one too ^ +00:33:55 priyanka gagneja: yes within would be good. +00:40:13 Arthur Shaw: no +00:49:50 Arthur Shaw: Sorry for dropping in and out. My WiFi router is having issues today--maybe is failing. +01:08:59 Trevin: Looking forward to it 🙂 +``` +</details> diff --git a/videos/06/07.qmd b/videos/06/07.qmd @@ -0,0 +1,21 @@ +--- +title: Cohort 7 +--- +{{< video https://www.youtube.com/embed/tz2z9l41IhU >}} +{{< video https://www.youtube.com/embed/AbdcI57vbcg >}} +<details> +<summary>Meeting chat log</summary> +``` +00:31:54 Ronald Legere: https://en.wikipedia.org/wiki/First-class_function +00:42:55 Ronald Legere: https://en.wikipedia.org/wiki/Immediately_invoked_function_expression +``` +</details> +<details> +<summary>Meeting chat log</summary> +``` +00:54:02 Ron: Book gives this simple example of when you might want to use prefix form of an infix operator: lapply(list(1:3, 4:5), `+`, 3) +00:56:49 collinberke: https://colinfay.me/playing-r-infix-functions/#:~:text=What%20are%20infix%20functions%3F,%2C%20%2B%20%2C%20and%20so%20on. +01:07:36 Ron: x[3] <- 33 +01:07:51 Ron: `[<-`(x,3,value =33) +``` +</details> diff --git a/videos/07/01.qmd b/videos/07/01.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 1 +--- +{{< video https://www.youtube.com/embed/mk7iu1-P8ZU >}} diff --git a/videos/07/02.qmd b/videos/07/02.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 2 +--- +{{< video https://www.youtube.com/embed/syRMRYKN30k >}} diff --git a/videos/07/03.qmd b/videos/07/03.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 3 +--- +{{< video https://www.youtube.com/embed/fW7Di01gLhw >}} diff --git a/videos/07/04.qmd b/videos/07/04.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 4 +--- +{{< video https://www.youtube.com/embed/Aw_Q7PMYJkA >}} diff --git a/videos/07/05.qmd b/videos/07/05.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 5 +--- +{{< video https://www.youtube.com/embed/tuafimbMyKk >}} diff --git a/videos/07/06.qmd b/videos/07/06.qmd @@ -0,0 +1,32 @@ +--- +title: Cohort 6 +--- +{{< video https://www.youtube.com/embed/buUaaOu89EQ >}} +<details> +<summary> Meeting chat log </summary> +``` +00:08:41 Arthur Shaw: Hello, everyone! +00:21:31 Federica Gazzelloni: ?walk: Apply a function to each element of a list or atomic vector +00:23:15 Federica Gazzelloni: ?caller_env: Get properties of the current or caller frame +00:24:56 Trevin: purrr::walk(x, function(x, ce, ace = rlang::caller_env()) { + .ce <- rlang::caller_env() + message("Internal: ") + print(.ce) + message("Argument: ") + print(ace) + message("External: ") + print(ce) + message("Internal: ",paste0(ls(.ce), collapse = "\n")) + message("Argument: ",paste0(ls(ace), collapse = "\n")) + message("External: ",paste0(ls(ce), collapse = "\n")) +}, ce = rlang::caller_env()) +00:29:39 Federica Gazzelloni: ??iwalk: Apply a function to each element of a vector, and its index +00:35:30 Arthur Shaw: https://magrittr.tidyverse.org/reference/tee.html +00:36:05 Federica Gazzelloni: ?`%T>%` +00:46:59 Trevin: ?eval +01:06:03 Federica Gazzelloni: https://cran.r-project.org/web/packages/withr/index.html +01:09:21 Federica Gazzelloni: https://github.com/r-lib/withr +01:10:38 Trevin: I'm okay if we meet next week +01:10:53 Oluwafemi Oyedele: I am ok with next week +``` +</details> diff --git a/videos/07/07.qmd b/videos/07/07.qmd @@ -0,0 +1,19 @@ +--- +title: Cohort 7 +--- +{{< video https://www.youtube.com/embed/PUXrijnsWy0 >}} +{{< video https://www.youtube.com/embed/6xECnY4ro48 >}} +<details> +<summary>Meeting chat log</summary> +``` +00:06:49 Ryan Honomichl: https://r4ds.github.io/bookclub-Advanced_R/QandA/docs/environments.html +``` +</details> +<details> +<summary>Meeting chat log</summary> +``` +00:14:44 collinberke: https://ivelasq.rbind.io/blog/macos-rig/index.html +00:21:10 collinberke: https://github.com/tidyverse/dplyr/blob/main/NAMESPACE +01:00:21 collinberke: https://r4ds.hadley.nz/iteration.html +``` +</details> diff --git a/videos/08/01.qmd b/videos/08/01.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 1 +--- +{{< video https://www.youtube.com/embed/mwiNe083DLU >}} diff --git a/videos/08/02.qmd b/videos/08/02.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 2 +--- +{{< video https://www.youtube.com/embed/ZFUr7YRSu2o >}} diff --git a/videos/08/03.qmd b/videos/08/03.qmd @@ -0,0 +1,5 @@ +--- +title: Cohort 3 +--- +{{< video https://www.youtube.com/embed/UZhrsVz6wi0 >}} +{{< video https://www.youtube.com/embed/Wt7p71_BuYY >}} diff --git a/videos/08/04.qmd b/videos/08/04.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 4 +--- +{{< video https://www.youtube.com/embed/WinIo5mrUZo >}} diff --git a/videos/08/05.qmd b/videos/08/05.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 5 +--- +{{< video https://www.youtube.com/embed/VFs-2sl5C70 >}} diff --git a/videos/08/06.qmd b/videos/08/06.qmd @@ -0,0 +1,12 @@ +--- +title: Cohort 6 +--- +{{< video https://www.youtube.com/embed/VwmrbPUQY1k >}} +<details> +<summary> Meeting chat log </summary> +``` +00:19:16 Trevin: https://style.tidyverse.org/error-messages.html +00:20:14 Trevin: More on errors in the design guide: https://design.tidyverse.org/ +01:14:27 Federica Gazzelloni: more info here: https://colinfay.me/learn-shiny-production/ +``` +</details> diff --git a/videos/08/07.qmd b/videos/08/07.qmd @@ -0,0 +1,16 @@ +--- +title: Cohort 7 +--- +{{< video https://www.youtube.com/embed/t1N6XdidvNo >}} +<details> +<summary>Meeting chat log</summary> +``` +00:34:09 Ron: Someone did: https://cran.r-project.org/web/packages/comprehenr/vignettes/Introduction.html +00:47:58 collinberke: https://purrr.tidyverse.org/reference/safely.html +00:48:24 Ron: it's a function operator ! +00:49:37 Ron: \(x) length(unique(x) is not too verbose though +00:49:39 Ron: ;) +01:06:50 collinberke: https://colinfay.me/purrr-mappers/ +01:07:45 collinberke: https://colinfay.me/purrr-web-mining/ +``` +</details> diff --git a/videos/09/01.qmd b/videos/09/01.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 1 +--- +{{< video https://www.youtube.com/embed/o0a6aJ4kCkU >}} diff --git a/videos/09/02.qmd b/videos/09/02.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 2 +--- +{{< video https://www.youtube.com/embed/YrZ13_4vUMw >}} diff --git a/videos/09/03.qmd b/videos/09/03.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 3 +--- +{{< video https://www.youtube.com/embed/DUHXo527mHs >}} diff --git a/videos/09/04.qmd b/videos/09/04.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 4 +--- +{{< video https://www.youtube.com/embed/SpDpmhW62Ns >}} diff --git a/videos/09/05.qmd b/videos/09/05.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 5 +--- +{{< video https://www.youtube.com/embed/tYqFMtmhmiI >}} diff --git a/videos/09/06.qmd b/videos/09/06.qmd @@ -0,0 +1,16 @@ +--- +title: Cohort 6 +--- +{{< video https://www.youtube.com/embed/HmDlvnp6uNQ >}} +<details> +<summary> Meeting chat log </summary> +``` +00:15:49 Matt Dupree: did anyone else lose audio? +00:15:59 Federica Gazzelloni: not me +00:16:02 Arthur Shaw: Not me either +00:16:04 Trevin: okay for me +00:16:27 Matt Dupree: gonna try rejoining +00:43:14 Matt Dupree: oh i didn't know they invisibly returned .x! That's useful! +00:48:29 Arthur Shaw: Very cool trick ! +``` +</details> diff --git a/videos/09/07.qmd b/videos/09/07.qmd @@ -0,0 +1,17 @@ +--- +title: Cohort 7 +--- +{{< video https://www.youtube.com/embed/t1N6XdidvNo >}} +{{< video https://www.youtube.com/embed/6gY3KZWYC00 >}} +<details> +<summary>Meeting chat log</summary> +``` +00:34:09 Ron: Someone did: https://cran.r-project.org/web/packages/comprehenr/vignettes/Introduction.html +00:47:58 collinberke: https://purrr.tidyverse.org/reference/safely.html +00:48:24 Ron: it's a function operator ! +00:49:37 Ron: \(x) length(unique(x) is not too verbose though +00:49:39 Ron: ;) +01:06:50 collinberke: https://colinfay.me/purrr-mappers/ +01:07:45 collinberke: https://colinfay.me/purrr-web-mining/ +``` +</details> diff --git a/videos/10/01.qmd b/videos/10/01.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 1 +--- +{{< video https://www.youtube.com/embed/enI5Ynq6olI >}} diff --git a/videos/10/02.qmd b/videos/10/02.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 2 +--- +{{< video https://www.youtube.com/embed/U-CoF7MCik0 >}} diff --git a/videos/10/03.qmd b/videos/10/03.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 3 +--- +{{< video https://www.youtube.com/embed/qgn7WTITnNs >}} diff --git a/videos/10/04.qmd b/videos/10/04.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 4 +--- +{{< video https://www.youtube.com/embed/GHp2W4JxVaY >}} diff --git a/videos/10/05.qmd b/videos/10/05.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 5 +--- +{{< video https://www.youtube.com/embed/8TGXjzi0n0o >}} diff --git a/videos/10/06.qmd b/videos/10/06.qmd @@ -0,0 +1,11 @@ +--- +title: Cohort 6 +--- +{{< video https://www.youtube.com/embed/FUoYwYFqT7Q >}} +<details> +<summary> Meeting chat log </summary> +``` +01:02:25 Trevin: I'm good with combining 👍 +01:02:57 Oluwafemi Oyedele: I agree with combining the chapter!!! +``` +</details> diff --git a/videos/10/07.qmd b/videos/10/07.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 7 +--- +{{< video https://www.youtube.com/embed/7GLyO3IntgE >}} diff --git a/videos/11/01.qmd b/videos/11/01.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 1 +--- +{{< video https://www.youtube.com/embed/zzUY03gt_pA >}} diff --git a/videos/11/02.qmd b/videos/11/02.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 2 +--- +{{< video https://www.youtube.com/embed/fD1QJB2pHik >}} diff --git a/videos/11/03.qmd b/videos/11/03.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 3 +--- +{{< video https://www.youtube.com/embed/Re6y5CQzwG4 >}} diff --git a/videos/11/04.qmd b/videos/11/04.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 4 +--- +{{< video https://www.youtube.com/embed/rVooJFdbePs >}} diff --git a/videos/11/05.qmd b/videos/11/05.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 5 +--- +{{< video https://www.youtube.com/embed/XOurCfeJLGc >}} diff --git a/videos/11/06.qmd b/videos/11/06.qmd @@ -0,0 +1,13 @@ +--- +title: Cohort 6 +--- +{{< video https://www.youtube.com/embed/EPs57es2MsE >}} +<details> +<summary> Meeting chat log </summary> +``` +00:01:42 Oluwafemi Oyedele: Hi, Good Evening !!! +00:05:52 Arthur Shaw: @federica, love the hex stickers behind you. All from rstudio::conf? +00:07:12 Arthur Shaw: I tried doing the same. I had a hard time ordering them. I also thought I'd make the stickers into magnets so that I could rearrange them in future. +00:48:34 Oluwafemi Oyedele: Thank you !!! +``` +</details> diff --git a/videos/11/07.qmd b/videos/11/07.qmd @@ -0,0 +1,12 @@ +--- +title: Cohort 7 +--- +{{< video https://www.youtube.com/embed/WDehjcuc7xs >}} +<details> +<summary>Meeting chat log</summary> +``` +00:18:21 collinberke: Jenny Bryan debugging: https://www.youtube.com/watch?v=vgYS-F8opgE +00:31:10 collinberke: https://purrr.tidyverse.org/reference/slowly.html +00:47:43 Robert Hilly: By guys! +``` +</details> diff --git a/videos/12/01.qmd b/videos/12/01.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 1 +--- +{{< video https://www.youtube.com/embed/Fy3JF5Em6qY >}} diff --git a/videos/12/02.qmd b/videos/12/02.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 2 +--- +{{< video https://www.youtube.com/embed/9GkgNC15EAw >}} diff --git a/videos/12/03.qmd b/videos/12/03.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 3 +--- +{{< video https://www.youtube.com/embed/IL6iJhAsZAY >}} diff --git a/videos/12/04.qmd b/videos/12/04.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 4 +--- +{{< video https://www.youtube.com/embed/4la5adcWwKE >}} diff --git a/videos/12/05.qmd b/videos/12/05.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 5 +--- +{{< video https://www.youtube.com/embed/NeHtEGab1Og >}} diff --git a/videos/12/06.qmd b/videos/12/06.qmd @@ -0,0 +1,13 @@ +--- +title: Cohort 6 +--- +{{< video https://www.youtube.com/embed/rfidR7tI_nQ >}} +<details> +<summary> Meeting chat log </summary> +``` +00:35:02 Trevin: sloop (“sail the seas of OOP”) +00:42:40 Ryan Metcalf: Awesome input Trevin! I jumped to the vignette, but didn't see the reference directly. +01:00:01 Trevin: If you're interested there may be a new “R Packages" cohort starting up soon (also a new version of the book coming out soonish as well?) +01:08:23 Oluwafemi Oyedele: Thank you !!! +``` +</details> diff --git a/videos/12/07.qmd b/videos/12/07.qmd @@ -0,0 +1,11 @@ +--- +title: Cohort 7 +--- +{{< video https://www.youtube.com/embed/mOpmvc9h_4M >}} +<details> +<summary>Meeting chat log</summary> +``` +00:35:43 Stone: base::InternalMethods +00:48:04 collinberke: https://cran.r-project.org/doc/manuals/R-exts.html +``` +</details> diff --git a/videos/13/01.qmd b/videos/13/01.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 1 +--- +{{< video https://www.youtube.com/embed/Fy3JF5Em6qY >}} diff --git a/videos/13/02.qmd b/videos/13/02.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 2 +--- +{{< video https://www.youtube.com/embed/9GkgNC15EAw >}} diff --git a/videos/13/03.qmd b/videos/13/03.qmd @@ -0,0 +1,5 @@ +--- +title: Cohort 3 +--- +{{< video https://www.youtube.com/embed/q7lFXSLdC1g >}} +{{< video https://www.youtube.com/embed/2rHS_urTGFg >}} diff --git a/videos/13/04.qmd b/videos/13/04.qmd @@ -0,0 +1,5 @@ +--- +title: Cohort 4 +--- +{{< video https://www.youtube.com/embed/4la5adcWwKE >}} +{{< video https://www.youtube.com/embed/eTCT2O58GYM >}} diff --git a/videos/13/05.qmd b/videos/13/05.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 5 +--- +{{< video https://www.youtube.com/embed/NeHtEGab1Og >}} diff --git a/videos/13/06.qmd b/videos/13/06.qmd @@ -0,0 +1,11 @@ +--- +title: Cohort 6 +--- +{{< video https://www.youtube.com/embed/vzbl2o-MEeQ >}} +<details> +<summary> Meeting chat log </summary> +``` +00:05:30 Oluwafemi Oyedele: Hi everyone, Good Evening !!! +00:09:44 Trevin: I agree Arthur, need to look at that package some more +``` +</details> diff --git a/videos/13/07.qmd b/videos/13/07.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 7 +--- +{{< video https://www.youtube.com/embed/zNLx4q8TCKQ >}} diff --git a/videos/14/01.qmd b/videos/14/01.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 1 +--- +{{< video https://www.youtube.com/embed/hPjaOdprgow >}} diff --git a/videos/14/02.qmd b/videos/14/02.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 2 +--- +{{< video https://www.youtube.com/embed/LVkDJ28XJUE >}} diff --git a/videos/14/03.qmd b/videos/14/03.qmd @@ -0,0 +1,5 @@ +--- +title: Cohort 3 +--- +{{< video https://www.youtube.com/embed/rCjQTbQ22qc >}} +{{< video https://www.youtube.com/embed/ii6xhOzT_HQ >}} diff --git a/videos/14/04.qmd b/videos/14/04.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 4 +--- +{{< video https://www.youtube.com/embed/i_z6pHavhX0 >}} diff --git a/videos/14/05.qmd b/videos/14/05.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 5 +--- +{{< video https://www.youtube.com/embed/NXmlqK2LxWw >}} diff --git a/videos/14/06.qmd b/videos/14/06.qmd @@ -0,0 +1,17 @@ +--- +title: Cohort 6 +--- +{{< video https://www.youtube.com/embed/EuTubeJ1VUw >}} +<details> +<summary> Meeting chat log </summary> +``` +00:11:34 Trevin: https://engineering-shiny.org/common-app-caveats.html?q=R6#using-r6-as-data-storage +00:39:36 Federica Gazzelloni: new R7: https://rconsortium.github.io/OOP-WG/ +00:40:04 Federica Gazzelloni: R7 designed to be a successor to S3 and S4 +00:40:40 Federica Gazzelloni: R6: https://r6.r-lib.org/articles/Introduction.html +00:52:44 Trevin: https://advanced-r-solutions.rbind.io/r6.html#controlling-access +01:00:34 Federica Gazzelloni: interesting: https://r-craft.org/r-news/object-oriented-programming-oop-in-r-with-r6-the-complete-guide/ +01:01:58 Trevin: https://hadley.shinyapps.io/cran-downloads/ +01:02:33 Oluwafemi Oyedele: Thank you !!! +``` +</details> diff --git a/videos/14/07.qmd b/videos/14/07.qmd @@ -0,0 +1,15 @@ +--- +title: Cohort 7 +--- +{{< video https://www.youtube.com/embed/Q4FA0BB_PeY >}} +<details> +<summary>Meeting chat log</summary> +``` +00:06:57 Ron Legere: https://arxiv.org/abs/2303.12712 +00:07:07 Ron Legere: ^^ shows some of the power and limitations +00:39:41 collinberke: https://www.youtube.com/watch?v=3GEFd8rZQgY&list=WL&index=11 +00:49:20 iPhone: Sorry fellas need to jump early. See you next week! +01:05:21 collinberke: https://github.com/r4ds/bookclub-advr +01:09:30 Ron Legere: makeActiveBinding +``` +</details> diff --git a/videos/15/01.qmd b/videos/15/01.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 1 +--- +{{< video https://www.youtube.com/embed/a1jzpWiksyA >}} diff --git a/videos/15/02.qmd b/videos/15/02.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 2 +--- +{{< video https://www.youtube.com/embed/bzo37PHCM1I >}} diff --git a/videos/15/03.qmd b/videos/15/03.qmd @@ -0,0 +1,5 @@ +--- +title: Cohort 3 +--- +{{< video https://www.youtube.com/embed/WWnJ5Cl-aTE >}} +{{< video https://www.youtube.com/embed/_byYFTQHp1Y >}} diff --git a/videos/15/04.qmd b/videos/15/04.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 4 +--- +{{< video https://www.youtube.com/embed/M8Poajmj-HU >}} diff --git a/videos/15/05.qmd b/videos/15/05.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 5 +--- +{{< video https://www.youtube.com/embed/unNfE1fDFEY >}} diff --git a/videos/15/06.qmd b/videos/15/06.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 6 +--- +{{< video https://www.youtube.com/embed/q1-QUFJsbLA >}} diff --git a/videos/15/07.qmd b/videos/15/07.qmd @@ -0,0 +1,10 @@ +--- +title: Cohort 7 +--- +{{< video https://www.youtube.com/embed/puvaJtv9gQw >}} +<details> +<summary>Meeting chat log</summary> +``` +01:09:37 Ron Legere: https://en.wikipedia.org/wiki/Composition_over_inheritance +``` +</details> diff --git a/videos/16/01.qmd b/videos/16/01.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 1 +--- +{{< video https://www.youtube.com/embed/W1uc8HbyZvI >}} diff --git a/videos/16/02.qmd b/videos/16/02.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 2 +--- +{{< video https://www.youtube.com/embed/bzo37PHCM1I >}} diff --git a/videos/16/03.qmd b/videos/16/03.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 3 +--- +{{< video https://www.youtube.com/embed/_byYFTQHp1Y >}} diff --git a/videos/16/04.qmd b/videos/16/04.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 4 +--- +{{< video https://www.youtube.com/embed/vdKDPBcOc6Y >}} diff --git a/videos/16/05.qmd b/videos/16/05.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 5 +--- +{{< video https://www.youtube.com/embed/3EvqtVYTFVM >}} diff --git a/videos/16/06.qmd b/videos/16/06.qmd @@ -0,0 +1,17 @@ +--- +title: Cohort 6 +--- +{{< video https://www.youtube.com/embed/vEButxFIvLw >}} +<details> +<summary> Meeting chat log </summary> +``` +00:11:36 Oluwafemi Oyedele: I have not built anything with them!!! +00:16:31 Arthur Shaw: https://cran.r-project.org/web/packages/sp/index.html +00:19:05 Arthur Shaw: Apparently Hadley asked the same question we're asking several years ago: https://stackoverflow.com/questions/5437238/which-packages-make-good-use-of-s4-objects +00:19:16 Trevin: HA +00:23:54 Trevin: Your audio is breaking up Federica +01:06:58 Federica Gazzelloni: https://mastering-shiny.org/reactive-motivation.html?q=R6#event-driven +01:07:37 Federica Gazzelloni: https://engineering-shiny.org/common-app-caveats.html?q=R6#using-r6-as-data-storage +01:10:52 Oluwafemi Oyedele: Thank you !!! +``` +</details> diff --git a/videos/16/07.qmd b/videos/16/07.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 7 +--- +{{< video https://www.youtube.com/embed/2vxnzqWp-OU >}} diff --git a/videos/17/01.qmd b/videos/17/01.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 1 +--- +{{< video https://www.youtube.com/embed/10gRbFMoh7g >}} diff --git a/videos/17/02.qmd b/videos/17/02.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 2 +--- +{{< video https://www.youtube.com/embed/vKKDU6x3BE8 >}} diff --git a/videos/17/03.qmd b/videos/17/03.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 3 +--- +{{< video https://www.youtube.com/embed/5RLCRFli6QI >}} diff --git a/videos/17/04.qmd b/videos/17/04.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 4 +--- +{{< video https://www.youtube.com/embed/9MDC12hgOWQ >}} diff --git a/videos/17/05.qmd b/videos/17/05.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 5 +--- +{{< video https://www.youtube.com/embed/FSm2_TJmhm0 >}} diff --git a/videos/17/06.qmd b/videos/17/06.qmd @@ -0,0 +1,22 @@ +--- +title: Cohort 6 +--- +{{< video https://www.youtube.com/embed/Ddd_43gw8nA >}} +<details> +<summary> Meeting chat log </summary> +``` +00:32:31 Oluwafemi Oyedele: When should eval_tidy() be used instead of eval()? +base::eval() is sufficient for simple evaluation. Use eval_tidy() when you'd like to support expressions referring to the .data pronoun, or when you need to support quosures. +00:37:08 Trevin (he/him): https://rlang.r-lib.org/reference/topic-defuse.html +00:38:38 Federica Gazzelloni: https://rlang.r-lib.org/reference/eval_tidy.html +00:39:57 Arthur Shaw: Tidy eval book: https://bookdown.dongzhuoer.com/tidyverse/tidyeval/ +00:40:14 Arthur Shaw: Also very useful resource: https://dplyr.tidyverse.org/articles/programming.html +00:40:28 Trevin (he/him): https://ggplot2.tidyverse.org/reference/aes.html +00:40:37 Federica Gazzelloni: https://ggplot2.tidyverse.org/reference/tidyeval.html +00:41:22 Oluwafemi Oyedele: It is Tidyverse design +00:49:13 Federica Gazzelloni: https://www.youtube.com/watch?v=2NixH3QAerQ&list=PL3x6DOfs2NGi9lH7q-phZlPrl6HKXYDbn&index=15 +00:50:13 Federica Gazzelloni: Minute: 17:04 +00:54:03 Federica Gazzelloni: con <- DBI::dbConnect(RSQLite::SQLite(), filename = ":memory:") +00:54:18 Federica Gazzelloni: DBI::dbDisconnect(con) +``` +</details> diff --git a/videos/17/07.qmd b/videos/17/07.qmd @@ -0,0 +1,16 @@ +--- +title: Cohort 7 +--- +{{< video https://www.youtube.com/embed/MX2vNlvIUFo >}} +<details> +<summary>Meeting chat log</summary> +``` +00:11:09 Ryan Honomichl: https://medium.com/analytics-vidhya/become-a-better-r-programmer-with-the-awesome-lobstr-package-af97fcd22602 +00:33:03 Ryan Honomichl: https://rlang.r-lib.org/reference/enquo.html +00:37:30 Ryan Honomichl: https://rlang.r-lib.org/reference/topic-multiple-columns.html +00:41:00 Ryan Honomichl: brb +00:44:37 Ron Legere: https://www.rdocumentation.org/packages/srvyr/versions/1.2.0 +00:44:58 Ron Legere: http://gdfe.co/srvyr/ +00:51:51 Stone: https://cran.r-project.org/web/packages/data.table/vignettes/datatable-intro.html +``` +</details> diff --git a/videos/18/01.qmd b/videos/18/01.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 1 +--- +{{< video https://www.youtube.com/embed/2NixH3QAerQ >}} diff --git a/videos/18/02.qmd b/videos/18/02.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 2 +--- +{{< video https://www.youtube.com/embed/mYOUgzoRcjI >}} diff --git a/videos/18/03.qmd b/videos/18/03.qmd @@ -0,0 +1,5 @@ +--- +title: Cohort 3 +--- +{{< video https://www.youtube.com/embed/5RLCRFli6QI >}} +{{< video https://www.youtube.com/embed/F8df5PMNC8Y >}} diff --git a/videos/18/04.qmd b/videos/18/04.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 4 +--- +{{< video https://www.youtube.com/embed/tSVBlAP5DIY >}} diff --git a/videos/18/05.qmd b/videos/18/05.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 5 +--- +{{< video https://www.youtube.com/embed/Jc_R4yFsYeE >}} diff --git a/videos/18/06.qmd b/videos/18/06.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 6 +--- +{{< video https://www.youtube.com/embed/K8w28ee3CR8 >}} diff --git a/videos/18/07.qmd b/videos/18/07.qmd @@ -0,0 +1,12 @@ +--- +title: Cohort 7 +--- +{{< video https://www.youtube.com/embed/XPs-TI4BYjk >}} +{{< video https://www.youtube.com/embed/8LPw_VTBsmQ >}} +<details> +<summary>Meeting chat log</summary> +``` +00:50:48 Stone: https://www.r-bloggers.com/2018/10/quasiquotation-in-r-via-bquote/ +00:58:26 iPhone: See ya next week! +``` +</details> diff --git a/videos/19/01.qmd b/videos/19/01.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 1 +--- +{{< video https://www.youtube.com/embed/tbByqsRRvdE >}} diff --git a/videos/19/02.qmd b/videos/19/02.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 2 +--- +{{< video https://www.youtube.com/embed/IXE21pR8EJ0 >}} diff --git a/videos/19/03.qmd b/videos/19/03.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 3 +--- +{{< video https://www.youtube.com/embed/gxSpz6IePLg >}} diff --git a/videos/19/04.qmd b/videos/19/04.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 4 +--- +{{< video https://www.youtube.com/embed/aniKrZrr4aU >}} diff --git a/videos/19/05.qmd b/videos/19/05.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 5 +--- +{{< video https://www.youtube.com/embed/klcpEb5ZBSM >}} diff --git a/videos/19/06.qmd b/videos/19/06.qmd @@ -0,0 +1,12 @@ +--- +title: Cohort 6 +--- +{{< video https://www.youtube.com/embed/OBodjc80y-E >}} +<details> +<summary> Meeting chat log </summary> +``` +01:02:07 Trevin: Yeah, that was a great workshop +01:02:18 Trevin: Glad they posted the resources online +01:06:39 Trevin: Thank you! +``` +</details> diff --git a/videos/19/07.qmd b/videos/19/07.qmd @@ -0,0 +1,18 @@ +--- +title: Cohort 7 +--- +{{< video https://www.youtube.com/embed/8LPw_VTBsmQ >}} +{{< video https://www.youtube.com/embed/g77Jfl_xrXM >}} +<details> +<summary>Meeting chat log</summary> +``` +00:50:48 Stone: https://www.r-bloggers.com/2018/10/quasiquotation-in-r-via-bquote/ +00:58:26 iPhone: See ya next week! +``` +</details> +<details> +<summary>Meeting chat log</summary> +``` +00:55:22 collinberke: https://rlang.r-lib.org/reference/embrace-operator.html?q=enquo#under-the-hood +``` +</details> diff --git a/videos/20/01.qmd b/videos/20/01.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 1 +--- +{{< video https://www.youtube.com/embed/4En_Ypvtjqw >}} diff --git a/videos/20/02.qmd b/videos/20/02.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 2 +--- +{{< video https://www.youtube.com/embed/ewHAlVwCGtY >}} diff --git a/videos/20/03.qmd b/videos/20/03.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 3 +--- +{{< video https://www.youtube.com/embed/0K1vyiV8_qo >}} diff --git a/videos/20/04.qmd b/videos/20/04.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 4 +--- +{{< video https://www.youtube.com/embed/kfwjJDuyN8U >}} diff --git a/videos/20/05.qmd b/videos/20/05.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 5 +--- +{{< video https://www.youtube.com/embed/WzfD9GK6nCI >}} diff --git a/videos/20/06.qmd b/videos/20/06.qmd @@ -0,0 +1,10 @@ +--- +title: Cohort 6 +--- +{{< video https://www.youtube.com/embed/8FT2BA18Ghg >}} +<details> +<summary> Meeting chat log </summary> +``` +01:00:42 Trevin: They just want to help you present that’s all +``` +</details> diff --git a/videos/20/07.qmd b/videos/20/07.qmd @@ -0,0 +1,11 @@ +--- +title: Cohort 7 +--- +{{< video https://www.youtube.com/embed/g77Jfl_xrXM >}} +{{< video https://www.youtube.com/embed/wPLrafScijE >}} +<details> +<summary>Meeting chat log</summary> +``` +00:55:22 collinberke: https://rlang.r-lib.org/reference/embrace-operator.html?q=enquo#under-the-hood +``` +</details> diff --git a/videos/21/01.qmd b/videos/21/01.qmd @@ -0,0 +1,5 @@ +--- +title: Cohort 1 +--- +{{< video https://www.youtube.com/embed/fixyitpXrwY >}} +{{< video https://www.youtube.com/embed/h3RNPyhIjas >}} diff --git a/videos/21/02.qmd b/videos/21/02.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 2 +--- +{{< video https://www.youtube.com/embed/pj0hTW1CtbI >}} diff --git a/videos/21/03.qmd b/videos/21/03.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 3 +--- +(no video) diff --git a/videos/21/04.qmd b/videos/21/04.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 4 +--- +{{< video https://www.youtube.com/embed/0TclsXa085Y >}} diff --git a/videos/21/05.qmd b/videos/21/05.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 5 +--- +{{< video https://www.youtube.com/embed/v_dkrIEdmKE >}} diff --git a/videos/21/06.qmd b/videos/21/06.qmd @@ -0,0 +1,14 @@ +--- +title: Cohort 6 +--- +{{< video https://www.youtube.com/embed/_-uwFjO5CyM >}} +<details> +<summary> Meeting chat log </summary> +``` +00:30:16 Arthur Shaw: https://www.w3schools.com/html/html_entities.asp +00:32:29 Arthur Shaw: Beta symbol in HTML: Β +00:56:55 Arthur Shaw: https://dbplyr.tidyverse.org/articles/translation-function.html +00:57:48 Arthur Shaw: https://dtplyr.tidyverse.org/index.html +00:58:43 Arthur Shaw: https://dtplyr.tidyverse.org/articles/translation.html +``` +</details> diff --git a/videos/22/01.qmd b/videos/22/01.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 1 +--- +{{< video https://www.youtube.com/embed/ROMefwMuqXU >}} diff --git a/videos/22/02.qmd b/videos/22/02.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 2 +--- +{{< video https://www.youtube.com/embed/N43p4txxxlY >}} diff --git a/videos/22/03.qmd b/videos/22/03.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 3 +--- +{{< video https://www.youtube.com/embed/Jdb00nepeWQ >}} diff --git a/videos/22/04.qmd b/videos/22/04.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 4 +--- +{{< video https://www.youtube.com/embed/tOql7ZD6P58 >}} diff --git a/videos/22/05.qmd b/videos/22/05.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 5 +--- +{{< video https://www.youtube.com/embed/EqsSWUQ6ZW0 >}} diff --git a/videos/22/06.qmd b/videos/22/06.qmd @@ -0,0 +1,23 @@ +--- +title: Cohort 6 +--- +{{< video https://www.youtube.com/embed/YvT-knh1baA >}} +<details> +<summary> Meeting chat log </summary> +``` +00:12:43 Trevin Flickinger: Hello everyone! +00:13:03 Oluwafemi Oyedele: Hello, Good evening!!! +00:22:10 Trevin Flickinger: My connection is slow so I’ll be in the chat +00:32:45 Trevin Flickinger: If you start with “continue” it should error out after the first call +00:56:18 Trevin Flickinger: Sys.frame(-1) shows it goes back one frame +00:59:55 fg: thanks +01:03:11 Arthur Shaw: Anyone else lose the presentation? +01:03:20 fg: yes +01:03:22 fg: ? +01:04:26 Trevin Flickinger: I thought that was my internet connection +01:05:07 Trevin Flickinger: Thank you! +01:08:42 Trevin Flickinger: I need to use debug( ) more as well +01:10:15 Trevin Flickinger: 21st works for me +01:10:29 Oluwafemi Oyedele: Same here!!! +``` +</details> diff --git a/videos/22/07.qmd b/videos/22/07.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 7 +--- +{{< video https://www.youtube.com/embed/T_uFW9xXoJk >}} diff --git a/videos/23/01.qmd b/videos/23/01.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 1 +--- +(no video) diff --git a/videos/23/02.qmd b/videos/23/02.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 2 +--- +{{< video https://www.youtube.com/embed/_zeLDufwTwY >}} diff --git a/videos/23/03.qmd b/videos/23/03.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 3 +--- +{{< video https://www.youtube.com/embed/Jdb00nepeWQ >}} diff --git a/videos/23/04.qmd b/videos/23/04.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 4 +--- +{{< video https://www.youtube.com/embed/sCso4FAF1DE >}} diff --git a/videos/23/05.qmd b/videos/23/05.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 5 +--- +{{< video https://www.youtube.com/embed/pOaiDK7J7EE >}} diff --git a/videos/23/06.qmd b/videos/23/06.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 6 +--- +{{< video https://www.youtube.com/embed/d_pzz_AsoRQ >}} diff --git a/videos/23/07.qmd b/videos/23/07.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 7 +--- +{{< video https://www.youtube.com/embed/4hngR1c9oP4 >}} diff --git a/videos/24/01.qmd b/videos/24/01.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 1 +--- +(no video) diff --git a/videos/24/02.qmd b/videos/24/02.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 2 +--- +{{< video https://www.youtube.com/embed/fSdAqlkeq6I >}} diff --git a/videos/24/03.qmd b/videos/24/03.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 3 +--- +{{< video https://www.youtube.com/embed/yCkvUcT7wW8 >}} diff --git a/videos/24/04.qmd b/videos/24/04.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 4 +--- +{{< video https://www.youtube.com/embed/LCaqvuv3JNg >}} diff --git a/videos/24/05.qmd b/videos/24/05.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 5 +--- +{{< video https://www.youtube.com/embed/pOaiDK7J7EE >}} diff --git a/videos/24/06.qmd b/videos/24/06.qmd @@ -0,0 +1,12 @@ +--- +title: Cohort 6 +--- +{{< video https://www.youtube.com/embed/UaXimKd3vg8 >}} +<details> +<summary> Meeting chat log </summary> +``` +00:24:42 Arthur Shaw: I wonder if there's a task view for R Universe: https://r-universe.dev/search/ +01:01:13 Arthur Shaw: https://www.alexejgossmann.com/benchmarking_r/ +01:04:34 Trevin: I agree that the chapter is a good jumping off point. Gonna have to dig into some of the listed resources 😄 +``` +</details> diff --git a/videos/24/07.qmd b/videos/24/07.qmd @@ -0,0 +1,10 @@ +--- +title: Cohort 7 +--- +{{< video https://www.youtube.com/embed/rOkrHvN8Uqg >}} +<details> +<summary>Meeting chat log</summary> +``` +00:23:48 Ron Legere: https://www.mathworks.com/help/matlab/matlab_prog/vectorization.html +``` +</details> diff --git a/videos/25/01.qmd b/videos/25/01.qmd @@ -0,0 +1,5 @@ +--- +title: Cohort 1 +--- +{{< video https://www.youtube.com/embed/2JDeacWl1DM >}} +{{< video https://www.youtube.com/embed/sLWCelHpcqc >}} diff --git a/videos/25/02.qmd b/videos/25/02.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 2 +--- +{{< video https://www.youtube.com/embed/rQwOosOJpaY >}} diff --git a/videos/25/03.qmd b/videos/25/03.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 3 +--- +{{< video https://www.youtube.com/embed/ZWdIeR1jK9Q >}} diff --git a/videos/25/04.qmd b/videos/25/04.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 4 +--- +{{< video https://www.youtube.com/embed/_K8DKF3Fzes >}} diff --git a/videos/25/05.qmd b/videos/25/05.qmd @@ -0,0 +1,4 @@ +--- +title: Cohort 5 +--- +{{< video https://www.youtube.com/embed/nske4iqsgh0 >}} diff --git a/videos/25/06.qmd b/videos/25/06.qmd @@ -0,0 +1,17 @@ +--- +title: Cohort 6 +--- +{{< video https://www.youtube.com/embed/hyVK08jXiYw >}} +<details> +<summary>Meeting chat log</summary> +``` +00:10:13 Arthur Shaw: Did things freeze for anyone else? +00:55:40 Federica Gazzelloni: https://en.cppreference.com/w/cpp/container +00:57:44 Federica Gazzelloni: https://dirk.eddelbuettel.com/blog/2011/07/14/ +01:07:33 Trevin: I don’t have experience +01:07:54 Oluwafemi Oyedele: Same here!!! +01:11:57 Arthur Shaw: Does anyone know any packages that use C++? The one that comes to mind for me is haven, which uses a C++ library +01:12:30 Trevin: When I was looking, one that stood out to me was rstan +01:13:02 Arthur Shaw: Reacted to "When I was looking, ..." with 👍 +``` +</details> diff --git a/videos/25/07.qmd b/videos/25/07.qmd @@ -0,0 +1,15 @@ +--- +title: Cohort 7 +--- +{{< video https://www.youtube.com/embed/Luu7JsixQgY >}} +<details> +<summary>Meeting chat log</summary> +``` +00:43:02 Gus Lipkin: I think I found the definition for `mean` +An R call goes to *a which then calls the C function *b +*a: https://github.com/wch/r-source/blob/trunk/src/library/base/R/mean.R +*b: https://github.com/wch/r-source/blob/trunk/src/library/stats/src/cov.c#L207 +It looks like the second pass only happens if `R_FINITE(mean_from_first_pass)` which tries to call `isfinite` from C++ and if it’s not there, it’ll make sure it is a number and is not positive or negative infinity. +00:49:55 Gus Lipkin: I feel bad for dropping in on the last chapter and getting Collin’s thanks 😅 I wish I’d joined sooner. +``` +</details>