Update chapter 4 Subsetting (#90) - bookclub-advr

commit 53cd24d75806f8dbda59b97aff056c7f1d9c44c2
parent 802c1533e2ed1c0736cb9aade05099ae82f83df7
Author: Jon Harmon <jonthegeek@gmail.com>
Date:   Tue,  2 Sep 2025 11:10:03 -0500

Update chapter 4 Subsetting (#90)


Diffstat:
M _freeze/slides/04/execute-results/html.json  | 14 ++++++++------
D slides/04.Rmd  | 523 -------------------------------------------------------------------------------
A slides/04.qmd  | 376 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

3 files changed, 384 insertions(+), 529 deletions(-)
diff --git a/_freeze/slides/04/execute-results/html.json b/_freeze/slides/04/execute-results/html.json
@@ -1,15 +1,17 @@
 {
-  "hash": "706ff898197dfba9ce51c9d83ac97658",
+  "hash": "b5ac0f8cfdf1cfbdbe9f5056fb42d7ee",
   "result": {
     "engine": "knitr",
-    "markdown": "---\nengine: knitr\ntitle: Subsetting\n---\n\n## Learning objectives:\n\n- Learn about the 6 ways to subset atomic vectors\n- Learn about the 3 subsetting operators: `[[`, `[`, and `$`\n- Learn how subsetting works with different vector types\n- Learn how subsetting can be combined with assignment\n\n## Selecting multiple elements\n\n### Atomic Vectors\n\n- 6 ways to subset atomic vectors\n\nLet's take a look with an example vector.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- c(1.1, 2.2, 3.3, 4.4)\n```\n:::\n\n\n**Positive integer indices**\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# return elements at specified positions which can be out of order\nx[c(4, 1)]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 4.4 1.1\n```\n\n\n:::\n\n```{.r .cell-code}\n# duplicate indices return duplicate values\nx[c(2, 2)]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 2.2 2.2\n```\n\n\n:::\n\n```{.r .cell-code}\n# real numbers truncate to integers\n# so this behaves as if it is x[c(3, 3)]\nx[c(3.2, 3.8)]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 3.3 3.3\n```\n\n\n:::\n:::\n\n\n**Negative integer indices**\n\n\n::: {.cell}\n\n```{.r .cell-code}\n### excludes elements at specified positions\nx[-c(1, 3)] # same as x[c(-1, -3)] or x[c(2, 4)]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 2.2 4.4\n```\n\n\n:::\n\n```{.r .cell-code}\n### mixing positive and negative is a no-no\nx[c(-1, 3)]\n```\n\n::: {.cell-output .cell-output-error}\n\n```\n#> Error in x[c(-1, 3)]: only 0's may be mixed with negative subscripts\n```\n\n\n:::\n:::\n\n\n**Logical Vectors**\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx[c(TRUE, TRUE, FALSE, TRUE)]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1.1 2.2 4.4\n```\n\n\n:::\n\n```{.r .cell-code}\nx[x < 3]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1.1 2.2\n```\n\n\n:::\n\n```{.r .cell-code}\ncond <- x > 2.5\nx[cond]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 3.3 4.4\n```\n\n\n:::\n:::\n\n\n- **Recyling rules** applies when the two vectors are of different lengths\n- the shorter of the two is recycled to the length of the longer\n- Easy to understand if x or y is 1, best to avoid other lengths\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx[c(F, T)] # equivalent to: x[c(FALSE, TRUE, FALSE, TRUE)]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 2.2 4.4\n```\n\n\n:::\n:::\n\n\n**Missing values (NA)**\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Missing values in index will also return NA in output\nx[c(NA, TRUE)]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1]  NA 2.2  NA 4.4\n```\n\n\n:::\n:::\n\n\n**Nothing**\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# returns the original vector\nx[]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1.1 2.2 3.3 4.4\n```\n\n\n:::\n:::\n\n\n**Zero**\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# returns a zero-length vector\nx[0]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> numeric(0)\n```\n\n\n:::\n:::\n\n\n**Character vectors**\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# if name, you can use to return matched elements\n(y <- setNames(x, letters[1:4]))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#>   a   b   c   d \n#> 1.1 2.2 3.3 4.4\n```\n\n\n:::\n\n```{.r .cell-code}\ny[c(\"d\", \"b\", \"a\")]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#>   d   b   a \n#> 4.4 2.2 1.1\n```\n\n\n:::\n\n```{.r .cell-code}\n# Like integer indices, you can repeat indices\ny[c(\"a\", \"a\", \"a\")]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#>   a   a   a \n#> 1.1 1.1 1.1\n```\n\n\n:::\n\n```{.r .cell-code}\n# When subsetting with [, names are always matched exactly\nz <- c(abc = 1, def = 2)\nz\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> abc def \n#>   1   2\n```\n\n\n:::\n\n```{.r .cell-code}\nz[c(\"a\", \"d\")]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> <NA> <NA> \n#>   NA   NA\n```\n\n\n:::\n:::\n\n\n### Lists\n\n- Subsetting works the same way\n- `[` always returns a list\n- `[[` and `$` let you pull elements out of a list\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmy_list <- list(a = c(T, F), b = letters[5:15], c = 100:108)\nmy_list\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> $a\n#> [1]  TRUE FALSE\n#> \n#> $b\n#>  [1] \"e\" \"f\" \"g\" \"h\" \"i\" \"j\" \"k\" \"l\" \"m\" \"n\" \"o\"\n#> \n#> $c\n#> [1] 100 101 102 103 104 105 106 107 108\n```\n\n\n:::\n:::\n\n\n**Return a (named) list**\n\n\n::: {.cell}\n\n```{.r .cell-code}\nl1 <- my_list[2]\nl1\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> $b\n#>  [1] \"e\" \"f\" \"g\" \"h\" \"i\" \"j\" \"k\" \"l\" \"m\" \"n\" \"o\"\n```\n\n\n:::\n:::\n\n\n**Return a vector**\n\n\n::: {.cell}\n\n```{.r .cell-code}\nl2 <- my_list[[2]]\nl2\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#>  [1] \"e\" \"f\" \"g\" \"h\" \"i\" \"j\" \"k\" \"l\" \"m\" \"n\" \"o\"\n```\n\n\n:::\n\n```{.r .cell-code}\nl2b <- my_list$b\nl2b\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#>  [1] \"e\" \"f\" \"g\" \"h\" \"i\" \"j\" \"k\" \"l\" \"m\" \"n\" \"o\"\n```\n\n\n:::\n:::\n\n\n**Return a specific element**\n\n\n::: {.cell}\n\n```{.r .cell-code}\nl3 <- my_list[[2]][3]\nl3\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"g\"\n```\n\n\n:::\n\n```{.r .cell-code}\nl4 <- my_list[['b']][3]\nl4\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"g\"\n```\n\n\n:::\n\n```{.r .cell-code}\nl4b <- my_list$b[3]\nl4b\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"g\"\n```\n\n\n:::\n:::\n\n\n**Visual Representation**\n\n![](images/subsetting/hadley-tweet.png) \n\nSee this stackoverflow article for more detailed information about the differences: https://stackoverflow.com/questions/1169456/the-difference-between-bracket-and-double-bracket-for-accessing-the-el\n\n### Matrices and arrays\n\nYou can subset higher dimensional structures in three ways:\n\n- with multiple vectors\n- with a single vector\n- with a matrix\n\n\n::: {.cell}\n\n```{.r .cell-code}\na <- matrix(1:12, nrow = 3)\ncolnames(a) <- c(\"A\", \"B\", \"C\", \"D\")\n\n# single row\na[1, ]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#>  A  B  C  D \n#>  1  4  7 10\n```\n\n\n:::\n\n```{.r .cell-code}\n# single column\na[, 1]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1 2 3\n```\n\n\n:::\n\n```{.r .cell-code}\n# single element\na[1, 1]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> A \n#> 1\n```\n\n\n:::\n\n```{.r .cell-code}\n# two rows from two columns\na[1:2, 3:4]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#>      C  D\n#> [1,] 7 10\n#> [2,] 8 11\n```\n\n\n:::\n\n```{.r .cell-code}\na[c(TRUE, FALSE, TRUE), c(\"B\", \"A\")]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#>      B A\n#> [1,] 4 1\n#> [2,] 6 3\n```\n\n\n:::\n\n```{.r .cell-code}\n# zero index and negative index\na[0, -2]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#>      A C D\n```\n\n\n:::\n:::\n\n\n**Subset a matrix with a matrix**\n\n\n::: {.cell}\n\n```{.r .cell-code}\nb <- matrix(1:4, nrow = 2)\nb\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#>      [,1] [,2]\n#> [1,]    1    3\n#> [2,]    2    4\n```\n\n\n:::\n\n```{.r .cell-code}\na[b]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1]  7 11\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nvals <- outer(1:5, 1:5, FUN = \"paste\", sep = \",\")\nvals\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#>      [,1]  [,2]  [,3]  [,4]  [,5] \n#> [1,] \"1,1\" \"1,2\" \"1,3\" \"1,4\" \"1,5\"\n#> [2,] \"2,1\" \"2,2\" \"2,3\" \"2,4\" \"2,5\"\n#> [3,] \"3,1\" \"3,2\" \"3,3\" \"3,4\" \"3,5\"\n#> [4,] \"4,1\" \"4,2\" \"4,3\" \"4,4\" \"4,5\"\n#> [5,] \"5,1\" \"5,2\" \"5,3\" \"5,4\" \"5,5\"\n```\n\n\n:::\n\n```{.r .cell-code}\nselect <- matrix(ncol = 2, byrow = TRUE, \n                 c(1, 1,\n                   3, 1,\n                   2, 4))\nselect\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#>      [,1] [,2]\n#> [1,]    1    1\n#> [2,]    3    1\n#> [3,]    2    4\n```\n\n\n:::\n\n```{.r .cell-code}\nvals[select]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"1,1\" \"3,1\" \"2,4\"\n```\n\n\n:::\n:::\n\n\nMatrices and arrays are just special vectors; can subset with a single vector\n(arrays in R stored column wise)\n\n\n::: {.cell}\n\n```{.r .cell-code}\nvals[c(3, 15, 16, 17)]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"3,1\" \"5,3\" \"1,4\" \"2,4\"\n```\n\n\n:::\n:::\n\n\n### Data frames and tibbles\n\nData frames act like both lists and matrices\n\n- When subsetting with a single index, they behave like lists and index the columns, so `df[1:2]` selects the first two columns.\n- When subsetting with two indices, they behave like matrices, so `df[1:3, ]` selects the first three rows (and all the columns).\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(palmerpenguins)\n```\n\n::: {.cell-output .cell-output-stderr}\n\n```\n#> \n#> Attaching package: 'palmerpenguins'\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\n#> The following objects are masked from 'package:datasets':\n#> \n#>     penguins, penguins_raw\n```\n\n\n:::\n\n```{.r .cell-code}\npenguins <- penguins\n\n# single index selects first two columns\ntwo_cols <- penguins[2:3] # or penguins[c(2,3)]\nhead(two_cols)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> # A tibble: 6 × 2\n#>   island    bill_length_mm\n#>   <fct>              <dbl>\n#> 1 Torgersen           39.1\n#> 2 Torgersen           39.5\n#> 3 Torgersen           40.3\n#> 4 Torgersen           NA  \n#> 5 Torgersen           36.7\n#> 6 Torgersen           39.3\n```\n\n\n:::\n\n```{.r .cell-code}\n# equivalent to the above code\nsame_two_cols <- penguins[c(\"island\", \"bill_length_mm\")]\nhead(same_two_cols)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> # A tibble: 6 × 2\n#>   island    bill_length_mm\n#>   <fct>              <dbl>\n#> 1 Torgersen           39.1\n#> 2 Torgersen           39.5\n#> 3 Torgersen           40.3\n#> 4 Torgersen           NA  \n#> 5 Torgersen           36.7\n#> 6 Torgersen           39.3\n```\n\n\n:::\n\n```{.r .cell-code}\n# two indices separated by comma (first two rows of 3rd and 4th columns)\npenguins[1:2, 3:4]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> # A tibble: 2 × 2\n#>   bill_length_mm bill_depth_mm\n#>            <dbl>         <dbl>\n#> 1           39.1          18.7\n#> 2           39.5          17.4\n```\n\n\n:::\n\n```{.r .cell-code}\n# Can't do this...\npenguins[[3:4]][c(1:4)]\n```\n\n::: {.cell-output .cell-output-error}\n\n```\n#> Error:\n#> ! The `j` argument of `[[.tbl_df()` can't be a vector of length 2 as of\n#>   tibble 3.0.0.\n#> ℹ Recursive subsetting is deprecated for tibbles.\n```\n\n\n:::\n\n```{.r .cell-code}\n# ...but this works...\npenguins[[3]][c(1:4)]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 39.1 39.5 40.3   NA\n```\n\n\n:::\n\n```{.r .cell-code}\n# ...or this equivalent...\npenguins$bill_length_mm[1:4]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 39.1 39.5 40.3   NA\n```\n\n\n:::\n:::\n\n\nSubsetting a tibble with `[` always returns a tibble\n\n### Preserving dimensionality\n\n- Data frames and tibbles behave differently\n- tibble will default to preserve dimensionality, data frames do not\n- this can lead to unexpected behavior and code breaking in the future\n- Use `drop = FALSE` to preserve dimensionality when subsetting a data frame or use tibbles\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntb <- tibble::tibble(a = 1:2, b = 1:2)\n\n# returns tibble\nstr(tb[, \"a\"])\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> tibble [2 × 1] (S3: tbl_df/tbl/data.frame)\n#>  $ a: int [1:2] 1 2\n```\n\n\n:::\n\n```{.r .cell-code}\ntb[, \"a\"] # equivalent to tb[, \"a\", drop = FALSE]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> # A tibble: 2 × 1\n#>       a\n#>   <int>\n#> 1     1\n#> 2     2\n```\n\n\n:::\n\n```{.r .cell-code}\n# returns integer vector\n# str(tb[, \"a\", drop = TRUE])\ntb[, \"a\", drop = TRUE]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1 2\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndf <- data.frame(a = 1:2, b = 1:2)\n\n# returns integer vector\n# str(df[, \"a\"])\ndf[, \"a\"]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1 2\n```\n\n\n:::\n\n```{.r .cell-code}\n# returns data frame with one column\n# str(df[, \"a\", drop = FALSE])\ndf[, \"a\", drop = FALSE]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#>   a\n#> 1 1\n#> 2 2\n```\n\n\n:::\n:::\n\n**Factors**\n\nFactor subsetting drop argument controls whether or not levels (rather than dimensions) are preserved.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nz <- factor(c(\"a\", \"b\", \"c\"))\nz[1]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] a\n#> Levels: a b c\n```\n\n\n:::\n\n```{.r .cell-code}\nz[1, drop = TRUE]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] a\n#> Levels: a\n```\n\n\n:::\n:::\n\n\n## Selecting a single element\n\n`[[` and `$` are used to extract single elements (note: a vector can be a single element)\n\n### `[[]]`\n\nBecause `[[]]` can return only a single item, you must use it with either a single positive integer or a single string. \n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- list(1:3, \"a\", 4:6)\nx[[1]]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1 2 3\n```\n\n\n:::\n:::\n\n\nHadley Wickham recommends using `[[]]` with atomic vectors whenever you want to extract a single value to reinforce the expectation that you are getting and setting individual values. \n\n### `$`\n\n- `x$y` is equivalent to `x[[\"y\"]]`\n\nthe `$` operator doesn't work with stored vals\n\n\n::: {.cell}\n\n```{.r .cell-code}\nvar <- \"cyl\"\n\n# Doesn't work - mtcars$var translated to mtcars[[\"var\"]]\nmtcars$var\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> NULL\n```\n\n\n:::\n\n```{.r .cell-code}\n# Instead use [[\nmtcars[[var]]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#>  [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4\n```\n\n\n:::\n:::\n\n\n`$` allows partial matching, `[[]]` does not\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- list(abc = 1)\nx$a\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1\n```\n\n\n:::\n\n```{.r .cell-code}\nx[[\"a\"]]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> NULL\n```\n\n\n:::\n:::\n\n\nHadley advises to change Global settings:\n\n\n::: {.cell}\n\n```{.r .cell-code}\noptions(warnPartialMatchDollar = TRUE)\nx$a\n```\n\n::: {.cell-output .cell-output-stderr}\n\n```\n#> Warning in x$a: partial match of 'a' to 'abc'\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1\n```\n\n\n:::\n:::\n\n\ntibbles don't have this behavior\n\n\n::: {.cell}\n\n```{.r .cell-code}\npenguins$s\n```\n\n::: {.cell-output .cell-output-stderr}\n\n```\n#> Warning: Unknown or uninitialised column: `s`.\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> NULL\n```\n\n\n:::\n:::\n\n\n### missing and out of bound indices\n\n- Due to the inconsistency of how R handles such indices, `purrr::pluck()` and `purrr::chuck()` are recommended\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- list(\n  a = list(1, 2, 3),\n  b = list(3, 4, 5)\n)\npurrr::pluck(x, \"a\", 1)\n# [1] 1\npurrr::pluck(x, \"c\", 1)\n# NULL\npurrr::pluck(x, \"c\", 1, .default = NA)\n# [1] NA\n```\n:::\n\n\n### `@` and `slot()`\n- `@` is `$` for S4 objects (to be revisited in Chapter 15)\n\n- `slot()` is `[[ ]]` for S4 objects\n\n## Subsetting and Assignment\n\n- Subsetting can be combined with assignment to edit values\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- c(\"Tigers\", \"Royals\", \"White Sox\", \"Twins\", \"Indians\")\n\nx[5] <- \"Guardians\"\n\nx\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"Tigers\"    \"Royals\"    \"White Sox\" \"Twins\"     \"Guardians\"\n```\n\n\n:::\n:::\n\n\n- length of the subset and assignment vector should be the same to avoid recycling\n\nYou can use NULL to remove a component\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- list(a = 1, b = 2)\nx[[\"b\"]] <- NULL\nstr(x)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> List of 1\n#>  $ a: num 1\n```\n\n\n:::\n:::\n\n\nSubsetting with nothing can preserve structure of original object\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# mtcars[] <- lapply(mtcars, as.integer)\n# is.data.frame(mtcars)\n# [1] TRUE\n# mtcars <- lapply(mtcars, as.integer)\n#> is.data.frame(mtcars)\n# [1] FALSE\n```\n:::\n\n\n## Applications\n\nApplications copied from cohort 2 slide\n\n### Lookup tables (character subsetting)\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- c(\"m\", \"f\", \"u\", \"f\", \"f\", \"m\", \"m\")\nlookup <- c(m = \"Male\", f = \"Female\", u = NA)\nlookup[x]\n#        m        f        u        f        f        m        m \n#   \"Male\" \"Female\"       NA \"Female\" \"Female\"   \"Male\"   \"Male\"\n```\n:::\n\n\n### Matching and merging by hand (integer subsetting)\n\n- The `match()` function allows merging a vector with a table\n\n\n::: {.cell}\n\n```{.r .cell-code}\ngrades <- c(\"D\", \"A\", \"C\", \"B\", \"F\")\ninfo <- data.frame(\n  grade = c(\"A\", \"B\", \"C\", \"D\", \"F\"),\n  desc = c(\"Excellent\", \"Very Good\", \"Average\", \"Fair\", \"Poor\"),\n  fail = c(F, F, F, F, T)\n)\nid <- match(grades, info$grade)\nid\n# [1] 3 2 2 1 3\ninfo[id, ]\n#   grade      desc  fail\n# 4     D      Fair FALSE\n# 1     A Excellent FALSE\n# 3     C   Average FALSE\n# 2     B Very Good FALSE\n# 5     F      Poor  TRUE\n```\n:::\n\n\n### Random samples and bootstrapping (integer subsetting)\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# mtcars[sample(nrow(mtcars), 3), ] # use replace = TRUE to replace\n#                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb\n# Lotus Europa       30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2\n# Mazda RX4          21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4\n# Cadillac Fleetwood 10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4\n```\n:::\n\n\n### Ordering (integer subsetting)\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# mtcars[order(mtcars$mpg), ]\n#                      mpg cyl  disp  hp drat    wt  qsec vs am gear carb\n# Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4\n# Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4\n# Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4\n# Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4\n# Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4\n# Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8\n# ...\n```\n:::\n\n\n### Expanding aggregated counts (integer subsetting)\n\n- We can expand a count column by using `rep()`\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndf <- tibble::tibble(x = c(\"Amy\", \"Julie\", \"Brian\"), n = c(2, 1, 3))\ndf[rep(1:nrow(df), df$n), ]\n# A tibble: 6 x 2\n#   x         n\n#   <chr> <dbl>\n# 1 Amy       2\n# 2 Amy       2\n# 3 Julie     1\n# 4 Brian     3\n# 5 Brian     3\n# 6 Brian     3\n```\n:::\n\n\n###  Removing columns from data frames (character)\n\n- We can remove a column by subsetting, which does not change the object\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndf[, 1]\n# A tibble: 3 x 1\n#   x    \n#   <chr>\n# 1 Amy  \n# 2 Julie\n# 3 Brian\n```\n:::\n\n\n- We can also delete the column using `NULL`\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndf$n <- NULL\ndf\n# A tibble: 3 x 1\n#   x    \n#   <chr>\n# 1 Amy  \n# 2 Julie\n# 3 Brian\n```\n:::\n\n\n### Selecting rows based on a condition (logical subsetting)\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# mtcars[mtcars$gear == 5, ]\n#                 mpg cyl  disp  hp drat    wt qsec vs am gear carb\n# Porsche 914-2  26.0   4 120.3  91 4.43 2.140 16.7  0  1    5    2\n# Lotus Europa   30.4   4  95.1 113 3.77 1.513 16.9  1  1    5    2\n# Ford Pantera L 15.8   8 351.0 264 4.22 3.170 14.5  0  1    5    4\n# Ferrari Dino   19.7   6 145.0 175 3.62 2.770 15.5  0  1    5    6\n# Maserati Bora  15.0   8 301.0 335 3.54 3.570 14.6  0  1    5    8\n```\n:::\n\n\n### Boolean algebra versus sets (logical and integer)\n\n- `which()` gives the indices of a Boolean vector\n\n\n::: {.cell}\n\n```{.r .cell-code}\n(x1 <- 1:10 %% 2 == 0) # 1-10 divisible by 2\n#  [1] FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE\n(x2 <- which(x1))\n# [1]  2  4  6  8 10\n(y1 <- 1:10 %% 5 == 0) # 1-10 divisible by 5\n#  [1] FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE  TRUE\n(y2 <- which(y1))\n# [1]  5 10\nx1 & y1\n# [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE\n```\n:::\n\n",
-    "supporting": [
-      "04_files"
-    ],
+    "markdown": "---\nengine: knitr\ntitle: Subsetting\n---\n\n## Learning objectives:\n\n- Select multiple elements from a vector with `[`\n- Learn about the 3 subsetting operators: `[[`, `[`, and `$`\n- Learn how subsetting works with different vector types\n- Learn how subsetting can be combined with assignment\n\n# Selecting multiple elements\n\n## 1. Positive integers return elements at specified positions\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- c(1.1, 2.2, 3.3, 4.4) # decimal = original position\nx\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1.1 2.2 3.3 4.4\n```\n\n\n:::\n\n```{.r .cell-code}\nx[c(4, 1)]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 4.4 1.1\n```\n\n\n:::\n\n```{.r .cell-code}\nx[c(1, 1, 1)]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1.1 1.1 1.1\n```\n\n\n:::\n\n```{.r .cell-code}\nx[c(1.9999)]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1.1\n```\n\n\n:::\n:::\n\n\nReals *truncate* to integers.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx[c(1.0001, 1.9999)]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1.1 1.1\n```\n\n\n:::\n:::\n\n\n## 2. Negative integers remove specified elements\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx[-c(1, 3)] # same as x[c(-1, -3)] or x[c(2, 4)]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 2.2 4.4\n```\n\n\n:::\n:::\n\n\n## 2b. Mixing negative and positive integers throws an error\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx[c(-1, 3)]\n```\n\n::: {.cell-output .cell-output-error}\n\n```\n#> Error in x[c(-1, 3)]: only 0's may be mixed with negative subscripts\n```\n\n\n:::\n:::\n\n\n## 2c. Zeros ignored with other ints \n\n\n::: {.cell}\n\n```{.r .cell-code}\nx[c(-1, 0)]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 2.2 3.3 4.4\n```\n\n\n:::\n\n```{.r .cell-code}\nx[c(-1, 0, 0, 0, 0, 0 ,0 ,0)]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 2.2 3.3 4.4\n```\n\n\n:::\n\n```{.r .cell-code}\nx[c(1, 0, 2, 0, 3, 0)]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1.1 2.2 3.3\n```\n\n\n:::\n:::\n\n\n\n## 3. Logical vectors select specified elements\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx[c(TRUE, TRUE, FALSE, TRUE)]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1.1 2.2 4.4\n```\n\n\n:::\n\n```{.r .cell-code}\nx[x < 3]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1.1 2.2\n```\n\n\n:::\n\n```{.r .cell-code}\ncond <- x > 2.5\nx[cond]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 3.3 4.4\n```\n\n\n:::\n:::\n\n\n## 3b. Shorter element are recycled to higher length\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx[FALSE]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> numeric(0)\n```\n\n\n:::\n\n```{.r .cell-code}\nx[TRUE]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1.1 2.2 3.3 4.4\n```\n\n\n:::\n\n```{.r .cell-code}\nx[c(FALSE, TRUE)] # equivalent to: x[c(FALSE, TRUE, FALSE, TRUE)]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 2.2 4.4\n```\n\n\n:::\n:::\n\n\n- Easy to understand if x or y is 1, best to avoid other lengths\n\n## 3c. NA index returns NA\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx[c(NA, TRUE, NA, TRUE)]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1]  NA 2.2  NA 4.4\n```\n\n\n:::\n:::\n\n## 3d. Extra TRUE index returns NA\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx[c(FALSE, TRUE, TRUE, TRUE, TRUE, TRUE)]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 2.2 3.3 4.4  NA  NA\n```\n\n\n:::\n\n```{.r .cell-code}\nx[1:5]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1.1 2.2 3.3 4.4  NA\n```\n\n\n:::\n:::\n\n\n## 4. Indexing with nothing returns original vector\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx[]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1.1 2.2 3.3 4.4\n```\n\n\n:::\n:::\n\n\n## 5. Indexing with just 0 returns 0-length vector (with class)\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx[0]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> numeric(0)\n```\n\n\n:::\n\n```{.r .cell-code}\nletters[0]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> character(0)\n```\n\n\n:::\n:::\n\n\n## 6. Indexing with character vector returns element of named vector\n\n\n::: {.cell}\n\n```{.r .cell-code}\n(y <- setNames(x, letters[1:4]))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#>   a   b   c   d \n#> 1.1 2.2 3.3 4.4\n```\n\n\n:::\n\n```{.r .cell-code}\ny[c(\"d\", \"b\", \"a\")]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#>   d   b   a \n#> 4.4 2.2 1.1\n```\n\n\n:::\n\n```{.r .cell-code}\ny[c(\"a\", \"a\", \"a\")]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#>   a   a   a \n#> 1.1 1.1 1.1\n```\n\n\n:::\n:::\n\n\n## 6b. Names must be exact for `[`\n\n\n::: {.cell}\n\n```{.r .cell-code}\nz <- c(abc = 1, def = 2)\nz\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> abc def \n#>   1   2\n```\n\n\n:::\n\n```{.r .cell-code}\nz[c(\"a\", \"d\")]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> <NA> <NA> \n#>   NA   NA\n```\n\n\n:::\n:::\n\n\n## Subsetting a list with `[` returns a list\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmy_list <- list(a = c(T, F), b = letters[5:15], c = 100:108)\nmy_list\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> $a\n#> [1]  TRUE FALSE\n#> \n#> $b\n#>  [1] \"e\" \"f\" \"g\" \"h\" \"i\" \"j\" \"k\" \"l\" \"m\" \"n\" \"o\"\n#> \n#> $c\n#> [1] 100 101 102 103 104 105 106 107 108\n```\n\n\n:::\n\n```{.r .cell-code}\nmy_list[c(\"a\", \"b\")]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> $a\n#> [1]  TRUE FALSE\n#> \n#> $b\n#>  [1] \"e\" \"f\" \"g\" \"h\" \"i\" \"j\" \"k\" \"l\" \"m\" \"n\" \"o\"\n```\n\n\n:::\n:::\n\n\n## Lists use same rules for `[`\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmy_list[2:3]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> $b\n#>  [1] \"e\" \"f\" \"g\" \"h\" \"i\" \"j\" \"k\" \"l\" \"m\" \"n\" \"o\"\n#> \n#> $c\n#> [1] 100 101 102 103 104 105 106 107 108\n```\n\n\n:::\n\n```{.r .cell-code}\nmy_list[c(TRUE, FALSE, TRUE)]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> $a\n#> [1]  TRUE FALSE\n#> \n#> $c\n#> [1] 100 101 102 103 104 105 106 107 108\n```\n\n\n:::\n:::\n\n\n## Matrices & arrays take multidimensional indices\n\n\n::: {.cell}\n\n```{.r .cell-code}\na <- matrix(1:9, nrow = 3)\na\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#>      [,1] [,2] [,3]\n#> [1,]    1    4    7\n#> [2,]    2    5    8\n#> [3,]    3    6    9\n```\n\n\n:::\n\n```{.r .cell-code}\na[1:2, 2:3] # rows, columns\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#>      [,1] [,2]\n#> [1,]    4    7\n#> [2,]    5    8\n```\n\n\n:::\n:::\n\n\n## Matrices & arrays can accept character, logical, etc\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncolnames(a) <- c(\"A\", \"B\", \"C\")\na[c(TRUE, TRUE, FALSE), c(\"B\", \"A\")] # a[1:2, 2:1]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#>      B A\n#> [1,] 4 1\n#> [2,] 5 2\n```\n\n\n:::\n:::\n\n\n## Matrices & arrays are also vectors\n\n\n::: {.cell}\n\n```{.r .cell-code}\nvals <- outer(1:5, 1:5, FUN = \"paste\", sep = \",\") # All chr combos of 1:5\nvals\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#>      [,1]  [,2]  [,3]  [,4]  [,5] \n#> [1,] \"1,1\" \"1,2\" \"1,3\" \"1,4\" \"1,5\"\n#> [2,] \"2,1\" \"2,2\" \"2,3\" \"2,4\" \"2,5\"\n#> [3,] \"3,1\" \"3,2\" \"3,3\" \"3,4\" \"3,5\"\n#> [4,] \"4,1\" \"4,2\" \"4,3\" \"4,4\" \"4,5\"\n#> [5,] \"5,1\" \"5,2\" \"5,3\" \"5,4\" \"5,5\"\n```\n\n\n:::\n\n```{.r .cell-code}\nvals[c(4, 15)]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"4,1\" \"5,3\"\n```\n\n\n:::\n\n```{.r .cell-code}\na[a > 5]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 6 7 8 9\n```\n\n\n:::\n:::\n\n\n## Data frames subset list-like with single index\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndf <- data.frame(x = 1:3, y = 3:1, z = letters[1:3])\ndf[1:2]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#>   x y\n#> 1 1 3\n#> 2 2 2\n#> 3 3 1\n```\n\n\n:::\n\n```{.r .cell-code}\ndf[c(\"x\", \"z\")]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#>   x z\n#> 1 1 a\n#> 2 2 b\n#> 3 3 c\n```\n\n\n:::\n:::\n\n\n## Data frames subset matrix-like with multiple indices\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndf[1:2, c(\"x\", \"z\")] # rows, columns\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#>   x z\n#> 1 1 a\n#> 2 2 b\n```\n\n\n:::\n\n```{.r .cell-code}\ndf[df$x == 2, ] # matching rows, all columns\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#>   x y z\n#> 2 2 2 b\n```\n\n\n:::\n\n```{.r .cell-code}\ndf[, c(\"x\", \"z\")] # equivalent to no ,\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#>   x z\n#> 1 1 a\n#> 2 2 b\n#> 3 3 c\n```\n\n\n:::\n:::\n\n\n## Subsetting a tibble with `[` returns a tibble\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntbl <- tibble::as_tibble(df)\ndf[, 1]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1 2 3\n```\n\n\n:::\n\n```{.r .cell-code}\ndf[, 1, drop = FALSE] # Prevent errors\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#>   x\n#> 1 1\n#> 2 2\n#> 3 3\n```\n\n\n:::\n\n```{.r .cell-code}\ntbl[, 1]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> # A tibble: 3 × 1\n#>       x\n#>   <int>\n#> 1     1\n#> 2     2\n#> 3     3\n```\n\n\n:::\n:::\n\n\n# Selecting a single element\n\n## `[[` selects a single element\n\n:::: {.columns}\n\n::: {.column}\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- list(1:3, \"a\", 4:6)\nx[1]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [[1]]\n#> [1] 1 2 3\n```\n\n\n:::\n\n```{.r .cell-code}\nclass(x[1])\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"list\"\n```\n\n\n:::\n\n```{.r .cell-code}\nx[[1]]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1 2 3\n```\n\n\n:::\n\n```{.r .cell-code}\nclass(x[[1]])\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"integer\"\n```\n\n\n:::\n\n```{.r .cell-code}\nx[[1]][[1]]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1\n```\n\n\n:::\n:::\n\n:::\n\n::: {.column}\n\n![](images/subsetting/hadley-tweet.png)\n:::\n\n::::\n\n## `$` is shorthand for `[[..., exact = FALSE]]`\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- list(abc = 1)\nx$abc\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1\n```\n\n\n:::\n\n```{.r .cell-code}\nx$a\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1\n```\n\n\n:::\n\n```{.r .cell-code}\nx[[\"a\"]]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> NULL\n```\n\n\n:::\n\n```{.r .cell-code}\nx[[\"a\", exact = FALSE]]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1\n```\n\n\n:::\n\n```{.r .cell-code}\noptions(warnPartialMatchDollar = TRUE)\nx$a\n```\n\n::: {.cell-output .cell-output-stderr}\n\n```\n#> Warning in x$a: partial match of 'a' to 'abc'\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1\n```\n\n\n:::\n:::\n\n\n## Behavior for missing-ish indices is inconsistent\n\n\n::: {.cell}\n\n```{.r .cell-code}\na <- c(a = 1L, b = 2L)\nlst <- list(a = 1:2)\n\n# Errors:\n# a[[NULL]]\n# lst[[NULL]]\n# a[[5]]\n# lst[[5]]\n# a[[\"c\"]]\n# a[[NA]]\n\nlst[[\"c\"]]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> NULL\n```\n\n\n:::\n\n```{.r .cell-code}\nlst[[NA]]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> NULL\n```\n\n\n:::\n:::\n\n\n## `purrr::pluck()` and `purrr::chuck()` provide consistent wrappers\n\n- `purrr::pluck()` always returns `NULL` or `.default` for (non-`NULL`) missing\n- `purrr::chuck()` always throws error\n\n\n::: {.cell}\n\n```{.r .cell-code}\npurrr::pluck(a, 5)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> NULL\n```\n\n\n:::\n\n```{.r .cell-code}\npurrr::pluck(a, \"c\")\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> NULL\n```\n\n\n:::\n\n```{.r .cell-code}\npurrr::pluck(lst, 5)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> NULL\n```\n\n\n:::\n\n```{.r .cell-code}\npurrr::pluck(lst, \"c\")\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> NULL\n```\n\n\n:::\n:::\n\n\n## S4 has two additional subsetting operators\n\n- `@` equivalent to `$` (but error if bad)\n- `slot()` equivalent to `[[`\n\nMore in Chapter 15\n\n# Subsetting and assignment\n\n## Can assign to position with `[`\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- 1:5\nx[1:2] <- c(101, 102)\nx\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 101 102   3   4   5\n```\n\n\n:::\n\n```{.r .cell-code}\nx[1:3] <- 1:2\nx\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1 2 1 4 5\n```\n\n\n:::\n:::\n\n\n## Remove list component with `NULL`\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- list(a = 1, b = 2)\nx[[\"b\"]] <- NULL\nx\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> $a\n#> [1] 1\n```\n\n\n:::\n:::\n\n\n## Use `list(NULL)` to add `NULL`\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- list(a = 1, b = 2)\nx[[\"b\"]] <- list(NULL)\nx\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> $a\n#> [1] 1\n#> \n#> $b\n#> $b[[1]]\n#> NULL\n```\n\n\n:::\n:::\n\n\n## Subset with nothing to retain shape\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndf <- data.frame(a = 1:3, b = 1:3)\ndf[] <- \"a\"\ndf\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#>   a b\n#> 1 a a\n#> 2 a a\n#> 3 a a\n```\n\n\n:::\n\n```{.r .cell-code}\ndf <- \"a\"\ndf\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"a\"\n```\n\n\n:::\n:::\n\n\n# Applications\n\n## Use a lookup vector and recycling rules to translate values\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- c(\"b\", \"g\", \"x\", \"g\", \"g\", \"b\")\nlookup <- c(b = \"blue\", g = \"green\", x = NA)\nlookup[x]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#>       b       g       x       g       g       b \n#>  \"blue\" \"green\"      NA \"green\" \"green\"  \"blue\"\n```\n\n\n:::\n\n```{.r .cell-code}\nunname(lookup[x])\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"blue\"  \"green\" NA      \"green\" \"green\" \"blue\"\n```\n\n\n:::\n:::\n\n\n## Use a lookup table to generate rows of data\n\n\n::: {.cell}\n\n```{.r .cell-code}\ninfo <- data.frame(\n  code = c(\"b\", \"g\", \"x\"),\n  color = c(\"blue\", \"green\", NA),\n  other_thing = 3:1\n)\nmatch(x, info$code) # Indices of info$code in x\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1 2 3 2 2 1\n```\n\n\n:::\n\n```{.r .cell-code}\ninfo[match(x, info$code), ]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#>     code color other_thing\n#> 1      b  blue           3\n#> 2      g green           2\n#> 3      x  <NA>           1\n#> 2.1    g green           2\n#> 2.2    g green           2\n#> 1.1    b  blue           3\n```\n\n\n:::\n:::\n\n\n## Sort with `order()`\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- c(\"b\", \"c\", \"a\")\norder(x)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 3 1 2\n```\n\n\n:::\n\n```{.r .cell-code}\nx[order(x)]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"a\" \"b\" \"c\"\n```\n\n\n:::\n\n```{.r .cell-code}\ndf <- data.frame(b = 3:1, a = 1:3)\ndf[order(df$b), ]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#>   b a\n#> 3 1 3\n#> 2 2 2\n#> 1 3 1\n```\n\n\n:::\n\n```{.r .cell-code}\ndf[, order(names(df))]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#>   a b\n#> 1 1 3\n#> 2 2 2\n#> 3 3 1\n```\n\n\n:::\n:::\n\n\n## Expand counts\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndf <- data.frame(x = c(2, 4, 1), y = c(9, 11, 6), n = c(3, 5, 1))\nrep(1:nrow(df), df$n)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1 1 1 2 2 2 2 2 3\n```\n\n\n:::\n\n```{.r .cell-code}\ndf[rep(1:nrow(df), df$n), ]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#>     x  y n\n#> 1   2  9 3\n#> 1.1 2  9 3\n#> 1.2 2  9 3\n#> 2   4 11 5\n#> 2.1 4 11 5\n#> 2.2 4 11 5\n#> 2.3 4 11 5\n#> 2.4 4 11 5\n#> 3   1  6 1\n```\n\n\n:::\n:::\n\n\n## Ran out of time to make slides for\n\nIdeally a future cohort should expand these:\n\n- Remove df columns with `setdiff()`\n- Logically subset rows `df[df$col > 5, ]`\n- The next slide about `which()`\n\n## Boolean algebra versus sets (logical and integer)\n\n- `which()` gives the indices of a Boolean vector\n\n\n::: {.cell}\n\n```{.r .cell-code}\n(x1 <- 1:10 %% 2 == 0) # 1-10 divisible by 2\n#  [1] FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE\n(x2 <- which(x1))\n# [1]  2  4  6  8 10\n(y1 <- 1:10 %% 5 == 0) # 1-10 divisible by 5\n#  [1] FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE  TRUE\n(y2 <- which(y1))\n# [1]  5 10\nx1 & y1\n# [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE\n```\n:::\n\n",
+    "supporting": [],
     "filters": [
       "rmarkdown/pagebreak.lua"
     ],
-    "includes": {},
+    "includes": {
+      "include-after-body": [
+        "\n<script>\n  // htmlwidgets need to know to resize themselves when slides are shown/hidden.\n  // Fire the \"slideenter\" event (handled by htmlwidgets.js) when the current\n  // slide changes (different for each slide format).\n  (function () {\n    // dispatch for htmlwidgets\n    function fireSlideEnter() {\n      const event = window.document.createEvent(\"Event\");\n      event.initEvent(\"slideenter\", true, true);\n      window.document.dispatchEvent(event);\n    }\n\n    function fireSlideChanged(previousSlide, currentSlide) {\n      fireSlideEnter();\n\n      // dispatch for shiny\n      if (window.jQuery) {\n        if (previousSlide) {\n          window.jQuery(previousSlide).trigger(\"hidden\");\n        }\n        if (currentSlide) {\n          window.jQuery(currentSlide).trigger(\"shown\");\n        }\n      }\n    }\n\n    // hookup for slidy\n    if (window.w3c_slidy) {\n      window.w3c_slidy.add_observer(function (slide_num) {\n        // slide_num starts at position 1\n        fireSlideChanged(null, w3c_slidy.slides[slide_num - 1]);\n      });\n    }\n\n  })();\n</script>\n\n"
+      ]
+    },
     "engineDependencies": {},
     "preserve": {},
     "postProcess": true
diff --git a/slides/04.Rmd b/slides/04.Rmd
@@ -1,523 +0,0 @@
----
-engine: knitr
-title: Subsetting
----
-
-## Learning objectives:
-
-- Learn about the 6 ways to subset atomic vectors
-- Learn about the 3 subsetting operators: `[[`, `[`, and `$`
-- Learn how subsetting works with different vector types
-- Learn how subsetting can be combined with assignment
-
-## Selecting multiple elements
-
-### Atomic Vectors
-
-- 6 ways to subset atomic vectors
-
-Let's take a look with an example vector.
-
-```{r atomic_vector}
-x <- c(1.1, 2.2, 3.3, 4.4)
-```
-
-**Positive integer indices**
-
-```{r positive_int}
-# return elements at specified positions which can be out of order
-x[c(4, 1)]
-
-# duplicate indices return duplicate values
-x[c(2, 2)]
-
-# real numbers truncate to integers
-# so this behaves as if it is x[c(3, 3)]
-x[c(3.2, 3.8)]
-```
-
-**Negative integer indices**
-
-```{r, error=TRUE}
-### excludes elements at specified positions
-x[-c(1, 3)] # same as x[c(-1, -3)] or x[c(2, 4)]
-
-### mixing positive and negative is a no-no
-x[c(-1, 3)]
-```
-
-**Logical Vectors**
-
-```{r logical_vec}
-x[c(TRUE, TRUE, FALSE, TRUE)]
-
-x[x < 3]
-
-cond <- x > 2.5
-x[cond]
-```
-
-- **Recyling rules** applies when the two vectors are of different lengths
-- the shorter of the two is recycled to the length of the longer
-- Easy to understand if x or y is 1, best to avoid other lengths
-
-```{r}
-x[c(F, T)] # equivalent to: x[c(FALSE, TRUE, FALSE, TRUE)]
-```
-
-**Missing values (NA)**
-
-```{r missing}
-# Missing values in index will also return NA in output
-x[c(NA, TRUE)]
-```
-
-**Nothing**
-
-```{r nothing}
-# returns the original vector
-x[]
-```
-
-**Zero**
-
-```{r zero}
-# returns a zero-length vector
-x[0]
-```
-
-**Character vectors**
-
-```{r character}
-# if name, you can use to return matched elements
-(y <- setNames(x, letters[1:4]))
-
-y[c("d", "b", "a")]
-
-# Like integer indices, you can repeat indices
-y[c("a", "a", "a")]
-
-# When subsetting with [, names are always matched exactly
-z <- c(abc = 1, def = 2)
-z
-z[c("a", "d")]
-```
-
-### Lists
-
-- Subsetting works the same way
-- `[` always returns a list
-- `[[` and `$` let you pull elements out of a list
-
-```{r}
-my_list <- list(a = c(T, F), b = letters[5:15], c = 100:108)
-my_list
-```
-
-**Return a (named) list**
-
-```{r}
-l1 <- my_list[2]
-l1
-```
-
-**Return a vector**
-
-```{r}
-l2 <- my_list[[2]]
-l2
-l2b <- my_list$b
-l2b
-```
-
-**Return a specific element**
-
-```{r}
-l3 <- my_list[[2]][3]
-l3
-l4 <- my_list[['b']][3]
-l4
-l4b <- my_list$b[3]
-l4b
-```
-
-**Visual Representation**
-
-![](images/subsetting/hadley-tweet.png) 
-
-See this stackoverflow article for more detailed information about the differences: https://stackoverflow.com/questions/1169456/the-difference-between-bracket-and-double-bracket-for-accessing-the-el
-
-### Matrices and arrays
-
-You can subset higher dimensional structures in three ways:
-
-- with multiple vectors
-- with a single vector
-- with a matrix
-
-```{r}
-a <- matrix(1:12, nrow = 3)
-colnames(a) <- c("A", "B", "C", "D")
-
-# single row
-a[1, ]
-
-# single column
-a[, 1]
-
-# single element
-a[1, 1]
-
-# two rows from two columns
-a[1:2, 3:4]
-
-a[c(TRUE, FALSE, TRUE), c("B", "A")]
-
-# zero index and negative index
-a[0, -2]
-```
-
-**Subset a matrix with a matrix**
-
-```{r}
-b <- matrix(1:4, nrow = 2)
-b
-a[b]
-```
-
-```{r}
-vals <- outer(1:5, 1:5, FUN = "paste", sep = ",")
-vals
-
-select <- matrix(ncol = 2, byrow = TRUE, 
-                 c(1, 1,
-                   3, 1,
-                   2, 4))
-select
-
-vals[select]
-```
-
-Matrices and arrays are just special vectors; can subset with a single vector
-(arrays in R stored column wise)
-
-```{r}
-vals[c(3, 15, 16, 17)]
-```
-
-### Data frames and tibbles
-
-Data frames act like both lists and matrices
-
-- When subsetting with a single index, they behave like lists and index the columns, so `df[1:2]` selects the first two columns.
-- When subsetting with two indices, they behave like matrices, so `df[1:3, ]` selects the first three rows (and all the columns).
-
-```{r penguins, error=TRUE}
-library(palmerpenguins)
-penguins <- penguins
-
-# single index selects first two columns
-two_cols <- penguins[2:3] # or penguins[c(2,3)]
-head(two_cols)
-
-# equivalent to the above code
-same_two_cols <- penguins[c("island", "bill_length_mm")]
-head(same_two_cols)
-
-# two indices separated by comma (first two rows of 3rd and 4th columns)
-penguins[1:2, 3:4]
-
-# Can't do this...
-penguins[[3:4]][c(1:4)]
-# ...but this works...
-penguins[[3]][c(1:4)]
-# ...or this equivalent...
-penguins$bill_length_mm[1:4]
-```
-
-Subsetting a tibble with `[` always returns a tibble
-
-### Preserving dimensionality
-
-- Data frames and tibbles behave differently
-- tibble will default to preserve dimensionality, data frames do not
-- this can lead to unexpected behavior and code breaking in the future
-- Use `drop = FALSE` to preserve dimensionality when subsetting a data frame or use tibbles
-
-
-```{r}
-tb <- tibble::tibble(a = 1:2, b = 1:2)
-
-# returns tibble
-str(tb[, "a"])
-tb[, "a"] # equivalent to tb[, "a", drop = FALSE]
-
-# returns integer vector
-# str(tb[, "a", drop = TRUE])
-tb[, "a", drop = TRUE]
-```
-
-```{r}
-df <- data.frame(a = 1:2, b = 1:2)
-
-# returns integer vector
-# str(df[, "a"])
-df[, "a"]
-
-# returns data frame with one column
-# str(df[, "a", drop = FALSE])
-df[, "a", drop = FALSE]
-```
-**Factors**
-
-Factor subsetting drop argument controls whether or not levels (rather than dimensions) are preserved.
-
-```{r}
-z <- factor(c("a", "b", "c"))
-z[1]
-z[1, drop = TRUE]
-```
-
-## Selecting a single element
-
-`[[` and `$` are used to extract single elements (note: a vector can be a single element)
-
-### `[[]]`
-
-Because `[[]]` can return only a single item, you must use it with either a single positive integer or a single string. 
-
-```{r train}
-x <- list(1:3, "a", 4:6)
-x[[1]]
-```
-
-Hadley Wickham recommends using `[[]]` with atomic vectors whenever you want to extract a single value to reinforce the expectation that you are getting and setting individual values. 
-
-### `$`
-
-- `x$y` is equivalent to `x[["y"]]`
-
-the `$` operator doesn't work with stored vals
-
-```{r}
-var <- "cyl"
-
-# Doesn't work - mtcars$var translated to mtcars[["var"]]
-mtcars$var
-
-# Instead use [[
-mtcars[[var]]
-```
-
-`$` allows partial matching, `[[]]` does not
-
-```{r}
-x <- list(abc = 1)
-x$a
-
-x[["a"]]
-
-```
-
-Hadley advises to change Global settings:
-
-```{r}
-options(warnPartialMatchDollar = TRUE)
-x$a
-```
-
-tibbles don't have this behavior
-
-```{r}
-penguins$s
-```
-
-### missing and out of bound indices
-
-- Due to the inconsistency of how R handles such indices, `purrr::pluck()` and `purrr::chuck()` are recommended
-
-```{r, eval=FALSE}
-x <- list(
-  a = list(1, 2, 3),
-  b = list(3, 4, 5)
-)
-purrr::pluck(x, "a", 1)
-# [1] 1
-purrr::pluck(x, "c", 1)
-# NULL
-purrr::pluck(x, "c", 1, .default = NA)
-# [1] NA
-```
-
-### `@` and `slot()`
-- `@` is `$` for S4 objects (to be revisited in Chapter 15)
-
-- `slot()` is `[[ ]]` for S4 objects
-
-## Subsetting and Assignment
-
-- Subsetting can be combined with assignment to edit values
-
-```{r}
-x <- c("Tigers", "Royals", "White Sox", "Twins", "Indians")
-
-x[5] <- "Guardians"
-
-x
-```
-
-- length of the subset and assignment vector should be the same to avoid recycling
-
-You can use NULL to remove a component
-
-```{r}
-x <- list(a = 1, b = 2)
-x[["b"]] <- NULL
-str(x)
-```
-
-Subsetting with nothing can preserve structure of original object
-
-```{r, eval=FALSE}
-# mtcars[] <- lapply(mtcars, as.integer)
-# is.data.frame(mtcars)
-# [1] TRUE
-# mtcars <- lapply(mtcars, as.integer)
-#> is.data.frame(mtcars)
-# [1] FALSE
-```
-
-## Applications
-
-Applications copied from cohort 2 slide
-
-### Lookup tables (character subsetting)
-
-```{r, eval=FALSE}
-x <- c("m", "f", "u", "f", "f", "m", "m")
-lookup <- c(m = "Male", f = "Female", u = NA)
-lookup[x]
-#        m        f        u        f        f        m        m 
-#   "Male" "Female"       NA "Female" "Female"   "Male"   "Male"
-```
-
-### Matching and merging by hand (integer subsetting)
-
-- The `match()` function allows merging a vector with a table
-
-```{r, eval=FALSE}
-grades <- c("D", "A", "C", "B", "F")
-info <- data.frame(
-  grade = c("A", "B", "C", "D", "F"),
-  desc = c("Excellent", "Very Good", "Average", "Fair", "Poor"),
-  fail = c(F, F, F, F, T)
-)
-id <- match(grades, info$grade)
-id
-# [1] 3 2 2 1 3
-info[id, ]
-#   grade      desc  fail
-# 4     D      Fair FALSE
-# 1     A Excellent FALSE
-# 3     C   Average FALSE
-# 2     B Very Good FALSE
-# 5     F      Poor  TRUE
-```
-
-### Random samples and bootstrapping (integer subsetting)
-
-```{r, eval=FALSE}
-# mtcars[sample(nrow(mtcars), 3), ] # use replace = TRUE to replace
-#                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
-# Lotus Europa       30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
-# Mazda RX4          21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
-# Cadillac Fleetwood 10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
-```
-
-### Ordering (integer subsetting)
-
-```{r, eval=FALSE}
-# mtcars[order(mtcars$mpg), ]
-#                      mpg cyl  disp  hp drat    wt  qsec vs am gear carb
-# Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
-# Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
-# Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
-# Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
-# Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
-# Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
-# ...
-```
-
-### Expanding aggregated counts (integer subsetting)
-
-- We can expand a count column by using `rep()`
-
-```{r, eval=FALSE}
-df <- tibble::tibble(x = c("Amy", "Julie", "Brian"), n = c(2, 1, 3))
-df[rep(1:nrow(df), df$n), ]
-# A tibble: 6 x 2
-#   x         n
-#   <chr> <dbl>
-# 1 Amy       2
-# 2 Amy       2
-# 3 Julie     1
-# 4 Brian     3
-# 5 Brian     3
-# 6 Brian     3
-```
-
-###  Removing columns from data frames (character)
-
-- We can remove a column by subsetting, which does not change the object
-
-```{r, eval=FALSE}
-df[, 1]
-# A tibble: 3 x 1
-#   x    
-#   <chr>
-# 1 Amy  
-# 2 Julie
-# 3 Brian
-```
-
-- We can also delete the column using `NULL`
-
-```{r, eval=FALSE}
-df$n <- NULL
-df
-# A tibble: 3 x 1
-#   x    
-#   <chr>
-# 1 Amy  
-# 2 Julie
-# 3 Brian
-```
-
-### Selecting rows based on a condition (logical subsetting)
-
-```{r, eval=FALSE}
-# mtcars[mtcars$gear == 5, ]
-#                 mpg cyl  disp  hp drat    wt qsec vs am gear carb
-# Porsche 914-2  26.0   4 120.3  91 4.43 2.140 16.7  0  1    5    2
-# Lotus Europa   30.4   4  95.1 113 3.77 1.513 16.9  1  1    5    2
-# Ford Pantera L 15.8   8 351.0 264 4.22 3.170 14.5  0  1    5    4
-# Ferrari Dino   19.7   6 145.0 175 3.62 2.770 15.5  0  1    5    6
-# Maserati Bora  15.0   8 301.0 335 3.54 3.570 14.6  0  1    5    8
-```
-
-### Boolean algebra versus sets (logical and integer)
-
-- `which()` gives the indices of a Boolean vector
-
-```{r, eval=FALSE}
-(x1 <- 1:10 %% 2 == 0) # 1-10 divisible by 2
-#  [1] FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE
-(x2 <- which(x1))
-# [1]  2  4  6  8 10
-(y1 <- 1:10 %% 5 == 0) # 1-10 divisible by 5
-#  [1] FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE  TRUE
-(y2 <- which(y1))
-# [1]  5 10
-x1 & y1
-# [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE
-```
diff --git a/slides/04.qmd b/slides/04.qmd
@@ -0,0 +1,376 @@
+---
+engine: knitr
+title: Subsetting
+---
+
+## Learning objectives:
+
+- Select multiple elements from a vector with `[`
+- Learn about the 3 subsetting operators: `[[`, `[`, and `$`
+- Learn how subsetting works with different vector types
+- Learn how subsetting can be combined with assignment
+
+# Selecting multiple elements
+
+## 1. Positive integers return elements at specified positions
+
+```{r}
+#| label: positive_int
+x <- c(1.1, 2.2, 3.3, 4.4) # decimal = original position
+x
+x[c(4, 1)]
+x[c(1, 1, 1)]
+x[c(1.9999)]
+```
+
+Reals *truncate* to integers.
+
+```{r}
+#| label: positive_real
+x[c(1.0001, 1.9999)]
+```
+
+## 2. Negative integers remove specified elements
+
+```{r}
+#| label: negative_int
+x[-c(1, 3)] # same as x[c(-1, -3)] or x[c(2, 4)]
+```
+
+## 2b. Mixing negative and positive integers throws an error
+
+```{r}
+#| label: mixed_int
+#| error: true
+x[c(-1, 3)]
+```
+
+## 2c. Zeros ignored with other ints 
+
+```{r}
+#| label: negative_int_zero
+x[c(-1, 0)]
+x[c(-1, 0, 0, 0, 0, 0 ,0 ,0)]
+x[c(1, 0, 2, 0, 3, 0)]
+```
+
+
+## 3. Logical vectors select specified elements
+
+```{r}
+#| label: logical_vec
+x[c(TRUE, TRUE, FALSE, TRUE)]
+x[x < 3]
+
+cond <- x > 2.5
+x[cond]
+```
+
+## 3b. Shorter element are recycled to higher length
+
+```{r}
+#| label: recycling
+x[FALSE]
+x[TRUE]
+x[c(FALSE, TRUE)] # equivalent to: x[c(FALSE, TRUE, FALSE, TRUE)]
+```
+
+- Easy to understand if x or y is 1, best to avoid other lengths
+
+## 3c. NA index returns NA
+
+```{r}
+#| label: missing_index
+x[c(NA, TRUE, NA, TRUE)]
+```
+## 3d. Extra TRUE index returns NA
+
+```{r}
+#| label: extra_index
+x[c(FALSE, TRUE, TRUE, TRUE, TRUE, TRUE)]
+x[1:5]
+```
+
+## 4. Indexing with nothing returns original vector
+
+```{r nothing}
+x[]
+```
+
+## 5. Indexing with just 0 returns 0-length vector (with class)
+
+```{r zero}
+x[0]
+letters[0]
+```
+
+## 6. Indexing with character vector returns element of named vector
+
+```{r character}
+(y <- setNames(x, letters[1:4]))
+y[c("d", "b", "a")]
+y[c("a", "a", "a")]
+```
+
+## 6b. Names must be exact for `[`
+
+```{r}
+#| label: exact_names
+z <- c(abc = 1, def = 2)
+z
+z[c("a", "d")]
+```
+
+## Subsetting a list with `[` returns a list
+
+```{r}
+#| label: list_subset_basics
+my_list <- list(a = c(T, F), b = letters[5:15], c = 100:108)
+my_list
+my_list[c("a", "b")]
+```
+
+## Lists use same rules for `[`
+
+```{r} 
+#| label: list_subset_multiple
+my_list[2:3]
+my_list[c(TRUE, FALSE, TRUE)]
+```
+
+## Matrices & arrays take multidimensional indices
+
+```{r}
+#| label: array_subset
+a <- matrix(1:9, nrow = 3)
+a
+a[1:2, 2:3] # rows, columns
+```
+
+## Matrices & arrays can accept character, logical, etc
+
+```{r}
+#| label: array_named
+colnames(a) <- c("A", "B", "C")
+a[c(TRUE, TRUE, FALSE), c("B", "A")] # a[1:2, 2:1]
+```
+
+## Matrices & arrays are also vectors
+
+```{r}
+#| label: array_vector
+vals <- outer(1:5, 1:5, FUN = "paste", sep = ",") # All chr combos of 1:5
+vals
+vals[c(4, 15)]
+a[a > 5]
+```
+
+## Data frames subset list-like with single index
+
+```{r}
+#| label: df_subset1
+df <- data.frame(x = 1:3, y = 3:1, z = letters[1:3])
+df[1:2]
+df[c("x", "z")]
+```
+
+## Data frames subset matrix-like with multiple indices
+
+```{r}
+df[1:2, c("x", "z")] # rows, columns
+df[df$x == 2, ] # matching rows, all columns
+df[, c("x", "z")] # equivalent to no ,
+```
+
+## Subsetting a tibble with `[` returns a tibble
+
+```{r}
+tbl <- tibble::as_tibble(df)
+df[, 1]
+df[, 1, drop = FALSE] # Prevent errors
+tbl[, 1]
+```
+
+# Selecting a single element
+
+## `[[` selects a single element
+
+:::: {.columns}
+
+::: {.column}
+```{r}
+x <- list(1:3, "a", 4:6)
+x[1]
+class(x[1])
+x[[1]]
+class(x[[1]])
+x[[1]][[1]]
+```
+:::
+
+::: {.column}
+
+![](images/subsetting/hadley-tweet.png)
+:::
+
+::::
+
+## `$` is shorthand for `[[..., exact = FALSE]]`
+
+```{r}
+#| label: dollar_subset
+#| warning: true
+x <- list(abc = 1)
+x$abc
+x$a
+x[["a"]]
+x[["a", exact = FALSE]]
+
+options(warnPartialMatchDollar = TRUE)
+x$a
+```
+
+## Behavior for missing-ish indices is inconsistent
+
+```{r}
+#| label: missingish_indices
+#| error: true
+a <- c(a = 1L, b = 2L)
+lst <- list(a = 1:2)
+
+# Errors:
+# a[[NULL]]
+# lst[[NULL]]
+# a[[5]]
+# lst[[5]]
+# a[["c"]]
+# a[[NA]]
+
+lst[["c"]]
+lst[[NA]]
+```
+
+## `purrr::pluck()` and `purrr::chuck()` provide consistent wrappers
+
+- `purrr::pluck()` always returns `NULL` or `.default` for (non-`NULL`) missing
+- `purrr::chuck()` always throws error
+
+```{r}
+purrr::pluck(a, 5)
+purrr::pluck(a, "c")
+purrr::pluck(lst, 5)
+purrr::pluck(lst, "c")
+```
+
+## S4 has two additional subsetting operators
+
+- `@` equivalent to `$` (but error if bad)
+- `slot()` equivalent to `[[`
+
+More in Chapter 15
+
+# Subsetting and assignment
+
+## Can assign to position with `[`
+
+```{r}
+x <- 1:5
+x[1:2] <- c(101, 102)
+x
+x[1:3] <- 1:2
+x
+```
+
+## Remove list component with `NULL`
+
+```{r}
+x <- list(a = 1, b = 2)
+x[["b"]] <- NULL
+x
+```
+
+## Use `list(NULL)` to add `NULL`
+
+```{r}
+x <- list(a = 1, b = 2)
+x[["b"]] <- list(NULL)
+x
+```
+
+## Subset with nothing to retain shape
+
+```{r}
+df <- data.frame(a = 1:3, b = 1:3)
+df[] <- "a"
+df
+df <- "a"
+df
+```
+
+# Applications
+
+## Use a lookup vector and recycling rules to translate values
+
+```{r}
+x <- c("b", "g", "x", "g", "g", "b")
+lookup <- c(b = "blue", g = "green", x = NA)
+lookup[x]
+unname(lookup[x])
+```
+
+## Use a lookup table to generate rows of data
+
+```{r}
+info <- data.frame(
+  code = c("b", "g", "x"),
+  color = c("blue", "green", NA),
+  other_thing = 3:1
+)
+match(x, info$code) # Indices of info$code in x
+info[match(x, info$code), ]
+```
+
+## Sort with `order()`
+
+```{r}
+x <- c("b", "c", "a")
+order(x)
+x[order(x)]
+
+df <- data.frame(b = 3:1, a = 1:3)
+df[order(df$b), ]
+df[, order(names(df))]
+```
+
+## Expand counts
+
+```{r}
+df <- data.frame(x = c(2, 4, 1), y = c(9, 11, 6), n = c(3, 5, 1))
+rep(1:nrow(df), df$n)
+df[rep(1:nrow(df), df$n), ]
+```
+
+## Ran out of time to make slides for
+
+Ideally a future cohort should expand these:
+
+- Remove df columns with `setdiff()`
+- Logically subset rows `df[df$col > 5, ]`
+- The next slide about `which()`
+
+## Boolean algebra versus sets (logical and integer)
+
+- `which()` gives the indices of a Boolean vector
+
+```{r, eval=FALSE}
+(x1 <- 1:10 %% 2 == 0) # 1-10 divisible by 2
+#  [1] FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE
+(x2 <- which(x1))
+# [1]  2  4  6  8 10
+(y1 <- 1:10 %% 5 == 0) # 1-10 divisible by 5
+#  [1] FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE  TRUE
+(y2 <- which(y1))
+# [1]  5 10
+x1 & y1
+# [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE
+```

	bookclub-advr DSLC Advanced R Book Club
	git clone https://git.eamoncaddigan.net/bookclub-advr.git
	Log \| Files \| Refs \| README \| LICENSE

M	_freeze/slides/04/execute-results/html.json	\|	14	++++++++------
D	slides/04.Rmd	\|	523	-------------------------------------------------------------------------------
A	slides/04.qmd	\|	376	+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++