commit 53cd24d75806f8dbda59b97aff056c7f1d9c44c2
parent 802c1533e2ed1c0736cb9aade05099ae82f83df7
Author: Jon Harmon <jonthegeek@gmail.com>
Date: Tue, 2 Sep 2025 11:10:03 -0500
Update chapter 4 Subsetting (#90)
Diffstat:
3 files changed, 384 insertions(+), 529 deletions(-)
diff --git a/_freeze/slides/04/execute-results/html.json b/_freeze/slides/04/execute-results/html.json
@@ -1,15 +1,17 @@
{
- "hash": "706ff898197dfba9ce51c9d83ac97658",
+ "hash": "b5ac0f8cfdf1cfbdbe9f5056fb42d7ee",
"result": {
"engine": "knitr",
- "markdown": "---\nengine: knitr\ntitle: Subsetting\n---\n\n## Learning objectives:\n\n- Learn about the 6 ways to subset atomic vectors\n- Learn about the 3 subsetting operators: `[[`, `[`, and `$`\n- Learn how subsetting works with different vector types\n- Learn how subsetting can be combined with assignment\n\n## Selecting multiple elements\n\n### Atomic Vectors\n\n- 6 ways to subset atomic vectors\n\nLet's take a look with an example vector.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- c(1.1, 2.2, 3.3, 4.4)\n```\n:::\n\n\n**Positive integer indices**\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# return elements at specified positions which can be out of order\nx[c(4, 1)]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 4.4 1.1\n```\n\n\n:::\n\n```{.r .cell-code}\n# duplicate indices return duplicate values\nx[c(2, 2)]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 2.2 2.2\n```\n\n\n:::\n\n```{.r .cell-code}\n# real numbers truncate to integers\n# so this behaves as if it is x[c(3, 3)]\nx[c(3.2, 3.8)]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 3.3 3.3\n```\n\n\n:::\n:::\n\n\n**Negative integer indices**\n\n\n::: {.cell}\n\n```{.r .cell-code}\n### excludes elements at specified positions\nx[-c(1, 3)] # same as x[c(-1, -3)] or x[c(2, 4)]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 2.2 4.4\n```\n\n\n:::\n\n```{.r .cell-code}\n### mixing positive and negative is a no-no\nx[c(-1, 3)]\n```\n\n::: {.cell-output .cell-output-error}\n\n```\n#> Error in x[c(-1, 3)]: only 0's may be mixed with negative subscripts\n```\n\n\n:::\n:::\n\n\n**Logical Vectors**\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx[c(TRUE, TRUE, FALSE, TRUE)]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1.1 2.2 4.4\n```\n\n\n:::\n\n```{.r .cell-code}\nx[x < 3]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1.1 2.2\n```\n\n\n:::\n\n```{.r .cell-code}\ncond <- x > 2.5\nx[cond]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 3.3 4.4\n```\n\n\n:::\n:::\n\n\n- **Recyling rules** applies when the two vectors are of different lengths\n- the shorter of the two is recycled to the length of the longer\n- Easy to understand if x or y is 1, best to avoid other lengths\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx[c(F, T)] # equivalent to: x[c(FALSE, TRUE, FALSE, TRUE)]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 2.2 4.4\n```\n\n\n:::\n:::\n\n\n**Missing values (NA)**\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Missing values in index will also return NA in output\nx[c(NA, TRUE)]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] NA 2.2 NA 4.4\n```\n\n\n:::\n:::\n\n\n**Nothing**\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# returns the original vector\nx[]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1.1 2.2 3.3 4.4\n```\n\n\n:::\n:::\n\n\n**Zero**\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# returns a zero-length vector\nx[0]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> numeric(0)\n```\n\n\n:::\n:::\n\n\n**Character vectors**\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# if name, you can use to return matched elements\n(y <- setNames(x, letters[1:4]))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> a b c d \n#> 1.1 2.2 3.3 4.4\n```\n\n\n:::\n\n```{.r .cell-code}\ny[c(\"d\", \"b\", \"a\")]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> d b a \n#> 4.4 2.2 1.1\n```\n\n\n:::\n\n```{.r .cell-code}\n# Like integer indices, you can repeat indices\ny[c(\"a\", \"a\", \"a\")]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> a a a \n#> 1.1 1.1 1.1\n```\n\n\n:::\n\n```{.r .cell-code}\n# When subsetting with [, names are always matched exactly\nz <- c(abc = 1, def = 2)\nz\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> abc def \n#> 1 2\n```\n\n\n:::\n\n```{.r .cell-code}\nz[c(\"a\", \"d\")]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> <NA> <NA> \n#> NA NA\n```\n\n\n:::\n:::\n\n\n### Lists\n\n- Subsetting works the same way\n- `[` always returns a list\n- `[[` and `$` let you pull elements out of a list\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmy_list <- list(a = c(T, F), b = letters[5:15], c = 100:108)\nmy_list\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> $a\n#> [1] TRUE FALSE\n#> \n#> $b\n#> [1] \"e\" \"f\" \"g\" \"h\" \"i\" \"j\" \"k\" \"l\" \"m\" \"n\" \"o\"\n#> \n#> $c\n#> [1] 100 101 102 103 104 105 106 107 108\n```\n\n\n:::\n:::\n\n\n**Return a (named) list**\n\n\n::: {.cell}\n\n```{.r .cell-code}\nl1 <- my_list[2]\nl1\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> $b\n#> [1] \"e\" \"f\" \"g\" \"h\" \"i\" \"j\" \"k\" \"l\" \"m\" \"n\" \"o\"\n```\n\n\n:::\n:::\n\n\n**Return a vector**\n\n\n::: {.cell}\n\n```{.r .cell-code}\nl2 <- my_list[[2]]\nl2\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"e\" \"f\" \"g\" \"h\" \"i\" \"j\" \"k\" \"l\" \"m\" \"n\" \"o\"\n```\n\n\n:::\n\n```{.r .cell-code}\nl2b <- my_list$b\nl2b\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"e\" \"f\" \"g\" \"h\" \"i\" \"j\" \"k\" \"l\" \"m\" \"n\" \"o\"\n```\n\n\n:::\n:::\n\n\n**Return a specific element**\n\n\n::: {.cell}\n\n```{.r .cell-code}\nl3 <- my_list[[2]][3]\nl3\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"g\"\n```\n\n\n:::\n\n```{.r .cell-code}\nl4 <- my_list[['b']][3]\nl4\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"g\"\n```\n\n\n:::\n\n```{.r .cell-code}\nl4b <- my_list$b[3]\nl4b\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"g\"\n```\n\n\n:::\n:::\n\n\n**Visual Representation**\n\n \n\nSee this stackoverflow article for more detailed information about the differences: https://stackoverflow.com/questions/1169456/the-difference-between-bracket-and-double-bracket-for-accessing-the-el\n\n### Matrices and arrays\n\nYou can subset higher dimensional structures in three ways:\n\n- with multiple vectors\n- with a single vector\n- with a matrix\n\n\n::: {.cell}\n\n```{.r .cell-code}\na <- matrix(1:12, nrow = 3)\ncolnames(a) <- c(\"A\", \"B\", \"C\", \"D\")\n\n# single row\na[1, ]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> A B C D \n#> 1 4 7 10\n```\n\n\n:::\n\n```{.r .cell-code}\n# single column\na[, 1]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1 2 3\n```\n\n\n:::\n\n```{.r .cell-code}\n# single element\na[1, 1]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> A \n#> 1\n```\n\n\n:::\n\n```{.r .cell-code}\n# two rows from two columns\na[1:2, 3:4]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> C D\n#> [1,] 7 10\n#> [2,] 8 11\n```\n\n\n:::\n\n```{.r .cell-code}\na[c(TRUE, FALSE, TRUE), c(\"B\", \"A\")]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> B A\n#> [1,] 4 1\n#> [2,] 6 3\n```\n\n\n:::\n\n```{.r .cell-code}\n# zero index and negative index\na[0, -2]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> A C D\n```\n\n\n:::\n:::\n\n\n**Subset a matrix with a matrix**\n\n\n::: {.cell}\n\n```{.r .cell-code}\nb <- matrix(1:4, nrow = 2)\nb\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [,1] [,2]\n#> [1,] 1 3\n#> [2,] 2 4\n```\n\n\n:::\n\n```{.r .cell-code}\na[b]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 7 11\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nvals <- outer(1:5, 1:5, FUN = \"paste\", sep = \",\")\nvals\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [,1] [,2] [,3] [,4] [,5] \n#> [1,] \"1,1\" \"1,2\" \"1,3\" \"1,4\" \"1,5\"\n#> [2,] \"2,1\" \"2,2\" \"2,3\" \"2,4\" \"2,5\"\n#> [3,] \"3,1\" \"3,2\" \"3,3\" \"3,4\" \"3,5\"\n#> [4,] \"4,1\" \"4,2\" \"4,3\" \"4,4\" \"4,5\"\n#> [5,] \"5,1\" \"5,2\" \"5,3\" \"5,4\" \"5,5\"\n```\n\n\n:::\n\n```{.r .cell-code}\nselect <- matrix(ncol = 2, byrow = TRUE, \n c(1, 1,\n 3, 1,\n 2, 4))\nselect\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [,1] [,2]\n#> [1,] 1 1\n#> [2,] 3 1\n#> [3,] 2 4\n```\n\n\n:::\n\n```{.r .cell-code}\nvals[select]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"1,1\" \"3,1\" \"2,4\"\n```\n\n\n:::\n:::\n\n\nMatrices and arrays are just special vectors; can subset with a single vector\n(arrays in R stored column wise)\n\n\n::: {.cell}\n\n```{.r .cell-code}\nvals[c(3, 15, 16, 17)]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"3,1\" \"5,3\" \"1,4\" \"2,4\"\n```\n\n\n:::\n:::\n\n\n### Data frames and tibbles\n\nData frames act like both lists and matrices\n\n- When subsetting with a single index, they behave like lists and index the columns, so `df[1:2]` selects the first two columns.\n- When subsetting with two indices, they behave like matrices, so `df[1:3, ]` selects the first three rows (and all the columns).\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(palmerpenguins)\n```\n\n::: {.cell-output .cell-output-stderr}\n\n```\n#> \n#> Attaching package: 'palmerpenguins'\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\n#> The following objects are masked from 'package:datasets':\n#> \n#> penguins, penguins_raw\n```\n\n\n:::\n\n```{.r .cell-code}\npenguins <- penguins\n\n# single index selects first two columns\ntwo_cols <- penguins[2:3] # or penguins[c(2,3)]\nhead(two_cols)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> # A tibble: 6 × 2\n#> island bill_length_mm\n#> <fct> <dbl>\n#> 1 Torgersen 39.1\n#> 2 Torgersen 39.5\n#> 3 Torgersen 40.3\n#> 4 Torgersen NA \n#> 5 Torgersen 36.7\n#> 6 Torgersen 39.3\n```\n\n\n:::\n\n```{.r .cell-code}\n# equivalent to the above code\nsame_two_cols <- penguins[c(\"island\", \"bill_length_mm\")]\nhead(same_two_cols)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> # A tibble: 6 × 2\n#> island bill_length_mm\n#> <fct> <dbl>\n#> 1 Torgersen 39.1\n#> 2 Torgersen 39.5\n#> 3 Torgersen 40.3\n#> 4 Torgersen NA \n#> 5 Torgersen 36.7\n#> 6 Torgersen 39.3\n```\n\n\n:::\n\n```{.r .cell-code}\n# two indices separated by comma (first two rows of 3rd and 4th columns)\npenguins[1:2, 3:4]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> # A tibble: 2 × 2\n#> bill_length_mm bill_depth_mm\n#> <dbl> <dbl>\n#> 1 39.1 18.7\n#> 2 39.5 17.4\n```\n\n\n:::\n\n```{.r .cell-code}\n# Can't do this...\npenguins[[3:4]][c(1:4)]\n```\n\n::: {.cell-output .cell-output-error}\n\n```\n#> Error:\n#> ! The `j` argument of `[[.tbl_df()` can't be a vector of length 2 as of\n#> tibble 3.0.0.\n#> ℹ Recursive subsetting is deprecated for tibbles.\n```\n\n\n:::\n\n```{.r .cell-code}\n# ...but this works...\npenguins[[3]][c(1:4)]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 39.1 39.5 40.3 NA\n```\n\n\n:::\n\n```{.r .cell-code}\n# ...or this equivalent...\npenguins$bill_length_mm[1:4]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 39.1 39.5 40.3 NA\n```\n\n\n:::\n:::\n\n\nSubsetting a tibble with `[` always returns a tibble\n\n### Preserving dimensionality\n\n- Data frames and tibbles behave differently\n- tibble will default to preserve dimensionality, data frames do not\n- this can lead to unexpected behavior and code breaking in the future\n- Use `drop = FALSE` to preserve dimensionality when subsetting a data frame or use tibbles\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntb <- tibble::tibble(a = 1:2, b = 1:2)\n\n# returns tibble\nstr(tb[, \"a\"])\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> tibble [2 × 1] (S3: tbl_df/tbl/data.frame)\n#> $ a: int [1:2] 1 2\n```\n\n\n:::\n\n```{.r .cell-code}\ntb[, \"a\"] # equivalent to tb[, \"a\", drop = FALSE]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> # A tibble: 2 × 1\n#> a\n#> <int>\n#> 1 1\n#> 2 2\n```\n\n\n:::\n\n```{.r .cell-code}\n# returns integer vector\n# str(tb[, \"a\", drop = TRUE])\ntb[, \"a\", drop = TRUE]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1 2\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndf <- data.frame(a = 1:2, b = 1:2)\n\n# returns integer vector\n# str(df[, \"a\"])\ndf[, \"a\"]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1 2\n```\n\n\n:::\n\n```{.r .cell-code}\n# returns data frame with one column\n# str(df[, \"a\", drop = FALSE])\ndf[, \"a\", drop = FALSE]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> a\n#> 1 1\n#> 2 2\n```\n\n\n:::\n:::\n\n**Factors**\n\nFactor subsetting drop argument controls whether or not levels (rather than dimensions) are preserved.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nz <- factor(c(\"a\", \"b\", \"c\"))\nz[1]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] a\n#> Levels: a b c\n```\n\n\n:::\n\n```{.r .cell-code}\nz[1, drop = TRUE]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] a\n#> Levels: a\n```\n\n\n:::\n:::\n\n\n## Selecting a single element\n\n`[[` and `$` are used to extract single elements (note: a vector can be a single element)\n\n### `[[]]`\n\nBecause `[[]]` can return only a single item, you must use it with either a single positive integer or a single string. \n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- list(1:3, \"a\", 4:6)\nx[[1]]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1 2 3\n```\n\n\n:::\n:::\n\n\nHadley Wickham recommends using `[[]]` with atomic vectors whenever you want to extract a single value to reinforce the expectation that you are getting and setting individual values. \n\n### `$`\n\n- `x$y` is equivalent to `x[[\"y\"]]`\n\nthe `$` operator doesn't work with stored vals\n\n\n::: {.cell}\n\n```{.r .cell-code}\nvar <- \"cyl\"\n\n# Doesn't work - mtcars$var translated to mtcars[[\"var\"]]\nmtcars$var\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> NULL\n```\n\n\n:::\n\n```{.r .cell-code}\n# Instead use [[\nmtcars[[var]]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4\n```\n\n\n:::\n:::\n\n\n`$` allows partial matching, `[[]]` does not\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- list(abc = 1)\nx$a\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1\n```\n\n\n:::\n\n```{.r .cell-code}\nx[[\"a\"]]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> NULL\n```\n\n\n:::\n:::\n\n\nHadley advises to change Global settings:\n\n\n::: {.cell}\n\n```{.r .cell-code}\noptions(warnPartialMatchDollar = TRUE)\nx$a\n```\n\n::: {.cell-output .cell-output-stderr}\n\n```\n#> Warning in x$a: partial match of 'a' to 'abc'\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1\n```\n\n\n:::\n:::\n\n\ntibbles don't have this behavior\n\n\n::: {.cell}\n\n```{.r .cell-code}\npenguins$s\n```\n\n::: {.cell-output .cell-output-stderr}\n\n```\n#> Warning: Unknown or uninitialised column: `s`.\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> NULL\n```\n\n\n:::\n:::\n\n\n### missing and out of bound indices\n\n- Due to the inconsistency of how R handles such indices, `purrr::pluck()` and `purrr::chuck()` are recommended\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- list(\n a = list(1, 2, 3),\n b = list(3, 4, 5)\n)\npurrr::pluck(x, \"a\", 1)\n# [1] 1\npurrr::pluck(x, \"c\", 1)\n# NULL\npurrr::pluck(x, \"c\", 1, .default = NA)\n# [1] NA\n```\n:::\n\n\n### `@` and `slot()`\n- `@` is `$` for S4 objects (to be revisited in Chapter 15)\n\n- `slot()` is `[[ ]]` for S4 objects\n\n## Subsetting and Assignment\n\n- Subsetting can be combined with assignment to edit values\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- c(\"Tigers\", \"Royals\", \"White Sox\", \"Twins\", \"Indians\")\n\nx[5] <- \"Guardians\"\n\nx\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"Tigers\" \"Royals\" \"White Sox\" \"Twins\" \"Guardians\"\n```\n\n\n:::\n:::\n\n\n- length of the subset and assignment vector should be the same to avoid recycling\n\nYou can use NULL to remove a component\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- list(a = 1, b = 2)\nx[[\"b\"]] <- NULL\nstr(x)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> List of 1\n#> $ a: num 1\n```\n\n\n:::\n:::\n\n\nSubsetting with nothing can preserve structure of original object\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# mtcars[] <- lapply(mtcars, as.integer)\n# is.data.frame(mtcars)\n# [1] TRUE\n# mtcars <- lapply(mtcars, as.integer)\n#> is.data.frame(mtcars)\n# [1] FALSE\n```\n:::\n\n\n## Applications\n\nApplications copied from cohort 2 slide\n\n### Lookup tables (character subsetting)\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- c(\"m\", \"f\", \"u\", \"f\", \"f\", \"m\", \"m\")\nlookup <- c(m = \"Male\", f = \"Female\", u = NA)\nlookup[x]\n# m f u f f m m \n# \"Male\" \"Female\" NA \"Female\" \"Female\" \"Male\" \"Male\"\n```\n:::\n\n\n### Matching and merging by hand (integer subsetting)\n\n- The `match()` function allows merging a vector with a table\n\n\n::: {.cell}\n\n```{.r .cell-code}\ngrades <- c(\"D\", \"A\", \"C\", \"B\", \"F\")\ninfo <- data.frame(\n grade = c(\"A\", \"B\", \"C\", \"D\", \"F\"),\n desc = c(\"Excellent\", \"Very Good\", \"Average\", \"Fair\", \"Poor\"),\n fail = c(F, F, F, F, T)\n)\nid <- match(grades, info$grade)\nid\n# [1] 3 2 2 1 3\ninfo[id, ]\n# grade desc fail\n# 4 D Fair FALSE\n# 1 A Excellent FALSE\n# 3 C Average FALSE\n# 2 B Very Good FALSE\n# 5 F Poor TRUE\n```\n:::\n\n\n### Random samples and bootstrapping (integer subsetting)\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# mtcars[sample(nrow(mtcars), 3), ] # use replace = TRUE to replace\n# mpg cyl disp hp drat wt qsec vs am gear carb\n# Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2\n# Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4\n# Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4\n```\n:::\n\n\n### Ordering (integer subsetting)\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# mtcars[order(mtcars$mpg), ]\n# mpg cyl disp hp drat wt qsec vs am gear carb\n# Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4\n# Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4\n# Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4\n# Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4\n# Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4\n# Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8\n# ...\n```\n:::\n\n\n### Expanding aggregated counts (integer subsetting)\n\n- We can expand a count column by using `rep()`\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndf <- tibble::tibble(x = c(\"Amy\", \"Julie\", \"Brian\"), n = c(2, 1, 3))\ndf[rep(1:nrow(df), df$n), ]\n# A tibble: 6 x 2\n# x n\n# <chr> <dbl>\n# 1 Amy 2\n# 2 Amy 2\n# 3 Julie 1\n# 4 Brian 3\n# 5 Brian 3\n# 6 Brian 3\n```\n:::\n\n\n### Removing columns from data frames (character)\n\n- We can remove a column by subsetting, which does not change the object\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndf[, 1]\n# A tibble: 3 x 1\n# x \n# <chr>\n# 1 Amy \n# 2 Julie\n# 3 Brian\n```\n:::\n\n\n- We can also delete the column using `NULL`\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndf$n <- NULL\ndf\n# A tibble: 3 x 1\n# x \n# <chr>\n# 1 Amy \n# 2 Julie\n# 3 Brian\n```\n:::\n\n\n### Selecting rows based on a condition (logical subsetting)\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# mtcars[mtcars$gear == 5, ]\n# mpg cyl disp hp drat wt qsec vs am gear carb\n# Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.7 0 1 5 2\n# Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.9 1 1 5 2\n# Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.5 0 1 5 4\n# Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.5 0 1 5 6\n# Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.6 0 1 5 8\n```\n:::\n\n\n### Boolean algebra versus sets (logical and integer)\n\n- `which()` gives the indices of a Boolean vector\n\n\n::: {.cell}\n\n```{.r .cell-code}\n(x1 <- 1:10 %% 2 == 0) # 1-10 divisible by 2\n# [1] FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE\n(x2 <- which(x1))\n# [1] 2 4 6 8 10\n(y1 <- 1:10 %% 5 == 0) # 1-10 divisible by 5\n# [1] FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE TRUE\n(y2 <- which(y1))\n# [1] 5 10\nx1 & y1\n# [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE\n```\n:::\n\n",
- "supporting": [
- "04_files"
- ],
+ "markdown": "---\nengine: knitr\ntitle: Subsetting\n---\n\n## Learning objectives:\n\n- Select multiple elements from a vector with `[`\n- Learn about the 3 subsetting operators: `[[`, `[`, and `$`\n- Learn how subsetting works with different vector types\n- Learn how subsetting can be combined with assignment\n\n# Selecting multiple elements\n\n## 1. Positive integers return elements at specified positions\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- c(1.1, 2.2, 3.3, 4.4) # decimal = original position\nx\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1.1 2.2 3.3 4.4\n```\n\n\n:::\n\n```{.r .cell-code}\nx[c(4, 1)]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 4.4 1.1\n```\n\n\n:::\n\n```{.r .cell-code}\nx[c(1, 1, 1)]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1.1 1.1 1.1\n```\n\n\n:::\n\n```{.r .cell-code}\nx[c(1.9999)]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1.1\n```\n\n\n:::\n:::\n\n\nReals *truncate* to integers.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx[c(1.0001, 1.9999)]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1.1 1.1\n```\n\n\n:::\n:::\n\n\n## 2. Negative integers remove specified elements\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx[-c(1, 3)] # same as x[c(-1, -3)] or x[c(2, 4)]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 2.2 4.4\n```\n\n\n:::\n:::\n\n\n## 2b. Mixing negative and positive integers throws an error\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx[c(-1, 3)]\n```\n\n::: {.cell-output .cell-output-error}\n\n```\n#> Error in x[c(-1, 3)]: only 0's may be mixed with negative subscripts\n```\n\n\n:::\n:::\n\n\n## 2c. Zeros ignored with other ints \n\n\n::: {.cell}\n\n```{.r .cell-code}\nx[c(-1, 0)]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 2.2 3.3 4.4\n```\n\n\n:::\n\n```{.r .cell-code}\nx[c(-1, 0, 0, 0, 0, 0 ,0 ,0)]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 2.2 3.3 4.4\n```\n\n\n:::\n\n```{.r .cell-code}\nx[c(1, 0, 2, 0, 3, 0)]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1.1 2.2 3.3\n```\n\n\n:::\n:::\n\n\n\n## 3. Logical vectors select specified elements\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx[c(TRUE, TRUE, FALSE, TRUE)]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1.1 2.2 4.4\n```\n\n\n:::\n\n```{.r .cell-code}\nx[x < 3]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1.1 2.2\n```\n\n\n:::\n\n```{.r .cell-code}\ncond <- x > 2.5\nx[cond]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 3.3 4.4\n```\n\n\n:::\n:::\n\n\n## 3b. Shorter element are recycled to higher length\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx[FALSE]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> numeric(0)\n```\n\n\n:::\n\n```{.r .cell-code}\nx[TRUE]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1.1 2.2 3.3 4.4\n```\n\n\n:::\n\n```{.r .cell-code}\nx[c(FALSE, TRUE)] # equivalent to: x[c(FALSE, TRUE, FALSE, TRUE)]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 2.2 4.4\n```\n\n\n:::\n:::\n\n\n- Easy to understand if x or y is 1, best to avoid other lengths\n\n## 3c. NA index returns NA\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx[c(NA, TRUE, NA, TRUE)]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] NA 2.2 NA 4.4\n```\n\n\n:::\n:::\n\n## 3d. Extra TRUE index returns NA\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx[c(FALSE, TRUE, TRUE, TRUE, TRUE, TRUE)]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 2.2 3.3 4.4 NA NA\n```\n\n\n:::\n\n```{.r .cell-code}\nx[1:5]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1.1 2.2 3.3 4.4 NA\n```\n\n\n:::\n:::\n\n\n## 4. Indexing with nothing returns original vector\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx[]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1.1 2.2 3.3 4.4\n```\n\n\n:::\n:::\n\n\n## 5. Indexing with just 0 returns 0-length vector (with class)\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx[0]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> numeric(0)\n```\n\n\n:::\n\n```{.r .cell-code}\nletters[0]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> character(0)\n```\n\n\n:::\n:::\n\n\n## 6. Indexing with character vector returns element of named vector\n\n\n::: {.cell}\n\n```{.r .cell-code}\n(y <- setNames(x, letters[1:4]))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> a b c d \n#> 1.1 2.2 3.3 4.4\n```\n\n\n:::\n\n```{.r .cell-code}\ny[c(\"d\", \"b\", \"a\")]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> d b a \n#> 4.4 2.2 1.1\n```\n\n\n:::\n\n```{.r .cell-code}\ny[c(\"a\", \"a\", \"a\")]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> a a a \n#> 1.1 1.1 1.1\n```\n\n\n:::\n:::\n\n\n## 6b. Names must be exact for `[`\n\n\n::: {.cell}\n\n```{.r .cell-code}\nz <- c(abc = 1, def = 2)\nz\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> abc def \n#> 1 2\n```\n\n\n:::\n\n```{.r .cell-code}\nz[c(\"a\", \"d\")]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> <NA> <NA> \n#> NA NA\n```\n\n\n:::\n:::\n\n\n## Subsetting a list with `[` returns a list\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmy_list <- list(a = c(T, F), b = letters[5:15], c = 100:108)\nmy_list\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> $a\n#> [1] TRUE FALSE\n#> \n#> $b\n#> [1] \"e\" \"f\" \"g\" \"h\" \"i\" \"j\" \"k\" \"l\" \"m\" \"n\" \"o\"\n#> \n#> $c\n#> [1] 100 101 102 103 104 105 106 107 108\n```\n\n\n:::\n\n```{.r .cell-code}\nmy_list[c(\"a\", \"b\")]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> $a\n#> [1] TRUE FALSE\n#> \n#> $b\n#> [1] \"e\" \"f\" \"g\" \"h\" \"i\" \"j\" \"k\" \"l\" \"m\" \"n\" \"o\"\n```\n\n\n:::\n:::\n\n\n## Lists use same rules for `[`\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmy_list[2:3]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> $b\n#> [1] \"e\" \"f\" \"g\" \"h\" \"i\" \"j\" \"k\" \"l\" \"m\" \"n\" \"o\"\n#> \n#> $c\n#> [1] 100 101 102 103 104 105 106 107 108\n```\n\n\n:::\n\n```{.r .cell-code}\nmy_list[c(TRUE, FALSE, TRUE)]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> $a\n#> [1] TRUE FALSE\n#> \n#> $c\n#> [1] 100 101 102 103 104 105 106 107 108\n```\n\n\n:::\n:::\n\n\n## Matrices & arrays take multidimensional indices\n\n\n::: {.cell}\n\n```{.r .cell-code}\na <- matrix(1:9, nrow = 3)\na\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [,1] [,2] [,3]\n#> [1,] 1 4 7\n#> [2,] 2 5 8\n#> [3,] 3 6 9\n```\n\n\n:::\n\n```{.r .cell-code}\na[1:2, 2:3] # rows, columns\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [,1] [,2]\n#> [1,] 4 7\n#> [2,] 5 8\n```\n\n\n:::\n:::\n\n\n## Matrices & arrays can accept character, logical, etc\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncolnames(a) <- c(\"A\", \"B\", \"C\")\na[c(TRUE, TRUE, FALSE), c(\"B\", \"A\")] # a[1:2, 2:1]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> B A\n#> [1,] 4 1\n#> [2,] 5 2\n```\n\n\n:::\n:::\n\n\n## Matrices & arrays are also vectors\n\n\n::: {.cell}\n\n```{.r .cell-code}\nvals <- outer(1:5, 1:5, FUN = \"paste\", sep = \",\") # All chr combos of 1:5\nvals\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [,1] [,2] [,3] [,4] [,5] \n#> [1,] \"1,1\" \"1,2\" \"1,3\" \"1,4\" \"1,5\"\n#> [2,] \"2,1\" \"2,2\" \"2,3\" \"2,4\" \"2,5\"\n#> [3,] \"3,1\" \"3,2\" \"3,3\" \"3,4\" \"3,5\"\n#> [4,] \"4,1\" \"4,2\" \"4,3\" \"4,4\" \"4,5\"\n#> [5,] \"5,1\" \"5,2\" \"5,3\" \"5,4\" \"5,5\"\n```\n\n\n:::\n\n```{.r .cell-code}\nvals[c(4, 15)]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"4,1\" \"5,3\"\n```\n\n\n:::\n\n```{.r .cell-code}\na[a > 5]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 6 7 8 9\n```\n\n\n:::\n:::\n\n\n## Data frames subset list-like with single index\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndf <- data.frame(x = 1:3, y = 3:1, z = letters[1:3])\ndf[1:2]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> x y\n#> 1 1 3\n#> 2 2 2\n#> 3 3 1\n```\n\n\n:::\n\n```{.r .cell-code}\ndf[c(\"x\", \"z\")]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> x z\n#> 1 1 a\n#> 2 2 b\n#> 3 3 c\n```\n\n\n:::\n:::\n\n\n## Data frames subset matrix-like with multiple indices\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndf[1:2, c(\"x\", \"z\")] # rows, columns\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> x z\n#> 1 1 a\n#> 2 2 b\n```\n\n\n:::\n\n```{.r .cell-code}\ndf[df$x == 2, ] # matching rows, all columns\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> x y z\n#> 2 2 2 b\n```\n\n\n:::\n\n```{.r .cell-code}\ndf[, c(\"x\", \"z\")] # equivalent to no ,\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> x z\n#> 1 1 a\n#> 2 2 b\n#> 3 3 c\n```\n\n\n:::\n:::\n\n\n## Subsetting a tibble with `[` returns a tibble\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntbl <- tibble::as_tibble(df)\ndf[, 1]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1 2 3\n```\n\n\n:::\n\n```{.r .cell-code}\ndf[, 1, drop = FALSE] # Prevent errors\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> x\n#> 1 1\n#> 2 2\n#> 3 3\n```\n\n\n:::\n\n```{.r .cell-code}\ntbl[, 1]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> # A tibble: 3 × 1\n#> x\n#> <int>\n#> 1 1\n#> 2 2\n#> 3 3\n```\n\n\n:::\n:::\n\n\n# Selecting a single element\n\n## `[[` selects a single element\n\n:::: {.columns}\n\n::: {.column}\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- list(1:3, \"a\", 4:6)\nx[1]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [[1]]\n#> [1] 1 2 3\n```\n\n\n:::\n\n```{.r .cell-code}\nclass(x[1])\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"list\"\n```\n\n\n:::\n\n```{.r .cell-code}\nx[[1]]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1 2 3\n```\n\n\n:::\n\n```{.r .cell-code}\nclass(x[[1]])\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"integer\"\n```\n\n\n:::\n\n```{.r .cell-code}\nx[[1]][[1]]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1\n```\n\n\n:::\n:::\n\n:::\n\n::: {.column}\n\n\n:::\n\n::::\n\n## `$` is shorthand for `[[..., exact = FALSE]]`\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- list(abc = 1)\nx$abc\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1\n```\n\n\n:::\n\n```{.r .cell-code}\nx$a\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1\n```\n\n\n:::\n\n```{.r .cell-code}\nx[[\"a\"]]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> NULL\n```\n\n\n:::\n\n```{.r .cell-code}\nx[[\"a\", exact = FALSE]]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1\n```\n\n\n:::\n\n```{.r .cell-code}\noptions(warnPartialMatchDollar = TRUE)\nx$a\n```\n\n::: {.cell-output .cell-output-stderr}\n\n```\n#> Warning in x$a: partial match of 'a' to 'abc'\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1\n```\n\n\n:::\n:::\n\n\n## Behavior for missing-ish indices is inconsistent\n\n\n::: {.cell}\n\n```{.r .cell-code}\na <- c(a = 1L, b = 2L)\nlst <- list(a = 1:2)\n\n# Errors:\n# a[[NULL]]\n# lst[[NULL]]\n# a[[5]]\n# lst[[5]]\n# a[[\"c\"]]\n# a[[NA]]\n\nlst[[\"c\"]]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> NULL\n```\n\n\n:::\n\n```{.r .cell-code}\nlst[[NA]]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> NULL\n```\n\n\n:::\n:::\n\n\n## `purrr::pluck()` and `purrr::chuck()` provide consistent wrappers\n\n- `purrr::pluck()` always returns `NULL` or `.default` for (non-`NULL`) missing\n- `purrr::chuck()` always throws error\n\n\n::: {.cell}\n\n```{.r .cell-code}\npurrr::pluck(a, 5)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> NULL\n```\n\n\n:::\n\n```{.r .cell-code}\npurrr::pluck(a, \"c\")\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> NULL\n```\n\n\n:::\n\n```{.r .cell-code}\npurrr::pluck(lst, 5)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> NULL\n```\n\n\n:::\n\n```{.r .cell-code}\npurrr::pluck(lst, \"c\")\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> NULL\n```\n\n\n:::\n:::\n\n\n## S4 has two additional subsetting operators\n\n- `@` equivalent to `$` (but error if bad)\n- `slot()` equivalent to `[[`\n\nMore in Chapter 15\n\n# Subsetting and assignment\n\n## Can assign to position with `[`\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- 1:5\nx[1:2] <- c(101, 102)\nx\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 101 102 3 4 5\n```\n\n\n:::\n\n```{.r .cell-code}\nx[1:3] <- 1:2\nx\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1 2 1 4 5\n```\n\n\n:::\n:::\n\n\n## Remove list component with `NULL`\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- list(a = 1, b = 2)\nx[[\"b\"]] <- NULL\nx\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> $a\n#> [1] 1\n```\n\n\n:::\n:::\n\n\n## Use `list(NULL)` to add `NULL`\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- list(a = 1, b = 2)\nx[[\"b\"]] <- list(NULL)\nx\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> $a\n#> [1] 1\n#> \n#> $b\n#> $b[[1]]\n#> NULL\n```\n\n\n:::\n:::\n\n\n## Subset with nothing to retain shape\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndf <- data.frame(a = 1:3, b = 1:3)\ndf[] <- \"a\"\ndf\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> a b\n#> 1 a a\n#> 2 a a\n#> 3 a a\n```\n\n\n:::\n\n```{.r .cell-code}\ndf <- \"a\"\ndf\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"a\"\n```\n\n\n:::\n:::\n\n\n# Applications\n\n## Use a lookup vector and recycling rules to translate values\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- c(\"b\", \"g\", \"x\", \"g\", \"g\", \"b\")\nlookup <- c(b = \"blue\", g = \"green\", x = NA)\nlookup[x]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> b g x g g b \n#> \"blue\" \"green\" NA \"green\" \"green\" \"blue\"\n```\n\n\n:::\n\n```{.r .cell-code}\nunname(lookup[x])\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"blue\" \"green\" NA \"green\" \"green\" \"blue\"\n```\n\n\n:::\n:::\n\n\n## Use a lookup table to generate rows of data\n\n\n::: {.cell}\n\n```{.r .cell-code}\ninfo <- data.frame(\n code = c(\"b\", \"g\", \"x\"),\n color = c(\"blue\", \"green\", NA),\n other_thing = 3:1\n)\nmatch(x, info$code) # Indices of info$code in x\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1 2 3 2 2 1\n```\n\n\n:::\n\n```{.r .cell-code}\ninfo[match(x, info$code), ]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> code color other_thing\n#> 1 b blue 3\n#> 2 g green 2\n#> 3 x <NA> 1\n#> 2.1 g green 2\n#> 2.2 g green 2\n#> 1.1 b blue 3\n```\n\n\n:::\n:::\n\n\n## Sort with `order()`\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- c(\"b\", \"c\", \"a\")\norder(x)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 3 1 2\n```\n\n\n:::\n\n```{.r .cell-code}\nx[order(x)]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"a\" \"b\" \"c\"\n```\n\n\n:::\n\n```{.r .cell-code}\ndf <- data.frame(b = 3:1, a = 1:3)\ndf[order(df$b), ]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> b a\n#> 3 1 3\n#> 2 2 2\n#> 1 3 1\n```\n\n\n:::\n\n```{.r .cell-code}\ndf[, order(names(df))]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> a b\n#> 1 1 3\n#> 2 2 2\n#> 3 3 1\n```\n\n\n:::\n:::\n\n\n## Expand counts\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndf <- data.frame(x = c(2, 4, 1), y = c(9, 11, 6), n = c(3, 5, 1))\nrep(1:nrow(df), df$n)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1 1 1 2 2 2 2 2 3\n```\n\n\n:::\n\n```{.r .cell-code}\ndf[rep(1:nrow(df), df$n), ]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> x y n\n#> 1 2 9 3\n#> 1.1 2 9 3\n#> 1.2 2 9 3\n#> 2 4 11 5\n#> 2.1 4 11 5\n#> 2.2 4 11 5\n#> 2.3 4 11 5\n#> 2.4 4 11 5\n#> 3 1 6 1\n```\n\n\n:::\n:::\n\n\n## Ran out of time to make slides for\n\nIdeally a future cohort should expand these:\n\n- Remove df columns with `setdiff()`\n- Logically subset rows `df[df$col > 5, ]`\n- The next slide about `which()`\n\n## Boolean algebra versus sets (logical and integer)\n\n- `which()` gives the indices of a Boolean vector\n\n\n::: {.cell}\n\n```{.r .cell-code}\n(x1 <- 1:10 %% 2 == 0) # 1-10 divisible by 2\n# [1] FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE\n(x2 <- which(x1))\n# [1] 2 4 6 8 10\n(y1 <- 1:10 %% 5 == 0) # 1-10 divisible by 5\n# [1] FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE TRUE\n(y2 <- which(y1))\n# [1] 5 10\nx1 & y1\n# [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE\n```\n:::\n\n",
+ "supporting": [],
"filters": [
"rmarkdown/pagebreak.lua"
],
- "includes": {},
+ "includes": {
+ "include-after-body": [
+ "\n<script>\n // htmlwidgets need to know to resize themselves when slides are shown/hidden.\n // Fire the \"slideenter\" event (handled by htmlwidgets.js) when the current\n // slide changes (different for each slide format).\n (function () {\n // dispatch for htmlwidgets\n function fireSlideEnter() {\n const event = window.document.createEvent(\"Event\");\n event.initEvent(\"slideenter\", true, true);\n window.document.dispatchEvent(event);\n }\n\n function fireSlideChanged(previousSlide, currentSlide) {\n fireSlideEnter();\n\n // dispatch for shiny\n if (window.jQuery) {\n if (previousSlide) {\n window.jQuery(previousSlide).trigger(\"hidden\");\n }\n if (currentSlide) {\n window.jQuery(currentSlide).trigger(\"shown\");\n }\n }\n }\n\n // hookup for slidy\n if (window.w3c_slidy) {\n window.w3c_slidy.add_observer(function (slide_num) {\n // slide_num starts at position 1\n fireSlideChanged(null, w3c_slidy.slides[slide_num - 1]);\n });\n }\n\n })();\n</script>\n\n"
+ ]
+ },
"engineDependencies": {},
"preserve": {},
"postProcess": true
diff --git a/slides/04.Rmd b/slides/04.Rmd
@@ -1,523 +0,0 @@
----
-engine: knitr
-title: Subsetting
----
-
-## Learning objectives:
-
-- Learn about the 6 ways to subset atomic vectors
-- Learn about the 3 subsetting operators: `[[`, `[`, and `$`
-- Learn how subsetting works with different vector types
-- Learn how subsetting can be combined with assignment
-
-## Selecting multiple elements
-
-### Atomic Vectors
-
-- 6 ways to subset atomic vectors
-
-Let's take a look with an example vector.
-
-```{r atomic_vector}
-x <- c(1.1, 2.2, 3.3, 4.4)
-```
-
-**Positive integer indices**
-
-```{r positive_int}
-# return elements at specified positions which can be out of order
-x[c(4, 1)]
-
-# duplicate indices return duplicate values
-x[c(2, 2)]
-
-# real numbers truncate to integers
-# so this behaves as if it is x[c(3, 3)]
-x[c(3.2, 3.8)]
-```
-
-**Negative integer indices**
-
-```{r, error=TRUE}
-### excludes elements at specified positions
-x[-c(1, 3)] # same as x[c(-1, -3)] or x[c(2, 4)]
-
-### mixing positive and negative is a no-no
-x[c(-1, 3)]
-```
-
-**Logical Vectors**
-
-```{r logical_vec}
-x[c(TRUE, TRUE, FALSE, TRUE)]
-
-x[x < 3]
-
-cond <- x > 2.5
-x[cond]
-```
-
-- **Recyling rules** applies when the two vectors are of different lengths
-- the shorter of the two is recycled to the length of the longer
-- Easy to understand if x or y is 1, best to avoid other lengths
-
-```{r}
-x[c(F, T)] # equivalent to: x[c(FALSE, TRUE, FALSE, TRUE)]
-```
-
-**Missing values (NA)**
-
-```{r missing}
-# Missing values in index will also return NA in output
-x[c(NA, TRUE)]
-```
-
-**Nothing**
-
-```{r nothing}
-# returns the original vector
-x[]
-```
-
-**Zero**
-
-```{r zero}
-# returns a zero-length vector
-x[0]
-```
-
-**Character vectors**
-
-```{r character}
-# if name, you can use to return matched elements
-(y <- setNames(x, letters[1:4]))
-
-y[c("d", "b", "a")]
-
-# Like integer indices, you can repeat indices
-y[c("a", "a", "a")]
-
-# When subsetting with [, names are always matched exactly
-z <- c(abc = 1, def = 2)
-z
-z[c("a", "d")]
-```
-
-### Lists
-
-- Subsetting works the same way
-- `[` always returns a list
-- `[[` and `$` let you pull elements out of a list
-
-```{r}
-my_list <- list(a = c(T, F), b = letters[5:15], c = 100:108)
-my_list
-```
-
-**Return a (named) list**
-
-```{r}
-l1 <- my_list[2]
-l1
-```
-
-**Return a vector**
-
-```{r}
-l2 <- my_list[[2]]
-l2
-l2b <- my_list$b
-l2b
-```
-
-**Return a specific element**
-
-```{r}
-l3 <- my_list[[2]][3]
-l3
-l4 <- my_list[['b']][3]
-l4
-l4b <- my_list$b[3]
-l4b
-```
-
-**Visual Representation**
-
-
-
-See this stackoverflow article for more detailed information about the differences: https://stackoverflow.com/questions/1169456/the-difference-between-bracket-and-double-bracket-for-accessing-the-el
-
-### Matrices and arrays
-
-You can subset higher dimensional structures in three ways:
-
-- with multiple vectors
-- with a single vector
-- with a matrix
-
-```{r}
-a <- matrix(1:12, nrow = 3)
-colnames(a) <- c("A", "B", "C", "D")
-
-# single row
-a[1, ]
-
-# single column
-a[, 1]
-
-# single element
-a[1, 1]
-
-# two rows from two columns
-a[1:2, 3:4]
-
-a[c(TRUE, FALSE, TRUE), c("B", "A")]
-
-# zero index and negative index
-a[0, -2]
-```
-
-**Subset a matrix with a matrix**
-
-```{r}
-b <- matrix(1:4, nrow = 2)
-b
-a[b]
-```
-
-```{r}
-vals <- outer(1:5, 1:5, FUN = "paste", sep = ",")
-vals
-
-select <- matrix(ncol = 2, byrow = TRUE,
- c(1, 1,
- 3, 1,
- 2, 4))
-select
-
-vals[select]
-```
-
-Matrices and arrays are just special vectors; can subset with a single vector
-(arrays in R stored column wise)
-
-```{r}
-vals[c(3, 15, 16, 17)]
-```
-
-### Data frames and tibbles
-
-Data frames act like both lists and matrices
-
-- When subsetting with a single index, they behave like lists and index the columns, so `df[1:2]` selects the first two columns.
-- When subsetting with two indices, they behave like matrices, so `df[1:3, ]` selects the first three rows (and all the columns).
-
-```{r penguins, error=TRUE}
-library(palmerpenguins)
-penguins <- penguins
-
-# single index selects first two columns
-two_cols <- penguins[2:3] # or penguins[c(2,3)]
-head(two_cols)
-
-# equivalent to the above code
-same_two_cols <- penguins[c("island", "bill_length_mm")]
-head(same_two_cols)
-
-# two indices separated by comma (first two rows of 3rd and 4th columns)
-penguins[1:2, 3:4]
-
-# Can't do this...
-penguins[[3:4]][c(1:4)]
-# ...but this works...
-penguins[[3]][c(1:4)]
-# ...or this equivalent...
-penguins$bill_length_mm[1:4]
-```
-
-Subsetting a tibble with `[` always returns a tibble
-
-### Preserving dimensionality
-
-- Data frames and tibbles behave differently
-- tibble will default to preserve dimensionality, data frames do not
-- this can lead to unexpected behavior and code breaking in the future
-- Use `drop = FALSE` to preserve dimensionality when subsetting a data frame or use tibbles
-
-
-```{r}
-tb <- tibble::tibble(a = 1:2, b = 1:2)
-
-# returns tibble
-str(tb[, "a"])
-tb[, "a"] # equivalent to tb[, "a", drop = FALSE]
-
-# returns integer vector
-# str(tb[, "a", drop = TRUE])
-tb[, "a", drop = TRUE]
-```
-
-```{r}
-df <- data.frame(a = 1:2, b = 1:2)
-
-# returns integer vector
-# str(df[, "a"])
-df[, "a"]
-
-# returns data frame with one column
-# str(df[, "a", drop = FALSE])
-df[, "a", drop = FALSE]
-```
-**Factors**
-
-Factor subsetting drop argument controls whether or not levels (rather than dimensions) are preserved.
-
-```{r}
-z <- factor(c("a", "b", "c"))
-z[1]
-z[1, drop = TRUE]
-```
-
-## Selecting a single element
-
-`[[` and `$` are used to extract single elements (note: a vector can be a single element)
-
-### `[[]]`
-
-Because `[[]]` can return only a single item, you must use it with either a single positive integer or a single string.
-
-```{r train}
-x <- list(1:3, "a", 4:6)
-x[[1]]
-```
-
-Hadley Wickham recommends using `[[]]` with atomic vectors whenever you want to extract a single value to reinforce the expectation that you are getting and setting individual values.
-
-### `$`
-
-- `x$y` is equivalent to `x[["y"]]`
-
-the `$` operator doesn't work with stored vals
-
-```{r}
-var <- "cyl"
-
-# Doesn't work - mtcars$var translated to mtcars[["var"]]
-mtcars$var
-
-# Instead use [[
-mtcars[[var]]
-```
-
-`$` allows partial matching, `[[]]` does not
-
-```{r}
-x <- list(abc = 1)
-x$a
-
-x[["a"]]
-
-```
-
-Hadley advises to change Global settings:
-
-```{r}
-options(warnPartialMatchDollar = TRUE)
-x$a
-```
-
-tibbles don't have this behavior
-
-```{r}
-penguins$s
-```
-
-### missing and out of bound indices
-
-- Due to the inconsistency of how R handles such indices, `purrr::pluck()` and `purrr::chuck()` are recommended
-
-```{r, eval=FALSE}
-x <- list(
- a = list(1, 2, 3),
- b = list(3, 4, 5)
-)
-purrr::pluck(x, "a", 1)
-# [1] 1
-purrr::pluck(x, "c", 1)
-# NULL
-purrr::pluck(x, "c", 1, .default = NA)
-# [1] NA
-```
-
-### `@` and `slot()`
-- `@` is `$` for S4 objects (to be revisited in Chapter 15)
-
-- `slot()` is `[[ ]]` for S4 objects
-
-## Subsetting and Assignment
-
-- Subsetting can be combined with assignment to edit values
-
-```{r}
-x <- c("Tigers", "Royals", "White Sox", "Twins", "Indians")
-
-x[5] <- "Guardians"
-
-x
-```
-
-- length of the subset and assignment vector should be the same to avoid recycling
-
-You can use NULL to remove a component
-
-```{r}
-x <- list(a = 1, b = 2)
-x[["b"]] <- NULL
-str(x)
-```
-
-Subsetting with nothing can preserve structure of original object
-
-```{r, eval=FALSE}
-# mtcars[] <- lapply(mtcars, as.integer)
-# is.data.frame(mtcars)
-# [1] TRUE
-# mtcars <- lapply(mtcars, as.integer)
-#> is.data.frame(mtcars)
-# [1] FALSE
-```
-
-## Applications
-
-Applications copied from cohort 2 slide
-
-### Lookup tables (character subsetting)
-
-```{r, eval=FALSE}
-x <- c("m", "f", "u", "f", "f", "m", "m")
-lookup <- c(m = "Male", f = "Female", u = NA)
-lookup[x]
-# m f u f f m m
-# "Male" "Female" NA "Female" "Female" "Male" "Male"
-```
-
-### Matching and merging by hand (integer subsetting)
-
-- The `match()` function allows merging a vector with a table
-
-```{r, eval=FALSE}
-grades <- c("D", "A", "C", "B", "F")
-info <- data.frame(
- grade = c("A", "B", "C", "D", "F"),
- desc = c("Excellent", "Very Good", "Average", "Fair", "Poor"),
- fail = c(F, F, F, F, T)
-)
-id <- match(grades, info$grade)
-id
-# [1] 3 2 2 1 3
-info[id, ]
-# grade desc fail
-# 4 D Fair FALSE
-# 1 A Excellent FALSE
-# 3 C Average FALSE
-# 2 B Very Good FALSE
-# 5 F Poor TRUE
-```
-
-### Random samples and bootstrapping (integer subsetting)
-
-```{r, eval=FALSE}
-# mtcars[sample(nrow(mtcars), 3), ] # use replace = TRUE to replace
-# mpg cyl disp hp drat wt qsec vs am gear carb
-# Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
-# Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
-# Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
-```
-
-### Ordering (integer subsetting)
-
-```{r, eval=FALSE}
-# mtcars[order(mtcars$mpg), ]
-# mpg cyl disp hp drat wt qsec vs am gear carb
-# Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
-# Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
-# Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
-# Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
-# Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
-# Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
-# ...
-```
-
-### Expanding aggregated counts (integer subsetting)
-
-- We can expand a count column by using `rep()`
-
-```{r, eval=FALSE}
-df <- tibble::tibble(x = c("Amy", "Julie", "Brian"), n = c(2, 1, 3))
-df[rep(1:nrow(df), df$n), ]
-# A tibble: 6 x 2
-# x n
-# <chr> <dbl>
-# 1 Amy 2
-# 2 Amy 2
-# 3 Julie 1
-# 4 Brian 3
-# 5 Brian 3
-# 6 Brian 3
-```
-
-### Removing columns from data frames (character)
-
-- We can remove a column by subsetting, which does not change the object
-
-```{r, eval=FALSE}
-df[, 1]
-# A tibble: 3 x 1
-# x
-# <chr>
-# 1 Amy
-# 2 Julie
-# 3 Brian
-```
-
-- We can also delete the column using `NULL`
-
-```{r, eval=FALSE}
-df$n <- NULL
-df
-# A tibble: 3 x 1
-# x
-# <chr>
-# 1 Amy
-# 2 Julie
-# 3 Brian
-```
-
-### Selecting rows based on a condition (logical subsetting)
-
-```{r, eval=FALSE}
-# mtcars[mtcars$gear == 5, ]
-# mpg cyl disp hp drat wt qsec vs am gear carb
-# Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.7 0 1 5 2
-# Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.9 1 1 5 2
-# Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.5 0 1 5 4
-# Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.5 0 1 5 6
-# Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.6 0 1 5 8
-```
-
-### Boolean algebra versus sets (logical and integer)
-
-- `which()` gives the indices of a Boolean vector
-
-```{r, eval=FALSE}
-(x1 <- 1:10 %% 2 == 0) # 1-10 divisible by 2
-# [1] FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE
-(x2 <- which(x1))
-# [1] 2 4 6 8 10
-(y1 <- 1:10 %% 5 == 0) # 1-10 divisible by 5
-# [1] FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE TRUE
-(y2 <- which(y1))
-# [1] 5 10
-x1 & y1
-# [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
-```
diff --git a/slides/04.qmd b/slides/04.qmd
@@ -0,0 +1,376 @@
+---
+engine: knitr
+title: Subsetting
+---
+
+## Learning objectives:
+
+- Select multiple elements from a vector with `[`
+- Learn about the 3 subsetting operators: `[[`, `[`, and `$`
+- Learn how subsetting works with different vector types
+- Learn how subsetting can be combined with assignment
+
+# Selecting multiple elements
+
+## 1. Positive integers return elements at specified positions
+
+```{r}
+#| label: positive_int
+x <- c(1.1, 2.2, 3.3, 4.4) # decimal = original position
+x
+x[c(4, 1)]
+x[c(1, 1, 1)]
+x[c(1.9999)]
+```
+
+Reals *truncate* to integers.
+
+```{r}
+#| label: positive_real
+x[c(1.0001, 1.9999)]
+```
+
+## 2. Negative integers remove specified elements
+
+```{r}
+#| label: negative_int
+x[-c(1, 3)] # same as x[c(-1, -3)] or x[c(2, 4)]
+```
+
+## 2b. Mixing negative and positive integers throws an error
+
+```{r}
+#| label: mixed_int
+#| error: true
+x[c(-1, 3)]
+```
+
+## 2c. Zeros ignored with other ints
+
+```{r}
+#| label: negative_int_zero
+x[c(-1, 0)]
+x[c(-1, 0, 0, 0, 0, 0 ,0 ,0)]
+x[c(1, 0, 2, 0, 3, 0)]
+```
+
+
+## 3. Logical vectors select specified elements
+
+```{r}
+#| label: logical_vec
+x[c(TRUE, TRUE, FALSE, TRUE)]
+x[x < 3]
+
+cond <- x > 2.5
+x[cond]
+```
+
+## 3b. Shorter element are recycled to higher length
+
+```{r}
+#| label: recycling
+x[FALSE]
+x[TRUE]
+x[c(FALSE, TRUE)] # equivalent to: x[c(FALSE, TRUE, FALSE, TRUE)]
+```
+
+- Easy to understand if x or y is 1, best to avoid other lengths
+
+## 3c. NA index returns NA
+
+```{r}
+#| label: missing_index
+x[c(NA, TRUE, NA, TRUE)]
+```
+## 3d. Extra TRUE index returns NA
+
+```{r}
+#| label: extra_index
+x[c(FALSE, TRUE, TRUE, TRUE, TRUE, TRUE)]
+x[1:5]
+```
+
+## 4. Indexing with nothing returns original vector
+
+```{r nothing}
+x[]
+```
+
+## 5. Indexing with just 0 returns 0-length vector (with class)
+
+```{r zero}
+x[0]
+letters[0]
+```
+
+## 6. Indexing with character vector returns element of named vector
+
+```{r character}
+(y <- setNames(x, letters[1:4]))
+y[c("d", "b", "a")]
+y[c("a", "a", "a")]
+```
+
+## 6b. Names must be exact for `[`
+
+```{r}
+#| label: exact_names
+z <- c(abc = 1, def = 2)
+z
+z[c("a", "d")]
+```
+
+## Subsetting a list with `[` returns a list
+
+```{r}
+#| label: list_subset_basics
+my_list <- list(a = c(T, F), b = letters[5:15], c = 100:108)
+my_list
+my_list[c("a", "b")]
+```
+
+## Lists use same rules for `[`
+
+```{r}
+#| label: list_subset_multiple
+my_list[2:3]
+my_list[c(TRUE, FALSE, TRUE)]
+```
+
+## Matrices & arrays take multidimensional indices
+
+```{r}
+#| label: array_subset
+a <- matrix(1:9, nrow = 3)
+a
+a[1:2, 2:3] # rows, columns
+```
+
+## Matrices & arrays can accept character, logical, etc
+
+```{r}
+#| label: array_named
+colnames(a) <- c("A", "B", "C")
+a[c(TRUE, TRUE, FALSE), c("B", "A")] # a[1:2, 2:1]
+```
+
+## Matrices & arrays are also vectors
+
+```{r}
+#| label: array_vector
+vals <- outer(1:5, 1:5, FUN = "paste", sep = ",") # All chr combos of 1:5
+vals
+vals[c(4, 15)]
+a[a > 5]
+```
+
+## Data frames subset list-like with single index
+
+```{r}
+#| label: df_subset1
+df <- data.frame(x = 1:3, y = 3:1, z = letters[1:3])
+df[1:2]
+df[c("x", "z")]
+```
+
+## Data frames subset matrix-like with multiple indices
+
+```{r}
+df[1:2, c("x", "z")] # rows, columns
+df[df$x == 2, ] # matching rows, all columns
+df[, c("x", "z")] # equivalent to no ,
+```
+
+## Subsetting a tibble with `[` returns a tibble
+
+```{r}
+tbl <- tibble::as_tibble(df)
+df[, 1]
+df[, 1, drop = FALSE] # Prevent errors
+tbl[, 1]
+```
+
+# Selecting a single element
+
+## `[[` selects a single element
+
+:::: {.columns}
+
+::: {.column}
+```{r}
+x <- list(1:3, "a", 4:6)
+x[1]
+class(x[1])
+x[[1]]
+class(x[[1]])
+x[[1]][[1]]
+```
+:::
+
+::: {.column}
+
+
+:::
+
+::::
+
+## `$` is shorthand for `[[..., exact = FALSE]]`
+
+```{r}
+#| label: dollar_subset
+#| warning: true
+x <- list(abc = 1)
+x$abc
+x$a
+x[["a"]]
+x[["a", exact = FALSE]]
+
+options(warnPartialMatchDollar = TRUE)
+x$a
+```
+
+## Behavior for missing-ish indices is inconsistent
+
+```{r}
+#| label: missingish_indices
+#| error: true
+a <- c(a = 1L, b = 2L)
+lst <- list(a = 1:2)
+
+# Errors:
+# a[[NULL]]
+# lst[[NULL]]
+# a[[5]]
+# lst[[5]]
+# a[["c"]]
+# a[[NA]]
+
+lst[["c"]]
+lst[[NA]]
+```
+
+## `purrr::pluck()` and `purrr::chuck()` provide consistent wrappers
+
+- `purrr::pluck()` always returns `NULL` or `.default` for (non-`NULL`) missing
+- `purrr::chuck()` always throws error
+
+```{r}
+purrr::pluck(a, 5)
+purrr::pluck(a, "c")
+purrr::pluck(lst, 5)
+purrr::pluck(lst, "c")
+```
+
+## S4 has two additional subsetting operators
+
+- `@` equivalent to `$` (but error if bad)
+- `slot()` equivalent to `[[`
+
+More in Chapter 15
+
+# Subsetting and assignment
+
+## Can assign to position with `[`
+
+```{r}
+x <- 1:5
+x[1:2] <- c(101, 102)
+x
+x[1:3] <- 1:2
+x
+```
+
+## Remove list component with `NULL`
+
+```{r}
+x <- list(a = 1, b = 2)
+x[["b"]] <- NULL
+x
+```
+
+## Use `list(NULL)` to add `NULL`
+
+```{r}
+x <- list(a = 1, b = 2)
+x[["b"]] <- list(NULL)
+x
+```
+
+## Subset with nothing to retain shape
+
+```{r}
+df <- data.frame(a = 1:3, b = 1:3)
+df[] <- "a"
+df
+df <- "a"
+df
+```
+
+# Applications
+
+## Use a lookup vector and recycling rules to translate values
+
+```{r}
+x <- c("b", "g", "x", "g", "g", "b")
+lookup <- c(b = "blue", g = "green", x = NA)
+lookup[x]
+unname(lookup[x])
+```
+
+## Use a lookup table to generate rows of data
+
+```{r}
+info <- data.frame(
+ code = c("b", "g", "x"),
+ color = c("blue", "green", NA),
+ other_thing = 3:1
+)
+match(x, info$code) # Indices of info$code in x
+info[match(x, info$code), ]
+```
+
+## Sort with `order()`
+
+```{r}
+x <- c("b", "c", "a")
+order(x)
+x[order(x)]
+
+df <- data.frame(b = 3:1, a = 1:3)
+df[order(df$b), ]
+df[, order(names(df))]
+```
+
+## Expand counts
+
+```{r}
+df <- data.frame(x = c(2, 4, 1), y = c(9, 11, 6), n = c(3, 5, 1))
+rep(1:nrow(df), df$n)
+df[rep(1:nrow(df), df$n), ]
+```
+
+## Ran out of time to make slides for
+
+Ideally a future cohort should expand these:
+
+- Remove df columns with `setdiff()`
+- Logically subset rows `df[df$col > 5, ]`
+- The next slide about `which()`
+
+## Boolean algebra versus sets (logical and integer)
+
+- `which()` gives the indices of a Boolean vector
+
+```{r, eval=FALSE}
+(x1 <- 1:10 %% 2 == 0) # 1-10 divisible by 2
+# [1] FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE
+(x2 <- which(x1))
+# [1] 2 4 6 8 10
+(y1 <- 1:10 %% 5 == 0) # 1-10 divisible by 5
+# [1] FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE TRUE
+(y2 <- which(y1))
+# [1] 5 10
+x1 & y1
+# [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
+```