Cohort 8 chapter 4 (#58) - bookclub-advr - DSLC Advanced R Book Club

commit 81a346feb22d44d3471856df5fe14f024447c6e9
parent 543d8954550390f0184229d51c4e6383dc2262bb
Author: Betsy Rosalen <betsy@mylittleuniverse.com>
Date:   Mon,  4 Mar 2024 10:24:20 -0500

Cohort 8 chapter 4 (#58)

* made adding dimensions in chapter 3 a little clearer

* Updated notes for Chapter 4 Subsetting

* Update caching.

I'm not 100% certain that we WANT this, but I THINK it's a good idea.

---------

Co-authored-by: Jon Harmon <jonthegeek@gmail.com>
Diffstat:
M 03_Vectors.Rmd  | 8 ++++++--
M 04_Subsetting.Rmd  | 262 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++----------------------
M bookclub-advr.Rproj  | 36 ++++++++++++++++++------------------
A bookclub-advr_cache/html/__packages  | 24 ++++++++++++++++++++++++
A bookclub-advr_cache/html/unnamed-chunk-382_772128e708c11feeae6c581e167c32a4.RData  | 0 
A bookclub-advr_cache/html/unnamed-chunk-382_772128e708c11feeae6c581e167c32a4.rdb  | 0 
A bookclub-advr_cache/html/unnamed-chunk-382_772128e708c11feeae6c581e167c32a4.rdx  | 0 
A bookclub-advr_cache/html/unnamed-chunk-392_0d65042b46d439c7ecde1d7c4bf84de4.RData  | 0 
A bookclub-advr_cache/html/unnamed-chunk-392_0d65042b46d439c7ecde1d7c4bf84de4.rdb  | 0 
A bookclub-advr_cache/html/unnamed-chunk-392_0d65042b46d439c7ecde1d7c4bf84de4.rdx  | 0

10 files changed, 237 insertions(+), 93 deletions(-)
diff --git a/03_Vectors.Rmd b/03_Vectors.Rmd
@@ -363,13 +363,17 @@ x <- matrix(1:6, nrow = 2, ncol = 3)
 x
 
 # One vector argument to describe all dimensions
-y <- array(1:12, c(2, 3, 2))
+y <- array(1:24, c(2, 3, 4)) # rows, columns, no of arrays
 y
 
 # You can also modify an object in place by setting dim()
 z <- 1:6
-dim(z) <- c(3, 2)
+dim(z) <- c(2, 3) # rows, columns
 z
+
+a <- 1:24
+dim(a) <- c(2, 3, 4) # rows, columns, no of arrays
+a
 ```
 
 ##### Functions for working with vectors, matrices and arrays:
diff --git a/04_Subsetting.Rmd b/04_Subsetting.Rmd
@@ -5,6 +5,7 @@
 - Learn about the 6 ways to subset atomic vectors
 - Learn about the 3 subsetting operators: `[[`, `[`, and `$`
 - Learn how subsetting works with different vector types
+- Learn how subsetting can be combined with assignment
 
 ## Selecting multiple elements
 
@@ -15,30 +16,31 @@
 Let's take a look with an example vector.
 
 ```{r atomic_vector}
-x <- c(3.1, 2.2, 1.3, 4.4)
+x <- c(1.1, 2.2, 3.3, 4.4)
 ```
 
-**Positive integers**
+**Positive integer indices**
 
 ```{r positive_int}
-# return elements at specified positions
+# return elements at specified positions which can be out of order
 x[c(4, 1)]
 
 # duplicate indices return duplicate values
 x[c(2, 2)]
 
 # real numbers truncate to integers
+# so this behaves as if it is x[c(3, 3)]
 x[c(3.2, 3.8)]
 ```
 
-**Negative integers**
+**Negative integer indices**
 
-```{r, eval=FALSE}
+```{r, error=TRUE}
 ### excludes elements at specified positions
-# x[-c(1, 3)] # same as x[c(-1, -3)]
+x[-c(1, 3)] # same as x[c(-1, -3)] or x[c(2, 4)]
 
 ### mixing positive and negative is a no-no
-# x[c(-1, 3)]
+x[c(-1, 3)]
 ```
 
 **Logical Vectors**
@@ -47,17 +49,26 @@ x[c(3.2, 3.8)]
 x[c(TRUE, TRUE, FALSE, TRUE)]
 
 x[x < 3]
+
+cond <- x > 2.5
+x[cond]
 ```
 
-- **Recyling rules** apply when subsetting this way: x[y]
+- **Recyling rules** applies when the two vectors are of different lengths
+- the shorter of the two is recycled to the length of the longer
 - Easy to understand if x or y is 1, best to avoid other lengths
 
+```{r}
+x[c(F, T)] # equivalent to: x[c(FALSE, TRUE, FALSE, TRUE)]
+```
+
+**Missing values (NA)**
+
 ```{r missing}
-# missing value in index will also return NA in output
+# Missing values in index will also return NA in output
 x[c(NA, TRUE)]
 ```
 
-
 **Nothing**
 
 ```{r nothing}
@@ -79,71 +90,146 @@ x[0]
 (y <- setNames(x, letters[1:4]))
 
 y[c("d", "b", "a")]
+
+# Like integer indices, you can repeat indices
+y[c("a", "a", "a")]
+
+# When subsetting with [, names are always matched exactly
+z <- c(abc = 1, def = 2)
+z
+z[c("a", "d")]
 ```
 
 ### Lists
 
 - Subsetting works the same way
-- `[` always returns a list, `[[` and `$` let you pull elements out of a list
+- `[` always returns a list
+- `[[` and `$` let you pull elements out of a list
+
+```{r}
+my_list <- list(a = c(T, F), b = letters[5:15], c = 100:108)
+my_list
+```
+
+**Return a (named) list**
+
+```{r}
+l1 <- my_list[2]
+l1
+```
+
+**Return a vector**
+
+```{r}
+l2 <- my_list[[2]]
+l2
+l2b <- my_list$b
+l2b
+```
+
+**Return a specific element**
+
+```{r}
+l3 <- my_list[[2]][3]
+l3
+l4 <- my_list[['b']][3]
+l4
+l4b <- my_list$b[3]
+l4b
+```
+
+**Visual Representation**
+
+![](images/subsetting/hadley-tweet.png) 
+
+See this stackoverflow article for more detailed information about the differences: https://stackoverflow.com/questions/1169456/the-difference-between-bracket-and-double-bracket-for-accessing-the-el
 
 ### Matrices and arrays
 
 You can subset higher dimensional structures in three ways:
+
 - with multiple vectors
 - with a single vector
 - with a matrix
 
-```{r, eval=FALSE}
-a <- matrix(1:9, nrow = 3)
-colnames(a) <- c("A", "B", "C")
-a[1:2, ]
-#>      A B C
-#> [1,] 1 4 7
-#> [2,] 2 5 8
-a[c(TRUE, FALSE, TRUE), c("B", "A")]
-#>      B A
-#> [1,] 4 1
-#> [2,] 6 3
-a[0, -2]
-#>      A C
-      
+```{r}
+a <- matrix(1:12, nrow = 3)
+colnames(a) <- c("A", "B", "C", "D")
+
+# single row
 a[1, ]
-#> A B C 
-#> 1 4 7
 
+# single column
+a[, 1]
+
+# single element
 a[1, 1]
-#> A 
-#> 1     
+
+# two rows from two columns
+a[1:2, 3:4]
+
+a[c(TRUE, FALSE, TRUE), c("B", "A")]
+
+# zero index and negative index
+a[0, -2]
 ```
 
-Credit: [Advanced R](https://adv-r.hadley.nz/index.html) by Hadley Wickham
+**Subset a matrix with a matrix**
 
-Matrices and arrays are just special vectors; can subset with a single vector
-(arrays in R stored column wise)
+```{r}
+b <- matrix(1:4, nrow = 2)
+b
+a[b]
+```
 
 ```{r}
 vals <- outer(1:5, 1:5, FUN = "paste", sep = ",")
 vals
 
-vals[c(3, 15)]
+select <- matrix(ncol = 2, byrow = TRUE, 
+                 c(1, 1,
+                   3, 1,
+                   2, 4))
+select
+
+vals[select]
+```
+
+Matrices and arrays are just special vectors; can subset with a single vector
+(arrays in R stored column wise)
+
+```{r}
+vals[c(3, 15, 16, 17)]
 ```
 
 ### Data frames and tibbles
 
-Data frames act like lists and matrices
-- single index -> list
-- two indices -> matrix
+Data frames act like both lists and matrices
 
-```{r penguins}
+- When subsetting with a single index, they behave like lists and index the columns, so `df[1:2]` selects the first two columns.
+- When subsetting with two indices, they behave like matrices, so `df[1:3, ]` selects the first three rows (and all the columns).
+
+```{r penguins, error=TRUE}
 library(palmerpenguins)
+penguins <- penguins
+
+# single index selects first two columns
+two_cols <- penguins[2:3] # or penguins[c(2,3)]
+head(two_cols)
 
-# single index
-penguins[1:2]
+# equivalent to the above code
+same_two_cols <- penguins[c("island", "bill_length_mm")]
+head(same_two_cols)
 
-penguins[c("species","island")]
+# two indices separated by comma (first two rows of 3rd and 4th columns)
+penguins[1:2, 3:4]
 
-# two indices
-penguins[1:2, ]
+# Can't do this...
+penguins[[3:4]][c(1:4)]
+# ...but this works...
+penguins[[3]][c(1:4)]
+# ...or this equivalent...
+penguins$bill_length_mm[1:4]
 ```
 
 Subsetting a tibble with `[` always returns a tibble
@@ -153,72 +239,100 @@ Subsetting a tibble with `[` always returns a tibble
 - Data frames and tibbles behave differently
 - tibble will default to preserve dimensionality, data frames do not
 - this can lead to unexpected behavior and code breaking in the future
+- Use `drop = FALSE` to preserve dimensionality when subsetting a data frame or use tibbles
 
-Can use `drop = FALSE` when using a data frame or can use tibbles
 
-## Selecting a single element
+```{r}
+tb <- tibble::tibble(a = 1:2, b = 1:2)
 
-`[[` and `$` are used to extract single elements
+# returns tibble
+str(tb[, "a"])
+tb[, "a"] # equivalent to tb[, "a", drop = FALSE]
 
-### `[[]]`
+# returns integer vector
+# str(tb[, "a", drop = TRUE])
+tb[, "a", drop = TRUE]
+```
 
-```{r train}
-x <- list(1:3, "a", 4:6)
+```{r}
+df <- data.frame(a = 1:2, b = 1:2)
+
+# returns integer vector
+# str(df[, "a"])
+df[, "a"]
+
+# returns data frame with one column
+# str(df[, "a", drop = FALSE])
+df[, "a", drop = FALSE]
+```
+**Factors**
+
+Factor subsetting drop argument controls whether or not levels (rather than dimensions) are preserved.
+
+```{r}
+z <- factor(c("a", "b", "c"))
+z[1]
+z[1, drop = TRUE]
 ```
 
-![](images/subsetting/train-1.png)
+## Selecting a single element
+
+`[[` and `$` are used to extract single elements (note: a vector can be a single element)
 
-![](images/subsetting/train-2.png)
+### `[[]]`
 
-![](images/subsetting/train-3.png)
+Because `[[]]` can return only a single item, you must use it with either a single positive integer or a single string. 
 
-Credit: [Advanced R](https://adv-r.hadley.nz/index.html) by Hadley Wickham
+```{r train}
+x <- list(1:3, "a", 4:6)
+x[[1]]
+```
 
-![](images/subsetting/hadley-tweet.png)
+Hadley Wickham recommends using `[[]]` with atomic vectors whenever you want to extract a single value to reinforce the expectation that you are getting and setting individual values. 
 
 ### `$`
 
 - `x$y` is equivalent to `x[["y"]]`
 
-the `$` operator doens't work with stored vals
+the `$` operator doesn't work with stored vals
 
-```{r, eval=FALSE}
+```{r}
 var <- "cyl"
+
 # Doesn't work - mtcars$var translated to mtcars[["var"]]
 mtcars$var
-#> NULL
 
 # Instead use [[
 mtcars[[var]]
-#>  [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4
 ```
 
 `$` allows partial matching, `[[]]` does not
 
-```{r, eval=FALSE}
+```{r}
 x <- list(abc = 1)
 x$a
-#> [1] 1
+
 x[["a"]]
-#> NULL
+
 ```
 
 Hadley advises to change Global settings:
 
-```{r, eval=FALSE}
+```{r}
 options(warnPartialMatchDollar = TRUE)
 x$a
-#> Warning in x$a: partial match of 'a' to 'abc'
-#> [1] 1
 ```
 
 tibbles don't have this behavior
+
 ```{r}
 penguins$s
 ```
 
 ### missing and out of bound indices
+
 - Due to the inconsistency of how R handles such indices, `purrr::pluck()` and `purrr::chuck()` are recommended
+
 ```{r, eval=FALSE}
 x <- list(
   a = list(1, 2, 3),
@@ -275,6 +389,7 @@ Subsetting with nothing can preserve structure of original object
 Applications copied from cohort 2 slide
 
 ### Lookup tables (character subsetting)
+
 ```{r, eval=FALSE}
 x <- c("m", "f", "u", "f", "f", "m", "m")
 lookup <- c(m = "Male", f = "Female", u = NA)
@@ -284,7 +399,9 @@ lookup[x]
 ```
 
 ### Matching and merging by hand (integer subsetting)
+
 - The `match()` function allows merging a vector with a table
+
 ```{r, eval=FALSE}
 grades <- c("D", "A", "C", "B", "F")
 info <- data.frame(
@@ -304,8 +421,8 @@ info[id, ]
 # 5     F      Poor  TRUE
 ```
 
-
 ### Random samples and bootstrapping (integer subsetting)
+
 ```{r, eval=FALSE}
 # mtcars[sample(nrow(mtcars), 3), ] # use replace = TRUE to replace
 #                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
@@ -314,8 +431,8 @@ info[id, ]
 # Cadillac Fleetwood 10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
 ```
 
-
 ### Ordering (integer subsetting)
+
 ```{r, eval=FALSE}
 # mtcars[order(mtcars$mpg), ]
 #                      mpg cyl  disp  hp drat    wt  qsec vs am gear carb
@@ -328,9 +445,10 @@ info[id, ]
 # ...
 ```
 
-
 ### Expanding aggregated counts (integer subsetting)
+
 - We can expand a count column by using `rep()`
+
 ```{r, eval=FALSE}
 df <- tibble::tibble(x = c("Amy", "Julie", "Brian"), n = c(2, 1, 3))
 df[rep(1:nrow(df), df$n), ]
@@ -345,10 +463,10 @@ df[rep(1:nrow(df), df$n), ]
 # 6 Brian     3
 ```
 
-
-
 ###  Removing columns from data frames (character)
+
 - We can remove a column by subsetting, which does not change the object
+
 ```{r, eval=FALSE}
 df[, 1]
 # A tibble: 3 x 1
@@ -358,7 +476,9 @@ df[, 1]
 # 2 Julie
 # 3 Brian
 ```
+
 - We can also delete the column using `NULL`
+
 ```{r, eval=FALSE}
 df$n <- NULL
 df
@@ -370,8 +490,6 @@ df
 # 3 Brian
 ```
 
-
-
 ### Selecting rows based on a condition (logical subsetting)
 
 ```{r, eval=FALSE}
@@ -384,9 +502,8 @@ df
 # Maserati Bora  15.0   8 301.0 335 3.54 3.570 14.6  0  1    5    8
 ```
 
-
-
 ### Boolean algebra versus sets (logical and integer)
+
 - `which()` gives the indices of a Boolean vector
 
 ```{r, eval=FALSE}
@@ -402,7 +519,6 @@ x1 & y1
 # [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE
 ```
 
-
 ## Meeting Videos
 
 ### Cohort 1
diff --git a/bookclub-advr.Rproj b/bookclub-advr.Rproj
@@ -1,18 +1,18 @@
-Version: 1.0
-
-RestoreWorkspace: Default
-SaveWorkspace: Default
-AlwaysSaveHistory: Default
-
-EnableCodeIndexing: Yes
-UseSpacesForTab: Yes
-NumSpacesForTab: 2
-Encoding: UTF-8
-
-RnwWeave: Sweave
-LaTeX: pdfLaTeX
-
-AutoAppendNewline: Yes
-StripTrailingWhitespace: Yes
-
-BuildType: Website
+Version: 1.0
+
+RestoreWorkspace: Default
+SaveWorkspace: Default
+AlwaysSaveHistory: Default
+
+EnableCodeIndexing: Yes
+UseSpacesForTab: Yes
+NumSpacesForTab: 2
+Encoding: UTF-8
+
+RnwWeave: Sweave
+LaTeX: pdfLaTeX
+
+AutoAppendNewline: Yes
+StripTrailingWhitespace: Yes
+
+BuildType: Website
diff --git a/bookclub-advr_cache/html/__packages b/bookclub-advr_cache/html/__packages
@@ -0,0 +1,24 @@
+DiagrammeR
+lobstr
+palmerpenguins
+ggplot2
+tidyverse
+tibble
+tidyr
+readr
+purrr
+dplyr
+stringr
+forcats
+lubridate
+rlang
+scales
+memoise
+ids
+R6
+deSolve
+reshape2
+patchwork
+profvis
+bench
+ggbeeswarm
diff --git a/bookclub-advr_cache/html/unnamed-chunk-382_772128e708c11feeae6c581e167c32a4.RData b/bookclub-advr_cache/html/unnamed-chunk-382_772128e708c11feeae6c581e167c32a4.RData
Binary files differ.
diff --git a/bookclub-advr_cache/html/unnamed-chunk-382_772128e708c11feeae6c581e167c32a4.rdb b/bookclub-advr_cache/html/unnamed-chunk-382_772128e708c11feeae6c581e167c32a4.rdb
Binary files differ.
diff --git a/bookclub-advr_cache/html/unnamed-chunk-382_772128e708c11feeae6c581e167c32a4.rdx b/bookclub-advr_cache/html/unnamed-chunk-382_772128e708c11feeae6c581e167c32a4.rdx
Binary files differ.
diff --git a/bookclub-advr_cache/html/unnamed-chunk-392_0d65042b46d439c7ecde1d7c4bf84de4.RData b/bookclub-advr_cache/html/unnamed-chunk-392_0d65042b46d439c7ecde1d7c4bf84de4.RData
Binary files differ.
diff --git a/bookclub-advr_cache/html/unnamed-chunk-392_0d65042b46d439c7ecde1d7c4bf84de4.rdb b/bookclub-advr_cache/html/unnamed-chunk-392_0d65042b46d439c7ecde1d7c4bf84de4.rdb
Binary files differ.
diff --git a/bookclub-advr_cache/html/unnamed-chunk-392_0d65042b46d439c7ecde1d7c4bf84de4.rdx b/bookclub-advr_cache/html/unnamed-chunk-392_0d65042b46d439c7ecde1d7c4bf84de4.rdx
Binary files differ.

	bookclub-advr DSLC Advanced R Book Club
	git clone https://git.eamoncaddigan.net/bookclub-advr.git
	Log \| Files \| Refs \| README \| LICENSE

M	03_Vectors.Rmd	\|	8	++++++--
M	04_Subsetting.Rmd	\|	262	+++++++++++++++++++++++++++++++++++++++++++++++++++++++++----------------------
M	bookclub-advr.Rproj	\|	36	++++++++++++++++++------------------
A	bookclub-advr_cache/html/__packages	\|	24	++++++++++++++++++++++++
A	bookclub-advr_cache/html/unnamed-chunk-382_772128e708c11feeae6c581e167c32a4.RData	\|	0
A	bookclub-advr_cache/html/unnamed-chunk-382_772128e708c11feeae6c581e167c32a4.rdb	\|	0
A	bookclub-advr_cache/html/unnamed-chunk-382_772128e708c11feeae6c581e167c32a4.rdx	\|	0
A	bookclub-advr_cache/html/unnamed-chunk-392_0d65042b46d439c7ecde1d7c4bf84de4.RData	\|	0
A	bookclub-advr_cache/html/unnamed-chunk-392_0d65042b46d439c7ecde1d7c4bf84de4.rdb	\|	0
A	bookclub-advr_cache/html/unnamed-chunk-392_0d65042b46d439c7ecde1d7c4bf84de4.rdx	\|	0