bookclub-advr

DSLC Advanced R Book Club
git clone https://git.eamoncaddigan.net/bookclub-advr.git
Log | Files | Refs | README | LICENSE

commit 81a346feb22d44d3471856df5fe14f024447c6e9
parent 543d8954550390f0184229d51c4e6383dc2262bb
Author: Betsy Rosalen <betsy@mylittleuniverse.com>
Date:   Mon,  4 Mar 2024 10:24:20 -0500

Cohort 8 chapter 4 (#58)

* made adding dimensions in chapter 3 a little clearer

* Updated notes for Chapter 4 Subsetting

* Update caching.

I'm not 100% certain that we WANT this, but I THINK it's a good idea.

---------

Co-authored-by: Jon Harmon <jonthegeek@gmail.com>
Diffstat:
M03_Vectors.Rmd | 8++++++--
M04_Subsetting.Rmd | 262+++++++++++++++++++++++++++++++++++++++++++++++++++++++++----------------------
Mbookclub-advr.Rproj | 36++++++++++++++++++------------------
Abookclub-advr_cache/html/__packages | 24++++++++++++++++++++++++
Abookclub-advr_cache/html/unnamed-chunk-382_772128e708c11feeae6c581e167c32a4.RData | 0
Abookclub-advr_cache/html/unnamed-chunk-382_772128e708c11feeae6c581e167c32a4.rdb | 0
Abookclub-advr_cache/html/unnamed-chunk-382_772128e708c11feeae6c581e167c32a4.rdx | 0
Abookclub-advr_cache/html/unnamed-chunk-392_0d65042b46d439c7ecde1d7c4bf84de4.RData | 0
Abookclub-advr_cache/html/unnamed-chunk-392_0d65042b46d439c7ecde1d7c4bf84de4.rdb | 0
Abookclub-advr_cache/html/unnamed-chunk-392_0d65042b46d439c7ecde1d7c4bf84de4.rdx | 0
10 files changed, 237 insertions(+), 93 deletions(-)

diff --git a/03_Vectors.Rmd b/03_Vectors.Rmd @@ -363,13 +363,17 @@ x <- matrix(1:6, nrow = 2, ncol = 3) x # One vector argument to describe all dimensions -y <- array(1:12, c(2, 3, 2)) +y <- array(1:24, c(2, 3, 4)) # rows, columns, no of arrays y # You can also modify an object in place by setting dim() z <- 1:6 -dim(z) <- c(3, 2) +dim(z) <- c(2, 3) # rows, columns z + +a <- 1:24 +dim(a) <- c(2, 3, 4) # rows, columns, no of arrays +a ``` ##### Functions for working with vectors, matrices and arrays: diff --git a/04_Subsetting.Rmd b/04_Subsetting.Rmd @@ -5,6 +5,7 @@ - Learn about the 6 ways to subset atomic vectors - Learn about the 3 subsetting operators: `[[`, `[`, and `$` - Learn how subsetting works with different vector types +- Learn how subsetting can be combined with assignment ## Selecting multiple elements @@ -15,30 +16,31 @@ Let's take a look with an example vector. ```{r atomic_vector} -x <- c(3.1, 2.2, 1.3, 4.4) +x <- c(1.1, 2.2, 3.3, 4.4) ``` -**Positive integers** +**Positive integer indices** ```{r positive_int} -# return elements at specified positions +# return elements at specified positions which can be out of order x[c(4, 1)] # duplicate indices return duplicate values x[c(2, 2)] # real numbers truncate to integers +# so this behaves as if it is x[c(3, 3)] x[c(3.2, 3.8)] ``` -**Negative integers** +**Negative integer indices** -```{r, eval=FALSE} +```{r, error=TRUE} ### excludes elements at specified positions -# x[-c(1, 3)] # same as x[c(-1, -3)] +x[-c(1, 3)] # same as x[c(-1, -3)] or x[c(2, 4)] ### mixing positive and negative is a no-no -# x[c(-1, 3)] +x[c(-1, 3)] ``` **Logical Vectors** @@ -47,17 +49,26 @@ x[c(3.2, 3.8)] x[c(TRUE, TRUE, FALSE, TRUE)] x[x < 3] + +cond <- x > 2.5 +x[cond] ``` -- **Recyling rules** apply when subsetting this way: x[y] +- **Recyling rules** applies when the two vectors are of different lengths +- the shorter of the two is recycled to the length of the longer - Easy to understand if x or y is 1, best to avoid other lengths +```{r} +x[c(F, T)] # equivalent to: x[c(FALSE, TRUE, FALSE, TRUE)] +``` + +**Missing values (NA)** + ```{r missing} -# missing value in index will also return NA in output +# Missing values in index will also return NA in output x[c(NA, TRUE)] ``` - **Nothing** ```{r nothing} @@ -79,71 +90,146 @@ x[0] (y <- setNames(x, letters[1:4])) y[c("d", "b", "a")] + +# Like integer indices, you can repeat indices +y[c("a", "a", "a")] + +# When subsetting with [, names are always matched exactly +z <- c(abc = 1, def = 2) +z +z[c("a", "d")] ``` ### Lists - Subsetting works the same way -- `[` always returns a list, `[[` and `$` let you pull elements out of a list +- `[` always returns a list +- `[[` and `$` let you pull elements out of a list + +```{r} +my_list <- list(a = c(T, F), b = letters[5:15], c = 100:108) +my_list +``` + +**Return a (named) list** + +```{r} +l1 <- my_list[2] +l1 +``` + +**Return a vector** + +```{r} +l2 <- my_list[[2]] +l2 +l2b <- my_list$b +l2b +``` + +**Return a specific element** + +```{r} +l3 <- my_list[[2]][3] +l3 +l4 <- my_list[['b']][3] +l4 +l4b <- my_list$b[3] +l4b +``` + +**Visual Representation** + +![](images/subsetting/hadley-tweet.png) + +See this stackoverflow article for more detailed information about the differences: https://stackoverflow.com/questions/1169456/the-difference-between-bracket-and-double-bracket-for-accessing-the-el ### Matrices and arrays You can subset higher dimensional structures in three ways: + - with multiple vectors - with a single vector - with a matrix -```{r, eval=FALSE} -a <- matrix(1:9, nrow = 3) -colnames(a) <- c("A", "B", "C") -a[1:2, ] -#> A B C -#> [1,] 1 4 7 -#> [2,] 2 5 8 -a[c(TRUE, FALSE, TRUE), c("B", "A")] -#> B A -#> [1,] 4 1 -#> [2,] 6 3 -a[0, -2] -#> A C - +```{r} +a <- matrix(1:12, nrow = 3) +colnames(a) <- c("A", "B", "C", "D") + +# single row a[1, ] -#> A B C -#> 1 4 7 +# single column +a[, 1] + +# single element a[1, 1] -#> A -#> 1 + +# two rows from two columns +a[1:2, 3:4] + +a[c(TRUE, FALSE, TRUE), c("B", "A")] + +# zero index and negative index +a[0, -2] ``` -Credit: [Advanced R](https://adv-r.hadley.nz/index.html) by Hadley Wickham +**Subset a matrix with a matrix** -Matrices and arrays are just special vectors; can subset with a single vector -(arrays in R stored column wise) +```{r} +b <- matrix(1:4, nrow = 2) +b +a[b] +``` ```{r} vals <- outer(1:5, 1:5, FUN = "paste", sep = ",") vals -vals[c(3, 15)] +select <- matrix(ncol = 2, byrow = TRUE, + c(1, 1, + 3, 1, + 2, 4)) +select + +vals[select] +``` + +Matrices and arrays are just special vectors; can subset with a single vector +(arrays in R stored column wise) + +```{r} +vals[c(3, 15, 16, 17)] ``` ### Data frames and tibbles -Data frames act like lists and matrices -- single index -> list -- two indices -> matrix +Data frames act like both lists and matrices -```{r penguins} +- When subsetting with a single index, they behave like lists and index the columns, so `df[1:2]` selects the first two columns. +- When subsetting with two indices, they behave like matrices, so `df[1:3, ]` selects the first three rows (and all the columns). + +```{r penguins, error=TRUE} library(palmerpenguins) +penguins <- penguins + +# single index selects first two columns +two_cols <- penguins[2:3] # or penguins[c(2,3)] +head(two_cols) -# single index -penguins[1:2] +# equivalent to the above code +same_two_cols <- penguins[c("island", "bill_length_mm")] +head(same_two_cols) -penguins[c("species","island")] +# two indices separated by comma (first two rows of 3rd and 4th columns) +penguins[1:2, 3:4] -# two indices -penguins[1:2, ] +# Can't do this... +penguins[[3:4]][c(1:4)] +# ...but this works... +penguins[[3]][c(1:4)] +# ...or this equivalent... +penguins$bill_length_mm[1:4] ``` Subsetting a tibble with `[` always returns a tibble @@ -153,72 +239,100 @@ Subsetting a tibble with `[` always returns a tibble - Data frames and tibbles behave differently - tibble will default to preserve dimensionality, data frames do not - this can lead to unexpected behavior and code breaking in the future +- Use `drop = FALSE` to preserve dimensionality when subsetting a data frame or use tibbles -Can use `drop = FALSE` when using a data frame or can use tibbles -## Selecting a single element +```{r} +tb <- tibble::tibble(a = 1:2, b = 1:2) -`[[` and `$` are used to extract single elements +# returns tibble +str(tb[, "a"]) +tb[, "a"] # equivalent to tb[, "a", drop = FALSE] -### `[[]]` +# returns integer vector +# str(tb[, "a", drop = TRUE]) +tb[, "a", drop = TRUE] +``` -```{r train} -x <- list(1:3, "a", 4:6) +```{r} +df <- data.frame(a = 1:2, b = 1:2) + +# returns integer vector +# str(df[, "a"]) +df[, "a"] + +# returns data frame with one column +# str(df[, "a", drop = FALSE]) +df[, "a", drop = FALSE] +``` +**Factors** + +Factor subsetting drop argument controls whether or not levels (rather than dimensions) are preserved. + +```{r} +z <- factor(c("a", "b", "c")) +z[1] +z[1, drop = TRUE] ``` -![](images/subsetting/train-1.png) +## Selecting a single element + +`[[` and `$` are used to extract single elements (note: a vector can be a single element) -![](images/subsetting/train-2.png) +### `[[]]` -![](images/subsetting/train-3.png) +Because `[[]]` can return only a single item, you must use it with either a single positive integer or a single string. -Credit: [Advanced R](https://adv-r.hadley.nz/index.html) by Hadley Wickham +```{r train} +x <- list(1:3, "a", 4:6) +x[[1]] +``` -![](images/subsetting/hadley-tweet.png) +Hadley Wickham recommends using `[[]]` with atomic vectors whenever you want to extract a single value to reinforce the expectation that you are getting and setting individual values. ### `$` - `x$y` is equivalent to `x[["y"]]` -the `$` operator doens't work with stored vals +the `$` operator doesn't work with stored vals -```{r, eval=FALSE} +```{r} var <- "cyl" + # Doesn't work - mtcars$var translated to mtcars[["var"]] mtcars$var -#> NULL # Instead use [[ mtcars[[var]] -#> [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4 ``` `$` allows partial matching, `[[]]` does not -```{r, eval=FALSE} +```{r} x <- list(abc = 1) x$a -#> [1] 1 + x[["a"]] -#> NULL + ``` Hadley advises to change Global settings: -```{r, eval=FALSE} +```{r} options(warnPartialMatchDollar = TRUE) x$a -#> Warning in x$a: partial match of 'a' to 'abc' -#> [1] 1 ``` tibbles don't have this behavior + ```{r} penguins$s ``` ### missing and out of bound indices + - Due to the inconsistency of how R handles such indices, `purrr::pluck()` and `purrr::chuck()` are recommended + ```{r, eval=FALSE} x <- list( a = list(1, 2, 3), @@ -275,6 +389,7 @@ Subsetting with nothing can preserve structure of original object Applications copied from cohort 2 slide ### Lookup tables (character subsetting) + ```{r, eval=FALSE} x <- c("m", "f", "u", "f", "f", "m", "m") lookup <- c(m = "Male", f = "Female", u = NA) @@ -284,7 +399,9 @@ lookup[x] ``` ### Matching and merging by hand (integer subsetting) + - The `match()` function allows merging a vector with a table + ```{r, eval=FALSE} grades <- c("D", "A", "C", "B", "F") info <- data.frame( @@ -304,8 +421,8 @@ info[id, ] # 5 F Poor TRUE ``` - ### Random samples and bootstrapping (integer subsetting) + ```{r, eval=FALSE} # mtcars[sample(nrow(mtcars), 3), ] # use replace = TRUE to replace # mpg cyl disp hp drat wt qsec vs am gear carb @@ -314,8 +431,8 @@ info[id, ] # Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4 ``` - ### Ordering (integer subsetting) + ```{r, eval=FALSE} # mtcars[order(mtcars$mpg), ] # mpg cyl disp hp drat wt qsec vs am gear carb @@ -328,9 +445,10 @@ info[id, ] # ... ``` - ### Expanding aggregated counts (integer subsetting) + - We can expand a count column by using `rep()` + ```{r, eval=FALSE} df <- tibble::tibble(x = c("Amy", "Julie", "Brian"), n = c(2, 1, 3)) df[rep(1:nrow(df), df$n), ] @@ -345,10 +463,10 @@ df[rep(1:nrow(df), df$n), ] # 6 Brian 3 ``` - - ### Removing columns from data frames (character) + - We can remove a column by subsetting, which does not change the object + ```{r, eval=FALSE} df[, 1] # A tibble: 3 x 1 @@ -358,7 +476,9 @@ df[, 1] # 2 Julie # 3 Brian ``` + - We can also delete the column using `NULL` + ```{r, eval=FALSE} df$n <- NULL df @@ -370,8 +490,6 @@ df # 3 Brian ``` - - ### Selecting rows based on a condition (logical subsetting) ```{r, eval=FALSE} @@ -384,9 +502,8 @@ df # Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.6 0 1 5 8 ``` - - ### Boolean algebra versus sets (logical and integer) + - `which()` gives the indices of a Boolean vector ```{r, eval=FALSE} @@ -402,7 +519,6 @@ x1 & y1 # [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE ``` - ## Meeting Videos ### Cohort 1 diff --git a/bookclub-advr.Rproj b/bookclub-advr.Rproj @@ -1,18 +1,18 @@ -Version: 1.0 - -RestoreWorkspace: Default -SaveWorkspace: Default -AlwaysSaveHistory: Default - -EnableCodeIndexing: Yes -UseSpacesForTab: Yes -NumSpacesForTab: 2 -Encoding: UTF-8 - -RnwWeave: Sweave -LaTeX: pdfLaTeX - -AutoAppendNewline: Yes -StripTrailingWhitespace: Yes - -BuildType: Website +Version: 1.0 + +RestoreWorkspace: Default +SaveWorkspace: Default +AlwaysSaveHistory: Default + +EnableCodeIndexing: Yes +UseSpacesForTab: Yes +NumSpacesForTab: 2 +Encoding: UTF-8 + +RnwWeave: Sweave +LaTeX: pdfLaTeX + +AutoAppendNewline: Yes +StripTrailingWhitespace: Yes + +BuildType: Website diff --git a/bookclub-advr_cache/html/__packages b/bookclub-advr_cache/html/__packages @@ -0,0 +1,24 @@ +DiagrammeR +lobstr +palmerpenguins +ggplot2 +tidyverse +tibble +tidyr +readr +purrr +dplyr +stringr +forcats +lubridate +rlang +scales +memoise +ids +R6 +deSolve +reshape2 +patchwork +profvis +bench +ggbeeswarm diff --git a/bookclub-advr_cache/html/unnamed-chunk-382_772128e708c11feeae6c581e167c32a4.RData b/bookclub-advr_cache/html/unnamed-chunk-382_772128e708c11feeae6c581e167c32a4.RData Binary files differ. diff --git a/bookclub-advr_cache/html/unnamed-chunk-382_772128e708c11feeae6c581e167c32a4.rdb b/bookclub-advr_cache/html/unnamed-chunk-382_772128e708c11feeae6c581e167c32a4.rdb Binary files differ. diff --git a/bookclub-advr_cache/html/unnamed-chunk-382_772128e708c11feeae6c581e167c32a4.rdx b/bookclub-advr_cache/html/unnamed-chunk-382_772128e708c11feeae6c581e167c32a4.rdx Binary files differ. diff --git a/bookclub-advr_cache/html/unnamed-chunk-392_0d65042b46d439c7ecde1d7c4bf84de4.RData b/bookclub-advr_cache/html/unnamed-chunk-392_0d65042b46d439c7ecde1d7c4bf84de4.RData Binary files differ. diff --git a/bookclub-advr_cache/html/unnamed-chunk-392_0d65042b46d439c7ecde1d7c4bf84de4.rdb b/bookclub-advr_cache/html/unnamed-chunk-392_0d65042b46d439c7ecde1d7c4bf84de4.rdb Binary files differ. diff --git a/bookclub-advr_cache/html/unnamed-chunk-392_0d65042b46d439c7ecde1d7c4bf84de4.rdx b/bookclub-advr_cache/html/unnamed-chunk-392_0d65042b46d439c7ecde1d7c4bf84de4.rdx Binary files differ.