bookclub-advr

DSLC Advanced R Book Club
git clone https://git.eamoncaddigan.net/bookclub-advr.git
Log | Files | Refs | README | LICENSE

commit e2dff30303ebdd2389ee59d3f57a7934f4745f35
parent cc279c49c1522df914e326a6e22ad282ed84ffac
Author: Arthur Shaw <47256431+arthur-shaw@users.noreply.github.com>
Date:   Wed,  8 Jun 2022 11:01:10 -0400

Draft notes on chapter 3. (#8)

* Draft notes on chapter 3.

* Update README and GHA to latest standards.

* Move knitr_opts to index.Rmd.

* Ignore transient html files during build.

This prevents weird things from getting checked in if a build fails.

Co-authored-by: Jon Harmon <jonthegeek@gmail.com>
Diffstat:
M.github/workflows/deploy_bookdown.yml | 2++
M.github/workflows/pr_check.yml | 5+++--
A.github/workflows/pr_check_readme.yml | 14++++++++++++++
M.gitignore | 1+
M03_Vectors.Rmd | 736++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
MREADME.md | 31+++++++++++++++++++++----------
Aimages/vectors/summary-tree-atomic.png | 0
Aimages/vectors/summary-tree-s3-1.png | 0
Aimages/vectors/summary-tree-s3-2.png | 0
Aimages/vectors/summary-tree.png | 0
Mindex.Rmd | 8++++++++
11 files changed, 781 insertions(+), 16 deletions(-)

diff --git a/.github/workflows/deploy_bookdown.yml b/.github/workflows/deploy_bookdown.yml @@ -1,6 +1,8 @@ on: push: branches: main + paths-ignore: + - 'README.md' workflow_dispatch: name: renderbook diff --git a/.github/workflows/pr_check.yml b/.github/workflows/pr_check.yml @@ -1,10 +1,11 @@ +name: pr_check on: pull_request: branches: main + paths-ignore: + - 'README.md' workflow_dispatch: -name: pr_check - jobs: bookdown: name: pr_check_book diff --git a/.github/workflows/pr_check_readme.yml b/.github/workflows/pr_check_readme.yml @@ -0,0 +1,14 @@ +name: pr_check +on: + pull_request: + branches: main + paths: + - 'README.md' + workflow_dispatch: + +jobs: + bookdown: + name: pr_check_book + runs-on: ubuntu-latest + steps: + - run: 'echo "No build required" ' diff --git a/.gitignore b/.gitignore @@ -10,3 +10,4 @@ bookclub-advr.html bookclub-advr.knit.md bookclub-advr_files libs +*.html diff --git a/03_Vectors.Rmd b/03_Vectors.Rmd @@ -2,12 +2,740 @@ **Learning objectives:** -- THESE ARE NICE TO HAVE BUT NOT ABSOLUTELY NECESSARY +- Learn about different types of vectors +- Learn how these types relate to one another -## SLIDE 1 +## Types of vectors + +The family tree of vectors: + +![](images/vectors/summary-tree.png) +Credit: [Advanced R](https://adv-r.hadley.nz/index.html) by Hadley Wickham + +- **Atomic.** Elements all the same type. +- **List.** Elements are different Types. +- **NULL** Null elements. Length zero. + +## Atomic vectors + +### Types + +- The vector family tree revisited. +- Meet the children of atomic vectors + +![](images/vectors/summary-tree-atomic.png) +Credit: [Advanced R](https://adv-r.hadley.nz/index.html) by Hadley Wickham + +### Length one + +"Scalars" that consist of a single value. + +```{r vec_lgl} +# Logicals +lgl1 <- TRUE +lgl2 <- T +``` + +```{r vec_dbl} +# Doubles +# integer, decimal, scientific, or hexidecimal format +dbl1 <- 1 +dbl2 <- 1.234 +dbl3 <- 1.234e0 +dbl4 <- 0xcafe +``` + +```{r vec_int} +# Integers +# Note: L denotes an integer +int1 <- 1L +int2 <- 1.234L +int3 <- 1.234e0L +int4 <- 0xcafeL +``` + +```{r vec_str} +# Strings +str1 <- "hello" # double quotes +str2 <- 'hello' # single quotes +str3 <- "مرحبًا" # Unicode +str4 <- "\U0001f605" # sweaty_smile +``` + +### Longer + +Several ways to make longer: + +**1. With single values** + +```{r long_single} +lgl_vec <- c(TRUE, FALSE) + +``` + + +**2. With other vectors** + +```{r long_vec} +c(c(1, 2), c(3, 4)) +``` + +**See also** + +`{rlang}` has [vector constructor functions too](https://rlang.r-lib.org/reference/vector-construction.html): + +- `rlang::lgl(...)` +- `rlang::int(...)` +- `rlang::dbl(...)` +- `rlang::chr(...)` + +They look to do both more and less than `c()`. + +- More: + - Enforce type + - Splice lists + - More types: `rlang::bytes()`, `rlang::cpl(...)` +- Less: + - Stricter rules on names + +Note: currently has `questioning` lifecycle badge, since these constructors may get moved to `vctrs` + +### Missing values + +**Contagion** + +For most computations, an operation over values that includes a missing value yields a missing value (unless you're careful) + +```{r na_contagion} +# contagion +5*NA +sum(c(1, 2, NA, 3)) + +# innoculate +sum(c(1, 2, NA, 3), na.rm = TRUE) + +``` +**Types** + +Each type has its own NA type + +- Logical: `NA` +- Integer: `NA_integer` +- Double: `NA_double` +- Character: `NA_character` + +This may not matter in many contexts. + +But this does matter for operations where types matter like `dplyr::if_else()`. + +### Testing + +**What type of vector `is.*`() it?** + +Test data type: + +- Logical: `is.logical()` +- Integer: `is.integer()` +- Double: `is.double()` +- Character: `is.character()` + +**What type of object is it?** + +Don't test objects with these tools: + +- `is.vector()` +- `is.atomic()` +- `is.numeric()` + +Instead, maybe, use `{rlang}` + +- `rlang::is_vector` +- `rlang::is_atomic` + +```{r test_rlang} +# vector +rlang::is_vector(c(1, 2)) +rlang::is_vector(list(1, 2)) + +# atomic +rlang::is_atomic(c(1, 2)) +rlang::is_atomic(list(1, "a")) + +``` + + +See more [here](https://rlang.r-lib.org/reference/type-predicates.html) + +### Coercion + +R follows rules for coercion: character → double → integer → logical + +R can coerce either automatically or explicitly + +**Automatic** + +Two contexts for automatic coercion: + +1. Combination +1. Mathematical + +Combination: + +```{r coerce_c} +str(c(TRUE, "TRUE")) +``` + +Mathematical operations + +```{r coerce_math} +# imagine a logical vector about whether an attribute is present +has_attribute <- c(TRUE, FALSE, TRUE, TRUE) + +# number with attribute +sum(has_attribute) +``` + +**Explicit** + +Use `as.*()` + +- Logical: `as.logical()` +- Integer: `as.integer()` +- Double: `as.double()` +- Character: `as.character()` + +But note that coercions may fail in one of two ways, or both: + +- With warning/error +- NAs + +```{r coerce_error} +as.integer(c(1, 2, "three")) +``` + +## Attributes + +- What +- How +- Why + +### What + +Two perspectives: + +- Name-value pairs +- Metadata + +**Name-value pairs** + +Formally, attributes have a name and a value. + +**Metadata** + +- Not data itself +- But data about the data + +### How + +Two operations: + +1. Get +1. Set + +Two cases: + +1. Single attribute +2. Multiple attributes + +**Single attribute** + +Use `attr()` + +```{r attr_single} +# some object +a <- c(1, 2, 3) + +# set attribute +attr(x = a, which = "some_attribute_name") <- "some attribute" + +# get attribute +attr(x = a, which = "some_attribute_name") +``` +**Multiple attributes** + +To set multiple attributes, use `structure()` +To get multiple attributes, use `attributes()` + +```{r attr_multiple} +b <- c(4, 5, 6) + +# set +b <- structure( + .Data = b, + attrib1 = "one", + attrib2 = "two" +) + +# get +str(attributes(b)) +``` + +### Why + +Two common use cases: + +- Names +- Dimensions + +**Names** + +~~Three~~ Four ways to name: + +```{r} +# 1. At creation +one <- c(one = 1, two = 2, three = 3) + +# 2. By assigning a character vector of names +two <- c(1, 2, 3) +names(two) <- c("one", "two", "three") + +# 3. By setting names--with base R +three <- c(1, 2, 3) +stats::setNames( + object = three, + nm = c("One", "Two", "Three") +) + +# 4. By setting names--with {rlang} +rlang::set_names( + x = three, + nm = c("One", "Two", "Three") +) +``` + +Thematically but not directly related: labelled class vectors with `haven::labelled()` + +**Dimensions** + +Important for arrays and matrices. + +```{r} +# length 6 vector spread across 2 rows of 3 columns +matrix(1:6, nrow = 2, ncol = 3) +``` + +## S3 atomic vectors + +- The vector family tree revisited. +- Meet the children of typed atomic vectors + +![](images/vectors/summary-tree-s3-1.png) +Credit: [Advanced R](https://adv-r.hadley.nz/index.html) by Hadley Wickham + +This list could (more easily) be expanded to new vector types with [`{vctrs}`](https://vctrs.r-lib.org/). See [rstudio::conf(2019) talk on the package around 18:27](https://www.rstudio.com/resources/rstudioconf-2019/vctrs-tools-for-making-size-and-type-consistent-functions/). See also [rstudio::conf(2020) talk on new vector types for dealing with non-decimal currencies](https://www.rstudio.com/resources/rstudioconf-2020/vctrs-creating-custom-vector-classes-with-the-vctrs-package/). + +What makes S3 atomic vectors different than their parents? + +Two things: + +1. Class +2. Attributes (typically) + +### Factors + +Factors are integer vectors with: + +- Class: "factor" +- Attributes: "levels", or the set of allowed values + +```{r factor} +# Build a factor +a_factor <- factor( + # values + x = c(1, 2, 3), + # exhaustive list of values + levels = c(1, 2, 3, 4) +) + +# Inspect +a_factor + +# Dissect +# - type +typeof(a_factor) + +# - attributes +attributes(a_factor) +``` + +Factors can be ordered. This can be useful for models or visaulations where order matters. + +```{r factor_ordered} +# Build +ordered_factor <- ordered( + # values + x = c(1, 2, 3), + # levels in ascending order + levels = c(4, 3, 2, 1) +) + +# Inspect +ordered_factor +``` + +### Dates + +Dates are: + +- Double vectors +- With class "Date" + +The double component represents the number of days since since `1970-01-01` + +```{r dates} +notes_date <- Sys.Date() + +# type +typeof(notes_date) + +# class +attributes(notes_date) +``` + +### Date-times + +There are 2 Date-time representations in base R: + +- POSIXct, where "ct" denotes calendar time +- POSIXlt, where "lt" designates local time. + +Let's focus on POSIXct because: + +- Simplest +- Built on an atomic vector +- Most apt to be in a data frame + +Let's now build and deconstruct a Date-time + +```{r date_time} +# Build +note_date_time <- as.POSIXct( + # time + x = Sys.time(), + # time zone, used only for formatting + tz = "EDT" +) + +# Inspect +note_date_time + +# Dissect +# - type +typeof(note_date_time) +# - attributes +attributes(note_date_time) +``` + + +### Durations + +Durations are: + +- Double vectors +- Class: "difftime" +- Attributes: "units", or the unit of duration (e.g., weeks, hours, minutes, seconds, etc.) + +```{r durations} +# Construct +one_minute <- as.difftime(1, units = "mins") + +# Inspect +one_minute + +# Dissect +# - type +typeof(one_minute) +# - attributes +attributes(one_minute) +``` + +See also: + +- [`lubridate::make_difftime()`](https://lubridate.tidyverse.org/reference/make_difftime.html) +- [`clock::date_time_build()`](https://clock.r-lib.org/reference/date_time_build.html) + +## Lists + +Sometimes called a generic vector, a list can be composed of elements of different types. + +### Constructing + +Simple lists: + +```{r list_simple} +# Construct +simple_list <- list( + # logicals + c(TRUE, FALSE), + # integers + 1:20, + # doubles + c(1.2, 2.3, 3.4), + # characters + c("primo", "secundo", "tercio") +) + +# Inspect +# - type +typeof(simple_list) +# - structure +str(simple_list) + +``` +Nested lists: + +```{r list_nested} +nested_list <- list( + # first level + list( + # second level + list( + # third level + list(1) + ) + ) +) + +str(nested_list) +``` + +Like JSON. + +Combined lists + +```{r list_combined} +# with list() +list_comb1 <- list(list(1, 2), list(3, 4)) +# with c() +list_comb2 <- c(list(1, 2), list(3, 4)) + +# compare structure +str(list_comb1) +str(list_comb2) +``` + +### Testing + +Check that is a list: + +- `is.list()` +- `rlang::is_list()`` + +The two do the same, except that the latter can check for the number of elements + +```{r list_test} +# is list +base::is.list(list_comb2) +rlang::is_list(list_comb2) + +# is list of 4 elements +rlang::is_list(x = list_comb2, n = 4) + +# is a vector (of a special type) +# remember the family tree? +rlang::is_vector(list_comb2) +``` + + +### Coercion + +## Data frames and tibbles + +- The vector family tree revisited. +- Meet the children of lists + +![](images/vectors/summary-tree-s3-2.png) +Credit: [Advanced R](https://adv-r.hadley.nz/index.html) by Hadley Wickham + +### Data frame + +A data frame is a: + +- Named list of vectors (i.e., column names) +- Class: "data frame" +- Attributes: + - (column) `names` + - `row.names`` + +```{r data_frame} +# Construct +df <- data.frame( + # named atomic vector + col1 = c(1, 2, 3), + # another named atomic vector + col2 = c("un", "deux", "trois"), + # not necessary after R 4.1 (?) + stringsAsFactors = FALSE +) + +# Inspect +df + +# Deconstruct +# - type +typeof(df) +# - attributes +attributes(df) +``` + + +Unlike other lists, the length of each vector must be the same (i.e. as many vector elements as rows in the data frame). + +### Tibble + +As compared to data frames, tibbles are data frames that are: + +- Lazy +- Surly + +#### Lazy + +Tibbles do not: + +- Coerce strings +- Transform non-syntactic names +- Recycle vectors of length greater than 1 + +**Coerce strings** + +```{r tbl_no_coerce} +chr_col <- c("don't", "factor", "me", "bro") + +# data frame +df <- data.frame( + a = chr_col, + # in R 4.1 and earlier, this was the default + stringsAsFactors = TRUE +) + +# tibble +tbl <- tibble::tibble( + a = chr_col +) + +# contrast the structure +str(df$a) +str(tbl$a) + +``` + +**Transform non-syntactic names** + +```{r tbl_col_name} +# data frame +df <- data.frame( + `1` = c(1, 2, 3) +) + +# tibble +tbl <- tibble::tibble( + `1` = c(1, 2, 3) +) + +# contrast the names +names(df) +names(tbl) +``` + +**Recycle vectors of length greater than 1** + +```{r tbl_recycle, error=TRUE} +# data frame +df <- data.frame( + col1 = c(1, 2, 3, 4), + col2 = c(1, 2) +) + +# tibble +tbl <- tibble::tibble( + col1 = c(1, 2, 3, 4), + col2 = c(1, 2) +) +``` + + +#### Surly + +Tibbles do only what they're asked and complain if what they're asked doesn't make sense: + +- Subsetting always yields a tibble +- Complains if cannot find column + +**Subsetting always yields a tibble** + +```{r tbl_subset} +# data frame +df <- data.frame( + col1 = c(1, 2, 3, 4) +) + +# tibble +tbl <- tibble::tibble( + col1 = c(1, 2, 3, 4) +) + +# contrast +df_col <- df[, "col1"] +str(df_col) +tbl_col <- tbl[, "col1"] +str(tbl_col) + +# to select a vector, do one of these instead +tbl_col_1 <- tbl[["col1"]] +str(tbl_col_1) +tbl_col_2 <- dplyr::pull(tbl, col1) +str(tbl_col_2) +``` + +**Complains if cannot find column** + +```{r tbl_col_match, warning=TRUE} +names(df) +df$col + +names(tbl) +tbl$col +``` + +### Testing + +Whether data frame: `is.data.frame()`. Note: both data frame and tibble are data frames. + +Whether tibble: `tibble::is_tibble`. Note: only tibbles are tibbles. Vanilla data frames are not. + +### Coercion + +- To data frame: `as.data.frame()` +- To tibble: `tibble::as_tibble()` + +## `NULL` + +Special type of object that: + +- Length 0 +- Cannot have attributes + +```{r null, error=TRUE} +typeof(NULL) +#> [1] "NULL" + +length(NULL) +#> [1] 0 + +x <- NULL +attr(x, "y") <- 1 +``` -- ADD SLIDES AS SECTIONS (`##`). -- TRY TO KEEP THEM RELATIVELY SLIDE-LIKE; THESE ARE NOTES, NOT THE BOOK ITSELF. ## Meeting Videos diff --git a/README.md b/README.md @@ -27,16 +27,27 @@ The slides from the old clubs are in a [separate repository](https://github.com/ This repository is structured as a [{bookdown}](https://CRAN.R-project.org/package=bookdown) site. To present, follow these instructions: +Do these steps once: + 1. [Setup Github Locally](https://www.youtube.com/watch?v=hNUNPkoledI) (also see [_Happy Git and GitHub for the useR_](https://happygitwithr.com/github-acct.html)) -2. Install {usethis} `install.packages("usethis")` -3. `usethis::create_from_github("r4ds/bookclub-advr")` (cleanly creates your own copy of this repository). -4. `usethis::pr_init("my-chapter")` (creates a branch for your work, to avoid confusion). -5. Edit the appropriate chapter file, if necessary. Use `##` to indicate new slides (new sections). -7. If you use any packages that are not already in the `DESCRIPTION`, add them. You can use `usethis::use_package("myCoolPackage")` to add them quickly! -8. Build the book! ctrl-shift-b (or command-shift-b) will render the full book, or ctrl-shift-k (command-shift-k) to render just your slide. Please do this to make sure it works before you push your changes up to the main repo! -9. Commit your changes (either through the command line or using Rstudio's Git tab). -10. `usethis::pr_push()` (pushes the changes up to github, and opens a "pull request" (PR) to let us know your work is ready). -11. (If we request changes, make them) -12. When your PR has been accepted ("merged"), `usethis::pr_finish()` to close out your branch and prepare your local repository for future work. +2. Install {usethis} and {devtools} `install.packages(c("usethis", "devtools"))` +3. Set up a default {usethis} directory: + - `usethis::edit_r_profile()` to open your profile for editing. + - Add this line: `options(usethis.destdir = "YOURDIR")` (replace `YOURDIR` with the root directory under which you want your R projects to appear; or you can skip these steps, and the project will be saved to your Desktop). + - Restart your R session (Session/Restart R in Rstudio). +4. `usethis::create_from_github("r4ds/bookclub-advr")` (cleanly creates your own copy of this repository). + +Do these steps each time you present another chapter: + +1. Open your project for this book. +2. `usethis::pr_init("my-chapter")` (creates a branch for your work, to avoid confusion, making sure that you have the latest changes from other contributors; replace `my-chapter` with a descriptive name, ideally). +3. `devtools::install_dev_deps()` (installs any packages used by the book that you don't already have installed). +4. Edit the appropriate chapter file, if necessary. Use `##` to indicate new slides (new sections). +5. If you use any packages that are not already in the `DESCRIPTION`, add them. You can use `usethis::use_package("myCoolPackage")` to add them quickly! +6. Build the book! ctrl-shift-b (or command-shift-b) will render the full book, or ctrl-shift-k (command-shift-k) to render just your slide. Please do this to make sure it works before you push your changes up to the main repo! +7. Commit your changes (either through the command line or using Rstudio's Git tab). +8. `usethis::pr_push()` (pushes the changes up to github, and opens a "pull request" (PR) to let us know your work is ready). +9. (If we request changes, make them) +10. When your PR has been accepted ("merged"), `usethis::pr_finish()` to close out your branch and prepare your local repository for future work. When your PR is checked into the main branch, the bookdown site will rebuild, adding your slides to [this site](https://r4ds.io/advr). diff --git a/images/vectors/summary-tree-atomic.png b/images/vectors/summary-tree-atomic.png Binary files differ. diff --git a/images/vectors/summary-tree-s3-1.png b/images/vectors/summary-tree-s3-1.png Binary files differ. diff --git a/images/vectors/summary-tree-s3-2.png b/images/vectors/summary-tree-s3-2.png Binary files differ. diff --git a/images/vectors/summary-tree.png b/images/vectors/summary-tree.png Binary files differ. diff --git a/index.Rmd b/index.Rmd @@ -13,6 +13,14 @@ description: "This is the product of the R4DS Online Learning Community's Advanc # Welcome {-} +```{r knitr_opts, echo=FALSE, message=FALSE, warning=FALSE} +knitr::opts_chunk$set( + echo = TRUE, + comment = "#>", + collapse = TRUE +) +``` + Welcome to the bookclub! This is a companion for the book [_Advanced R_](https://adv-r.hadley.nz/) by Hadley Wickham (Chapman & Hall, copyright 2019, [9780815384571](https://www.routledge.com/Advanced-R-Second-Edition/Wickham/p/book/9780815384571)).