commit 05745ca52c11ad9905777fba1e4c95a9e461be83
parent da54ce7f8fbe516c9ff405e981b28b1dabe3f1bc
Author: Jon Harmon <jonthegeek@gmail.com>
Date:   Mon,  4 Aug 2025 05:17:50 -0500
Restructure into listings (#80)
Diffstat:
17 files changed, 764 insertions(+), 721 deletions(-)
diff --git a/01.qmd b/01.qmd
@@ -0,0 +1,16 @@
+---
+title: "1. Introduction"
+listing: 
+  - contents: "slides/01.qmd"
+    id: slides
+    type: grid
+  - contents: "videos/01"
+    id: videos
+    type: grid
+    grid-columns: 2
+    sort: "filename desc"
+header-includes:
+  - '<script src="listing_new_window.js" defer></script>'
+---
+
+{{< include _chapter_body.qmd >}}
diff --git a/02.qmd b/02.qmd
@@ -0,0 +1,16 @@
+---
+title: "2. Names and values"
+listing: 
+  - contents: "slides/02.qmd"
+    id: slides
+    type: grid
+  - contents: "videos/02"
+    id: videos
+    type: grid
+    grid-columns: 2
+    sort: "filename desc"
+header-includes:
+  - '<script src="listing_new_window.js" defer></script>'
+---
+
+{{< include _chapter_body.qmd >}}
diff --git a/_chapter_body.qmd b/_chapter_body.qmd
@@ -0,0 +1,7 @@
+:::{#slides}
+:::
+
+## Meeting videos
+
+:::{#videos}
+:::
diff --git a/_quarto.yml b/_quarto.yml
@@ -21,14 +21,10 @@ website:
         target: advr_club-slides
       - section: "Getting started"
         contents:
-        - section: slides/01-introduction.qmd
-          target: advr_club-slides
-          contents:
-          - file: videos/01.qmd
+        - 01.qmd
       - section: "Foundations"
         contents:
-          - file: slides/02_Names_and_values.Rmd
-            target: advr_club-slides
+          - 02.qmd
           - file: slides/03_Vectors.Rmd
             target: advr_club-slides
           - file: slides/04_Subsetting.Rmd
diff --git a/listing_new_window.js b/listing_new_window.js
@@ -0,0 +1,9 @@
+// listing_new_window.js
+document.addEventListener("DOMContentLoaded", function() {
+  const links = document.querySelectorAll('#listing-slides a, #listing-videos a');
+  
+  links.forEach(link => {
+    link.target = '_blank';
+    link.rel = 'noopener noreferrer';
+  });
+});
diff --git a/slides/01-introduction.qmd b/slides/01-introduction.qmd
@@ -1,105 +0,0 @@
----
-engine: knitr
-title: Introduction
----
-
-# ️✅ Learning objectives
-
-## LOs for the entire book
-
-- Improve programming skills.
-- Develop a deep understanding of R language fundamentals.
-- Understand what functional programming means.
-- Understand object-oriented programming as applied in R.
-- Understand metaprogramming while developing in R.
-- Be able to identify what to optimize and how to optimize it.
-
-## LOs for this chapter
-
-- Recognize the differences between the 1st and 2nd edition of this book.
-- Describe the overall structure of the book.
-- Decide whether this book is right for you.
-
-Books suggestions:
-
-- [The Structure and Interpretation of Computer Programs (SICP)](https://mitp-content-server.mit.edu/books/content/sectbyfn/books_pres_0/6515/sicp.zip/full-text/book/book.html)
-- [Concepts, Techniques and Models of Computer Programming](https://mitpress.mit.edu/books/concepts-techniques-and-models-computer-programming)
-- [The Pragmatic Programmer](https://pragprog.com/titles/tpp20/the-pragmatic-programmer-20th-anniversary-edition/)
-
-# What's new?
-
-## Hadley's goals
-
-- Improve coverage of concepts Hadley understood better after 1e
-- Reduce coverage of unimportant topics
-- Easier to understand (including many more diagrams)
-
-## Base vs rlang
-
-- [1e](http://adv-r.had.co.nz) used base R almost exclusively
-- 2e uses {[rlang](https://rlang.r-lib.org/)}, {[purrr](https://purrr.tidyverse.org/)}, etc
-
-# What we'll learn
-
-## The 5 sections
-
-- **Foundations:** (7 chapters) Building blocks of R
-- **Functional programming:** (3 chapters) Treating functions as objects (that can be args in functions)
-- **Object-oriented programming:** (5 chapters + 1) The many object systems of R (we'll add S7)
-- **Metaprogramming:** (5 chapters) Generating code with code
-- **Techniques:** (4 chapters) Debugging, measuring performance, improving performance
-
-::: notes
-- Might be useful to open TOC here.
-:::
-
-## Why R?
-
-- Diverse & welcoming community
-- Many packages for stats & modeling, ML, dataviz, data wrangling
-- Rmarkdown / Quarto
-- RStudio / Positron
-- Often used in science
-- Functional programming powerful for data
-- Metaprogramming
-- Ease of connection to C, C++, etc
-
-## R imperfections
-
-- Much code by non-coders (messy)
-- Community more about results than programming best practices
-- Metaprogramming can lead to weird failures
-- Inconsistency from > 30 years of evolution
-- Poorly written R code runs very poorly
-
-## Who should read Advanced R?
-
-- Intermediate (and up) R programmers who want to really understand R
-- Programmers from other langs who want to know why R is weird
-- Prereqs:
-  - You've written lots of code
-  - You understand basics of data analysis
-  - You can install CRAN packages
-
-## What this book is not
-
-- [R for Data Science](https://r4ds.hadley.nz/)
-- [R Packages](https://r-pkgs.org/)
-
-## Meta-techniques
-
-- Read source code
-  - F2 to see code in RStudio/Positron (with RStudio bindings)
-- Adopt a scientific mindset
-  - Don't understand something? Hypothesize & experiment
-
-## Other books
-
-- The Structure and Interpretation of Computer Programs (Abelson, Sussman, and Sussman, 1996) [PDF](https://web.mit.edu/6.001/6.037/sicp.pdf)
-- Concepts, Techniques and Models of Computer Programming (Van Roy & Haridi, 2003) [PDF](https://webperso.info.ucl.ac.be/~pvr/VanRoyHaridi2003-book.pdf)
-- The Pragmatic Programmer (Hunt & Thomas, 1990) [buy eBook](https://pragprog.com/titles/tpp20/the-pragmatic-programmer-20th-anniversary-edition/)
-
-::: notes
-- As far as I can tell, first 2 PDFs are legal.
-- I don't think a legal, free version of The Pragmatic Programmer is available.
-:::
diff --git a/slides/01.qmd b/slides/01.qmd
@@ -0,0 +1,99 @@
+---
+engine: knitr
+title: Introduction
+---
+
+# ️✅ Learning objectives
+
+## LOs for the entire book
+
+- Improve programming skills.
+- Develop a deep understanding of R language fundamentals.
+- Understand what functional programming means.
+- Understand object-oriented programming as applied in R.
+- Understand metaprogramming while developing in R.
+- Be able to identify what to optimize and how to optimize it.
+
+## LOs for this chapter
+
+- Recognize the differences between the 1st and 2nd edition of this book.
+- Describe the overall structure of the book.
+- Decide whether this book is right for you.
+
+# What's new?
+
+## Hadley's goals
+
+- Improve coverage of concepts Hadley understood better after 1e
+- Reduce coverage of unimportant topics
+- Easier to understand (including many more diagrams)
+
+## Base vs rlang
+
+- [1e](http://adv-r.had.co.nz) used base R almost exclusively
+- 2e uses {[rlang](https://rlang.r-lib.org/)}, {[purrr](https://purrr.tidyverse.org/)}, etc
+
+# What we'll learn
+
+## The 5 sections
+
+- **Foundations:** (7 chapters) Building blocks of R
+- **Functional programming:** (3 chapters) Treating functions as objects (that can be args in functions)
+- **Object-oriented programming:** (5 chapters + 1) The many object systems of R (we'll add S7)
+- **Metaprogramming:** (5 chapters) Generating code with code
+- **Techniques:** (4 chapters) Debugging, measuring performance, improving performance
+
+::: notes
+- Might be useful to open TOC here.
+:::
+
+## Why R?
+
+- Diverse & welcoming community
+- Many packages for stats & modeling, ML, dataviz, data wrangling
+- Rmarkdown / Quarto
+- RStudio / Positron
+- Often used in science
+- Functional programming powerful for data
+- Metaprogramming
+- Ease of connection to C, C++, etc
+
+## R imperfections
+
+- Much code by non-coders (messy)
+- Community more about results than programming best practices
+- Metaprogramming can lead to weird failures
+- Inconsistency from > 30 years of evolution
+- Poorly written R code runs very poorly
+
+## Who should read Advanced R?
+
+- Intermediate (and up) R programmers who want to really understand R
+- Programmers from other langs who want to know why R is weird
+- Prereqs:
+  - You've written lots of code
+  - You understand basics of data analysis
+  - You can install CRAN packages
+
+## What this book is not
+
+- [R for Data Science](https://r4ds.hadley.nz/)
+- [R Packages](https://r-pkgs.org/)
+
+## Meta-techniques
+
+- Read source code
+  - F2 to see code in RStudio/Positron (with RStudio bindings)
+- Adopt a scientific mindset
+  - Don't understand something? Hypothesize & experiment
+
+## Other books
+
+- The Structure and Interpretation of Computer Programs (Abelson, Sussman, and Sussman, 1996) [PDF](https://web.mit.edu/6.001/6.037/sicp.pdf)
+- Concepts, Techniques and Models of Computer Programming (Van Roy & Haridi, 2003) [PDF](https://webperso.info.ucl.ac.be/~pvr/VanRoyHaridi2003-book.pdf)
+- The Pragmatic Programmer (Hunt & Thomas, 1990) [buy eBook](https://pragprog.com/titles/tpp20/the-pragmatic-programmer-20th-anniversary-edition/)
+
+::: notes
+- As far as I can tell, first 2 PDFs are legal.
+- I don't think a legal, free version of The Pragmatic Programmer is available.
+:::
diff --git a/slides/02.qmd b/slides/02.qmd
@@ -0,0 +1,546 @@
+---
+engine: knitr
+title: Names and values
+---
+
+## Learning objectives
+
+- To be able to understand distinction between an *object* and its *name*
+- With this knowledge, to be able write faster code using less memory
+- To better understand R's functional programming tools
+
+Using lobstr package here.
+```{r}
+library(lobstr)
+```
+
+
+## Quiz
+
+### 1. How do I create a new column called `3` that contains the sum of `1` and `2`?
+
+```{r}
+df <- data.frame(runif(3), runif(3))
+names(df) <- c(1, 2)
+df
+```
+
+```{r}
+df$`3` <- df$`1` + df$`2`
+df
+```
+
+**What makes these names challenging?**
+
+> You need to use backticks (`) when the name of an object doesn't start with a 
+> a character or '.' [or . followed by a number] (non-syntactic names).
+
+### 2. How much memory does `y` occupy?
+
+```{r}
+x <- runif(1e6)
+y <- list(x, x, x)
+```
+
+Need to use the lobstr package:
+```{r}
+lobstr::obj_size(y)
+```
+
+> Note that if you look in the RStudio Environment or use R base `object.size()`
+> you actually get a value of 24 MB
+
+```{r}
+object.size(y)
+```
+
+### 3. On which line does `a` get copied in the following example?
+```{r}
+a <- c(1, 5, 3, 2)
+b <- a
+b[[1]] <- 10
+```
+
+> Not until `b` is modified, the third line
+
+## Binding basics
+
+- Create values and *bind* a name to them
+- Names have values (rather than values have names)
+- Multiple names can refer to the same values
+- We can look at an object's address to keep track of the values independent of their names
+
+```{r}
+x <- c(1, 2, 3)
+y <- x
+obj_addr(x)
+obj_addr(y)
+```
+
+
+### Exercises
+
+##### 1. Explain the relationships
+```{r}
+a <- 1:10
+b <- a
+c <- b
+d <- 1:10
+```
+
+> `a` `b` and `c` are all names that refer to the first value `1:10`
+> 
+> `d` is a name that refers to the *second* value of `1:10`.
+
+
+##### 2. Do the following all point to the same underlying function object? hint: `lobstr::obj_addr()`
+```{r}
+obj_addr(mean)
+obj_addr(base::mean)
+obj_addr(get("mean"))
+obj_addr(evalq(mean))
+obj_addr(match.fun("mean"))
+```
+
+> Yes!
+
+## Copy-on-modify
+
+- If you modify a value bound to multiple names, it is 'copy-on-modify'
+- If you modify a value bound to a single name, it is 'modify-in-place'
+- Use `tracemem()` to see when a name's value changes
+
+```{r}
+x <- c(1, 2, 3)
+cat(tracemem(x), "\n")
+```
+
+```{r}
+y <- x
+y[[3]] <- 4L  # Changes (copy-on-modify)
+y[[3]] <- 5L  # Doesn't change (modify-in-place)
+```
+
+Turn off `tracemem()` with `untracemem()`
+
+> Can also use `ref(x)` to get the address of the value bound to a given name
+
+
+## Functions
+
+- Copying also applies within functions
+- If you copy (but don't modify) `x` within `f()`, no copy is made
+
+```{r}
+f <- function(a) {
+  a
+}
+
+x <- c(1, 2, 3)
+z <- f(x) # No change in value
+
+ref(x)
+ref(z)
+```
+
+<!--  -->
+
+## Lists
+
+- A list overall, has it's own reference (id)
+- List *elements* also each point to other values
+- List doesn't store the value, it *stores a reference to the value*
+- As of R 3.1.0, modifying lists creates a *shallow copy*
+    - References (bindings) are copied, but *values are not*
+
+```{r}
+l1 <- list(1, 2, 3)
+l2 <- l1
+l2[[3]] <- 4
+```
+
+- We can use `ref()` to see how they compare
+  - See how the list reference is different
+  - But first two items in each list are the same
+
+```{r}
+ref(l1, l2)
+```
+
+{width=50%}
+
+## Data Frames
+
+- Data frames are lists of vectors
+- So copying and modifying a column *only affects that column*
+- **BUT** if you modify a *row*, every column must be copied
+
+```{r}
+d1 <- data.frame(x = c(1, 5, 6), y = c(2, 4, 3))
+d2 <- d1
+d3 <- d1
+```
+
+Only the modified column changes
+```{r}
+d2[, 2] <- d2[, 2] * 2
+ref(d1, d2)
+```
+
+All columns change
+```{r}
+d3[1, ] <- d3[1, ] * 3
+ref(d1, d3)
+```
+
+## Character vectors
+
+- R has a **global string pool**
+- Elements of character vectors point to unique strings in the pool
+
+```{r}
+x <- c("a", "a", "abc", "d")
+```
+
+
+
+## Exercises
+
+##### 1. Why is `tracemem(1:10)` not useful?
+
+> Because it tries to trace a value that is not bound to a name
+
+##### 2. Why are there two copies?
+```{r}
+x <- c(1L, 2L, 3L)
+tracemem(x)
+x[[3]] <- 4
+```
+
+> Because we convert an *integer* vector (using 1L, etc.) to a *double* vector (using just 4)- 
+
+##### 3. What is the relationships among these objects?
+
+```{r}
+a <- 1:10      
+b <- list(a, a)
+c <- list(b, a, 1:10) # 
+```
+
+a <- obj 1    
+b <- obj 1, obj 1    
+c <- b(obj 1, obj 1), obj 1, 1:10    
+
+```{r}
+ref(c)
+```
+
+
+##### 4. What happens here?
+```{r}
+x <- list(1:10)
+x[[2]] <- x
+```
+
+- `x` is a list
+- `x[[2]] <- x` creates a new list, which in turn contains a reference to the 
+  original list
+- `x` is no longer bound to `list(1:10)`
+
+```{r}
+ref(x)
+```
+
+{width=50%}
+
+## Object Size
+
+- Use `lobstr::obj_size()` 
+- Lists may be smaller than expected because of referencing the same value
+- Strings may be smaller than expected because using global string pool
+- Difficult to predict how big something will be
+  - Can only add sizes together if they share no references in common
+
+### Alternative Representation
+- As of R 3.5.0 - ALTREP
+- Represent some vectors compactly
+    - e.g., 1:1000 - not 10,000 values, just 1 and 1,000
+
+### Exercises
+
+##### 1. Why are the sizes so different?
+
+```{r}
+y <- rep(list(runif(1e4)), 100)
+
+object.size(y) # ~8000 kB
+obj_size(y)    # ~80   kB
+```
+
+> From `?object.size()`: 
+> 
+> "This function merely provides a rough indication: it should be reasonably accurate for atomic vectors, but **does not detect if elements of a list are shared**, for example.
+
+##### 2. Why is the size misleading?
+
+```{r}
+funs <- list(mean, sd, var)
+obj_size(funs)
+```
+
+> Because they reference functions from base and stats, which are always available.
+> Why bother looking at the size? What use is that?
+
+##### 3. Predict the sizes
+
+```{r}
+a <- runif(1e6) # 8 MB
+obj_size(a)
+```
+
+
+```{r}
+b <- list(a, a)
+```
+
+- There is one value ~8MB
+- `a` and `b[[1]]` and `b[[2]]` all point to the same value.
+
+```{r}
+obj_size(b)
+obj_size(a, b)
+```
+
+
+```{r}
+b[[1]][[1]] <- 10
+```
+- Now there are two values ~8MB each (16MB total)
+- `a` and `b[[2]]` point to the same value (8MB)
+- `b[[1]]` is new (8MB) because the first element (`b[[1]][[1]]`) has been changed
+
+```{r}
+obj_size(b)     # 16 MB (two values, two element references)
+obj_size(a, b)  # 16 MB (a & b[[2]] point to the same value)
+```
+
+
+```{r}
+b[[2]][[1]] <- 10
+```
+- Finally, now there are three values ~8MB each (24MB total)
+- Although `b[[1]]` and `b[[2]]` have the same contents, 
+  they are not references to the same object.
+
+```{r}
+obj_size(b)
+obj_size(a, b)
+```
+
+
+## Modify-in-place
+
+- Modifying usually creates a copy except for
+    - Objects with a single binding (performance optimization)
+    - Environments (special)
+
+### Objects with a single binding
+
+- Hard to know if copy will occur
+- If you have 2+ bindings and remove them, R can't follow how many are removed (so will always think there are more than one)
+- May make a copy even if there's only one binding left
+- Using a function makes a reference to it **unless it's a function based on C**
+- Best to use `tracemem()` to check rather than guess.
+
+
+#### Example - lists vs. data frames in for loop
+
+**Setup**  
+
+Create the data to modify
+```{r}
+x <- data.frame(matrix(runif(5 * 1e4), ncol = 5))
+medians <- vapply(x, median, numeric(1))
+```
+
+
+**Data frame - Copied every time!**
+```{r}
+cat(tracemem(x), "\n")
+for (i in seq_along(medians)) {
+  x[[i]] <- x[[i]] - medians[[i]]
+}
+untracemem(x)
+```
+
+**List (uses internal C code) - Copied once!**
+```{r}
+y <- as.list(x)
+
+cat(tracemem(y), "\n")
+for (i in seq_along(medians)) {
+  y[[i]] <- y[[i]] - medians[[i]]
+}
+untracemem(y)
+```
+
+#### Benchmark this (Exercise #2)
+
+**First wrap in a function**
+```{r}
+med <- function(d, medians) {
+  for (i in seq_along(medians)) {
+    d[[i]] <- d[[i]] - medians[[i]]
+  }
+}
+```
+
+**Try with 5 columns**
+```{r}
+x <- data.frame(matrix(runif(5 * 1e4), ncol = 5))
+medians <- vapply(x, median, numeric(1))
+y <- as.list(x)
+
+bench::mark(
+  "data.frame" = med(x, medians),
+  "list" = med(y, medians)
+)
+```
+
+**Try with 20 columns**
+```{r}
+x <- data.frame(matrix(runif(5 * 1e4), ncol = 20))
+medians <- vapply(x, median, numeric(1))
+y <- as.list(x)
+
+bench::mark(
+  "data.frame" = med(x, medians),
+  "list" = med(y, medians)
+)
+```
+
+**WOW!**
+
+
+### Environmments
+- Always modified in place (**reference semantics**)
+- Interesting because if you modify the environment, all existing bindings have the same reference
+- If two names point to the same environment, and you update one, you update both!
+
+```{r}
+e1 <- rlang::env(a = 1, b = 2, c = 3)
+e2 <- e1
+e1$c <- 4
+e2$c
+```
+
+- This means that environments can contain themselves (!)
+
+### Exercises
+
+##### 1. Why isn't this circular?
+```{r}
+x <- list()
+x[[1]] <- x
+```
+
+> Because the binding to the list() object moves from `x` in the first line to `x[[1]]` in the second.
+
+##### 2. (see "Objects with a single binding")
+
+##### 3. What happens if you attempt to use tracemem() on an environment?
+
+```{r}
+#| error: true
+e1 <- rlang::env(a = 1, b = 2, c = 3)
+tracemem(e1)
+```
+
+> Because environments always modified in place, there's no point in tracing them
+
+
+## Unbinding and the garbage collector
+
+- If you delete the 'name' bound to an object, the object still exists
+- R runs a "garbage collector" (GC) to remove these objects when it needs more memory
+- "Looking from the outside, it’s basically impossible to predict when the GC will run. In fact, you shouldn’t even try."
+- If you want to know when it runs, use `gcinfo(TRUE)` to get a message printed
+- You can force GC with `gc()` but you never need to to use more memory *within* R
+- Only reason to do so is to free memory for other system software, or, to get the
+message printed about how much memory is being used
+
+```{r}
+gc()
+mem_used()
+```
+
+- These numbers will **not** be what you OS tells you because, 
+  1. It includes objects created by R, but not R interpreter
+  2. R and OS are lazy and don't reclaim/release memory until it's needed
+  3. R counts memory from objects, but there are gaps due to those that are deleted -> 
+  *memory fragmentation* [less memory actually available they you might think]
+
+
+## Meeting Videos
+
+### Cohort 1
+
+(no video recorded)
+
+### Cohort 2
+
+`r knitr::include_url("https://www.youtube.com/embed/pCiNj2JRK50")`
+
+### Cohort 3
+
+`r knitr::include_url("https://www.youtube.com/embed/-bEXdOoxO_E")`
+
+### Cohort 4
+
+`r knitr::include_url("https://www.youtube.com/embed/gcVU_F-L6zY")`
+
+### Cohort 5
+
+`r knitr::include_url("https://www.youtube.com/embed/aqcvKox9V0Q")`
+
+### Cohort 6
+
+`r knitr::include_url("https://www.youtube.com/embed/O4Oo_qO7SIY")`
+
+<details>
+<summary> Meeting chat log </summary>
+
+```
+00:16:57	Federica Gazzelloni:	cohort 2 video: https://www.youtube.com/watch?v=pCiNj2JRK50
+00:18:39	Federica Gazzelloni:	cohort 2 presentation: https://r4ds.github.io/bookclub-Advanced_R/Presentations/Week02/Cohort2_America/Chapter2Slides.html#1
+00:40:24	Arthur Shaw:	Just the opposite, Ryan. Very clear presentation!
+00:51:54	Trevin:	parquet?
+00:53:00	Arthur Shaw:	We may all be right. {arrow} looks to deal with feather and parquet files: https://arrow.apache.org/docs/r/
+01:00:04	Arthur Shaw:	Some questions for future meetings. (1) I find Ryan's use of slides hugely effective in conveying information. Would it be OK if future sessions (optionally) used slides? If so, should/could we commit slides to some folder on the repo? (2) I think reusing the images from Hadley's books really helps understanding and discussion. Is that OK to do? Here I'm thinking about copyright concerns. (If possible, I would rather not redraw variants of Hadley's images.)
+01:01:35	Federica Gazzelloni:	It's all ok, you can use past presentation, you don't need to push them to the repo, you can use the images from the book
+01:07:19	Federica Gazzelloni:	Can I use: gc(reset = TRUE) safely?
+```
+</details>
+
+### Cohort 7
+
+`r knitr::include_url("https://www.youtube.com/embed/kpAUoGO6elE")`
+
+<details>
+
+<summary>Meeting chat log</summary>
+```
+00:09:40	Ryan Honomichl:	https://drdoane.com/three-deep-truths-about-r/
+00:12:51	Robert Hilly:	Be right back
+00:36:12	Ryan Honomichl:	brb
+00:41:18	Ron:	I tried mapply and also got different answers
+00:41:44	collinberke:	Interesting, would like to know more what is going on.
+00:49:57	Robert Hilly:	simple_map <- function(x, f, ...) {
+  out <- vector("list", length(x))
+  for (i in seq_along(x)) {
+    out[[i]] <- f(x[[i]], ...)
+  }
+  out
+}
+```
+</details>
diff --git a/slides/02_Names_and_values.Rmd b/slides/02_Names_and_values.Rmd
@@ -1,543 +0,0 @@
-# Names and values
-
-**Learning objectives:**
-
-- To be able to understand distinction between an *object* and its *name*
-- With this knowledge, to be able write faster code using less memory
-- To better understand R's functional programming tools
-
-Using lobstr package here.
-```{r}
-library(lobstr)
-```
-
-
-### Quiz {-}
-
-##### 1. How do I create a new column called `3` that contains the sum of `1` and `2`? {-}
-
-```{r}
-df <- data.frame(runif(3), runif(3))
-names(df) <- c(1, 2)
-df
-```
-
-```{r}
-df$`3` <- df$`1` + df$`2`
-df
-```
-
-**What makes these names challenging?**
-
-> You need to use backticks (`) when the name of an object doesn't start with a 
-> a character or '.' [or . followed by a number] (non-syntactic names).
-
-##### 2. How much memory does `y` occupy? {-}
-
-```{r}
-x <- runif(1e6)
-y <- list(x, x, x)
-```
-
-Need to use the lobstr package:
-```{r}
-lobstr::obj_size(y)
-```
-
-> Note that if you look in the RStudio Environment or use R base `object.size()`
-> you actually get a value of 24 MB
-
-```{r}
-object.size(y)
-```
-
-##### 3. On which line does `a` get copied in the following example? {-}
-```{r}
-a <- c(1, 5, 3, 2)
-b <- a
-b[[1]] <- 10
-```
-
-> Not until `b` is modified, the third line
-
-## Binding basics {-}
-
-- Create values and *bind* a name to them
-- Names have values (rather than values have names)
-- Multiple names can refer to the same values
-- We can look at an object's address to keep track of the values independent of their names
-
-```{r}
-x <- c(1, 2, 3)
-y <- x
-obj_addr(x)
-obj_addr(y)
-```
-
-
-### Exercises {-}
-
-##### 1. Explain the relationships {-}
-```{r}
-a <- 1:10
-b <- a
-c <- b
-d <- 1:10
-```
-
-> `a` `b` and `c` are all names that refer to the first value `1:10`
-> 
-> `d` is a name that refers to the *second* value of `1:10`.
-
-
-##### 2. Do the following all point to the same underlying function object? hint: `lobstr::obj_addr()` {-}
-```{r}
-obj_addr(mean)
-obj_addr(base::mean)
-obj_addr(get("mean"))
-obj_addr(evalq(mean))
-obj_addr(match.fun("mean"))
-```
-
-> Yes!
-
-## Copy-on-modify {-}
-
-- If you modify a value bound to multiple names, it is 'copy-on-modify'
-- If you modify a value bound to a single name, it is 'modify-in-place'
-- Use `tracemem()` to see when a name's value changes
-
-```{r}
-x <- c(1, 2, 3)
-cat(tracemem(x), "\n")
-```
-
-```{r}
-y <- x
-y[[3]] <- 4L  # Changes (copy-on-modify)
-y[[3]] <- 5L  # Doesn't change (modify-in-place)
-```
-
-Turn off `tracemem()` with `untracemem()`
-
-> Can also use `ref(x)` to get the address of the value bound to a given name
-
-
-## Functions {-}
-
-- Copying also applies within functions
-- If you copy (but don't modify) `x` within `f()`, no copy is made
-
-```{r}
-f <- function(a) {
-  a
-}
-
-x <- c(1, 2, 3)
-z <- f(x) # No change in value
-
-ref(x)
-ref(z)
-```
-
-<!--  -->
-
-## Lists {-}
-
-- A list overall, has it's own reference (id)
-- List *elements* also each point to other values
-- List doesn't store the value, it *stores a reference to the value*
-- As of R 3.1.0, modifying lists creates a *shallow copy*
-    - References (bindings) are copied, but *values are not*
-
-```{r}
-l1 <- list(1, 2, 3)
-l2 <- l1
-l2[[3]] <- 4
-```
-
-- We can use `ref()` to see how they compare
-  - See how the list reference is different
-  - But first two items in each list are the same
-
-```{r}
-ref(l1, l2)
-```
-
-{width=50%}
-
-## Data Frames {-}
-
-- Data frames are lists of vectors
-- So copying and modifying a column *only affects that column*
-- **BUT** if you modify a *row*, every column must be copied
-
-```{r}
-d1 <- data.frame(x = c(1, 5, 6), y = c(2, 4, 3))
-d2 <- d1
-d3 <- d1
-```
-
-Only the modified column changes
-```{r}
-d2[, 2] <- d2[, 2] * 2
-ref(d1, d2)
-```
-
-All columns change
-```{r}
-d3[1, ] <- d3[1, ] * 3
-ref(d1, d3)
-```
-
-## Character vectors {-}
-
-- R has a **global string pool**
-- Elements of character vectors point to unique strings in the pool
-
-```{r}
-x <- c("a", "a", "abc", "d")
-```
-
-
-
-## Exercises {-}
-
-##### 1. Why is `tracemem(1:10)` not useful? {-}
-
-> Because it tries to trace a value that is not bound to a name
-
-##### 2. Why are there two copies? {-}
-```{r}
-x <- c(1L, 2L, 3L)
-tracemem(x)
-x[[3]] <- 4
-```
-
-> Because we convert an *integer* vector (using 1L, etc.) to a *double* vector (using just 4)- 
-
-##### 3. What is the relationships among these objects? {-}
-
-```{r}
-a <- 1:10      
-b <- list(a, a)
-c <- list(b, a, 1:10) # 
-```
-
-a <- obj 1    
-b <- obj 1, obj 1    
-c <- b(obj 1, obj 1), obj 1, 1:10    
-
-```{r}
-ref(c)
-```
-
-
-##### 4. What happens here? {-}
-```{r}
-x <- list(1:10)
-x[[2]] <- x
-```
-
-- `x` is a list
-- `x[[2]] <- x` creates a new list, which in turn contains a reference to the 
-  original list
-- `x` is no longer bound to `list(1:10)`
-
-```{r}
-ref(x)
-```
-
-{width=50%}
-
-## Object Size {-}
-
-- Use `lobstr::obj_size()` 
-- Lists may be smaller than expected because of referencing the same value
-- Strings may be smaller than expected because using global string pool
-- Difficult to predict how big something will be
-  - Can only add sizes together if they share no references in common
-
-### Alternative Representation {-}
-- As of R 3.5.0 - ALTREP
-- Represent some vectors compactly
-    - e.g., 1:1000 - not 10,000 values, just 1 and 1,000
-
-### Exercises {-}
-
-##### 1. Why are the sizes so different? {-}
-
-```{r}
-y <- rep(list(runif(1e4)), 100)
-
-object.size(y) # ~8000 kB
-obj_size(y)    # ~80   kB
-```
-
-> From `?object.size()`: 
-> 
-> "This function merely provides a rough indication: it should be reasonably accurate for atomic vectors, but **does not detect if elements of a list are shared**, for example.
-
-##### 2. Why is the size misleading? {-}
-
-```{r}
-funs <- list(mean, sd, var)
-obj_size(funs)
-```
-
-> Because they reference functions from base and stats, which are always available.
-> Why bother looking at the size? What use is that?
-
-##### 3. Predict the sizes {-}
-
-```{r}
-a <- runif(1e6) # 8 MB
-obj_size(a)
-```
-
-
-```{r}
-b <- list(a, a)
-```
-
-- There is one value ~8MB
-- `a` and `b[[1]]` and `b[[2]]` all point to the same value.
-
-```{r}
-obj_size(b)
-obj_size(a, b)
-```
-
-
-```{r}
-b[[1]][[1]] <- 10
-```
-- Now there are two values ~8MB each (16MB total)
-- `a` and `b[[2]]` point to the same value (8MB)
-- `b[[1]]` is new (8MB) because the first element (`b[[1]][[1]]`) has been changed
-
-```{r}
-obj_size(b)     # 16 MB (two values, two element references)
-obj_size(a, b)  # 16 MB (a & b[[2]] point to the same value)
-```
-
-
-```{r}
-b[[2]][[1]] <- 10
-```
-- Finally, now there are three values ~8MB each (24MB total)
-- Although `b[[1]]` and `b[[2]]` have the same contents, 
-  they are not references to the same object.
-
-```{r}
-obj_size(b)
-obj_size(a, b)
-```
-
-
-## Modify-in-place {-}
-
-- Modifying usually creates a copy except for
-    - Objects with a single binding (performance optimization)
-    - Environments (special)
-
-### Objects with a single binding {-}
-
-- Hard to know if copy will occur
-- If you have 2+ bindings and remove them, R can't follow how many are removed (so will always think there are more than one)
-- May make a copy even if there's only one binding left
-- Using a function makes a reference to it **unless it's a function based on C**
-- Best to use `tracemem()` to check rather than guess.
-
-
-#### Example - lists vs. data frames in for loop {-}
-
-**Setup**  
-
-Create the data to modify
-```{r}
-x <- data.frame(matrix(runif(5 * 1e4), ncol = 5))
-medians <- vapply(x, median, numeric(1))
-```
-
-
-**Data frame - Copied every time!**
-```{r}
-cat(tracemem(x), "\n")
-for (i in seq_along(medians)) {
-  x[[i]] <- x[[i]] - medians[[i]]
-}
-untracemem(x)
-```
-
-**List (uses internal C code) - Copied once!**
-```{r}
-y <- as.list(x)
-
-cat(tracemem(y), "\n")
-for (i in seq_along(medians)) {
-  y[[i]] <- y[[i]] - medians[[i]]
-}
-untracemem(y)
-```
-
-#### Benchmark this (Exercise #2) {-}
-
-**First wrap in a function**
-```{r}
-med <- function(d, medians) {
-  for (i in seq_along(medians)) {
-    d[[i]] <- d[[i]] - medians[[i]]
-  }
-}
-```
-
-**Try with 5 columns**
-```{r}
-x <- data.frame(matrix(runif(5 * 1e4), ncol = 5))
-medians <- vapply(x, median, numeric(1))
-y <- as.list(x)
-
-bench::mark(
-  "data.frame" = med(x, medians),
-  "list" = med(y, medians)
-)
-```
-
-**Try with 20 columns**
-```{r}
-x <- data.frame(matrix(runif(5 * 1e4), ncol = 20))
-medians <- vapply(x, median, numeric(1))
-y <- as.list(x)
-
-bench::mark(
-  "data.frame" = med(x, medians),
-  "list" = med(y, medians)
-)
-```
-
-**WOW!**
-
-
-### Environmments {-}
-- Always modified in place (**reference semantics**)
-- Interesting because if you modify the environment, all existing bindings have the same reference
-- If two names point to the same environment, and you update one, you update both!
-
-```{r}
-e1 <- rlang::env(a = 1, b = 2, c = 3)
-e2 <- e1
-e1$c <- 4
-e2$c
-```
-
-- This means that environments can contain themselves (!)
-
-### Exercises {-}
-
-##### 1. Why isn't this circular? {-}
-```{r}
-x <- list()
-x[[1]] <- x
-```
-
-> Because the binding to the list() object moves from `x` in the first line to `x[[1]]` in the second.
-
-##### 2. (see "Objects with a single binding") {-}
-
-##### 3. What happens if you attempt to use tracemem() on an environment? {-}
-
-```{r}
-#| error: true
-e1 <- rlang::env(a = 1, b = 2, c = 3)
-tracemem(e1)
-```
-
-> Because environments always modified in place, there's no point in tracing them
-
-
-## Unbinding and the garbage collector {-}
-
-- If you delete the 'name' bound to an object, the object still exists
-- R runs a "garbage collector" (GC) to remove these objects when it needs more memory
-- "Looking from the outside, it’s basically impossible to predict when the GC will run. In fact, you shouldn’t even try."
-- If you want to know when it runs, use `gcinfo(TRUE)` to get a message printed
-- You can force GC with `gc()` but you never need to to use more memory *within* R
-- Only reason to do so is to free memory for other system software, or, to get the
-message printed about how much memory is being used
-
-```{r}
-gc()
-mem_used()
-```
-
-- These numbers will **not** be what you OS tells you because, 
-  1. It includes objects created by R, but not R interpreter
-  2. R and OS are lazy and don't reclaim/release memory until it's needed
-  3. R counts memory from objects, but there are gaps due to those that are deleted -> 
-  *memory fragmentation* [less memory actually available they you might think]
-
-
-## Meeting Videos {-}
-
-### Cohort 1 {-}
-
-(no video recorded)
-
-### Cohort 2 {-}
-
-`r knitr::include_url("https://www.youtube.com/embed/pCiNj2JRK50")`
-
-### Cohort 3 {-}
-
-`r knitr::include_url("https://www.youtube.com/embed/-bEXdOoxO_E")`
-
-### Cohort 4 {-}
-
-`r knitr::include_url("https://www.youtube.com/embed/gcVU_F-L6zY")`
-
-### Cohort 5 {-}
-
-`r knitr::include_url("https://www.youtube.com/embed/aqcvKox9V0Q")`
-
-### Cohort 6 {-}
-
-`r knitr::include_url("https://www.youtube.com/embed/O4Oo_qO7SIY")`
-
-<details>
-<summary> Meeting chat log </summary>
-
-```
-00:16:57	Federica Gazzelloni:	cohort 2 video: https://www.youtube.com/watch?v=pCiNj2JRK50
-00:18:39	Federica Gazzelloni:	cohort 2 presentation: https://r4ds.github.io/bookclub-Advanced_R/Presentations/Week02/Cohort2_America/Chapter2Slides.html#1
-00:40:24	Arthur Shaw:	Just the opposite, Ryan. Very clear presentation!
-00:51:54	Trevin:	parquet?
-00:53:00	Arthur Shaw:	We may all be right. {arrow} looks to deal with feather and parquet files: https://arrow.apache.org/docs/r/
-01:00:04	Arthur Shaw:	Some questions for future meetings. (1) I find Ryan's use of slides hugely effective in conveying information. Would it be OK if future sessions (optionally) used slides? If so, should/could we commit slides to some folder on the repo? (2) I think reusing the images from Hadley's books really helps understanding and discussion. Is that OK to do? Here I'm thinking about copyright concerns. (If possible, I would rather not redraw variants of Hadley's images.)
-01:01:35	Federica Gazzelloni:	It's all ok, you can use past presentation, you don't need to push them to the repo, you can use the images from the book
-01:07:19	Federica Gazzelloni:	Can I use: gc(reset = TRUE) safely?
-```
-</details>
-
-### Cohort 7 {-}
-
-`r knitr::include_url("https://www.youtube.com/embed/kpAUoGO6elE")`
-
-<details>
-
-<summary>Meeting chat log</summary>
-```
-00:09:40	Ryan Honomichl:	https://drdoane.com/three-deep-truths-about-r/
-00:12:51	Robert Hilly:	Be right back
-00:36:12	Ryan Honomichl:	brb
-00:41:18	Ron:	I tried mapply and also got different answers
-00:41:44	collinberke:	Interesting, would like to know more what is going on.
-00:49:57	Robert Hilly:	simple_map <- function(x, f, ...) {
-  out <- vector("list", length(x))
-  for (i in seq_along(x)) {
-    out[[i]] <- f(x[[i]], ...)
-  }
-  out
-}
-```
-</details>
diff --git a/styles.css b/styles.css
@@ -0,0 +1,3 @@
+#listing-videos .listing-item-img-placeholder {
+  display: none;
+}
diff --git a/videos/01.qmd b/videos/01.qmd
@@ -1,67 +0,0 @@
----
-title: Meeting Videos
----
-
-## Cohort 1
-
-(no video recorded)
-
-## Cohort 2
-
-{{< video https://www.youtube.com/embed/PCG52lU_YlA >}}
-
-## Cohort 3
-
-{{< video https://www.youtube.com/embed/f6PuOnuZWBc >}}
-
-## Cohort 4
-
-{{< video https://www.youtube.com/embed/qDaJvX-Mpls >}}
-
-## Cohort 5
-
-{{< video https://www.youtube.com/embed/BvmiQlWOP5o >}}
-
-## Cohort 6
-
-{{< video https://www.youtube.com/embed/dH72riiXrVI >}}
-
-<details>
-<summary> Meeting chat log </summary>
-
-```
-00:14:40	SriRam:	From Toronto, Civil Engineer. I use R for infrastructure planning/ GIS. Here coz of the ping 😄 , was not ready with a good computer with mic/audio !
-00:15:20	SriRam:	I was with Ryan, Federica on other courses
-00:23:21	SriRam:	I think the only caution is about Copyright issues
-00:31:32	Ryan Metcalf:	Citation, giving credit back to source. Great comment SriRam.
-00:34:33	SriRam:	one = one, in my opinion
-00:41:53	Ryan Metcalf:	https://docs.google.com/spreadsheets/d/1_WFY82UxAdvP4GUdZ2luh15quwdO1n0Km3Q0tfYuqvc/edit#gid=0
-00:48:35	Arthur Shaw:	The README has a nice step-by-step process at the bottom: https://github.com/r4ds/bookclub-advr#how-to-present. I've not done this myself yet, but it looks fairly straightforward.
-00:54:13	lucus w:	Thanks Ryan. Probably {usethis} will be easier. It looks straight forward
-01:00:02	Moria W.:	Thank you for sharing that. This has been good!
-01:00:08	Vaibhav Janve:	Thank you
-01:00:44	Federica Gazzelloni:	hi SriRam we are going..
-```
-</details>
-
-## Cohort 7
-
-{{< video https://www.youtube.com/embed/vfTg6upHvO4 >}}
-{{< video https://www.youtube.com/embed/3wRyE6-3OKQ >}}
-
-<details>
-
-<summary>Meeting chat log</summary>
-```
-00:20:42	collinberke:	https://rich-iannone.github.io/pointblank/
-00:27:36	Ryan Honomichl:	brb
-00:37:05	collinberke:	https://rstudio.github.io/renv/articles/renv.html
-00:51:52	Ryan Honomichl:	gotta sign off I'll be ready to lead chapter 2 next week!
-00:52:43	collinberke:	https://r4ds.had.co.nz/iteration.html
-00:59:44	collinberke:	https://mastering-shiny.org/action-tidy.html
-01:00:12	collinberke:	https://dplyr.tidyverse.org/articles/programming.html
-01:05:02	collinberke:	https://usethis.r-lib.org/reference/create_from_github.html
-01:05:53	collinberke:	https://github.com/r4ds/bookclub-advr
-01:06:28	Ron:	I gotta run ,  fun conversation, and nice to meet you Matthew !
-```
-</details>
diff --git a/videos/01/02.qmd b/videos/01/02.qmd
@@ -0,0 +1,5 @@
+---
+title: Cohort 2
+---
+
+{{< video https://www.youtube.com/embed/PCG52lU_YlA >}}
diff --git a/videos/01/03.qmd b/videos/01/03.qmd
@@ -0,0 +1,5 @@
+---
+title: Cohort 3
+---
+
+{{< video https://www.youtube.com/embed/f6PuOnuZWBc >}}
diff --git a/videos/01/04.qmd b/videos/01/04.qmd
@@ -0,0 +1,5 @@
+---
+title: Cohort 4
+---
+
+{{< video https://www.youtube.com/embed/qDaJvX-Mpls >}}
diff --git a/videos/01/05.qmd b/videos/01/05.qmd
@@ -0,0 +1,5 @@
+---
+title: Cohort 5
+---
+
+{{< video https://www.youtube.com/embed/BvmiQlWOP5o >}}
diff --git a/videos/01/06.qmd b/videos/01/06.qmd
@@ -0,0 +1,23 @@
+---
+title: Cohort 6
+---
+
+{{< video https://www.youtube.com/embed/dH72riiXrVI >}}
+
+<details>
+<summary> Meeting chat log </summary>
+
+```
+00:14:40	SriRam:	From Toronto, Civil Engineer. I use R for infrastructure planning/ GIS. Here coz of the ping 😄 , was not ready with a good computer with mic/audio !
+00:15:20	SriRam:	I was with Ryan, Federica on other courses
+00:23:21	SriRam:	I think the only caution is about Copyright issues
+00:31:32	Ryan Metcalf:	Citation, giving credit back to source. Great comment SriRam.
+00:34:33	SriRam:	one = one, in my opinion
+00:41:53	Ryan Metcalf:	https://docs.google.com/spreadsheets/d/1_WFY82UxAdvP4GUdZ2luh15quwdO1n0Km3Q0tfYuqvc/edit#gid=0
+00:48:35	Arthur Shaw:	The README has a nice step-by-step process at the bottom: https://github.com/r4ds/bookclub-advr#how-to-present. I've not done this myself yet, but it looks fairly straightforward.
+00:54:13	lucus w:	Thanks Ryan. Probably {usethis} will be easier. It looks straight forward
+01:00:02	Moria W.:	Thank you for sharing that. This has been good!
+01:00:08	Vaibhav Janve:	Thank you
+01:00:44	Federica Gazzelloni:	hi SriRam we are going..
+```
+</details>
diff --git a/videos/01/07.qmd b/videos/01/07.qmd
@@ -0,0 +1,23 @@
+---
+title: Cohort 7
+---
+
+{{< video https://www.youtube.com/embed/vfTg6upHvO4 >}}
+{{< video https://www.youtube.com/embed/3wRyE6-3OKQ >}}
+
+<details>
+
+<summary>Meeting chat log</summary>
+```
+00:20:42	collinberke:	https://rich-iannone.github.io/pointblank/
+00:27:36	Ryan Honomichl:	brb
+00:37:05	collinberke:	https://rstudio.github.io/renv/articles/renv.html
+00:51:52	Ryan Honomichl:	gotta sign off I'll be ready to lead chapter 2 next week!
+00:52:43	collinberke:	https://r4ds.had.co.nz/iteration.html
+00:59:44	collinberke:	https://mastering-shiny.org/action-tidy.html
+01:00:12	collinberke:	https://dplyr.tidyverse.org/articles/programming.html
+01:05:02	collinberke:	https://usethis.r-lib.org/reference/create_from_github.html
+01:05:53	collinberke:	https://github.com/r4ds/bookclub-advr
+01:06:28	Ron:	I gotta run ,  fun conversation, and nice to meet you Matthew !
+```
+</details>