bookclub-advr

DSLC Advanced R Book Club
git clone https://git.eamoncaddigan.net/bookclub-advr.git
Log | Files | Refs | README | LICENSE

commit 337e4db47cd95fe263091f095e4b96368ad16d40
parent 7d2c3f9c8a241036a80ae4b406d37487a7ae0a73
Author: Jon Harmon <jonthegeek@gmail.com>
Date:   Sun,  3 Aug 2025 10:32:14 -0500

Convert to quarto (partially) (#78)


Diffstat:
M.Rbuildignore | 1+
A.github/workflows/build_site.yml | 36++++++++++++++++++++++++++++++++++++
D.github/workflows/deploy_bookdown.yml | 10----------
M.github/workflows/pr_check.yml | 24+++++++++++++++++++++---
M.github/workflows/pr_check_readme.yml | 6++++--
M.gitignore | 13++++++-------
D01_Introduction.Rmd | 239-------------------------------------------------------------------------------
D05_Control_flow.Rmd | 487-------------------------------------------------------------------------------
D08_Conditions.Rmd | 549-------------------------------------------------------------------------------
D10_Function_factories.Rmd | 672-------------------------------------------------------------------------------
D13_S3.Rmd | 417-------------------------------------------------------------------------------
D15_S4.Rmd | 360-------------------------------------------------------------------------------
D24_Improving_performance.Rmd | 290-------------------------------------------------------------------------------
MDESCRIPTION | 7++++---
MREADME.md | 18++++++++++--------
D_bookdown.yml | 8--------
D_output.yml | 17-----------------
A_quarto.yml | 96+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Dbook.bib | 10----------
Mbookclub-advr.Rproj | 2--
Dbookclub-advr_cache/html/__packages | 25-------------------------
Dbookclub-advr_cache/html/unnamed-chunk-382_772128e708c11feeae6c581e167c32a4.RData | 0
Dbookclub-advr_cache/html/unnamed-chunk-382_772128e708c11feeae6c581e167c32a4.rdb | 0
Dbookclub-advr_cache/html/unnamed-chunk-382_772128e708c11feeae6c581e167c32a4.rdx | 0
Dbookclub-advr_cache/html/unnamed-chunk-392_0d65042b46d439c7ecde1d7c4bf84de4.RData | 0
Dbookclub-advr_cache/html/unnamed-chunk-392_0d65042b46d439c7ecde1d7c4bf84de4.rdb | 0
Dbookclub-advr_cache/html/unnamed-chunk-392_0d65042b46d439c7ecde1d7c4bf84de4.rdx | 0
Dbookclub-advr_cache/html/unnamed-chunk-485_24d0d80a65ab19bc80ef07efabb9b0d2.RData | 0
Dbookclub-advr_cache/html/unnamed-chunk-485_24d0d80a65ab19bc80ef07efabb9b0d2.rdb | 0
Dbookclub-advr_cache/html/unnamed-chunk-485_24d0d80a65ab19bc80ef07efabb9b0d2.rdx | 0
Dbookclub-advr_cache/html/unnamed-chunk-535_b3e7c9558026489184160fc00b8ca8a9.RData | 0
Dbookclub-advr_cache/html/unnamed-chunk-535_b3e7c9558026489184160fc00b8ca8a9.rdb | 0
Dbookclub-advr_cache/html/unnamed-chunk-535_b3e7c9558026489184160fc00b8ca8a9.rdx | 0
Dindex.Rmd | 87-------------------------------------------------------------------------------
Aindex.qmd | 16++++++++++++++++
Dpreamble.tex | 1-
Aslides/00-club_intro.qmd | 53+++++++++++++++++++++++++++++++++++++++++++++++++++++
Aslides/01-introduction.qmd | 105+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
R02_Names_and_values.Rmd -> slides/02_Names_and_values.Rmd | 0
R03_Vectors.Rmd -> slides/03_Vectors.Rmd | 0
R04_Subsetting.Rmd -> slides/04_Subsetting.Rmd | 0
Aslides/05_Control_flow.Rmd | 487+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
R06_Functions.Rmd -> slides/06_Functions.Rmd | 0
R07_Environments.Rmd -> slides/07_Environments.Rmd | 0
Aslides/08_Conditions.Rmd | 549+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
R09_Functionals.Rmd -> slides/09_Functionals.Rmd | 0
Aslides/10_Function_factories.Rmd | 672+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
R11_Function_operators.Rmd -> slides/11_Function_operators.Rmd | 0
R12_Base_types.Rmd -> slides/12_Base_types.Rmd | 0
Aslides/13_S3.Rmd | 417+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
R14_R6.Rmd -> slides/14_R6.Rmd | 0
Aslides/15_S4.Rmd | 360+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
R16_Trade-offs.Rmd -> slides/16_Trade-offs.Rmd | 0
R17_Big_picture.Rmd -> slides/17_Big_picture.Rmd | 0
R18_Expressions.Rmd -> slides/18_Expressions.Rmd | 0
R19_Quasiquotation.Rmd -> slides/19_Quasiquotation.Rmd | 0
R20_Evaluation.Rmd -> slides/20_Evaluation.Rmd | 0
R21_Translating_R_code.Rmd -> slides/21_Translating_R_code.Rmd | 0
R22_Debugging.Rmd -> slides/22_Debugging.Rmd | 0
R23_Measuring_performance.Rmd -> slides/23_Measuring_performance.Rmd | 0
Aslides/24_Improving_performance.Rmd | 290+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
R25_Rewriting_R_code_in_C++.Rmd -> slides/25_Rewriting_R_code_in_C++.Rmd | 0
Aslides/_metadata.yml | 16++++++++++++++++
Aslides/custom.scss | 31+++++++++++++++++++++++++++++++
Rimages/01-hadley-image1.jpeg -> slides/images/01-hadley-image1.jpeg | 0
Rimages/01-hadley-image2.jpeg -> slides/images/01-hadley-image2.jpeg | 0
Rimages/02-character-2.png -> slides/images/02-character-2.png | 0
Rimages/02-copy_on_modify_fig2.png -> slides/images/02-copy_on_modify_fig2.png | 0
Rimages/02-l-modify-2.png -> slides/images/02-l-modify-2.png | 0
Rimages/02-trace.png -> slides/images/02-trace.png | 0
Rimages/06_forms.png -> slides/images/06_forms.png | 0
Rimages/06_functions.png -> slides/images/06_functions.png | 0
Rimages/10-1-factories.png -> slides/images/10-1-factories.png | 0
Rimages/10-2-how.jpg -> slides/images/10-2-how.jpg | 0
Rimages/10-3-procedure.png -> slides/images/10-3-procedure.png | 0
Rimages/11-function_operators.png -> slides/images/11-function_operators.png | 0
Rimages/11-maths_example.png -> slides/images/11-maths_example.png | 0
Rimages/14-four-pillars.png -> slides/images/14-four-pillars.png | 0
Rimages/14-r6-logo.png -> slides/images/14-r6-logo.png | 0
Rimages/14-r6_active_field.png -> slides/images/14-r6_active_field.png | 0
Rimages/14-r6_environment.png -> slides/images/14-r6_environment.png | 0
Rimages/16-objects.png -> slides/images/16-objects.png | 0
Rimages/16-oop.png -> slides/images/16-oop.png | 0
Rimages/16-trade-offs.png -> slides/images/16-trade-offs.png | 0
Rimages/23_code_faster.jpeg -> slides/images/23_code_faster.jpeg | 0
Rimages/23_microbenchmarking.jpeg -> slides/images/23_microbenchmarking.jpeg | 0
Rimages/9_2_3_map-arg.png -> slides/images/9_2_3_map-arg.png | 0
Rimages/9_5_1-reduce.png -> slides/images/9_5_1-reduce.png | 0
Rimages/ambig-order.png -> slides/images/ambig-order.png | 0
Rimages/base_types/base_types_Sankey_graph.png -> slides/images/base_types/base_types_Sankey_graph.png | 0
Rimages/base_types/john_chambers_about_objects.png -> slides/images/base_types/john_chambers_about_objects.png | 0
Rimages/base_types/sloop_john_b.png -> slides/images/base_types/sloop_john_b.png | 0
Rimages/base_types/standards.png -> slides/images/base_types/standards.png | 0
Rimages/browse.png -> slides/images/browse.png | 0
Rimages/browser.png -> slides/images/browser.png | 0
Rimages/browser2.png -> slides/images/browser2.png | 0
Rimages/call-call.png -> slides/images/call-call.png | 0
Rimages/case_study.jpg -> slides/images/case_study.jpg | 0
Rimages/complicated.png -> slides/images/complicated.png | 0
Rimages/debug-toolbar.png -> slides/images/debug-toolbar.png | 0
Rimages/fa.png -> slides/images/fa.png | 0
Rimages/forloop.png -> slides/images/forloop.png | 0
Rimages/lazy-evaluation.png -> slides/images/lazy-evaluation.png | 0
Rimages/locating-errors.png -> slides/images/locating-errors.png | 0
Rimages/map_variants.png -> slides/images/map_variants.png | 0
Rimages/non-interractive-debugging.png -> slides/images/non-interractive-debugging.png | 0
Rimages/options.png -> slides/images/options.png | 0
Rimages/pmap.png -> slides/images/pmap.png | 0
Rimages/prefix.png -> slides/images/prefix.png | 0
Rimages/print-debug.png -> slides/images/print-debug.png | 0
Rimages/print-debugging.png -> slides/images/print-debugging.png | 0
Rimages/print-recover.png -> slides/images/print-recover.png | 0
Rimages/recover.png -> slides/images/recover.png | 0
Rimages/reduce-init.png -> slides/images/reduce-init.png | 0
Rimages/reduce2-init.png -> slides/images/reduce2-init.png | 0
Rimages/show-traceback.png -> slides/images/show-traceback.png | 0
Rimages/simple.png -> slides/images/simple.png | 0
Rimages/subsetting/hadley-tweet.png -> slides/images/subsetting/hadley-tweet.png | 0
Rimages/subsetting/train-1.png -> slides/images/subsetting/train-1.png | 0
Rimages/subsetting/train-2.png -> slides/images/subsetting/train-2.png | 0
Rimages/subsetting/train-3.png -> slides/images/subsetting/train-3.png | 0
Rimages/trace-env.png -> slides/images/trace-env.png | 0
Rimages/traceback.png -> slides/images/traceback.png | 0
Rimages/translating/calculus_cat.png -> slides/images/translating/calculus_cat.png | 0
Rimages/translating/greek_letters.txt -> slides/images/translating/greek_letters.txt | 0
Rimages/translating/tags.txt -> slides/images/translating/tags.txt | 0
Rimages/translating/tags_r_venn.png -> slides/images/translating/tags_r_venn.png | 0
Rimages/vectors/atomic.png -> slides/images/vectors/atomic.png | 0
Rimages/vectors/attr-names-1.png -> slides/images/vectors/attr-names-1.png | 0
Rimages/vectors/attr-names-2.png -> slides/images/vectors/attr-names-2.png | 0
Rimages/vectors/attr.png -> slides/images/vectors/attr.png | 0
Rimages/vectors/bayes_rules_textbook.png -> slides/images/vectors/bayes_rules_textbook.png | 0
Rimages/vectors/culmen_depth.png -> slides/images/vectors/culmen_depth.png | 0
Rimages/vectors/lter_penguins.png -> slides/images/vectors/lter_penguins.png | 0
Rimages/vectors/lter_penguins_no_gentoo.png -> slides/images/vectors/lter_penguins_no_gentoo.png | 0
Rimages/vectors/summary-tree-atomic.png -> slides/images/vectors/summary-tree-atomic.png | 0
Rimages/vectors/summary-tree-s3-1.png -> slides/images/vectors/summary-tree-s3-1.png | 0
Rimages/vectors/summary-tree-s3-2.png -> slides/images/vectors/summary-tree-s3-2.png | 0
Rimages/vectors/summary-tree.png -> slides/images/vectors/summary-tree.png | 0
Rimages/vectors/surly_tibbles.png -> slides/images/vectors/surly_tibbles.png | 0
Rimages/view_expr.png -> slides/images/view_expr.png | 0
Rimages/view_expression.png -> slides/images/view_expression.png | 0
Rimages/walk.png -> slides/images/walk.png | 0
Rimages/walk2.png -> slides/images/walk2.png | 0
Rimages/we-did-it-celebration-meme.jpg -> slides/images/we-did-it-celebration-meme.jpg | 0
Rimages/whatif.png -> slides/images/whatif.png | 0
Rimages/whatif2.png -> slides/images/whatif2.png | 0
Rimages/with_abort.png -> slides/images/with_abort.png | 0
Rscripts/16-example_accumulator_programming.R -> slides/scripts/16-example_accumulator_programming.R | 0
Rscripts/profiling-example.R -> slides/scripts/profiling-example.R | 0
Astyles.css | 0
Avideos/01.qmd | 67+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
152 files changed, 3241 insertions(+), 3197 deletions(-)

diff --git a/.Rbuildignore b/.Rbuildignore @@ -1,2 +1,3 @@ ^.*\.Rproj$ ^\.Rproj\.user$ +^\.github$ diff --git a/.github/workflows/build_site.yml b/.github/workflows/build_site.yml @@ -0,0 +1,36 @@ +on: + push: + branches: main + paths-ignore: + - 'README.md' + workflow_dispatch: + +name: Render and deploy site + +jobs: + render: + if: startsWith(github.repository, 'r4ds/') + runs-on: ubuntu-latest + permissions: + contents: write + steps: + - name: Check out repository + uses: actions/checkout@v4 + + - name: Set up Quarto + uses: quarto-dev/quarto-actions/setup@v2 + + - name: Set up R + uses: r-lib/actions/setup-r@v2 + with: + use-public-rspm: true + + - name: Set up DESCRIPTION dependencies + uses: r-lib/actions/setup-r-dependencies@v2 + + - name: Render & Publish + uses: quarto-dev/quarto-actions/publish@v2 + with: + target: gh-pages + env: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} diff --git a/.github/workflows/deploy_bookdown.yml b/.github/workflows/deploy_bookdown.yml @@ -1,10 +0,0 @@ -on: - push: - branches: main - paths-ignore: - - 'README.md' - workflow_dispatch: - -jobs: - bookdown: - uses: r4ds/r4dsactions/.github/workflows/render_pages.yml@main diff --git a/.github/workflows/pr_check.yml b/.github/workflows/pr_check.yml @@ -3,8 +3,26 @@ on: branches: main paths-ignore: - 'README.md' - workflow_dispatch: jobs: - pr_check: - uses: r4ds/r4dsactions/.github/workflows/render_check.yml@main + render_site: + runs-on: ubuntu-latest + steps: + - name: Check out repository + uses: actions/checkout@v4 + + - name: Set up Quarto + uses: quarto-dev/quarto-actions/setup@v2 + + - name: Set up R + uses: r-lib/actions/setup-r@v2 + with: + use-public-rspm: true + + - name: Set up DESCRIPTION dependencies + uses: r-lib/actions/setup-r-dependencies@v2 + + - name: Render + uses: quarto-dev/quarto-actions/render@v2 + with: + to: html diff --git a/.github/workflows/pr_check_readme.yml b/.github/workflows/pr_check_readme.yml @@ -6,5 +6,7 @@ on: workflow_dispatch: jobs: - pr_check: - uses: r4ds/r4dsactions/.github/workflows/render_check_readme.yml@main + render_site: + runs-on: ubuntu-latest + steps: + - run: 'echo "No build required" ' diff --git a/.gitignore b/.gitignore @@ -3,11 +3,10 @@ .Rdata .httr-oauth .DS_Store -_book -_bookdown_files -bookclub-advr.Rmd -bookclub-advr.html -bookclub-advr.knit.md -bookclub-advr_files -libs + +# Ignore things during render. *.html +*.rmarkdown +/.quarto/ +/_site/ +/site_libs/ diff --git a/01_Introduction.Rmd b/01_Introduction.Rmd @@ -1,239 +0,0 @@ -# Introduction - -**Learning objectives:** - -**For the entire book:** - -- Improve programming skills. -- Develop a deep understanding of the R language fundamentals. -- Understand what functional programming means. -- Understand object-oriented programming as applied in R. -- Understand metaprogramming while developing in R. - -**For this chapter (includes the Welcome and Preface):** - -- Recognize the differences between the 1st and 2nd edition of this book. -- Describe the overall structure of the book. -- Decide whether this book is right for you. - -Books suggestions: - -- [The Structure and Interpretation of Computer Programs (SICP)](https://mitp-content-server.mit.edu/books/content/sectbyfn/books_pres_0/6515/sicp.zip/full-text/book/book.html) -- [Concepts, Techniques and Models of Computer Programming](https://mitpress.mit.edu/books/concepts-techniques-and-models-computer-programming) -- [The Pragmatic Programmer](https://pragprog.com/titles/tpp20/the-pragmatic-programmer-20th-anniversary-edition/) - -## What's new? - -> "The [first edition](http://adv-r.had.co.nz) used base R functions almost exclusively, this version of the book expands into more advanced functions provided by other pakages." - -> "Use of new packages, particularly rlang, which provides a clean interface to low-level data structures and operations." - -```{r, echo=FALSE,out.width="49%",out.height="49%",fig.show='hold',fig.align='center', fig.cap="Twitter: `@hadleywickham` - 6 June 2019"} -knitr::include_graphics(c("images/01-hadley-image1.jpeg","images/01-hadley-image2.jpeg")) -``` - -## Overview of the book structure - -* The book is composed of five sections. A step by step path towards mastering R techniques. -* The **Foundations** is the part in which the R components will be examined. It will help understanding how to use all the basics tools to deal with functions and structures. -* The **Functional programming** goes a little more in dept into programming with R, making functions of functions. Describing function factories and operators. -* The **Object-oriented programming** - OOP is a five chapter section, all about object oriented systems among S3, R6 and S4. -* The **Metaprogramming** section introduces you through the programming layers. -* Finally, the **Techniques** section is dedicated to finding and fixing bugs and improving performances. - -### What this book is not - -* This book is about the R programming language, not R as the data analysis tool. - * Other books cover this topic. - * [R for Data Science](https://r4ds.had.co.nz/). -* We will not be discussing package development at length. - * [R Packages](https://r-pkgs.org/). -* Some experience using R will be helpful. - * I encourage all to participate, though. - - -### Organization of the book - -```{r dia-lib,include=FALSE} -library(DiagrammeR) -``` -<center> -```{r c00, echo=FALSE, fig.align='center', fig.dim="100%"} -DiagrammeR(" - graph TD - A{Foundations}-->B(Functional programming) - B-->C(Object-oriented programming) - C-->D(Metaprogramming) - D-->E(Techniques) - ") -``` -</center> - -### Foundations - -Six chapters to learn the foundational components of R. -<center> -```{r c01, echo=FALSE, fig.align='center', fig.dim="100%"} -DiagrammeR(" - graph TD - A{Foundations}-->B(Names and values) - A-->C(Control flow) - C-->E(Functions) - B-->D(Vectors) - D-->F(Subsetting) - E-->G(Environment) - G-->H(Conditions) -F-->H - ") -``` -</center> -The last chapter "conditions" describe errors, warnings, and messages. - - -### Functional programming - -This part of the book is dedicated to functions: function factories and operators. - -<center> -```{r c02,echo=FALSE,fig.align='center',fig.dim="100%"} -DiagrammeR(" - graph TD - A{Functional programming}-->B(Functionals) - B-->C(Function factories) - B-->E(Function operators) - ") -``` -</center> - -### Object-oriented programming - - -OOP is the most dense part of the book, as it mentions about systems which interact with R. - -<center> -```{r c03,echo=FALSE,fig.align='center',fig.dim="100%"} -DiagrammeR(" - graph TD - A{Object-oriented programming}-->B(Base types) - B-->C(S3) - B-->E(R6) - B-->D(S4) - D-->F(Trade-offs) - E-->F - C-->F -") -``` -</center> - -### Metaprogramming - -This is the part of the book where things are blended to the **Big Picture**. R is a versatile functional language that can be managed and assembled. - -<center> -```{r c04,echo=FALSE,fig.align='center',fig.dim="100%"} -DiagrammeR(" - graph TD - A{Metaprogramming}-->B(Big Picture) - B-->C(Expressions) - B-->E(Quasiquotation) - B-->D(Evaluation) - - D-->F(Translating R code) -E-->F -C-->F - - ") -``` -</center> - - -### Techniques - -Finally, this is the last section of the book, where debugging is used to measure and improve performance. And how to improve performance by rewriting key functions in C++. - -<center> -```{r c05,echo=FALSE,fig.align='center',fig.dim="100%"} -DiagrammeR(" - graph TD - A{Techniques}-->B(Debugging) - B-->C(Measuring performance) - B-->E(Improving performance) - C-->D(Rewriting R code in C++) -E-->D - - - ") -``` -</center> - -## Resources - -- [first edition](http://adv-r.had.co.nz) -- [advanced-r-solutions](https://advanced-r-solutions.rbind.io/) - - -## Meeting Videos - -### Cohort 1 - -(no video recorded) - -### Cohort 2 - -`r knitr::include_url("https://www.youtube.com/embed/PCG52lU_YlA")` - -### Cohort 3 - -`r knitr::include_url("https://www.youtube.com/embed/f6PuOnuZWBc")` - -### Cohort 4 - -`r knitr::include_url("https://www.youtube.com/embed/qDaJvX-Mpls")` - -### Cohort 5 - -`r knitr::include_url("https://www.youtube.com/embed/BvmiQlWOP5o")` - -### Cohort 6 - -`r knitr::include_url("https://www.youtube.com/embed/dH72riiXrVI")` - -<details> -<summary> Meeting chat log </summary> - -``` -00:14:40 SriRam: From Toronto, Civil Engineer. I use R for infrastructure planning/ GIS. Here coz of the ping 😄 , was not ready with a good computer with mic/audio ! -00:15:20 SriRam: I was with Ryan, Federica on other courses -00:23:21 SriRam: I think the only caution is about Copyright issues -00:31:32 Ryan Metcalf: Citation, giving credit back to source. Great comment SriRam. -00:34:33 SriRam: one = one, in my opinion -00:41:53 Ryan Metcalf: https://docs.google.com/spreadsheets/d/1_WFY82UxAdvP4GUdZ2luh15quwdO1n0Km3Q0tfYuqvc/edit#gid=0 -00:48:35 Arthur Shaw: The README has a nice step-by-step process at the bottom: https://github.com/r4ds/bookclub-advr#how-to-present. I've not done this myself yet, but it looks fairly straightforward. -00:54:13 lucus w: Thanks Ryan. Probably {usethis} will be easier. It looks straight forward -01:00:02 Moria W.: Thank you for sharing that. This has been good! -01:00:08 Vaibhav Janve: Thank you -01:00:44 Federica Gazzelloni: hi SriRam we are going.. -``` -</details> - -### Cohort 7 - -`r knitr::include_url("https://www.youtube.com/embed/vfTg6upHvO4")` -`r knitr::include_url("https://www.youtube.com/embed/3wRyE6-3OKQ")` - -<details> - -<summary>Meeting chat log</summary> -``` -00:20:42 collinberke: https://rich-iannone.github.io/pointblank/ -00:27:36 Ryan Honomichl: brb -00:37:05 collinberke: https://rstudio.github.io/renv/articles/renv.html -00:51:52 Ryan Honomichl: gotta sign off I'll be ready to lead chapter 2 next week! -00:52:43 collinberke: https://r4ds.had.co.nz/iteration.html -00:59:44 collinberke: https://mastering-shiny.org/action-tidy.html -01:00:12 collinberke: https://dplyr.tidyverse.org/articles/programming.html -01:05:02 collinberke: https://usethis.r-lib.org/reference/create_from_github.html -01:05:53 collinberke: https://github.com/r4ds/bookclub-advr -01:06:28 Ron: I gotta run , fun conversation, and nice to meet you Matthew ! -``` -</details> diff --git a/05_Control_flow.Rmd b/05_Control_flow.Rmd @@ -1,487 +0,0 @@ -# Control flow - -**Learning objectives:** - -- Learn the **tools** for controlling flow of execution. - -- Learn some technical pitfalls and (perhaps lesser known) useful features. - -```{r echo = FALSE, fig.align = 'left', fig.dim = '100%'} -knitr::include_graphics("images/whatif2.png") -``` -```{r echo = FALSE, fig.align = 'right', fig.dim = '100%'} -knitr::include_graphics("images/forloop.png") -``` - ---- - -## Introduction - -There are two main groups of flow control tools: **choices** and **loops**: - -- Choices (`if`, `switch`, `ifelse`, `dplyr::if_else`, `dplyr::case_when`) allow you to run different code depending on the input. - -- Loops (`for`, `while`, `repeat`) allow you to repeatedly run code - - ---- - - -## Choices - - - -`if()` and `else` - -Use `if` to specify a block of code to be executed, if a specified condition is true. Use `else` to specify a block of code to be executed, if the same condition is false. - -```{r, eval=FALSE} -if (condition) true_action -if (condition) true_action else false_action -``` - -(Note braces are only *needed* for compound expressions) - -```{r eval=FALSE, include=T} -if (test_expression) { - true_action -} else { - false_action -} -``` - - -Can be expanded to more alternatives: - -```{r, eval=FALSE} -if (test_expression) { - true_action -} else if (other_test_expression) { - other_action -} else { - false_action -} -``` - - -## Exercise {-} -Why does this work? -``` -x <- 1:10 -if (length(x)) "not empty" else "empty" -#> [1] "not empty" - -x <- numeric() -if (length(x)) "not empty" else "empty" -#> [1] "empty" -``` - -`if` returns a value which can be assigned - -```{r} -x1 <- if (TRUE) 1 else 2 -x2 <- if (FALSE) 1 else 2 - -c(x1, x2) -``` - -The book recommends assigning the results of an if statement only when the entire expression fits on one line; otherwise it tends to be hard to read. - - -## Single if without else {-} - -When you use the single argument form without an else statement, if invisibly (Section 6.7.2) returns NULL if the condition is FALSE. Since functions like c() and paste() drop NULL inputs, this allows for a compact expression of certain idioms: - -```{r, eval=FALSE} -greet <- function(name, birthday = FALSE) { - paste0( - "Hi ", name, - if (birthday) " and HAPPY BIRTHDAY" - ) -} -greet("Maria", FALSE) -#> [1] "Hi Maria" -greet("Jaime", TRUE) -#> [1] "Hi Jaime and HAPPY BIRTHDAY" -``` - - - -```{r, eval=FALSE} -format_lane_text <- function(number){ - - paste0( - number, - " lane", - if (number > 1) "s", - " of sequencing" - ) -} - -format_lane_text(1) -#> [1] "1 lane of sequencing" -format_lane_text(4) -#> [1] "4 lanes of sequencing" -``` - - - - -## Invalid inputs {-} - -- *Condition* must evaluate to a *single* `TRUE` or `FALSE` - -A single number gets coerced to a logical type. - -```{r, eval=FALSE} -if (56) 1 -#> [1] 1 -if (0.3) 1 -#> [1] 1 -if (0) 1 -``` - -If the condition cannot evaluate to a *single* `TRUE` or `FALSE`, an error is (usually) produced. - -```{r, eval=FALSE} -if ("text") 1 -#> Error in if ("text") 1: argument is not interpretable as logical -if ("true") 1 -#> 1 -if (numeric()) 1 -#> Error in if (numeric()) 1: argument is of length zero -if (NULL) 1 -#> Error in if (NULL) 1 : argument is of length zero -if (NA) 1 -#> Error in if (NA) 1: missing value where TRUE/FALSE needed -``` - - -Exception is a logical vector of length greater than 1, which only generates a warning, unless you have `_R_CHECK_LENGTH_1_CONDITION_` set to `TRUE`. -This seems to have been the default since R-4.2.0 - -```{r, eval=FALSE} -if (c(TRUE, FALSE)) 1 -#>Error in if (c(TRUE, FALSE)) 1 : the condition has length > 1 -``` - -## Vectorized choices {-} - -- `ifelse()` is a vectorized version of `if`: - -```{r, eval=FALSE} -x <- 1:10 -ifelse(x %% 5 == 0, "XXX", as.character(x)) -#> [1] "1" "2" "3" "4" "XXX" "6" "7" "8" "9" "XXX" - -ifelse(x %% 2 == 0, "even", "odd") -#> [1] "odd" "even" "odd" "even" "odd" "even" "odd" "even" "odd" "even" -``` - -- `dplyr::if_else()` - -- Book recommends only using `ifelse()` "only when the yes and no vectors are the same type as it is otherwise hard to predict the output type." - -- `dplyr::if_else()` enforces this recommendation. - -**For example:** - -```{r eval=FALSE, include=T} -ifelse(c(TRUE,TRUE,FALSE),"a",3) -#> [1] "a" "a" "3" -dplyr::if_else(c(TRUE,TRUE,FALSE),"a",3) -#> Error in `dplyr::if_else()`: -#> ! `false` must be a character vector, not a double vector. -``` - -## Switch {-} - -Rather then string together multiple if - else if chains, you can often use `switch`. - - -```{r message=FALSE, warning=FALSE} -centre <- function(x, type) { - switch(type, - mean = mean(x), - median = median(x), - trimmed = mean(x, trim = .1), - stop("Invalid `type` value") - ) -} -``` - -Last component should always throw an error, as unmatched inputs would otherwise invisibly return NULL. -Book recommends to only use character inputs for `switch()`. - -```{r, eval=FALSE} -vec <- c(1:20,50:55) -centre(vec, "mean") -#> [1] 20.19231 -centre(vec, "median") -#> [1] 13.5 -centre(vec, "trimmed") -#> [1] 18.77273 -``` - -```{r, message=FALSE} -set.seed(123) -x <- rlnorm(100) - -centers <- data.frame(type = c('mean', 'median', 'trimmed')) -centers$value = sapply(centers$type, \(t){centre(x,t)}) - -require(ggplot2) -ggplot(data = data.frame(x), aes(x))+ - geom_density()+ - geom_vline(data = centers, - mapping = aes(color = type, xintercept = value), - linewidth=0.5,linetype="dashed") + - xlim(-1,10)+ - theme_bw() -``` - - -Example from book of "falling through" to next value - -```{r} -legs <- function(x) { - switch(x, - cow = , - horse = , - dog = 4, - human = , - chicken = 2, - plant = 0, - stop("Unknown input") - ) -} -legs("cow") -#> [1] 4 -legs("dog") -#> [1] 4 -``` - - - - -## Using `dplyr::case_when` {-} - -- `case_when` is a more general `if_else` and can be used often in place of multiple chained `if_else` or sapply'ing `switch`. - -- It uses a special syntax to allow any number of condition-vector pairs: - -```{r message=FALSE, warning=FALSE} -set.seed(123) -x <- rlnorm(100) - -centers <- data.frame(type = c('mean', 'median', 'trimmed')) - -centers$value = dplyr::case_when( - centers$type == 'mean' ~ mean(x), - centers$type == 'median' ~ median(x), - centers$type == 'trimmed' ~ mean(x, trim = 0.1), - .default = 1000 - ) - -centers -``` - - - -## Loops - -- Iteration over a elements of a vector - -`for (item in vector) perform_action` - -**First example** -```{r} -for(i in 1:5) { - print(1:i) -} - -x <- numeric(length=5L) -df <- data.frame(x=1:5) - -for(i in 1:5) { - df$y[[i]] <- i+1 -} -``` - - -**Second example**: terminate a *for loop* earlier - -- `next` skips rest of current iteration -- `break` exits the loop entirely - -```{r} -for (i in 1:10) { - if (i < 3) - next - - print(i) - - if (i >= 5) - break -} -``` - -## Exercise {-} - -When the following code is evaluated, what can you say about the vector being iterated? -``` -xs <- c(1, 2, 3) -for (x in xs) { - xs <- c(xs, x * 2) -} -xs -#> [1] 1 2 3 2 4 6 -``` - -## Pitfalls {-} - -- Preallocate output containers to avoid *slow* code. - -- Beware that `1:length(v)` when `v` has length 0 results in a iterating backwards over `1:0`, probably not what is intended. Use `seq_along(v)` instead. - -- When iterating over S3 vectors, use `[[]]` yourself to avoid stripping attributes. - -``` -xs <- as.Date(c("2020-01-01", "2010-01-01")) -for (x in xs) { - print(x) -} -#> [1] 18262 -#> [1] 14610 -``` -vs. -``` -for (i in seq_along(xs)) { - print(xs[[i]]) -} -#> [1] "2020-01-01" -#> [1] "2010-01-01" -``` - -## Related tools {-} - -- `while(condition) action`: performs action while condition is TRUE. -- `repeat(action)`: repeats action forever (i.e. until it encounters break). - -- Note that `for` can be rewritten as `while` and while can be rewritten as `repeat` (this goes in one direction only!); *however*: - ->Good practice is to use the least-flexible solution to a problem, so you should use `for` wherever possible. -BUT you shouldn't even use for loops for data analysis tasks as `map()` and `apply()` already provide *less flexible* solutions to most problems. (More in Chapter 9.) - -```{r} -for (i in 1:5) { - print(i) -} - - -``` - -```{r} - -x_option <- function(x) { - switch(x, - a = "option 1", - b = "option 2", - c = "option 3"#, - #stop("Invalid `x` value") - ) -} - -``` - - - -```{r} -i <- 1 - -while(i <=5 ) { - print(i) - i <- i+1 -} -``` - -```{r} -i <- 1 - -repeat { - print(i) - i <- i+1 - if (i > 5) break -} - -``` - - ---- - - - - -## Meeting Videos - -### Cohort 1 - -`r knitr::include_url("https://www.youtube.com/embed/96eY6YS_3hU")` - -### Cohort 2 - -`r knitr::include_url("https://www.youtube.com/embed/x5I_uHnMxIk")` - -### Cohort 3 - -`r knitr::include_url("https://www.youtube.com/embed/u6UMGWDuxDE")` - -### Cohort 4 - -`r knitr::include_url("https://www.youtube.com/embed/G4YOvwsSw2Q")` - -### Cohort 5 - -`r knitr::include_url("https://www.youtube.com/embed/AZwJjsl8xiI")` - -### Cohort 6 - -`r knitr::include_url("https://www.youtube.com/embed/wg2QZ3rMIqM")` - -<details> -<summary> Meeting chat log </summary> - -``` -00:16:34 Federica Gazzelloni: https://github.com/r4ds/bookclub-Advanced_R -00:22:28 Federica Gazzelloni: https://stackoverflow.com/questions/50646133/dplyr-if-else-vs-base-r-ifelse -00:26:20 Trevin: case_when() is great, makes it easy to read -00:54:01 Trevin: out[I, ] -00:54:14 Trevin: out[i, ] -00:55:03 Trevin: I think you have to specify number of rows and columns before.. -00:55:30 Trevin: iterations = 10 - variables = 2 - - output <- matrix(ncol=variables, nrow=iterations) -00:55:43 Trevin: https://stackoverflow.com/questions/13442461/populating-a-data-frame-in-r-in-a-loop -``` -</details> - -### Cohort 7 - -`r knitr::include_url("https://www.youtube.com/embed/W9CoQ15NlOc")` - -<details> - -<summary>Meeting chat log</summary> -``` -00:40:18 Ryan Honomichl: What type of vector does each of the following calls to ifelse() return? - -* "ifelse returns a value with the same shape as test which is filled with elements selected from either yes or no depending on whether the element of test is TRUE or FALSE." -00:42:11 Ryan Honomichl: "I recommend assigning the results of an if statement only when the entire expression fits on one line; otherwise it tends to be hard to read" -00:42:46 Ryan Honomichl: * When you use the single argument form without an `else` statement, `if` invisibly returns NULL if the condition is FALSE. - -- Since functions like c() and paste() drop NULL inputs, this allows for a compact expression of certain idioms -00:54:15 collinberke: https://docs.google.com/spreadsheets/d/1ScrbEw_-vB9DruaJhjtVY8HLQmuNPqyWeOOjmG6OY1M/edit?usp=sharing -00:58:46 collinberke: https://www.youtube.com/@safe4democracy/videos -``` -</details> diff --git a/08_Conditions.Rmd b/08_Conditions.Rmd @@ -1,549 +0,0 @@ -# Conditions - -**Learning objectives:** - -- What conditions are -- How to use them - -## Introduction - -What are conditions? Problems that happen in functions: - -- Error -- Warning -- Message - -As a function author, one can signal them--that is, say there's a problem. - -As a function consumer, one can handle them--for example, react or ignore. - -## Signalling conditions - -### Types of conditions - -Three types of conditions: - -- `r emoji::emoji("x")` **Errors.** Problem arose, and the function cannot continue. -- `r emoji::emoji("warning")` **Warnings.** Problem arose, but the function can continue, if only partially. -- `r emoji::emoji("speech_balloon")` **Messages.** Something happened, and the user should know. - -### `r emoji::emoji("x")` Errors - -How to throw errors - -```{r throwing_errors} -# with base R -stop("... in the name of love...") - -# with rlang -rlang::abort("...before you break my heart...") - -# with base R; without call -stop("... think it o-o-over...", call. = FALSE) -``` -Composing error messages - -- Mechanics. - - `stop()` pastes together arguments -```{r} -some_val <- 1 -stop("Your value is: ", some_val, call. = FALSE) -``` - - `abort()` requires `{glue}` -```{r} -some_val <- 1 -rlang::abort(glue::glue("Your value is: {some_val}")) -``` -- Style. See [here](http://style.tidyverse.org/error-messages.html). - -### `r emoji::emoji("warning")` Warnings - -May have multiple warnings per call - -```{r} -warn <- function() { - warning("This is your first warning") - warning("This is your second warning") - warning("This is your LAST warning") -} -``` - -Print all warnings once call is complete. - -```{r} -warn() -``` - -Like errors, `warning()` has - -- a call argument -- an `{rlang}` analog - -```{r} -# base R -# ... with call (implicitly .call = TRUE) -warning("Warning") -# ... with call suppressed -warning("Warning", call. = FALSE) - -# rlang -# note: call suppressed by default -rlang::warn("Warning") -``` - -(Hadley's) advice on usage: - -- Err on the side of errors. In other words, error rather than warn. -- But warnings make sense in a few cases: - - Function is being deprecated. Warn that it is reaching end of life. - - Function is reasonably sure to recover from issue. - -### `r emoji::emoji("speech_balloon")` Messages - -Mechanics: - -- Issued immediately -- Do not have a call argument - -Style: - -Messages are best when they inform about: - -- Default arguments -- Status updates of for functions used primarily for side-effects (e.g., interaction with web API, file downloaded, etc.) -- Progress of long-running process (in the absence of a status bar). -- Package loading message (e.g., attaching package, objects masked) - -## Ignoring conditions - -A few ways: - -- `try()` -- `suppressWarnings()` -- `suppressMessages()` - -### `try()` - -What it does: - -- Displays error -- But continues execution after error - -```{r} -bad_log <- function(x) { - try(log(x)) - 10 -} - -bad_log("bad") -``` - -Better ways to react to/recover from errors: - -1. Use `tryCatch()` to "catch" the error and perform a different action in the event of an error. -1. Set a default value inside the call. See below. - -```{r} -default <- NULL -try(default <- read.csv("possibly-bad-input.csv"), silent = TRUE) -``` - - -### `suppressWarnings()`, `suppressMessages()` - -What it does: - -- Supresses all warnings (messages) - -```{r} -# suppress warnings (from our `warn()` function above) -suppressWarnings(warn()) - -# suppress messages -many_messages <- function() { - message("Message 1") - message("Message 2") - message("Message 3") -} - -suppressMessages(many_messages()) -``` - -## Handling conditions - -Every condition has a default behavior: - -- `r emoji::emoji("x")` Errors halt execution -- `r emoji::emoji("warning")` Warnings are collected during execution and displayed in bulk after execution -- `r emoji::emoji("speech_balloon")` Messages are displayed immediately - -Condition handlers allow one to change that behavior (within the scope of a function). - -Two handler functions: - -- `tryCatch()` -- `withCallingHandlers()` - -```{r, eval=FALSE} -# try to run `code_to_try_to_run` -# if (error) condition is signalled, fun some other code -tryCatch( - error = function(cnd) { - # code to run when error is thrown - }, - code_to_try_to_run -) - -# try to `code_to_try_to_run` -# if condition is signalled, run code corresponding to condition type -withCallingHandlers( - warning = function(cnd) { - # code to run when warning is signalled - }, - message = function(cnd) { - # code to run when message is signalled - }, - code_to_try_to_run -) -``` - - -### Condition objects - -```{r} -# catch a condition -cnd <- rlang::catch_cnd(stop("An error")) -# inspect it -str(cnd) -``` - -The standard components - -- `message`. The error message. To extract it, use `conditionMessage(cnd)`. -- `call`. The function call that triggered the condition. To extract it, use `conditionCall(cnd)`. - -But custom conditions may contain other components. - -### Exiting handlers - -If a condition is signalled, this type of handler controls what code to run before exiting the function call. - -```{r} -f3 <- function(x) { - tryCatch( - # if error signalled, return NA - error = function(cnd) NA, - # try to run log - log(x) - ) -} - -f3("x") -``` - -When a condition is signalled, control moves to the handler and never returns to the original code. - -```{r} -tryCatch( - message = function(cnd) "There", - { - message("Here") - stop("This code is never run!") - } -) -``` - -The `tryCatch()` exit handler has one final argument: `finally`. This is run regardless of the condition of the original code. This is often used for clean-up. - -```{r} -# try to write text to disk -# if an error is signalled--for example, `path` does not exist -# or if no condition is signalled -# that is in both cases, the code block in `finally` is executed -path <- tempfile() -tryCatch( - { - writeLines("Hi!", path) - # ... - }, - finally = { - # always run - unlink(path) - } -) -``` - -### Calling handlers - -Definition by verbal comparison: - -- With exit handlers, code exits the normal flow once a condition is signalled -- With calling handlers, code continues in the normal flow once control is returned by the handler. - -Definition by code comparison: - -```{r} -# with an exit handler, control moves to the handler once condition signalled and does not move back -tryCatch( - message = function(cnd) cat("Caught a message!\n"), - { - message("Someone there?") - message("Why, yes!") - } -) - -# with a calling handler, control moves first to the handler and the moves back to the main code -withCallingHandlers( - message = function(cnd) cat("Caught a message!\n"), - { - message("Someone there?") - message("Why, yes!") - } -) -``` - -### By default, conditions propagate - -Let's suppose that there are nested handlers. If a condition is signalled in the child, it propagates to its parent handler(s). - -```{r} -# Bubbles all the way up to default handler which generates the message -withCallingHandlers( - message = function(cnd) cat("Level 2\n"), - withCallingHandlers( - message = function(cnd) cat("Level 1\n"), - message("Hello") - ) -) - -# Bubbles up to tryCatch -tryCatch( - message = function(cnd) cat("Level 2\n"), - withCallingHandlers( - message = function(cnd) cat("Level 1\n"), - message("Hello") - ) -) -``` - -### But conditions can be muffled - -If one wants to "muffle" the siginal, one needs to use `rlang::cnd_muffle()` - -```{r} -# Muffles the default handler which prints the messages -withCallingHandlers( - message = function(cnd) { - cat("Level 2\n") - rlang::cnd_muffle(cnd) - }, - withCallingHandlers( - message = function(cnd) cat("Level 1\n"), - message("Hello") - ) -) - -# Muffles level 2 handler and the default handler -withCallingHandlers( - message = function(cnd) cat("Level 2\n"), - withCallingHandlers( - message = function(cnd) { - cat("Level 1\n") - rlang::cnd_muffle(cnd) - }, - message("Hello") - ) -) -``` - -### Call stacks - -Call stacks of exiting and calling handlers differ. - -Why? - -> Calling handlers are called in the context of the call that signalled the condition -> exiting handlers are called in the context of the call to tryCatch() - -To see this, consider how the call stacks differ for a toy example. - -```{r} -# create a function -f <- function() g() -g <- function() h() -h <- function() message - -# call stack of calling handlers -withCallingHandlers(f(), message = function(cnd) { - lobstr::cst() - rlang::cnd_muffle(cnd) -}) - -# call stack of exit handlers -tryCatch(f(), message = function(cnd) lobstr::cst()) -tryCatch(f(), message = function(cnd) lobstr::cst()) -``` - -## Custom conditions - -### Motivation - -The `base::log()` function provides a minimal error message. - -```{r} -log(letters) -log(1:10, base = letters) -``` - -One could make a more informative error message about which argument is problematic. - -```{r} -my_log <- function(x, base = exp(1)) { - if (!is.numeric(x)) { - rlang::abort(paste0( - "`x` must be a numeric vector; not ", typeof(x), "." - )) - } - if (!is.numeric(base)) { - rlang::abort(paste0( - "`base` must be a numeric vector; not ", typeof(base), "." - )) - } - - base::log(x, base = base) -} -``` - -Consider the difference: - -```{r} -my_log(letters) -my_log(1:10, base = letters) -``` - - -### Signalling - -Create a helper function to describe errors: - -```{r} -abort_bad_argument <- function(arg, must, not = NULL) { - msg <- glue::glue("`{arg}` must {must}") - if (!is.null(not)) { - not <- typeof(not) - msg <- glue::glue("{msg}; not {not}.") - } - - rlang::abort( - "error_bad_argument", # <- this is the (error) class, I believe - message = msg, - arg = arg, - must = must, - not = not - ) -} -``` - -Rewrite the log function to use this helper function: - -```{r} -my_log <- function(x, base = exp(1)) { - if (!is.numeric(x)) { - abort_bad_argument("x", must = "be numeric", not = x) - } - if (!is.numeric(base)) { - abort_bad_argument("base", must = "be numeric", not = base) - } - - base::log(x, base = base) -} -``` - -See the result for the end user: - -```{r} -my_log(letters) -my_log(1:10, base = letters) -``` - -### Handling - -Use class of condition object to allow for different handling of different types of errors - -```{r} -tryCatch( - error_bad_argument = function(cnd) "bad_argument", - error = function(cnd) "other error", - my_log("a") -) -``` - -But note that the first handler that matches any of the signal's class, potentially in a vector of signal classes, will get control. So put the most specific handlers first. - -## Applications - -See [the sub-section in the book](https://adv-r.hadley.nz/conditions.html#condition-applications) for excellent examples. - -## Resources - -- Conditions articles in rlang vignettes: - - [Including function calls in error messages](https://rlang.r-lib.org/reference/topic-error-call.html) - - [Including contextual information with error chains](https://rlang.r-lib.org/reference/topic-error-chaining.html) - - [Formatting messages with cli](https://rlang.r-lib.org/reference/topic-condition-formatting.html) -- [Other resources](https://github.com/rstudio-conf-2022/pkg-dev-masterclass/blob/main/materials/5-error-resources.md) from error message segment of rstudio::conf(2022) workshop "Package Development Masterclass" - -## Meeting Videos - -### Cohort 1 - -`r knitr::include_url("https://www.youtube.com/embed/mwiNe083DLU")` - -### Cohort 2 - -`r knitr::include_url("https://www.youtube.com/embed/ZFUr7YRSu2o")` - -### Cohort 3 - -`r knitr::include_url("https://www.youtube.com/embed/UZhrsVz6wi0")` - -`r knitr::include_url("https://www.youtube.com/embed/Wt7p71_BuYY")` - -### Cohort 4 - -`r knitr::include_url("https://www.youtube.com/embed/WinIo5mrUZo")` - -### Cohort 5 - -`r knitr::include_url("https://www.youtube.com/embed/VFs-2sl5C70")` - -### Cohort 6 - -`r knitr::include_url("https://www.youtube.com/embed/VwmrbPUQY1k")` - -<details> -<summary> Meeting chat log </summary> - -``` -00:19:16 Trevin: https://style.tidyverse.org/error-messages.html -00:20:14 Trevin: More on errors in the design guide: https://design.tidyverse.org/ -01:14:27 Federica Gazzelloni: more info here: https://colinfay.me/learn-shiny-production/ -``` -</details> - -### Cohort 7 - -`r knitr::include_url("https://www.youtube.com/embed/t1N6XdidvNo")` - -<details> - -<summary>Meeting chat log</summary> -``` -00:34:09 Ron: Someone did: https://cran.r-project.org/web/packages/comprehenr/vignettes/Introduction.html -00:47:58 collinberke: https://purrr.tidyverse.org/reference/safely.html -00:48:24 Ron: it's a function operator ! -00:49:37 Ron: \(x) length(unique(x) is not too verbose though -00:49:39 Ron: ;) -01:06:50 collinberke: https://colinfay.me/purrr-mappers/ -01:07:45 collinberke: https://colinfay.me/purrr-web-mining/ -``` -</details> diff --git a/10_Function_factories.Rmd b/10_Function_factories.Rmd @@ -1,672 +0,0 @@ -# Function factories - -**Learning objectives:** - -- Understand what a function factory is -- Recognise how function factories work -- Learn about non-obvious combination of function features -- Generate a family of functions from data - -```{r, message = FALSE} -library(rlang) -library(ggplot2) -library(scales) -``` - - -## What is a function factory? - - -A **function factory** is a function that makes (returns) functions - -Factory made function are **manufactured functions**. - -```{r 10-1, echo=FALSE, fig.align='center', fig.dim="50%",fig.alt="https://epsis.com/no/operations-centers-focus-on-ways-of-working/",fig.cap="Function factory | Credits: epsis.com"} -knitr::include_graphics("images/10-1-factories.png") -``` - - - -## How does a function factory work? -```{r 10-2, echo=FALSE, fig.align='center', fig.dim="100%",fig.cap="How does it work? | Credits: kakaakigas.com/how-it-works/"} -knitr::include_graphics("images/10-2-how.jpg") -``` - -```{r 10-ex1} -power1 <- function(exp) { - function(x) { - x ^ exp - } -} - -square <- power1(2) -cube <- power1(3) -``` -`power1()` is the function factory and `square()` and `cube()` are manufactured functions. - -## Important to remember - -1. R has First-class functions (can be created with `function()` and `<-`) - -> R functions are objects in their own right, a language property often called “first-class functions” -> -- [Section 6.2.3](https://adv-r.hadley.nz/functions.html?q=first%20class#first-class-functions) - -2. Functions capture (enclose) environment in which they are created - -```{r 10-ex3} -f <- function(x) function(y) x + y -fn_env(f) # The function f() -fn_env(f()) # The function created by f() -``` - -3. Functions create a new environment on each run -```{r 10-ex4} -f <- function(x) { - function() x + 1 -} -ff <- f(1) -ff() -ff() -``` - - -## Fundamentals - Environment - -- Environment when function is created defines arguments in the function -- Use `env_print(fun)` and `fn_env()` to explore - -```{r} -env_print(square) -fn_env(square)$exp -``` - -![Blue indicates environment, arrows bindings](images/10-3-procedure.png){width=50% fig-align=center} - -## Fundamentals - Forcing - -- Lazy evaluation means arguments only evaluated when used -- "[can] lead to a real head-scratcher of a bug" - -```{r} -x <- 2 -square <- power1(x) -x <- 3 -square(4) -``` - -- *Only applies if passing object as argument* -- Here argument `2` evaluated when function called - -```{r} -square <- power1(2) -x <- 3 -square(4) -``` - -So use `force()`! (Unless you want it to change with the `x` in the parent environment) - -## Forcing - Reiterated - -Only required if the argument is **not** evaluated before the new function is created: -```{r} -power1 <- function(exp) { - stopifnot(is.numeric(exp)) - function(x) x ^ exp -} - -x <- 2 -square <- power1(x) -x <- 3 -square(4) -``` - -## Fundamentals - Stateful functions - -Because - -- The enclosing environment is unique and constant, and -- We have `<<-` (super assignment) - -We can *change* that enclosing environment and keep track of that state -across iterations (!) - -- `<-` Assignment in *current* environment -- `<<-` Assignment in *parent* environment - -```{r 10-15} -new_counter <- function() { - i <- 0 - function() { - i <<- i + 1 # second assignment (super assignment) - i - } -} - -counter_one <- new_counter() -counter_two <- new_counter() -c(counter_one(), counter_one(), counter_one()) -c(counter_two(), counter_two(), counter_two()) -``` - - -> "As soon as your function starts managing the state of multiple variables, it’s better to switch to R6" - -## Fundamentals - Garbage collection - -- Because environment is attached to (enclosed by) function, temporary objects -don't go away. - -**Cleaning up** using `rm()` inside a function: -```{r 10-16} -f_dirty <- function(n) { - x <- runif(n) - m <- mean(x) - function() m -} - -f_clean <- function(n) { - x <- runif(n) - m <- mean(x) - rm(x) # <---- Important part! - function() m -} - -lobstr::obj_size(f_dirty(1e6)) -lobstr::obj_size(f_clean(1e6)) - -``` - - -## Useful Examples - Histograms and binwidth - -**Useful when...** - -- You need to pass a function -- You don't want to have to re-write the function every time - (the *default* behaviour of the function should be flexible) - - -For example, these bins are not appropriate -```{r} -#| fig-asp: 0.3 -sd <- c(1, 5, 15) -n <- 100 -df <- data.frame(x = rnorm(3 * n, sd = sd), sd = rep(sd, n)) - -ggplot(df, aes(x)) + - geom_histogram(binwidth = 2) + - facet_wrap(~ sd, scales = "free_x") + - labs(x = NULL) -``` - -We could just make a function... -```{r} -#| fig-asp: 0.3 -binwidth_bins <- function(x) (max(x) - min(x)) / 20 - -ggplot(df, aes(x = x)) + - geom_histogram(binwidth = binwidth_bins) + - facet_wrap(~ sd, scales = "free_x") + - labs(x = NULL) -``` - -But if we want to change the number of bins (20) we'd have to re-write the function -each time. - -If we use a factory, we don't have to do that. -```{r} -#| fig-asp: 0.3 -binwidth_bins <- function(n) { - force(n) - function(x) (max(x) - min(x)) / n -} - -ggplot(df, aes(x = x)) + - geom_histogram(binwidth = binwidth_bins(20)) + - facet_wrap(~ sd, scales = "free_x") + - labs(x = NULL, title = "20 bins") - -ggplot(df, aes(x = x)) + - geom_histogram(binwidth = binwidth_bins(5)) + - facet_wrap(~ sd, scales = "free_x") + - labs(x = NULL, title = "5 bins") -``` - -> Similar benefit in Box-cox example - -## Useful Examples - Wrapper - -**Useful when...** - -- You want to create a function that wraps a bunch of other functions - -For example, `ggsave()` wraps a bunch of different graphics device functions: - -```{r} -# (Even more simplified) -plot_dev <- function(ext, dpi = 96) { - force(dpi) - - switch( - ext, - svg = function(filename, ...) svglite::svglite(file = filename, ...), - png = function(...) grDevices::png(..., res = dpi, units = "in"), - jpg = , - jpeg = function(...) grDevices::jpeg(..., res = dpi, units = "in"), - stop("Unknown graphics extension: ", ext, call. = FALSE) - ) -} -``` - -Then `ggsave()` uses - -``` -ggsave <- function(...) { - dev <- plot_dev(device, filename, dpi = dpi) - ... - dev(filename = filename, width = dim[1], height = dim[2], bg = bg, ...) - ... -} -``` - -Otherwise, would have to do something like like a bunch of if/else statements. - -## Useful Examples - Optimizing - -**Useful when...** - -- Want to pass function on to `optimise()`/`optimize()` -- Want to perform pre-computations to speed things up -- Want to re-use this for other datasets - -(*Skipping to final results from section*) - -Here, using MLE want to to find the most likely value of lambda for a Poisson distribution -and this data. -```{r} -x1 <- c(41, 30, 31, 38, 29, 24, 30, 29, 31, 38) -``` - -We'll create a function that creates a lambda assessment function for a given -data set. - -```{r} -ll_poisson <- function(x) { - n <- length(x) - sum_x <- sum(x) - c <- sum(lfactorial(x)) - - function(lambda) { - log(lambda) * sum_x - n * lambda - c - } -} -``` - -We can use this on different data sets, but here use ours `x1` -```{r} -ll <- ll_poisson(x1) -ll(10) # Log-probility of a lambda = 10 -``` - -Use `optimise()` rather than trial and error -```{r} -optimise(ll, c(0, 100), maximum = TRUE) -``` - -Result: Highest log-probability is -30.3, best lambda is 32.1 - - -## Function factories + functionals - -Combine functionals and function factories to turn data into many functions. - -```{r} -names <- list( - square = 2, - cube = 3, - root = 1/2, - cuberoot = 1/3, - reciprocal = -1 -) -funs <- purrr::map(names, power1) -names(funs) -funs$root(64) -funs$square(3) -``` - -Avoid the prefix with - -- `with()` - `with(funs, root(100))` - - Temporary, clear, short-term -- `attach()` - `attach(funs)` / `detach(funs)` - - Added to search path (like package function), cannot be overwritten, but can be attached multiple times! -- `rlang::env_bind` - `env_bind(globalenv(), !!!funs)` / `env_unbind(gloablenv(), names(funs))` - - Added to global env (like created function), can be overwritten - -<!-- -## EXTRA - Previous set of slides - -Graphical factories **useful function factories**, such as: - -1. Labelling with: - - * formatter functions - -```{r 10-19} -y <- c(12345, 123456, 1234567) -comma_format()(y) -``` -```{r 10-20} -number_format(scale = 1e-3, suffix = " K")(y) -``` -They are more commonly used inside a ggplot: -```{r 10-21, include=FALSE} -df <- data.frame(x = 1, y = y) -a_ggplot_object <- ggplot(df, aes(x, y)) + - geom_point() + - scale_x_continuous(breaks = 1, labels = NULL) + - labs(x = NULL, y = NULL) -``` - -```{r 10-22,eval=T} -a_ggplot_object + - scale_y_continuous( - labels = comma_format() -) -``` - -2. Using binwidth in facet histograms - - * binwidth_bins - -```{r} -binwidth_bins <- function(n) { - force(n) - - function(x) { - (max(x) - min(x)) / n - } -} -``` - -Or use a concatenation of this typr of detecting number of bins functions: - - - nclass.Sturges() - - nclass.scott() - - nclass.FD() - -```{r} -base_bins <- function(type) { - fun <- switch(type, - Sturges = nclass.Sturges, - scott = nclass.scott, - FD = nclass.FD, - stop("Unknown type", call. = FALSE) - ) - - function(x) { - (max(x) - min(x)) / fun(x) - } -} -``` - - -3. Internals: - - * ggplot2:::plot_dev() - - -## Non-obvious combinations - - -- The **Box-Cox** transformation. -- **Bootstrap** resampling. -- **Maximum likelihood** estimation. - - -### Statistical factories - -The **Box-Cox** transformation towards normality: -```{r} -boxcox1 <- function(x, lambda) { - stopifnot(length(lambda) == 1) - - if (lambda == 0) { - log(x) - } else { - (x ^ lambda - 1) / lambda - } -} -``` - - -```{r} -boxcox2 <- function(lambda) { - if (lambda == 0) { - function(x) log(x) - } else { - function(x) (x ^ lambda - 1) / lambda - } -} - -stat_boxcox <- function(lambda) { - stat_function(aes(colour = lambda), fun = boxcox2(lambda), size = 1) -} - -plot1 <- ggplot(data.frame(x = c(0, 5)), aes(x)) + - lapply(c(0.5, 1, 1.5), stat_boxcox) + - scale_colour_viridis_c(limits = c(0, 1.5)) - -# visually, log() does seem to make sense as the transformation -# for lambda = 0; as values get smaller and smaller, the function -# gets close and closer to a log transformation -plot2 <- ggplot(data.frame(x = c(0.01, 1)), aes(x)) + - lapply(c(0.5, 0.25, 0.1, 0), stat_boxcox) + - scale_colour_viridis_c(limits = c(0, 1.5)) -library(patchwork) -plot1+plot2 -``` - -**Bootstrap generators** - - -```{r} -boot_permute <- function(df, var) { - n <- nrow(df) - force(var) - - function() { - col <- df[[var]] - col[sample(n, replace = TRUE)] - } -} - -boot_mtcars1 <- boot_permute(mtcars, "mpg") -head(boot_mtcars1()) -#> [1] 16.4 22.8 22.8 22.8 16.4 19.2 -head(boot_mtcars1()) -#> [1] 17.8 18.7 30.4 30.4 16.4 21.0 -``` -```{r} -boot_model <- function(df, formula) { - mod <- lm(formula, data = df) - fitted <- unname(fitted(mod)) - resid <- unname(resid(mod)) - rm(mod) - - function() { - fitted + sample(resid) - } -} - -boot_mtcars2 <- boot_model(mtcars, mpg ~ wt) -head(boot_mtcars2()) -#> [1] 25.0 24.0 21.7 19.2 24.9 16.0 -head(boot_mtcars2()) -#> [1] 27.4 21.0 20.3 19.4 16.3 21.3 -``` - -**Maximum likelihood estimation** - -$$P(\lambda,x)=\prod_{i=1}^{n}\frac{\lambda^{x_i}e^{-\lambda}}{x_i!}$$ -```{r} -lprob_poisson <- function(lambda, x) { - n <- length(x) - (log(lambda) * sum(x)) - (n * lambda) - sum(lfactorial(x)) -} -``` - -```{r} -x1 <- c(41, 30, 31, 38, 29, 24, 30, 29, 31, 38) -``` - -```{r} -lprob_poisson(10, x1) -#> [1] -184 -lprob_poisson(20, x1) -#> [1] -61.1 -lprob_poisson(30, x1) -#> [1] -31 -``` - -```{r} -ll_poisson1 <- function(x) { - n <- length(x) - - function(lambda) { - log(lambda) * sum(x) - n * lambda - sum(lfactorial(x)) - } -} -``` -```{r} -ll_poisson2 <- function(x) { - n <- length(x) - sum_x <- sum(x) - c <- sum(lfactorial(x)) - - function(lambda) { - log(lambda) * sum_x - n * lambda - c - } -} -``` - -```{r} -ll1 <- ll_poisson2(x1) - -ll1(10) -#> [1] -184 -ll1(20) -#> [1] -61.1 -ll1(30) -#> [1] -31 -``` -```{r} -optimise(ll1, c(0, 100), maximum = TRUE) -#> $maximum -#> [1] 32.1 -#> -#> $objective -#> [1] -30.3 -``` -```{r} -optimise(lprob_poisson, c(0, 100), x = x1, maximum = TRUE) -#> $maximum -#> [1] 32.1 -#> -#> $objective -#> [1] -30.3 -``` - -## Function factory applications - - -Combine functionals and function factories to turn data into many functions. - -### Function factories + functionals -```{r} -names <- list( - square = 2, - cube = 3, - root = 1/2, - cuberoot = 1/3, - reciprocal = -1 -) -funs <- purrr::map(names, power1) - -funs$root(64) -#> [1] 8 -funs$root -#> function(x) { -#> x ^ exp -#> } -#> <bytecode: 0x7fe85512a410> -#> <environment: 0x7fe85b21f190> -``` -```{r} -with(funs, root(100)) -#> [1] 10 -``` - -```{r} -attach(funs) -#> The following objects are masked _by_ .GlobalEnv: -#> -#> cube, square -root(100) -#> [1] 10 -detach(funs) -``` - - -```{r} -rlang::env_bind(globalenv(), !!!funs) -root(100) -#> [1] 10 -``` - -```{r} -rlang::env_unbind(globalenv(), names(funs)) -``` - - ---> - - -## Meeting Videos - -### Cohort 1 - -`r knitr::include_url("https://www.youtube.com/embed/enI5Ynq6olI")` - -### Cohort 2 - -`r knitr::include_url("https://www.youtube.com/embed/U-CoF7MCik0")` - -### Cohort 3 - -`r knitr::include_url("https://www.youtube.com/embed/qgn7WTITnNs")` - -### Cohort 4 - -`r knitr::include_url("https://www.youtube.com/embed/GHp2W4JxVaY")` - -### Cohort 5 - -`r knitr::include_url("https://www.youtube.com/embed/8TGXjzi0n0o")` - -### Cohort 6 - -`r knitr::include_url("https://www.youtube.com/embed/FUoYwYFqT7Q")` - -<details> -<summary> Meeting chat log </summary> - -``` -01:02:25 Trevin: I'm good with combining 👍 -01:02:57 Oluwafemi Oyedele: I agree with combining the chapter!!! -``` -</details> - -### Cohort 7 - -`r knitr::include_url("https://www.youtube.com/embed/7GLyO3IntgE")` diff --git a/13_S3.Rmd b/13_S3.Rmd @@ -1,417 +0,0 @@ -# S3 - -## Introcudion - -## Basics - -- Has class -- Uses a generic function to decide on method - - method = implementation for a specific class - - dispatch = process of searching for right method - -## Classes - -**Theory:** - -What is class? - - - No formal definition in S3 - - Simply set class attribute - -How to set class? - - - At time of object creation - - After object creation - -```{r} -# at time of object creation -x <- structure(list(), class = "my_class") - -# after object creation -x <- list() -class(x) <- "my_class" -``` - -Some advice on style: - - - Rules: Can be any string - - Advice: Consider using/including package name to avoid collision with name of another class (e.g., `blob`, which defines a single class; haven has `labelled` and `haven_labelled`) - - Convention: letters and `_`; avoid `.` since it might be confused as separator between generic and class name - -**Practice:** - -How to compose a class in practice? - -- **Constructor**, which helps the developer create new object of target class. Provide always. -- **Validator**, which checks that values in constructor are valid. May not be necessary for simple classes. -- **Helper**, which helps users create new objects of target class. May be relevant only for user-facing classes. - -### Constructors - -Help developers construct an object of the target class: - -```{r} -new_difftime <- function(x = double(), units = "secs") { - # check inputs - # issue generic system error if unexpected type or value - stopifnot(is.double(x)) - units <- match.arg(units, c("secs", "mins", "hours", "days", "weeks")) - - # construct instance of target class - structure(x, - class = "difftime", - units = units - ) -} -``` - -### Validators - -Contrast a constructor, aimed at quickly creating instances of a class, which only checks type of inputs ... - -```{r} -new_factor <- function(x = integer(), levels = character()) { - stopifnot(is.integer(x)) - stopifnot(is.character(levels)) - - structure( - x, - levels = levels, - class = "factor" - ) -} - -# error messages are for system default and developer-facing -new_factor(1:5, "a") -``` - - -... with a validator, aimed at emitting errors if inputs pose problems, which makes more expensive checks - -```{r} -validate_factor <- function(x) { - values <- unclass(x) - levels <- attr(x, "levels") - - if (!all(!is.na(values) & values > 0)) { - stop( - "All `x` values must be non-missing and greater than zero", - call. = FALSE - ) - } - - if (length(levels) < max(values)) { - stop( - "There must be at least as many `levels` as possible values in `x`", - call. = FALSE - ) - } - - x -} - -# error messages are informative and user-facing -validate_factor(new_factor(1:5, "a")) -``` - -Maybe there is a typo in the `validate_factor()` function? Do the integers need to start at 1 and be consecutive? - -* If not, then `length(levels) < max(values)` should be `length(levels) < length(values)`, right? -* If so, why do the integers need to start at 1 and be consecutive? And if they need to be as such, we should tell the user, right? - -```{r} -validate_factor(new_factor(1:3, levels = c("a", "b", "c"))) -validate_factor(new_factor(10:12, levels = c("a", "b", "c"))) -``` - - -### Helpers - -Some desired virtues: - -- Have the same name as the class -- Call the constructor and validator, if the latter exists. -- Issue error informative, user-facing error messages -- Adopt thoughtful/useful defaults or type conversion - - -Exercise 5 in 13.3.4 - -Q: Read the documentation for `utils::as.roman()`. How would you write a constructor for this class? Does it need a validator? What might a helper do? - -A: This function transforms numeric input into Roman numbers. It is built on the integer type, which results in the following constructor. - - -```{r} -new_roman <- function(x = integer()) { - stopifnot(is.integer(x)) - structure(x, class = "roman") -} -``` - -The documentation tells us, that only values between 1 and 3899 are uniquely represented, which we then include in our validation function. - -```{r} -validate_roman <- function(x) { - values <- unclass(x) - - if (any(values < 1 | values > 3899)) { - stop( - "Roman numbers must fall between 1 and 3899.", - call. = FALSE - ) - } - x -} -``` - -For convenience, we allow the user to also pass real values to a helper function. - -```{r} -roman <- function(x = integer()) { - x <- as.integer(x) - - validate_roman(new_roman(x)) -} - -# Test -roman(c(1, 753, 2024)) - -roman(0) -``` - - - -## Generics and methods - -**Generic functions:** - -- Consist of a call to `UseMethod()` -- Pass arguments from the generic to the dispatched method "auto-magically" - -```{r} -my_new_generic <- function(x) { - UseMethod("my_new_generic") -} -``` - -### Method dispatch - -- `UseMethod()` creates a vector of method names -- Dispatch - - Examines all methods in the vector - - Selects a method - -```{r} -x <- Sys.Date() -sloop::s3_dispatch(print(x)) -``` - -### Finding methods - -While `sloop::s3_dispatch()` gives the specific method selected for a specific call, on can see the methods defined: - -- For a generic -```{r} -sloop::s3_methods_generic("mean") -``` -- For a class -```{r} -sloop::s3_methods_class("ordered") -``` - -### Creating methods - -Two rules: - -- Only write a method if you own the generic. Otherwise, bad manners. -- Method must have same arguments as its generic--with one important exception: `...` - -**Example from text:** - -I thought it would be good for us to work through this problem. - -> Carefully read the documentation for `UseMethod()` and explain why the following code returns the results that it does. What two usual rules of function evaluation does `UseMethod()` violate? - -```{r} -g <- function(x) { - x <- 10 - y <- 10 - UseMethod("g") -} -g.default <- function(x) c(x = x, y = y) - -x <- 1 -y <- 1 -g(x) -g.default(x) -``` - - - -**Examples caught in the wild:** - -- [`haven::zap_label`](https://github.com/tidyverse/haven/blob/main/R/zap_label.R), which removes column labels -- [`dplyr::mutate`](https://github.com/tidyverse/dplyr/blob/main/R/mutate.R) -- [`tidyr::pivot_longer`](https://github.com/tidyverse/tidyr/blob/main/R/pivot-long.R) - -## Object styles - -## Inheritance - -Three ideas: - -1. Class is a vector of classes -```{r} -class(ordered("x")) -class(Sys.time()) -``` -2. Dispatch moves through class vector until it finds a defined method -```{r} -sloop::s3_dispatch(print(ordered("x"))) -``` -3. Method can delegate to another method via `NextMethod()`, which is indicated by `->` as below: -```{r} -sloop::s3_dispatch(ordered("x")[1]) -``` - -### `NextMethod()` - -Consider `secret` class that masks each character of the input with `x` in output - -```{r} -new_secret <- function(x = double()) { - stopifnot(is.double(x)) - structure(x, class = "secret") -} - -print.secret <- function(x, ...) { - print(strrep("x", nchar(x))) - invisible(x) -} - -y <- new_secret(c(15, 1, 456)) -y -``` - -Notice that the `[` method is problematic in that it does not preserve the `secret` class. Additionally, it returns `15` as the first element instead of `xx`. - -```{r} -sloop::s3_dispatch(y[1]) -y[1] -``` - -Fix this with a `[.secret` method: - -The first fix (not run) is inefficient because it creates a copy of `y`. - -```{r eval = FALSE} -# not run -`[.secret` <- function(x, i) { - x <- unclass(x) - new_secret(x[i]) -} -``` - -`NextMethod()` is more efficient. - -```{r} -`[.secret` <- function(x, i) { - # first, dispatch to `[` - # then, coerce subset value to `secret` class - new_secret(NextMethod()) -} -``` - -Notice that `[.secret` is selected for dispatch, but that the method delegates to the internal `[`. - -```{r} -sloop::s3_dispatch(y[1]) -y[1] -``` - - -### Allowing subclassing - -Continue the example above to have a `supersecret` subclass that hides even the number of characters in the input (e.g., `123` -> `xxxxx`, 12345678 -> `xxxxx`, 1 -> `xxxxx`). - -To allow for this subclass, the constructor function needs to include two additional arguments: - -- `...` for passing an arbitrary set of arguments to different subclasses -- `class` for defining the subclass - -```{r} -new_secret <- function(x, ..., class = character()) { - stopifnot(is.double(x)) - - structure( - x, - ..., - class = c(class, "secret") - ) -} -``` - -To create the subclass, simply invoke the parent class constructor inside of the subclass constructor: - -```{r} -new_supersecret <- function(x) { - new_secret(x, class = "supersecret") -} - -print.supersecret <- function(x, ...) { - print(rep("xxxxx", length(x))) - invisible(x) -} -``` - -But this means the subclass inherits all parent methods and needs to overwrite all parent methods with subclass methods that return the sublclass rather than the parent class. - -There's no easy solution to this problem in base R. - -There is a solution in the vectors package: `vctrs::vec_restore()` - -<!-- TODO: read docs/vignettes to be able to summarize how this works --> - -## Meeting Videos - -### Cohort 1 - -`r knitr::include_url("https://www.youtube.com/embed/Fy3JF5Em6qY")` - -### Cohort 2 - -`r knitr::include_url("https://www.youtube.com/embed/9GkgNC15EAw")` - -### Cohort 3 - -`r knitr::include_url("https://www.youtube.com/embed/q7lFXSLdC1g")` - -`r knitr::include_url("https://www.youtube.com/embed/2rHS_urTGFg")` - -### Cohort 4 - -`r knitr::include_url("https://www.youtube.com/embed/4la5adcWwKE")` - -`r knitr::include_url("https://www.youtube.com/embed/eTCT2O58GYM")` - -### Cohort 5 - -`r knitr::include_url("https://www.youtube.com/embed/NeHtEGab1Og")` - -### Cohort 6 - -`r knitr::include_url("https://www.youtube.com/embed/vzbl2o-MEeQ")` - -<details> -<summary> Meeting chat log </summary> - -``` -00:05:30 Oluwafemi Oyedele: Hi everyone, Good Evening !!! -00:09:44 Trevin: I agree Arthur, need to look at that package some more -``` -</details> - -### Cohort 7 - -`r knitr::include_url("https://www.youtube.com/embed/zNLx4q8TCKQ")` diff --git a/15_S4.Rmd b/15_S4.Rmd @@ -1,360 +0,0 @@ -# S4 - -## Introduction - -Object consists of: - -- Slots. Like fields in R6. -- Methods. Accessed through generics. Dispatched to particular methods. - -Uses functions to define classes and their methods: - -- `setClass()`. Define class and its components. -- `setGenerics()`. Define generic functions. Used to dispatch. -- `setMethods()`. Define methods - -## Basics overview - -### Set class - -Define the class: - -```{r} -setClass("Person", - slots = c( - name = "character", - age = "numeric" - ) -) -``` - -Create an instance of the class - -```{r} -john <- new("Person", name = "John Smith", age = NA_real_) -``` - - -### Set generics - -Define generic functions for setting and getting the age slot - -```{r} -# get the value -setGeneric("age", function(x) standardGeneric("age")) -# set the value -setGeneric("age<-", function(x, value) standardGeneric("age<-")) -``` - -### Set methods - -Define methods for the generics: - -```{r} -# get the value -setMethod("age", "Person", function(x) x@age) -# set the value -setMethod("age<-", "Person", function(x, value) { - x@age <- value - x -}) - -# set the value -age(john) <- 50 -# get the value -age(john) -``` - -To give a flavor, there is only one method per slot. In more realistic cases, there might be several methods. - -## Details on defining the class - -### Inheritance - -```{r, eval=FALSE} -setClass("Employee", - contains = "Person", - slots = c( - boss = "Person" - ), - prototype = list( - boss = new("Person") - ) -) -``` - -### Instantiation - -Create an instance of the class at two levels: - -- For developer (you): `methods::new()` -- For user: constructor function - -```{r} -# how user constructs an instance -Person <- function(name, age = NA) { - age <- as.double(age) - - # how the developer constructs an instance - new("Person", name = name, age = age) -} - -Person("Someone") -``` - -### Validation - -S4 objects - -- Check class of slot at creation -```{r} -Person(mtcars) -``` - -- Do **not** check other things -```{r} -Person("Hadley", age = c(30, 37)) -``` - -That's where validation comes in--at two stages: - -1. At creation -2. At modification - -#### At creation - -```{r} -setValidity("Person", function(object) { - if (length(object@name) != length(object@age)) { - "@name and @age must be same length" - } else { - TRUE - } -}) - -Person("Hadley", age = c(30, 37)) -``` - -#### At modification - -```{r} -# get value -setGeneric("name", function(x) standardGeneric("name")) -setMethod("name", "Person", function(x) x@name) - -# set value--and assess whether resulting object is valid -setGeneric("name<-", function(x, value) standardGeneric("name<-")) -setMethod("name<-", "Person", function(x, value) { - x@name <- value - validObject(x) - x -}) - -# normal name; no problem -name(john) <- "Jon Smythe" -name(john) - -# invalid name; error thrown -name(john) <- letters -``` - - -## Details on generics and methods - -### Dictate dispatch via signature - -Specify function arguments to be used in determining method. - -```{r} -setGeneric("myGeneric", - function(x, ..., verbose = TRUE) standardGeneric("myGeneric"), - signature = "x" -) -``` - -### Define generics - -General form: - -```{r, eval=FALSE} -setMethod("myGeneric", "Person", function(x) { - # method implementation -}) -``` - -Example to print object: - -```{r} -setMethod("show", "Person", function(object) { - cat(is(object)[[1]], "\n", - " Name: ", object@name, "\n", - " Age: ", object@age, "\n", - sep = "" - ) -}) -john -``` - -Example to access slot: - -```{r} -setGeneric("name", function(x) standardGeneric("name")) -setMethod("name", "Person", function(x) x@name) - -name(john) -``` - - -This is how end users should access slots. - -## Example: `lubridate::period()` - -### Define the class - -```{r, eval=FALSE} -setClass("Period", - # inherits from these classes - contains = c("Timespan", "numeric"), - # has slots for time components - slots = c( - year = "numeric", - month = "numeric", - day = "numeric", - hour = "numeric", - minute = "numeric" - ), - # defines prototype as period of zero duration for all slots - prototype = prototype(year = 0, month = 0, day = 0, hour = 0, minute = 0), - # check validity with `check_period` function; see section below - validity = check_period -) -``` - -See source code [here](https://github.com/tidyverse/lubridate/blob/0bb49b21c88736240219dc67e7ed0eb3df15d9b1/R/periods.r#L90) - -### Validate object - -Check whether object is valid--notably if all arugments have the same length and are integers. - -```{r, eval=FALSE} -check_period <- function(object) { - # start with an empty vector of error messages - errors <- character() - - # check length of object's data - length(object@.Data) -> n - # check length of each slot - lengths <- c( - length(object@year), - length(object@month), - length(object@day), - length(object@hour), - length(object@minute) - ) - - # if length of any slot is different than overall length, compose error message - if (any(lengths != n)) { - msg <- paste("Inconsistent lengths: year = ", lengths[1], - ", month = ", lengths[2], - ", day = ", lengths[3], - ", hour = ", lengths[4], - ", minute = ", lengths[5], - ", second = ", n, - sep = "" - ) - # add just-composed error to vector of error messages - errors <- c(errors, msg) - } - - values <- c(object@year, object@month, object@day, object@hour, object@minute) - values <- na.omit(values) - if (sum(values - trunc(values))) { - msg <- "periods must have integer values" - errors <- c(errors, msg) - } - - if (length(errors) == 0) { - TRUE - } else { - errors - } -} -``` - - -See source code [here](https://github.com/tidyverse/lubridate/blob/0bb49b21c88736240219dc67e7ed0eb3df15d9b1/R/periods.r#L6). - -### Set methods - -Show period: - -```{r, eval=FALSE} -#' @export -setMethod("show", signature(object = "Period"), function(object) { - if (length(object@.Data) == 0) { - cat("<Period[0]>\n") - } else { - print(format(object)) - } -}) - -#' @export -format.Period <- function(x, ...) { - if (length(x) == 0) { - return(character()) - } - - show <- paste( - x@year, "y ", x@month, "m ", x@day, "d ", - x@hour, "H ", x@minute, "M ", x@.Data, "S", - sep = "" - ) - start <- regexpr("[-1-9]|(0\\.)", show) - show <- ifelse(start > 0, substr(show, start, nchar(show)), "0S") - - show[is.na(x)] <- NA - show -} -``` - -See source code [here](https://github.com/tidyverse/lubridate/blob/0bb49b21c88736240219dc67e7ed0eb3df15d9b1/R/periods.r#L195) - -## Meeting Videos - -### Cohort 1 - -`r knitr::include_url("https://www.youtube.com/embed/a1jzpWiksyA")` - -### Cohort 2 - -`r knitr::include_url("https://www.youtube.com/embed/bzo37PHCM1I")` - -### Cohort 3 - -`r knitr::include_url("https://www.youtube.com/embed/WWnJ5Cl-aTE")` - -`r knitr::include_url("https://www.youtube.com/embed/_byYFTQHp1Y")` - -### Cohort 4 - -`r knitr::include_url("https://www.youtube.com/embed/M8Poajmj-HU")` - -### Cohort 5 - -`r knitr::include_url("https://www.youtube.com/embed/unNfE1fDFEY")` - -### Cohort 6 - -`r knitr::include_url("https://www.youtube.com/embed/q1-QUFJsbLA")` - -### Cohort 7 - -`r knitr::include_url("https://www.youtube.com/embed/puvaJtv9gQw")` - -<details> - -<summary>Meeting chat log</summary> -``` -01:09:37 Ron Legere: https://en.wikipedia.org/wiki/Composition_over_inheritance -``` -</details> diff --git a/24_Improving_performance.Rmd b/24_Improving_performance.Rmd @@ -1,290 +0,0 @@ -# Improving performance - - -## Overview - -1. Code organization -2. Check for existing solutions -3. Do as little as possible -4. Vectorise -5. Avoid Copies - -## Organizing code - -- Write a function for each approach -```{r} -mean1 <- function(x) mean(x) -mean2 <- function(x) sum(x) / length(x) -``` -- Keep old functions that you've tried, even the failures -- Generate a representative test case -```{r} -x <- runif(1e5) -``` -- Use `bench::mark` to compare the different versions (and include unit tests) -```{r} -bench::mark( - mean1(x), - mean2(x) -)[c("expression", "min", "median", "itr/sec", "n_gc")] -``` - -## Check for Existing Solution -- CRAN task views (http://cran.rstudio.com/web/views/) -- Reverse dependencies of Rcpp (https://cran.r-project.org/web/packages/Rcpp/) -- Talk to others! - - Google (rseek) - - Stackoverflow ([R]) - - https://community.rstudio.com/ - - DSLC community - -## Do as little as possible -- use a function tailored to a more specific type of input or output, or to a more specific problem - - `rowSums()`, `colSums()`, `rowMeans()`, and `colMeans()` are faster than equivalent invocations that use `apply()` because they are vectorised - - `vapply()` is faster than `sapply()` because it pre-specifies the output type - - `any(x == 10)` is much faster than `10 %in% x` because testing equality is simpler than testing set inclusion -- Some functions coerce their inputs into a specific type. If your input is not the right type, the function has to do extra work - - e.g. `apply()` will always turn a dataframe into a matrix -- Other examples - - `read.csv()`: specify known column types with `colClasses`. (Also consider - switching to `readr::read_csv()` or `data.table::fread()` which are - considerably faster than `read.csv()`.) - - - `factor()`: specify known levels with `levels`. - - - `cut()`: don't generate labels with `labels = FALSE` if you don't need them, - or, even better, use `findInterval()` as mentioned in the "see also" section - of the documentation. - - - `unlist(x, use.names = FALSE)` is much faster than `unlist(x)`. - - - `interaction()`: if you only need combinations that exist in the data, use - `drop = TRUE`. - -## Avoiding Method Dispatch -```{r} -x <- runif(1e2) -bench::mark( - mean(x), - mean.default(x) -)[c("expression", "min", "median", "itr/sec", "n_gc")] -``` - -```{r} -x <- runif(1e2) -bench::mark( - mean(x), - mean.default(x), - .Internal(mean(x)) -)[c("expression", "min", "median", "itr/sec", "n_gc")] -``` - -```{r} -x <- runif(1e4) -bench::mark( - mean(x), - mean.default(x), - .Internal(mean(x)) -)[c("expression", "min", "median", "itr/sec", "n_gc")] -``` - -## Avoiding Input Coercion -- `as.data.frame()` is quite slow because it coerces each element into a data frame and then `rbind()`s them together -- instead, if you have a named list with vectors of equal length, you can directly transform it into a data frame - -```{r} -quickdf <- function(l) { - class(l) <- "data.frame" - attr(l, "row.names") <- .set_row_names(length(l[[1]])) - l -} -l <- lapply(1:26, function(i) runif(1e3)) -names(l) <- letters -bench::mark( - as.data.frame = as.data.frame(l), - quick_df = quickdf(l) -)[c("expression", "min", "median", "itr/sec", "n_gc")] -``` - -*Caveat!* This method is fast because it's dangerous! - -## Vectorise -- vectorisation means finding the existing R function that is implemented in C and most closely applies to your problem -- Vectorised functions that apply to many scenarios - - `rowSums()`, `colSums()`, `rowMeans()`, and `colMeans()` - - Vectorised subsetting can lead to big improvements in speed - - `cut()` and `findInterval()` for converting continuous variables to categorical - - Be aware of vectorised functions like `cumsum()` and `diff()` - - Matrix algebra is a general example of vectorisation - -## Avoiding copies - -- Whenever you use c(), append(), cbind(), rbind(), or paste() to create a bigger object, R must first allocate space for the new object and then copy the old object to its new home. - -```{r} -random_string <- function() { - paste(sample(letters, 50, replace = TRUE), collapse = "") -} -strings10 <- replicate(10, random_string()) -strings100 <- replicate(100, random_string()) -collapse <- function(xs) { - out <- "" - for (x in xs) { - out <- paste0(out, x) - } - out -} -bench::mark( - loop10 = collapse(strings10), - loop100 = collapse(strings100), - vec10 = paste(strings10, collapse = ""), - vec100 = paste(strings100, collapse = ""), - check = FALSE -)[c("expression", "min", "median", "itr/sec", "n_gc")] -``` - -## Case study: t-test - -```{r} -m <- 1000 -n <- 50 -X <- matrix(rnorm(m * n, mean = 10, sd = 3), nrow = m) -grp <- rep(1:2, each = n / 2) -``` - -```{r, cache = TRUE} -# formula interface -system.time( - for (i in 1:m) { - t.test(X[i, ] ~ grp)$statistic - } -) -# provide two vectors -system.time( - for (i in 1:m) { - t.test(X[i, grp == 1], X[i, grp == 2])$statistic - } -) -``` - -Add functionality to save values - -```{r} -compT <- function(i){ - t.test(X[i, grp == 1], X[i, grp == 2])$statistic -} -system.time(t1 <- purrr::map_dbl(1:m, compT)) -``` - -If you look at the source code of stats:::t.test.default(), you’ll see that it does a lot more than just compute the t-statistic. - -```{r} -# Do less work -my_t <- function(x, grp) { - t_stat <- function(x) { - m <- mean(x) - n <- length(x) - var <- sum((x - m) ^ 2) / (n - 1) - list(m = m, n = n, var = var) - } - g1 <- t_stat(x[grp == 1]) - g2 <- t_stat(x[grp == 2]) - se_total <- sqrt(g1$var / g1$n + g2$var / g2$n) - (g1$m - g2$m) / se_total -} -system.time(t2 <- purrr::map_dbl(1:m, ~ my_t(X[.,], grp))) -stopifnot(all.equal(t1, t2)) -``` - -This gives us a six-fold speed improvement! - -```{r} -# Vectorise it -rowtstat <- function(X, grp){ - t_stat <- function(X) { - m <- rowMeans(X) - n <- ncol(X) - var <- rowSums((X - m) ^ 2) / (n - 1) - list(m = m, n = n, var = var) - } - g1 <- t_stat(X[, grp == 1]) - g2 <- t_stat(X[, grp == 2]) - se_total <- sqrt(g1$var / g1$n + g2$var / g2$n) - (g1$m - g2$m) / se_total -} -system.time(t3 <- rowtstat(X, grp)) -stopifnot(all.equal(t1, t3)) -``` - -1000 times faster than when we started! - -## Other techniques -* [Read R blogs](http://www.r-bloggers.com/) to see what performance - problems other people have struggled with, and how they have made their - code faster. - -* Read other R programming books, like The Art of R Programming or Patrick Burns' - [_R Inferno_](http://www.burns-stat.com/documents/books/the-r-inferno/) to - learn about common traps. - -* Take an algorithms and data structure course to learn some - well known ways of tackling certain classes of problems. I have heard - good things about Princeton's - [Algorithms course](https://www.coursera.org/course/algs4partI) offered on - Coursera. - -* Learn how to parallelise your code. Two places to start are - Parallel R and Parallel Computing for Data Science - -* Read general books about optimisation like Mature optimisation - or the Pragmatic Programmer - -* Read more R code. StackOverflow, R Mailing List, DSLC, GitHub, etc. - -## Meeting Videos - -### Cohort 1 - -(no video) - -### Cohort 2 - -`r knitr::include_url("https://www.youtube.com/embed/fSdAqlkeq6I")` - -### Cohort 3 - -`r knitr::include_url("https://www.youtube.com/embed/yCkvUcT7wW8")` - -### Cohort 4 - -`r knitr::include_url("https://www.youtube.com/embed/LCaqvuv3JNg")` - -### Cohort 5 - -`r knitr::include_url("https://www.youtube.com/embed/pOaiDK7J7EE")` - -### Cohort 6 - -`r knitr::include_url("https://www.youtube.com/embed/UaXimKd3vg8")` - -<details> -<summary> Meeting chat log </summary> - -``` -00:24:42 Arthur Shaw: I wonder if there's a task view for R Universe: https://r-universe.dev/search/ -01:01:13 Arthur Shaw: https://www.alexejgossmann.com/benchmarking_r/ -01:04:34 Trevin: I agree that the chapter is a good jumping off point. Gonna have to dig into some of the listed resources 😄 -``` -</details> - -### Cohort 7 - -`r knitr::include_url("https://www.youtube.com/embed/rOkrHvN8Uqg")` - -<details> - -<summary>Meeting chat log</summary> -``` -00:23:48 Ron Legere: https://www.mathworks.com/help/matlab/matlab_prog/vectorization.html -``` -</details> diff --git a/DESCRIPTION b/DESCRIPTION @@ -1,6 +1,7 @@ +Type: Book Package: bookclub-advr Title: Advanced R Book Club -Version: 0.0.1 +Version: 0.0.10 Authors@R: c( person( "Data Science Learning Community", @@ -9,9 +10,9 @@ Authors@R: c( ) ) License: CC BY NC SA 4.0 -URL: https://r4ds.github.io/bookclub-advr, https://github.com/r4ds/bookclub-advr +URL: https://DSLC.io/advr, https://github.com/r4ds/bookclub-advr Depends: - R (>= 3.1.0) + R (>= 4.1.0) Imports: bench, bookdown, diff --git a/README.md b/README.md @@ -3,12 +3,13 @@ Welcome to the DSLC Advanced R Book Club! We are working together to read [_Advanced R_](https://adv-r.hadley.nz/) by Hadley Wickham (Chapman & Hall, copyright 2019, [9780815384571](https://www.routledge.com/Advanced-R-Second-Edition/Wickham/p/book/9780815384571)). + Join the [#book_club-advr](https://rfordatascience.slack.com/archives/C010GJ3VAE5) channel on the [DSLC Slack](https://dslc.io/join) to participate. -As we read, we are producing [notes about the book](https://dslc.io/advr). +As we read, we are producing [slides about the book](https://dslc.io/advr). ## Meeting Schedule -If you would like to present, please see the sign-up sheet for your cohort (linked below, and pinned in the [#book_club-BOOKABBR](https://rfordatascience.slack.com/archives/BOOKCHANNELID) channel on Slack)! +If you would like to present, please see the sign-up sheet for your cohort (linked below, and pinned in the [#book_club-advr](https://rfordatascience.slack.com/archives/C010GJ3VAE5) channel on Slack)! - Cohort 1 (started 2020-04-02, ended 2020-09-22): [meeting videos](https://www.youtube.com/playlist?list=PL3x6DOfs2NGi9lH7q-phZlPrl6HKXYDbn) - Cohort 2 (started 2020-07-30, ended 2021-03-04): [meeting videos](https://www.youtube.com/playlist?list=PL3x6DOfs2NGhPmtka2Wg_NdLk71LJFbVl) @@ -19,6 +20,7 @@ If you would like to present, please see the sign-up sheet for your cohort (link - Cohort 7 (started 2022-10-24, ended 2023-06-13): [meeting videos](https://youtube.com/playlist?list=PL3x6DOfs2NGi4I1DhjPufFNbqCry_xQLq) - Cohort 8 (started 2024-02-01, ended 2024-06-13): [meeting videos](https://www.youtube.com/playlist?list=PL3x6DOfs2NGgr9ZNvaqf4Lb6GN9l6g9dK) - Cohort 9 (started 2024-05-24, ended 2024-11-22): [meeting videos](https://www.youtube.com/playlist?list=PL3x6DOfs2NGgR7BeG9Jri8wrSgW_X-s4_) +- Cohort 10 (facilitator: @jonthegeek): meeting time TBD | [meeting videos](https://www.youtube.com/playlist?list=PL3x6DOfs2NGjlV0O0uf685xmUGfBzstHP) The slides from the old clubs are in a [separate repository](https://github.com/r4ds/bookclub-Advanced_R). @@ -27,7 +29,7 @@ The slides from the old clubs are in a [separate repository](https://github.com/ ## How to Present -This repository is structured as a [{bookdown}](https://CRAN.R-project.org/package=bookdown) site. +This repository is structured as a Quarto website. To present, follow these instructions: Do these steps once: @@ -39,14 +41,14 @@ Do these steps each time you present another chapter: 1. Open your project for this book. 2. `usethis::pr_init("my-chapter")` (creates a branch for your work, to avoid confusion, making sure that you have the latest changes from other contributors; replace `my-chapter` with a descriptive name, ideally). -3. `devtools::install_dev_deps()` (installs any packages used by the book that you don't already have installed). -4. Edit the appropriate chapter file, if necessary. Use `##` to indicate new slides (new sections). +3. `pak::pak()` (installs any packages used by the book that you don't already have installed). +4. Edit the appropriate chapter file (in `slides/`). Use `#` and `##` to indicate new slides. If it's currently `.Rmd`, rename to `.qmd`, and update `_quarto.yml` to point to the `.qmd`. `.Rmd` decks might look weird or otherwise not render properly! 5. If you use any packages that are not already in the `DESCRIPTION`, add them. You can use `usethis::use_package("myCoolPackage")` to add them quickly! -6. Build the book! ctrl-shift-b (or command-shift-b) will render the full book, or ctrl-shift-k (command-shift-k) to render just your slide. Please do this to make sure it works before you push your changes up to the main repo! -7. Commit your changes (either through the command line or using Rstudio's Git tab). +6. Build the book! ctrl-shift-b (or command-shift-b) will render the full book in RStudio, or ctrl-shift-k (command-shift-k) to render just your slides (renders ALL slides in Positron!). Please do this to make sure it works before you push your changes up to the main repo! +7. Commit your changes (through the command line, using Rstudio's Git tab, or using Positron's Source Control section). 8. `usethis::pr_push()` (pushes the changes up to github, and opens a "pull request" (PR) to let us know your work is ready). 9. (If we request changes, make them) 10. When your PR has been accepted ("merged"), `usethis::pr_finish()` to close out your branch and prepare your local repository for future work. 11. Now that your local copy is up-to-date with the main repo, you need to update your remote fork. Run `gert::git_push("origin")` or click the `Push` button on the `Git` tab of Rstudio. -When your PR is checked into the main branch, the bookdown site will rebuild, adding your slides to [this site](https://dslc.io/advr). +When your PR is checked into the main branch, the quarto site will rebuild, adding your slides to [this site](https://dslc.io/advr). diff --git a/_bookdown.yml b/_bookdown.yml @@ -1,8 +0,0 @@ -book_filename: "bookclub-advr" -repo: https://github.com/r4ds/bookclub-advr -edit: "https://github.com/r4ds/bookclub-advr/edit/main/%s" -output_dir: "_book" -delete_merged_file: true -language: - ui: - chapter_name: "Chapter " diff --git a/_output.yml b/_output.yml @@ -1,17 +0,0 @@ -bookdown::gitbook: - css: style.css - split_by: section - config: - toc: - collapse: section - before: | - <li><a href="./">Advanced R Book Club</a></li> - after: | - <li><a href="https://github.com/rstudio/bookdown" target="blank">Published with bookdown</a></li> - edit: - link: https://github.com/r4ds/bookclub-advr/edit/main/%s - text: "Edit" - sharing: - github: yes - facebook: no - twitter: no diff --git a/_quarto.yml b/_quarto.yml @@ -0,0 +1,96 @@ +project: + type: website + +website: + title: "Advanced R Book Club" + sidebar: + style: "docked" + search: true + tools: + - icon: github + href: https://github.com/r4ds/bookclub-advr + - icon: youtube + href: https://dslc.video/advr + - icon: mastodon + href: https://fosstodon.org/@DSLC + - icon: linkedin + href: https://www.linkedin.com/company/dslc-io + contents: + - index.qmd + - file: slides/00-club_intro.qmd + target: advr_club-club_intro + - section: "Getting started" + contents: + - file: slides/01-introduction.qmd + target: advr_club-slides + contents: + - text: Meeting videos + file: videos/01.qmd + - section: "Foundations" + contents: + - file: slides/02_Names_and_values.Rmd + target: advr_club-slides + - file: slides/03_Vectors.Rmd + target: advr_club-slides + - file: slides/04_Subsetting.Rmd + target: advr_club-slides + - file: slides/05_Control_flow.Rmd + target: advr_club-slides + - file: slides/06_Functions.Rmd + target: advr_club-slides + - file: slides/07_Environments.Rmd + target: advr_club-slides + - file: slides/08_Conditions.Rmd + target: advr_club-slides + - section: "Functional programming" + contents: + - file: slides/09_Functionals.Rmd + target: advr_club-slides + - file: slides/10_Function_factories.Rmd + target: advr_club-slides + - file: slides/11_Function_operators.Rmd + target: advr_club-slides + - section: "Object-oriented programming" + contents: + - file: slides/12_Base_types.Rmd + target: advr_club-slides + - file: slides/13_S3.Rmd + target: advr_club-slides + - file: slides/14_R6.Rmd + target: advr_club-slides + - file: slides/15_S4.Rmd + target: advr_club-slides + - file: slides/16_Trade-offs.Rmd + target: advr_club-slides + - section: "Metaprogramming" + contents: + - file: slides/17_Big_picture.Rmd + target: advr_club-slides + - file: slides/18_Expressions.Rmd + target: advr_club-slides + - file: slides/19_Quasiquotation.Rmd + target: advr_club-slides + - file: slides/20_Evaluation.Rmd + target: advr_club-slides + - file: slides/21_Translating_R_code.Rmd + target: advr_club-slides + - section: "Techniques" + contents: + - file: slides/22_Debugging.Rmd + target: advr_club-slides + - file: slides/23_Measuring_performance.Rmd + target: advr_club-slides + - file: slides/24_Improving_performance.Rmd + target: advr_club-slides + - file: slides/25_Rewriting_R_code_in_C++.Rmd + target: advr_club-slides + + +format: + html: + theme: + dark: darkly + light: flatly + css: styles.css + toc: false + link-external-newwindow: true diff --git a/book.bib b/book.bib @@ -1,10 +0,0 @@ -@Book{xie2015, - title = {Dynamic Documents with {R} and knitr}, - author = {Yihui Xie}, - publisher = {Chapman and Hall/CRC}, - address = {Boca Raton, Florida}, - year = {2015}, - edition = {2nd}, - note = {ISBN 978-1498716963}, - url = {http://yihui.org/knitr/}, -} diff --git a/bookclub-advr.Rproj b/bookclub-advr.Rproj @@ -14,5 +14,3 @@ LaTeX: pdfLaTeX AutoAppendNewline: Yes StripTrailingWhitespace: Yes - -BuildType: Website diff --git a/bookclub-advr_cache/html/__packages b/bookclub-advr_cache/html/__packages @@ -1,25 +0,0 @@ -DiagrammeR -lobstr -palmerpenguins -ggplot2 -tidyverse -tibble -tidyr -readr -purrr -dplyr -stringr -forcats -lubridate -rlang -scales -memoise -ids -R6 -deSolve -reshape2 -patchwork -profvis -bench -ggbeeswarm -gt diff --git a/bookclub-advr_cache/html/unnamed-chunk-382_772128e708c11feeae6c581e167c32a4.RData b/bookclub-advr_cache/html/unnamed-chunk-382_772128e708c11feeae6c581e167c32a4.RData Binary files differ. diff --git a/bookclub-advr_cache/html/unnamed-chunk-382_772128e708c11feeae6c581e167c32a4.rdb b/bookclub-advr_cache/html/unnamed-chunk-382_772128e708c11feeae6c581e167c32a4.rdb Binary files differ. diff --git a/bookclub-advr_cache/html/unnamed-chunk-382_772128e708c11feeae6c581e167c32a4.rdx b/bookclub-advr_cache/html/unnamed-chunk-382_772128e708c11feeae6c581e167c32a4.rdx Binary files differ. diff --git a/bookclub-advr_cache/html/unnamed-chunk-392_0d65042b46d439c7ecde1d7c4bf84de4.RData b/bookclub-advr_cache/html/unnamed-chunk-392_0d65042b46d439c7ecde1d7c4bf84de4.RData Binary files differ. diff --git a/bookclub-advr_cache/html/unnamed-chunk-392_0d65042b46d439c7ecde1d7c4bf84de4.rdb b/bookclub-advr_cache/html/unnamed-chunk-392_0d65042b46d439c7ecde1d7c4bf84de4.rdb Binary files differ. diff --git a/bookclub-advr_cache/html/unnamed-chunk-392_0d65042b46d439c7ecde1d7c4bf84de4.rdx b/bookclub-advr_cache/html/unnamed-chunk-392_0d65042b46d439c7ecde1d7c4bf84de4.rdx Binary files differ. diff --git a/bookclub-advr_cache/html/unnamed-chunk-485_24d0d80a65ab19bc80ef07efabb9b0d2.RData b/bookclub-advr_cache/html/unnamed-chunk-485_24d0d80a65ab19bc80ef07efabb9b0d2.RData Binary files differ. diff --git a/bookclub-advr_cache/html/unnamed-chunk-485_24d0d80a65ab19bc80ef07efabb9b0d2.rdb b/bookclub-advr_cache/html/unnamed-chunk-485_24d0d80a65ab19bc80ef07efabb9b0d2.rdb Binary files differ. diff --git a/bookclub-advr_cache/html/unnamed-chunk-485_24d0d80a65ab19bc80ef07efabb9b0d2.rdx b/bookclub-advr_cache/html/unnamed-chunk-485_24d0d80a65ab19bc80ef07efabb9b0d2.rdx Binary files differ. diff --git a/bookclub-advr_cache/html/unnamed-chunk-535_b3e7c9558026489184160fc00b8ca8a9.RData b/bookclub-advr_cache/html/unnamed-chunk-535_b3e7c9558026489184160fc00b8ca8a9.RData Binary files differ. diff --git a/bookclub-advr_cache/html/unnamed-chunk-535_b3e7c9558026489184160fc00b8ca8a9.rdb b/bookclub-advr_cache/html/unnamed-chunk-535_b3e7c9558026489184160fc00b8ca8a9.rdb Binary files differ. diff --git a/bookclub-advr_cache/html/unnamed-chunk-535_b3e7c9558026489184160fc00b8ca8a9.rdx b/bookclub-advr_cache/html/unnamed-chunk-535_b3e7c9558026489184160fc00b8ca8a9.rdx Binary files differ. diff --git a/index.Rmd b/index.Rmd @@ -1,87 +0,0 @@ ---- -title: "Advanced R Book Club" -author: "The Data Science Learning Community" -date: "`r Sys.Date()`" -site: bookdown::bookdown_site -documentclass: book -bibliography: book.bib -biblio-style: apalike -link-citations: yes -github-repo: r4ds/bookclub-advr -description: "This is the product of the Data Science Learning Community's Advanced R Book Club." ---- - -# Welcome {-} - -```{r knitr_opts, echo=FALSE, message=FALSE, warning=FALSE} -knitr::opts_chunk$set( - echo = TRUE, - comment = "#>", - collapse = TRUE, - error = TRUE -) -``` - -Welcome to the bookclub! - -This is a companion for the book [_Advanced R_](https://adv-r.hadley.nz/) by Hadley Wickham (Chapman & Hall, copyright 2019, [9780815384571](https://www.routledge.com/Advanced-R-Second-Edition/Wickham/p/book/9780815384571)). -This companion is available at [dslc.io/advr](https://dslc.io/advr). - -This website is being developed by the [Data Science Learning Community](https://dslc.io). Follow along, and [join the community](https://dslc.io/join) to participate. - -This companion follows the [Data Science Learning Community Code of Conduct](https://dslc.io/conduct). - -## Book club meetings {-} - -- Each week, a volunteer will present a chapter from the book (or part of a chapter). - - **This is the best way to learn the material.** -- Presentations will usually consist of a review of the material, a discussion, and/or a demonstration of the principles presented in that chapter. -- More information about how to present is available in the [github repo](https://github.com/r4ds/bookclub-advr). -- Presentations will be recorded, and will be available on the [Data Science Learning Community YouTube Channel](https://dslc.io/youtube). - - Camera is optional, but encouraged. -- If we need to slow down and discuss, let me or the speaker know. - - A lot can be learned from a discussion. - - Most likely someone has the same question. - - We are all here to learn. - -## Pace {-} - -- We'll _try_ to cover 1 chapter/week, but... -- ...It's ok to split chapters when they feel like too much. -- We will try to meet every week, but will likely take some breaks for holidays, etc. -- The session will be exactly one hour. -- I encourage the group to adopt a 'go no matter what' mentality. - -## Group introductions {-} - -- If you feel comfortable sharing: - - Who are you? - - Where you calling in from? (If you're not comfortable sharing, skip) - - How long have you been using R? - - What was your introduction to R? - - What are you most looking forward to during the group? - -## `git` and `GitHub` {-} - -- If you are unfamilar with these tools, it's best to use the workflows made easy by the `usethis` package. - - [Managing Git(Hub) credentials](https://usethis.r-lib.org/articles/git-credentials.html). - - [Pull request helpers](https://usethis.r-lib.org/articles/pr-functions.html). - - [Check out the notes `README.md` file for exact steps](https://github.com/r4ds/bookclub-advr#readme). - - The community (especially `@jonthegeek`) and myself (`@Collin Berke`) are there to help. - - This is a great opportunity to become more comfortable with these tools. - -- If you're comforable with `git` and `GitHub`: - - Fork the repo. - - Create a branch with an informative name. - - Submit a pull request via the `DSLC` repo of the book. - - PR's are automatically tested and reviewed before being merged. - -## Resources to learn more about `git` and GitHub {-} - -- There are so many tools to use `git` and interface with `GitHub`. - - If you just want to stick with what is familiar, just use the `usethis` package. -- [Happy Git and GitHub for the useR](https://happygitwithr.com/). -- [`usethis`'s pull request helpers](https://usethis.r-lib.org/articles/pr-functions.html). -- [`git`'s documentation](https://git-scm.com/doc). -- [MShiny Cohort 2 Introduction](https://www.youtube.com/watch?v=beOYuHG9Xng&list=PL3x6DOfs2NGjhwrYvdmrKRNcvXX7X6ldt). -- [Git for Book Clubs YouTube playlist](https://www.youtube.com/playlist?list=PL3x6DOfs2NGhS_PhklqT6PwK1Fh7blgP2). diff --git a/index.qmd b/index.qmd @@ -0,0 +1,16 @@ +--- +title: Welcome +date: now +date-format: "YYYY-MM-DD HH:mm" +--- + +This is a companion for [_Advanced R_](https://adv-r.hadley.nz/) by Hadley Wickham (Chapman & Hall, copyright 2019, [9780815384571](https://www.routledge.com/Advanced-R-Second-Edition/Wickham/p/book/9780815384571)). + +Each chapter title to the left is a link to a slide deck. + +- These slides are being developed *by this club.* +- Each deck will open in its own tab. +- You may want to type "s" at the start of each deck to open the *s*peaker notes. +- [Join the Data Science Learning Community](https://dslc.io/join) to participate in the discussion! + +We follow the [Data Science Learning Community Code of Conduct](https://dslc.io/conduct). diff --git a/preamble.tex b/preamble.tex @@ -1 +0,0 @@ -\usepackage{booktabs} diff --git a/slides/00-club_intro.qmd b/slides/00-club_intro.qmd @@ -0,0 +1,53 @@ +--- +engine: knitr +title: Club Meetings +--- + +# Welcome + +- Welcome to the Advanced R book club! +- book: [adv-r.hadley.nz](https://adv-r.hadley.nz/) +- These slides are available at [dslc.io/advr](https://dslc.io/advr). +- We follow the [Data Science Learning Community Code of Conduct](https://dslc.io/code_of_conduct.html). + +::: notes +- You can add notes on each slide with blocks like this! +- Load a deck in the browser and type "s" to see these notes. +::: + +## Book club meetings + +- Volunteer leads discussion of a chapter + - **This is the best way to learn the material.** +- Presentations: + - Review of material + - Questions you have + - Maybe live demo +- More info about editing: [this github repo](https://github.com/r4ds/bookclub-advr) +- Ideally convert existing Rmd to qmd as we go +- Recorded, available on the [Data Science Learning Community YouTube Channel (DSLC.video)](http://dslc.video) + +## Pace + +- **Goal:** 1 chapter/week +- Ok to split overwhelming chapters +- Ok to combine short chapters +- Meet ***every*** week except holidays, etc + - We will meet even if scheduled presenter unavailable + - Push to finish before Daylight Savings Madness March 8 + +## Learning objectives (LOs) + +- Students who study with LOs in mind ***retain more*** +- **Tips:** + - Think "After today's session, you will be able to {LO}" + - *Very* roughly **1 per heading** + +## Group introductions + +- If you feel comfortable sharing: + - Who are you? + - Where you calling in from? (If you're not comfortable sharing, skip) + - How long have you been using R? + - What was your introduction to R? + - What are you most looking forward to during the club? diff --git a/slides/01-introduction.qmd b/slides/01-introduction.qmd @@ -0,0 +1,105 @@ +--- +engine: knitr +title: Introduction +--- + +# ️✅ Learning objectives + +## LOs for the entire book + +- Improve programming skills. +- Develop a deep understanding of R language fundamentals. +- Understand what functional programming means. +- Understand object-oriented programming as applied in R. +- Understand metaprogramming while developing in R. +- Be able to identify what to optimize and how to optimize it. + +## LOs for this chapter + +- Recognize the differences between the 1st and 2nd edition of this book. +- Describe the overall structure of the book. +- Decide whether this book is right for you. + +Books suggestions: + +- [The Structure and Interpretation of Computer Programs (SICP)](https://mitp-content-server.mit.edu/books/content/sectbyfn/books_pres_0/6515/sicp.zip/full-text/book/book.html) +- [Concepts, Techniques and Models of Computer Programming](https://mitpress.mit.edu/books/concepts-techniques-and-models-computer-programming) +- [The Pragmatic Programmer](https://pragprog.com/titles/tpp20/the-pragmatic-programmer-20th-anniversary-edition/) + +# What's new? + +## Hadley's goals + +- Improve coverage of concepts Hadley understood better after 1e +- Reduce coverage of unimportant topics +- Easier to understand (including many more diagrams) + +## Base vs rlang + +- [1e](http://adv-r.had.co.nz) used base R almost exclusively +- 2e uses {[rlang](https://rlang.r-lib.org/)}, {[purrr](https://purrr.tidyverse.org/)}, etc + +# What we'll learn + +## The 5 sections + +- **Foundations:** (7 chapters) Building blocks of R +- **Functional programming:** (3 chapters) Treating functions as objects (that can be args in functions) +- **Object-oriented programming:** (5 chapters + 1) The many object systems of R (we'll add S7) +- **Metaprogramming:** (5 chapters) Generating code with code +- **Techniques:** (4 chapters) Debugging, measuring performance, improving performance + +::: notes +- Might be useful to open TOC here. +::: + +## Why R? + +- Diverse & welcoming community +- Many packages for stats & modeling, ML, dataviz, data wrangling +- Rmarkdown / Quarto +- RStudio / Positron +- Often used in science +- Functional programming powerful for data +- Metaprogramming +- Ease of connection to C, C++, etc + +## R imperfections + +- Much code by non-coders (messy) +- Community more about results than programming best practices +- Metaprogramming can lead to weird failures +- Inconsistency from > 30 years of evolution +- Poorly written R code runs very poorly + +## Who should read Advanced R? + +- Intermediate (and up) R programmers who want to really understand R +- Programmers from other langs who want to know why R is weird +- Prereqs: + - You've written lots of code + - You understand basics of data analysis + - You can install CRAN packages + +## What this book is not + +- [R for Data Science](https://r4ds.hadley.nz/) +- [R Packages](https://r-pkgs.org/) + +## Meta-techniques + +- Read source code + - F2 to see code in RStudio/Positron (with RStudio bindings) +- Adopt a scientific mindset + - Don't understand something? Hypothesize & experiment + +## Other books + +- The Structure and Interpretation of Computer Programs (Abelson, Sussman, and Sussman, 1996) [PDF](https://web.mit.edu/6.001/6.037/sicp.pdf) +- Concepts, Techniques and Models of Computer Programming (Van Roy & Haridi, 2003) [PDF](https://webperso.info.ucl.ac.be/~pvr/VanRoyHaridi2003-book.pdf) +- The Pragmatic Programmer (Hunt & Thomas, 1990) [buy eBook](https://pragprog.com/titles/tpp20/the-pragmatic-programmer-20th-anniversary-edition/) + +::: notes +- As far as I can tell, first 2 PDFs are legal. +- I don't think a legal, free version of The Pragmatic Programmer is available. +::: diff --git a/02_Names_and_values.Rmd b/slides/02_Names_and_values.Rmd diff --git a/03_Vectors.Rmd b/slides/03_Vectors.Rmd diff --git a/04_Subsetting.Rmd b/slides/04_Subsetting.Rmd diff --git a/slides/05_Control_flow.Rmd b/slides/05_Control_flow.Rmd @@ -0,0 +1,487 @@ +# Control flow + +**Learning objectives:** + +- Learn the **tools** for controlling flow of execution. + +- Learn some technical pitfalls and (perhaps lesser known) useful features. + +```{r echo = FALSE, fig.align = 'left', fig.dim = '100%'} +knitr::include_graphics("images/whatif2.png") +``` +```{r echo = FALSE, fig.align = 'right', fig.dim = '100%'} +knitr::include_graphics("images/forloop.png") +``` + +--- + +## Introduction + +There are two main groups of flow control tools: **choices** and **loops**: + +- Choices (`if`, `switch`, `ifelse`, `dplyr::if_else`, `dplyr::case_when`) allow you to run different code depending on the input. + +- Loops (`for`, `while`, `repeat`) allow you to repeatedly run code + + +--- + + +## Choices + + + +`if()` and `else` + +Use `if` to specify a block of code to be executed, if a specified condition is true. Use `else` to specify a block of code to be executed, if the same condition is false. + +```{r, eval=FALSE} +if (condition) true_action +if (condition) true_action else false_action +``` + +(Note braces are only *needed* for compound expressions) + +```{r eval=FALSE, include=T} +if (test_expression) { + true_action +} else { + false_action +} +``` + + +Can be expanded to more alternatives: + +```{r, eval=FALSE} +if (test_expression) { + true_action +} else if (other_test_expression) { + other_action +} else { + false_action +} +``` + + +## Exercise {-} +Why does this work? +``` +x <- 1:10 +if (length(x)) "not empty" else "empty" +#> [1] "not empty" + +x <- numeric() +if (length(x)) "not empty" else "empty" +#> [1] "empty" +``` + +`if` returns a value which can be assigned + +```{r} +x1 <- if (TRUE) 1 else 2 +x2 <- if (FALSE) 1 else 2 + +c(x1, x2) +``` + +The book recommends assigning the results of an if statement only when the entire expression fits on one line; otherwise it tends to be hard to read. + + +## Single if without else {-} + +When you use the single argument form without an else statement, if invisibly (Section 6.7.2) returns NULL if the condition is FALSE. Since functions like c() and paste() drop NULL inputs, this allows for a compact expression of certain idioms: + +```{r, eval=FALSE} +greet <- function(name, birthday = FALSE) { + paste0( + "Hi ", name, + if (birthday) " and HAPPY BIRTHDAY" + ) +} +greet("Maria", FALSE) +#> [1] "Hi Maria" +greet("Jaime", TRUE) +#> [1] "Hi Jaime and HAPPY BIRTHDAY" +``` + + + +```{r, eval=FALSE} +format_lane_text <- function(number){ + + paste0( + number, + " lane", + if (number > 1) "s", + " of sequencing" + ) +} + +format_lane_text(1) +#> [1] "1 lane of sequencing" +format_lane_text(4) +#> [1] "4 lanes of sequencing" +``` + + + + +## Invalid inputs {-} + +- *Condition* must evaluate to a *single* `TRUE` or `FALSE` + +A single number gets coerced to a logical type. + +```{r, eval=FALSE} +if (56) 1 +#> [1] 1 +if (0.3) 1 +#> [1] 1 +if (0) 1 +``` + +If the condition cannot evaluate to a *single* `TRUE` or `FALSE`, an error is (usually) produced. + +```{r, eval=FALSE} +if ("text") 1 +#> Error in if ("text") 1: argument is not interpretable as logical +if ("true") 1 +#> 1 +if (numeric()) 1 +#> Error in if (numeric()) 1: argument is of length zero +if (NULL) 1 +#> Error in if (NULL) 1 : argument is of length zero +if (NA) 1 +#> Error in if (NA) 1: missing value where TRUE/FALSE needed +``` + + +Exception is a logical vector of length greater than 1, which only generates a warning, unless you have `_R_CHECK_LENGTH_1_CONDITION_` set to `TRUE`. +This seems to have been the default since R-4.2.0 + +```{r, eval=FALSE} +if (c(TRUE, FALSE)) 1 +#>Error in if (c(TRUE, FALSE)) 1 : the condition has length > 1 +``` + +## Vectorized choices {-} + +- `ifelse()` is a vectorized version of `if`: + +```{r, eval=FALSE} +x <- 1:10 +ifelse(x %% 5 == 0, "XXX", as.character(x)) +#> [1] "1" "2" "3" "4" "XXX" "6" "7" "8" "9" "XXX" + +ifelse(x %% 2 == 0, "even", "odd") +#> [1] "odd" "even" "odd" "even" "odd" "even" "odd" "even" "odd" "even" +``` + +- `dplyr::if_else()` + +- Book recommends only using `ifelse()` "only when the yes and no vectors are the same type as it is otherwise hard to predict the output type." + +- `dplyr::if_else()` enforces this recommendation. + +**For example:** + +```{r eval=FALSE, include=T} +ifelse(c(TRUE,TRUE,FALSE),"a",3) +#> [1] "a" "a" "3" +dplyr::if_else(c(TRUE,TRUE,FALSE),"a",3) +#> Error in `dplyr::if_else()`: +#> ! `false` must be a character vector, not a double vector. +``` + +## Switch {-} + +Rather then string together multiple if - else if chains, you can often use `switch`. + + +```{r message=FALSE, warning=FALSE} +centre <- function(x, type) { + switch(type, + mean = mean(x), + median = median(x), + trimmed = mean(x, trim = .1), + stop("Invalid `type` value") + ) +} +``` + +Last component should always throw an error, as unmatched inputs would otherwise invisibly return NULL. +Book recommends to only use character inputs for `switch()`. + +```{r, eval=FALSE} +vec <- c(1:20,50:55) +centre(vec, "mean") +#> [1] 20.19231 +centre(vec, "median") +#> [1] 13.5 +centre(vec, "trimmed") +#> [1] 18.77273 +``` + +```{r, message=FALSE} +set.seed(123) +x <- rlnorm(100) + +centers <- data.frame(type = c('mean', 'median', 'trimmed')) +centers$value = sapply(centers$type, \(t){centre(x,t)}) + +require(ggplot2) +ggplot(data = data.frame(x), aes(x))+ + geom_density()+ + geom_vline(data = centers, + mapping = aes(color = type, xintercept = value), + linewidth=0.5,linetype="dashed") + + xlim(-1,10)+ + theme_bw() +``` + + +Example from book of "falling through" to next value + +```{r} +legs <- function(x) { + switch(x, + cow = , + horse = , + dog = 4, + human = , + chicken = 2, + plant = 0, + stop("Unknown input") + ) +} +legs("cow") +#> [1] 4 +legs("dog") +#> [1] 4 +``` + + + + +## Using `dplyr::case_when` {-} + +- `case_when` is a more general `if_else` and can be used often in place of multiple chained `if_else` or sapply'ing `switch`. + +- It uses a special syntax to allow any number of condition-vector pairs: + +```{r message=FALSE, warning=FALSE} +set.seed(123) +x <- rlnorm(100) + +centers <- data.frame(type = c('mean', 'median', 'trimmed')) + +centers$value = dplyr::case_when( + centers$type == 'mean' ~ mean(x), + centers$type == 'median' ~ median(x), + centers$type == 'trimmed' ~ mean(x, trim = 0.1), + .default = 1000 + ) + +centers +``` + + + +## Loops + +- Iteration over a elements of a vector + +`for (item in vector) perform_action` + +**First example** +```{r} +for(i in 1:5) { + print(1:i) +} + +x <- numeric(length=5L) +df <- data.frame(x=1:5) + +for(i in 1:5) { + df$y[[i]] <- i+1 +} +``` + + +**Second example**: terminate a *for loop* earlier + +- `next` skips rest of current iteration +- `break` exits the loop entirely + +```{r} +for (i in 1:10) { + if (i < 3) + next + + print(i) + + if (i >= 5) + break +} +``` + +## Exercise {-} + +When the following code is evaluated, what can you say about the vector being iterated? +``` +xs <- c(1, 2, 3) +for (x in xs) { + xs <- c(xs, x * 2) +} +xs +#> [1] 1 2 3 2 4 6 +``` + +## Pitfalls {-} + +- Preallocate output containers to avoid *slow* code. + +- Beware that `1:length(v)` when `v` has length 0 results in a iterating backwards over `1:0`, probably not what is intended. Use `seq_along(v)` instead. + +- When iterating over S3 vectors, use `[[]]` yourself to avoid stripping attributes. + +``` +xs <- as.Date(c("2020-01-01", "2010-01-01")) +for (x in xs) { + print(x) +} +#> [1] 18262 +#> [1] 14610 +``` +vs. +``` +for (i in seq_along(xs)) { + print(xs[[i]]) +} +#> [1] "2020-01-01" +#> [1] "2010-01-01" +``` + +## Related tools {-} + +- `while(condition) action`: performs action while condition is TRUE. +- `repeat(action)`: repeats action forever (i.e. until it encounters break). + +- Note that `for` can be rewritten as `while` and while can be rewritten as `repeat` (this goes in one direction only!); *however*: + +>Good practice is to use the least-flexible solution to a problem, so you should use `for` wherever possible. +BUT you shouldn't even use for loops for data analysis tasks as `map()` and `apply()` already provide *less flexible* solutions to most problems. (More in Chapter 9.) + +```{r} +for (i in 1:5) { + print(i) +} + + +``` + +```{r} + +x_option <- function(x) { + switch(x, + a = "option 1", + b = "option 2", + c = "option 3"#, + #stop("Invalid `x` value") + ) +} + +``` + + + +```{r} +i <- 1 + +while(i <=5 ) { + print(i) + i <- i+1 +} +``` + +```{r} +i <- 1 + +repeat { + print(i) + i <- i+1 + if (i > 5) break +} + +``` + + +--- + + + + +## Meeting Videos + +### Cohort 1 + +`r knitr::include_url("https://www.youtube.com/embed/96eY6YS_3hU")` + +### Cohort 2 + +`r knitr::include_url("https://www.youtube.com/embed/x5I_uHnMxIk")` + +### Cohort 3 + +`r knitr::include_url("https://www.youtube.com/embed/u6UMGWDuxDE")` + +### Cohort 4 + +`r knitr::include_url("https://www.youtube.com/embed/G4YOvwsSw2Q")` + +### Cohort 5 + +`r knitr::include_url("https://www.youtube.com/embed/AZwJjsl8xiI")` + +### Cohort 6 + +`r knitr::include_url("https://www.youtube.com/embed/wg2QZ3rMIqM")` + +<details> +<summary> Meeting chat log </summary> + +``` +00:16:34 Federica Gazzelloni: https://github.com/r4ds/bookclub-Advanced_R +00:22:28 Federica Gazzelloni: https://stackoverflow.com/questions/50646133/dplyr-if-else-vs-base-r-ifelse +00:26:20 Trevin: case_when() is great, makes it easy to read +00:54:01 Trevin: out[I, ] +00:54:14 Trevin: out[i, ] +00:55:03 Trevin: I think you have to specify number of rows and columns before.. +00:55:30 Trevin: iterations = 10 + variables = 2 + + output <- matrix(ncol=variables, nrow=iterations) +00:55:43 Trevin: https://stackoverflow.com/questions/13442461/populating-a-data-frame-in-r-in-a-loop +``` +</details> + +### Cohort 7 + +`r knitr::include_url("https://www.youtube.com/embed/W9CoQ15NlOc")` + +<details> + +<summary>Meeting chat log</summary> +``` +00:40:18 Ryan Honomichl: What type of vector does each of the following calls to ifelse() return? + +* "ifelse returns a value with the same shape as test which is filled with elements selected from either yes or no depending on whether the element of test is TRUE or FALSE." +00:42:11 Ryan Honomichl: "I recommend assigning the results of an if statement only when the entire expression fits on one line; otherwise it tends to be hard to read" +00:42:46 Ryan Honomichl: * When you use the single argument form without an `else` statement, `if` invisibly returns NULL if the condition is FALSE. + +- Since functions like c() and paste() drop NULL inputs, this allows for a compact expression of certain idioms +00:54:15 collinberke: https://docs.google.com/spreadsheets/d/1ScrbEw_-vB9DruaJhjtVY8HLQmuNPqyWeOOjmG6OY1M/edit?usp=sharing +00:58:46 collinberke: https://www.youtube.com/@safe4democracy/videos +``` +</details> diff --git a/06_Functions.Rmd b/slides/06_Functions.Rmd diff --git a/07_Environments.Rmd b/slides/07_Environments.Rmd diff --git a/slides/08_Conditions.Rmd b/slides/08_Conditions.Rmd @@ -0,0 +1,549 @@ +# Conditions + +**Learning objectives:** + +- What conditions are +- How to use them + +## Introduction + +What are conditions? Problems that happen in functions: + +- Error +- Warning +- Message + +As a function author, one can signal them--that is, say there's a problem. + +As a function consumer, one can handle them--for example, react or ignore. + +## Signalling conditions + +### Types of conditions + +Three types of conditions: + +- `r emoji::emoji("x")` **Errors.** Problem arose, and the function cannot continue. +- `r emoji::emoji("warning")` **Warnings.** Problem arose, but the function can continue, if only partially. +- `r emoji::emoji("speech_balloon")` **Messages.** Something happened, and the user should know. + +### `r emoji::emoji("x")` Errors + +How to throw errors + +```{r throwing_errors, error = TRUE} +# with base R +stop("... in the name of love...") + +# with rlang +rlang::abort("...before you break my heart...") + +# with base R; without call +stop("... think it o-o-over...", call. = FALSE) +``` +Composing error messages + +- Mechanics. + - `stop()` pastes together arguments +```{r, error = TRUE} +some_val <- 1 +stop("Your value is: ", some_val, call. = FALSE) +``` + - `abort()` requires `{glue}` +```{r, error = TRUE} +some_val <- 1 +rlang::abort(glue::glue("Your value is: {some_val}")) +``` +- Style. See [here](http://style.tidyverse.org/error-messages.html). + +### `r emoji::emoji("warning")` Warnings + +May have multiple warnings per call + +```{r} +warn <- function() { + warning("This is your first warning") + warning("This is your second warning") + warning("This is your LAST warning") +} +``` + +Print all warnings once call is complete. + +```{r} +warn() +``` + +Like errors, `warning()` has + +- a call argument +- an `{rlang}` analog + +```{r} +# base R +# ... with call (implicitly .call = TRUE) +warning("Warning") +# ... with call suppressed +warning("Warning", call. = FALSE) + +# rlang +# note: call suppressed by default +rlang::warn("Warning") +``` + +(Hadley's) advice on usage: + +- Err on the side of errors. In other words, error rather than warn. +- But warnings make sense in a few cases: + - Function is being deprecated. Warn that it is reaching end of life. + - Function is reasonably sure to recover from issue. + +### `r emoji::emoji("speech_balloon")` Messages + +Mechanics: + +- Issued immediately +- Do not have a call argument + +Style: + +Messages are best when they inform about: + +- Default arguments +- Status updates of for functions used primarily for side-effects (e.g., interaction with web API, file downloaded, etc.) +- Progress of long-running process (in the absence of a status bar). +- Package loading message (e.g., attaching package, objects masked) + +## Ignoring conditions + +A few ways: + +- `try()` +- `suppressWarnings()` +- `suppressMessages()` + +### `try()` + +What it does: + +- Displays error +- But continues execution after error + +```{r} +bad_log <- function(x) { + try(log(x)) + 10 +} + +bad_log("bad") +``` + +Better ways to react to/recover from errors: + +1. Use `tryCatch()` to "catch" the error and perform a different action in the event of an error. +1. Set a default value inside the call. See below. + +```{r} +default <- NULL +try(default <- read.csv("possibly-bad-input.csv"), silent = TRUE) +``` + + +### `suppressWarnings()`, `suppressMessages()` + +What it does: + +- Supresses all warnings (messages) + +```{r} +# suppress warnings (from our `warn()` function above) +suppressWarnings(warn()) + +# suppress messages +many_messages <- function() { + message("Message 1") + message("Message 2") + message("Message 3") +} + +suppressMessages(many_messages()) +``` + +## Handling conditions + +Every condition has a default behavior: + +- `r emoji::emoji("x")` Errors halt execution +- `r emoji::emoji("warning")` Warnings are collected during execution and displayed in bulk after execution +- `r emoji::emoji("speech_balloon")` Messages are displayed immediately + +Condition handlers allow one to change that behavior (within the scope of a function). + +Two handler functions: + +- `tryCatch()` +- `withCallingHandlers()` + +```{r, eval=FALSE} +# try to run `code_to_try_to_run` +# if (error) condition is signalled, fun some other code +tryCatch( + error = function(cnd) { + # code to run when error is thrown + }, + code_to_try_to_run +) + +# try to `code_to_try_to_run` +# if condition is signalled, run code corresponding to condition type +withCallingHandlers( + warning = function(cnd) { + # code to run when warning is signalled + }, + message = function(cnd) { + # code to run when message is signalled + }, + code_to_try_to_run +) +``` + + +### Condition objects + +```{r} +# catch a condition +cnd <- rlang::catch_cnd(stop("An error")) +# inspect it +str(cnd) +``` + +The standard components + +- `message`. The error message. To extract it, use `conditionMessage(cnd)`. +- `call`. The function call that triggered the condition. To extract it, use `conditionCall(cnd)`. + +But custom conditions may contain other components. + +### Exiting handlers + +If a condition is signalled, this type of handler controls what code to run before exiting the function call. + +```{r} +f3 <- function(x) { + tryCatch( + # if error signalled, return NA + error = function(cnd) NA, + # try to run log + log(x) + ) +} + +f3("x") +``` + +When a condition is signalled, control moves to the handler and never returns to the original code. + +```{r} +tryCatch( + message = function(cnd) "There", + { + message("Here") + stop("This code is never run!") + } +) +``` + +The `tryCatch()` exit handler has one final argument: `finally`. This is run regardless of the condition of the original code. This is often used for clean-up. + +```{r} +# try to write text to disk +# if an error is signalled--for example, `path` does not exist +# or if no condition is signalled +# that is in both cases, the code block in `finally` is executed +path <- tempfile() +tryCatch( + { + writeLines("Hi!", path) + # ... + }, + finally = { + # always run + unlink(path) + } +) +``` + +### Calling handlers + +Definition by verbal comparison: + +- With exit handlers, code exits the normal flow once a condition is signalled +- With calling handlers, code continues in the normal flow once control is returned by the handler. + +Definition by code comparison: + +```{r} +# with an exit handler, control moves to the handler once condition signalled and does not move back +tryCatch( + message = function(cnd) cat("Caught a message!\n"), + { + message("Someone there?") + message("Why, yes!") + } +) + +# with a calling handler, control moves first to the handler and the moves back to the main code +withCallingHandlers( + message = function(cnd) cat("Caught a message!\n"), + { + message("Someone there?") + message("Why, yes!") + } +) +``` + +### By default, conditions propagate + +Let's suppose that there are nested handlers. If a condition is signalled in the child, it propagates to its parent handler(s). + +```{r} +# Bubbles all the way up to default handler which generates the message +withCallingHandlers( + message = function(cnd) cat("Level 2\n"), + withCallingHandlers( + message = function(cnd) cat("Level 1\n"), + message("Hello") + ) +) + +# Bubbles up to tryCatch +tryCatch( + message = function(cnd) cat("Level 2\n"), + withCallingHandlers( + message = function(cnd) cat("Level 1\n"), + message("Hello") + ) +) +``` + +### But conditions can be muffled + +If one wants to "muffle" the siginal, one needs to use `rlang::cnd_muffle()` + +```{r} +# Muffles the default handler which prints the messages +withCallingHandlers( + message = function(cnd) { + cat("Level 2\n") + rlang::cnd_muffle(cnd) + }, + withCallingHandlers( + message = function(cnd) cat("Level 1\n"), + message("Hello") + ) +) + +# Muffles level 2 handler and the default handler +withCallingHandlers( + message = function(cnd) cat("Level 2\n"), + withCallingHandlers( + message = function(cnd) { + cat("Level 1\n") + rlang::cnd_muffle(cnd) + }, + message("Hello") + ) +) +``` + +### Call stacks + +Call stacks of exiting and calling handlers differ. + +Why? + +> Calling handlers are called in the context of the call that signalled the condition +> exiting handlers are called in the context of the call to tryCatch() + +To see this, consider how the call stacks differ for a toy example. + +```{r} +# create a function +f <- function() g() +g <- function() h() +h <- function() message + +# call stack of calling handlers +withCallingHandlers(f(), message = function(cnd) { + lobstr::cst() + rlang::cnd_muffle(cnd) +}) + +# call stack of exit handlers +tryCatch(f(), message = function(cnd) lobstr::cst()) +tryCatch(f(), message = function(cnd) lobstr::cst()) +``` + +## Custom conditions + +### Motivation + +The `base::log()` function provides a minimal error message. + +```{r, error = TRUE} +log(letters) +log(1:10, base = letters) +``` + +One could make a more informative error message about which argument is problematic. + +```{r} +my_log <- function(x, base = exp(1)) { + if (!is.numeric(x)) { + rlang::abort(paste0( + "`x` must be a numeric vector; not ", typeof(x), "." + )) + } + if (!is.numeric(base)) { + rlang::abort(paste0( + "`base` must be a numeric vector; not ", typeof(base), "." + )) + } + + base::log(x, base = base) +} +``` + +Consider the difference: + +```{r, error = TRUE} +my_log(letters) +my_log(1:10, base = letters) +``` + + +### Signalling + +Create a helper function to describe errors: + +```{r} +abort_bad_argument <- function(arg, must, not = NULL) { + msg <- glue::glue("`{arg}` must {must}") + if (!is.null(not)) { + not <- typeof(not) + msg <- glue::glue("{msg}; not {not}.") + } + + rlang::abort( + "error_bad_argument", # <- this is the (error) class, I believe + message = msg, + arg = arg, + must = must, + not = not + ) +} +``` + +Rewrite the log function to use this helper function: + +```{r} +my_log <- function(x, base = exp(1)) { + if (!is.numeric(x)) { + abort_bad_argument("x", must = "be numeric", not = x) + } + if (!is.numeric(base)) { + abort_bad_argument("base", must = "be numeric", not = base) + } + + base::log(x, base = base) +} +``` + +See the result for the end user: + +```{r, error = TRUE} +my_log(letters) +my_log(1:10, base = letters) +``` + +### Handling + +Use class of condition object to allow for different handling of different types of errors + +```{r, error = TRUE} +tryCatch( + error_bad_argument = function(cnd) "bad_argument", + error = function(cnd) "other error", + my_log("a") +) +``` + +But note that the first handler that matches any of the signal's class, potentially in a vector of signal classes, will get control. So put the most specific handlers first. + +## Applications + +See [the sub-section in the book](https://adv-r.hadley.nz/conditions.html#condition-applications) for excellent examples. + +## Resources + +- Conditions articles in rlang vignettes: + - [Including function calls in error messages](https://rlang.r-lib.org/reference/topic-error-call.html) + - [Including contextual information with error chains](https://rlang.r-lib.org/reference/topic-error-chaining.html) + - [Formatting messages with cli](https://rlang.r-lib.org/reference/topic-condition-formatting.html) +- [Other resources](https://github.com/rstudio-conf-2022/pkg-dev-masterclass/blob/main/materials/5-error-resources.md) from error message segment of rstudio::conf(2022) workshop "Package Development Masterclass" + +## Meeting Videos + +### Cohort 1 + +`r knitr::include_url("https://www.youtube.com/embed/mwiNe083DLU")` + +### Cohort 2 + +`r knitr::include_url("https://www.youtube.com/embed/ZFUr7YRSu2o")` + +### Cohort 3 + +`r knitr::include_url("https://www.youtube.com/embed/UZhrsVz6wi0")` + +`r knitr::include_url("https://www.youtube.com/embed/Wt7p71_BuYY")` + +### Cohort 4 + +`r knitr::include_url("https://www.youtube.com/embed/WinIo5mrUZo")` + +### Cohort 5 + +`r knitr::include_url("https://www.youtube.com/embed/VFs-2sl5C70")` + +### Cohort 6 + +`r knitr::include_url("https://www.youtube.com/embed/VwmrbPUQY1k")` + +<details> +<summary> Meeting chat log </summary> + +``` +00:19:16 Trevin: https://style.tidyverse.org/error-messages.html +00:20:14 Trevin: More on errors in the design guide: https://design.tidyverse.org/ +01:14:27 Federica Gazzelloni: more info here: https://colinfay.me/learn-shiny-production/ +``` +</details> + +### Cohort 7 + +`r knitr::include_url("https://www.youtube.com/embed/t1N6XdidvNo")` + +<details> + +<summary>Meeting chat log</summary> +``` +00:34:09 Ron: Someone did: https://cran.r-project.org/web/packages/comprehenr/vignettes/Introduction.html +00:47:58 collinberke: https://purrr.tidyverse.org/reference/safely.html +00:48:24 Ron: it's a function operator ! +00:49:37 Ron: \(x) length(unique(x) is not too verbose though +00:49:39 Ron: ;) +01:06:50 collinberke: https://colinfay.me/purrr-mappers/ +01:07:45 collinberke: https://colinfay.me/purrr-web-mining/ +``` +</details> diff --git a/09_Functionals.Rmd b/slides/09_Functionals.Rmd diff --git a/slides/10_Function_factories.Rmd b/slides/10_Function_factories.Rmd @@ -0,0 +1,672 @@ +# Function factories + +**Learning objectives:** + +- Understand what a function factory is +- Recognise how function factories work +- Learn about non-obvious combination of function features +- Generate a family of functions from data + +```{r, message = FALSE} +library(rlang) +library(ggplot2) +library(scales) +``` + + +## What is a function factory? + + +A **function factory** is a function that makes (returns) functions + +Factory made function are **manufactured functions**. + +```{r 10-1, echo=FALSE, fig.align='center', fig.dim="50%",fig.alt="https://epsis.com/no/operations-centers-focus-on-ways-of-working/",fig.cap="Function factory | Credits: epsis.com"} +knitr::include_graphics("images/10-1-factories.png") +``` + + + +## How does a function factory work? +```{r 10-2, echo=FALSE, fig.align='center', fig.dim="100%",fig.cap="How does it work? | Credits: kakaakigas.com/how-it-works/"} +knitr::include_graphics("images/10-2-how.jpg") +``` + +```{r 10-ex1} +power1 <- function(exp) { + function(x) { + x ^ exp + } +} + +square <- power1(2) +cube <- power1(3) +``` +`power1()` is the function factory and `square()` and `cube()` are manufactured functions. + +## Important to remember + +1. R has First-class functions (can be created with `function()` and `<-`) + +> R functions are objects in their own right, a language property often called “first-class functions” +> -- [Section 6.2.3](https://adv-r.hadley.nz/functions.html?q=first%20class#first-class-functions) + +2. Functions capture (enclose) environment in which they are created + +```{r 10-ex3} +f <- function(x) function(y) x + y +fn_env(f) # The function f() +fn_env(f()) # The function created by f() +``` + +3. Functions create a new environment on each run +```{r 10-ex4} +f <- function(x) { + function() x + 1 +} +ff <- f(1) +ff() +ff() +``` + + +## Fundamentals - Environment + +- Environment when function is created defines arguments in the function +- Use `env_print(fun)` and `fn_env()` to explore + +```{r} +env_print(square) +fn_env(square)$exp +``` + +![Blue indicates environment, arrows bindings](images/10-3-procedure.png){width=50% fig-align=center} + +## Fundamentals - Forcing + +- Lazy evaluation means arguments only evaluated when used +- "[can] lead to a real head-scratcher of a bug" + +```{r} +x <- 2 +square <- power1(x) +x <- 3 +square(4) +``` + +- *Only applies if passing object as argument* +- Here argument `2` evaluated when function called + +```{r} +square <- power1(2) +x <- 3 +square(4) +``` + +So use `force()`! (Unless you want it to change with the `x` in the parent environment) + +## Forcing - Reiterated + +Only required if the argument is **not** evaluated before the new function is created: +```{r} +power1 <- function(exp) { + stopifnot(is.numeric(exp)) + function(x) x ^ exp +} + +x <- 2 +square <- power1(x) +x <- 3 +square(4) +``` + +## Fundamentals - Stateful functions + +Because + +- The enclosing environment is unique and constant, and +- We have `<<-` (super assignment) + +We can *change* that enclosing environment and keep track of that state +across iterations (!) + +- `<-` Assignment in *current* environment +- `<<-` Assignment in *parent* environment + +```{r 10-15} +new_counter <- function() { + i <- 0 + function() { + i <<- i + 1 # second assignment (super assignment) + i + } +} + +counter_one <- new_counter() +counter_two <- new_counter() +c(counter_one(), counter_one(), counter_one()) +c(counter_two(), counter_two(), counter_two()) +``` + + +> "As soon as your function starts managing the state of multiple variables, it’s better to switch to R6" + +## Fundamentals - Garbage collection + +- Because environment is attached to (enclosed by) function, temporary objects +don't go away. + +**Cleaning up** using `rm()` inside a function: +```{r 10-16} +f_dirty <- function(n) { + x <- runif(n) + m <- mean(x) + function() m +} + +f_clean <- function(n) { + x <- runif(n) + m <- mean(x) + rm(x) # <---- Important part! + function() m +} + +lobstr::obj_size(f_dirty(1e6)) +lobstr::obj_size(f_clean(1e6)) + +``` + + +## Useful Examples - Histograms and binwidth + +**Useful when...** + +- You need to pass a function +- You don't want to have to re-write the function every time + (the *default* behaviour of the function should be flexible) + + +For example, these bins are not appropriate +```{r} +#| fig-asp: 0.3 +sd <- c(1, 5, 15) +n <- 100 +df <- data.frame(x = rnorm(3 * n, sd = sd), sd = rep(sd, n)) + +ggplot(df, aes(x)) + + geom_histogram(binwidth = 2) + + facet_wrap(~ sd, scales = "free_x") + + labs(x = NULL) +``` + +We could just make a function... +```{r} +#| fig-asp: 0.3 +binwidth_bins <- function(x) (max(x) - min(x)) / 20 + +ggplot(df, aes(x = x)) + + geom_histogram(binwidth = binwidth_bins) + + facet_wrap(~ sd, scales = "free_x") + + labs(x = NULL) +``` + +But if we want to change the number of bins (20) we'd have to re-write the function +each time. + +If we use a factory, we don't have to do that. +```{r} +#| fig-asp: 0.3 +binwidth_bins <- function(n) { + force(n) + function(x) (max(x) - min(x)) / n +} + +ggplot(df, aes(x = x)) + + geom_histogram(binwidth = binwidth_bins(20)) + + facet_wrap(~ sd, scales = "free_x") + + labs(x = NULL, title = "20 bins") + +ggplot(df, aes(x = x)) + + geom_histogram(binwidth = binwidth_bins(5)) + + facet_wrap(~ sd, scales = "free_x") + + labs(x = NULL, title = "5 bins") +``` + +> Similar benefit in Box-cox example + +## Useful Examples - Wrapper + +**Useful when...** + +- You want to create a function that wraps a bunch of other functions + +For example, `ggsave()` wraps a bunch of different graphics device functions: + +```{r} +# (Even more simplified) +plot_dev <- function(ext, dpi = 96) { + force(dpi) + + switch( + ext, + svg = function(filename, ...) svglite::svglite(file = filename, ...), + png = function(...) grDevices::png(..., res = dpi, units = "in"), + jpg = , + jpeg = function(...) grDevices::jpeg(..., res = dpi, units = "in"), + stop("Unknown graphics extension: ", ext, call. = FALSE) + ) +} +``` + +Then `ggsave()` uses + +``` +ggsave <- function(...) { + dev <- plot_dev(device, filename, dpi = dpi) + ... + dev(filename = filename, width = dim[1], height = dim[2], bg = bg, ...) + ... +} +``` + +Otherwise, would have to do something like like a bunch of if/else statements. + +## Useful Examples - Optimizing + +**Useful when...** + +- Want to pass function on to `optimise()`/`optimize()` +- Want to perform pre-computations to speed things up +- Want to re-use this for other datasets + +(*Skipping to final results from section*) + +Here, using MLE want to to find the most likely value of lambda for a Poisson distribution +and this data. +```{r} +x1 <- c(41, 30, 31, 38, 29, 24, 30, 29, 31, 38) +``` + +We'll create a function that creates a lambda assessment function for a given +data set. + +```{r} +ll_poisson <- function(x) { + n <- length(x) + sum_x <- sum(x) + c <- sum(lfactorial(x)) + + function(lambda) { + log(lambda) * sum_x - n * lambda - c + } +} +``` + +We can use this on different data sets, but here use ours `x1` +```{r} +ll <- ll_poisson(x1) +ll(10) # Log-probility of a lambda = 10 +``` + +Use `optimise()` rather than trial and error +```{r} +optimise(ll, c(0, 100), maximum = TRUE) +``` + +Result: Highest log-probability is -30.3, best lambda is 32.1 + + +## Function factories + functionals + +Combine functionals and function factories to turn data into many functions. + +```{r} +names <- list( + square = 2, + cube = 3, + root = 1/2, + cuberoot = 1/3, + reciprocal = -1 +) +funs <- purrr::map(names, power1) +names(funs) +funs$root(64) +funs$square(3) +``` + +Avoid the prefix with + +- `with()` - `with(funs, root(100))` + - Temporary, clear, short-term +- `attach()` - `attach(funs)` / `detach(funs)` + - Added to search path (like package function), cannot be overwritten, but can be attached multiple times! +- `rlang::env_bind` - `env_bind(globalenv(), !!!funs)` / `env_unbind(gloablenv(), names(funs))` + - Added to global env (like created function), can be overwritten + +<!-- +## EXTRA - Previous set of slides + +Graphical factories **useful function factories**, such as: + +1. Labelling with: + + * formatter functions + +```{r 10-19} +y <- c(12345, 123456, 1234567) +comma_format()(y) +``` +```{r 10-20} +number_format(scale = 1e-3, suffix = " K")(y) +``` +They are more commonly used inside a ggplot: +```{r 10-21, include=FALSE} +df <- data.frame(x = 1, y = y) +a_ggplot_object <- ggplot(df, aes(x, y)) + + geom_point() + + scale_x_continuous(breaks = 1, labels = NULL) + + labs(x = NULL, y = NULL) +``` + +```{r 10-22, eval=T} +a_ggplot_object + + scale_y_continuous( + labels = comma_format() +) +``` + +2. Using binwidth in facet histograms + + * binwidth_bins + +```{r} +binwidth_bins <- function(n) { + force(n) + + function(x) { + (max(x) - min(x)) / n + } +} +``` + +Or use a concatenation of this typr of detecting number of bins functions: + + - nclass.Sturges() + - nclass.scott() + - nclass.FD() + +```{r} +base_bins <- function(type) { + fun <- switch(type, + Sturges = nclass.Sturges, + scott = nclass.scott, + FD = nclass.FD, + stop("Unknown type", call. = FALSE) + ) + + function(x) { + (max(x) - min(x)) / fun(x) + } +} +``` + + +3. Internals: + + * ggplot2:::plot_dev() + + +## Non-obvious combinations + + +- The **Box-Cox** transformation. +- **Bootstrap** resampling. +- **Maximum likelihood** estimation. + + +### Statistical factories + +The **Box-Cox** transformation towards normality: +```{r} +boxcox1 <- function(x, lambda) { + stopifnot(length(lambda) == 1) + + if (lambda == 0) { + log(x) + } else { + (x ^ lambda - 1) / lambda + } +} +``` + + +```{r} +boxcox2 <- function(lambda) { + if (lambda == 0) { + function(x) log(x) + } else { + function(x) (x ^ lambda - 1) / lambda + } +} + +stat_boxcox <- function(lambda) { + stat_function(aes(colour = lambda), fun = boxcox2(lambda), size = 1) +} + +plot1 <- ggplot(data.frame(x = c(0, 5)), aes(x)) + + lapply(c(0.5, 1, 1.5), stat_boxcox) + + scale_colour_viridis_c(limits = c(0, 1.5)) + +# visually, log() does seem to make sense as the transformation +# for lambda = 0; as values get smaller and smaller, the function +# gets close and closer to a log transformation +plot2 <- ggplot(data.frame(x = c(0.01, 1)), aes(x)) + + lapply(c(0.5, 0.25, 0.1, 0), stat_boxcox) + + scale_colour_viridis_c(limits = c(0, 1.5)) +library(patchwork) +plot1+plot2 +``` + +**Bootstrap generators** + + +```{r} +boot_permute <- function(df, var) { + n <- nrow(df) + force(var) + + function() { + col <- df[[var]] + col[sample(n, replace = TRUE)] + } +} + +boot_mtcars1 <- boot_permute(mtcars, "mpg") +head(boot_mtcars1()) +#> [1] 16.4 22.8 22.8 22.8 16.4 19.2 +head(boot_mtcars1()) +#> [1] 17.8 18.7 30.4 30.4 16.4 21.0 +``` +```{r} +boot_model <- function(df, formula) { + mod <- lm(formula, data = df) + fitted <- unname(fitted(mod)) + resid <- unname(resid(mod)) + rm(mod) + + function() { + fitted + sample(resid) + } +} + +boot_mtcars2 <- boot_model(mtcars, mpg ~ wt) +head(boot_mtcars2()) +#> [1] 25.0 24.0 21.7 19.2 24.9 16.0 +head(boot_mtcars2()) +#> [1] 27.4 21.0 20.3 19.4 16.3 21.3 +``` + +**Maximum likelihood estimation** + +$$P(\lambda,x)=\prod_{i=1}^{n}\frac{\lambda^{x_i}e^{-\lambda}}{x_i!}$$ +```{r} +lprob_poisson <- function(lambda, x) { + n <- length(x) + (log(lambda) * sum(x)) - (n * lambda) - sum(lfactorial(x)) +} +``` + +```{r} +x1 <- c(41, 30, 31, 38, 29, 24, 30, 29, 31, 38) +``` + +```{r} +lprob_poisson(10, x1) +#> [1] -184 +lprob_poisson(20, x1) +#> [1] -61.1 +lprob_poisson(30, x1) +#> [1] -31 +``` + +```{r} +ll_poisson1 <- function(x) { + n <- length(x) + + function(lambda) { + log(lambda) * sum(x) - n * lambda - sum(lfactorial(x)) + } +} +``` +```{r} +ll_poisson2 <- function(x) { + n <- length(x) + sum_x <- sum(x) + c <- sum(lfactorial(x)) + + function(lambda) { + log(lambda) * sum_x - n * lambda - c + } +} +``` + +```{r} +ll1 <- ll_poisson2(x1) + +ll1(10) +#> [1] -184 +ll1(20) +#> [1] -61.1 +ll1(30) +#> [1] -31 +``` +```{r} +optimise(ll1, c(0, 100), maximum = TRUE) +#> $maximum +#> [1] 32.1 +#> +#> $objective +#> [1] -30.3 +``` +```{r} +optimise(lprob_poisson, c(0, 100), x = x1, maximum = TRUE) +#> $maximum +#> [1] 32.1 +#> +#> $objective +#> [1] -30.3 +``` + +## Function factory applications + + +Combine functionals and function factories to turn data into many functions. + +### Function factories + functionals +```{r} +names <- list( + square = 2, + cube = 3, + root = 1/2, + cuberoot = 1/3, + reciprocal = -1 +) +funs <- purrr::map(names, power1) + +funs$root(64) +#> [1] 8 +funs$root +#> function(x) { +#> x ^ exp +#> } +#> <bytecode: 0x7fe85512a410> +#> <environment: 0x7fe85b21f190> +``` +```{r} +with(funs, root(100)) +#> [1] 10 +``` + +```{r} +attach(funs) +#> The following objects are masked _by_ .GlobalEnv: +#> +#> cube, square +root(100) +#> [1] 10 +detach(funs) +``` + + +```{r} +rlang::env_bind(globalenv(), !!!funs) +root(100) +#> [1] 10 +``` + +```{r} +rlang::env_unbind(globalenv(), names(funs)) +``` + + +--> + + +## Meeting Videos + +### Cohort 1 + +`r knitr::include_url("https://www.youtube.com/embed/enI5Ynq6olI")` + +### Cohort 2 + +`r knitr::include_url("https://www.youtube.com/embed/U-CoF7MCik0")` + +### Cohort 3 + +`r knitr::include_url("https://www.youtube.com/embed/qgn7WTITnNs")` + +### Cohort 4 + +`r knitr::include_url("https://www.youtube.com/embed/GHp2W4JxVaY")` + +### Cohort 5 + +`r knitr::include_url("https://www.youtube.com/embed/8TGXjzi0n0o")` + +### Cohort 6 + +`r knitr::include_url("https://www.youtube.com/embed/FUoYwYFqT7Q")` + +<details> +<summary> Meeting chat log </summary> + +``` +01:02:25 Trevin: I'm good with combining 👍 +01:02:57 Oluwafemi Oyedele: I agree with combining the chapter!!! +``` +</details> + +### Cohort 7 + +`r knitr::include_url("https://www.youtube.com/embed/7GLyO3IntgE")` diff --git a/11_Function_operators.Rmd b/slides/11_Function_operators.Rmd diff --git a/12_Base_types.Rmd b/slides/12_Base_types.Rmd diff --git a/slides/13_S3.Rmd b/slides/13_S3.Rmd @@ -0,0 +1,417 @@ +# S3 + +## Introcudion + +## Basics + +- Has class +- Uses a generic function to decide on method + - method = implementation for a specific class + - dispatch = process of searching for right method + +## Classes + +**Theory:** + +What is class? + + - No formal definition in S3 + - Simply set class attribute + +How to set class? + + - At time of object creation + - After object creation + +```{r} +# at time of object creation +x <- structure(list(), class = "my_class") + +# after object creation +x <- list() +class(x) <- "my_class" +``` + +Some advice on style: + + - Rules: Can be any string + - Advice: Consider using/including package name to avoid collision with name of another class (e.g., `blob`, which defines a single class; haven has `labelled` and `haven_labelled`) + - Convention: letters and `_`; avoid `.` since it might be confused as separator between generic and class name + +**Practice:** + +How to compose a class in practice? + +- **Constructor**, which helps the developer create new object of target class. Provide always. +- **Validator**, which checks that values in constructor are valid. May not be necessary for simple classes. +- **Helper**, which helps users create new objects of target class. May be relevant only for user-facing classes. + +### Constructors + +Help developers construct an object of the target class: + +```{r} +new_difftime <- function(x = double(), units = "secs") { + # check inputs + # issue generic system error if unexpected type or value + stopifnot(is.double(x)) + units <- match.arg(units, c("secs", "mins", "hours", "days", "weeks")) + + # construct instance of target class + structure(x, + class = "difftime", + units = units + ) +} +``` + +### Validators + +Contrast a constructor, aimed at quickly creating instances of a class, which only checks type of inputs ... + +```{r, error = TRUE} +new_factor <- function(x = integer(), levels = character()) { + stopifnot(is.integer(x)) + stopifnot(is.character(levels)) + + structure( + x, + levels = levels, + class = "factor" + ) +} + +# error messages are for system default and developer-facing +new_factor(1:5, "a") +``` + + +... with a validator, aimed at emitting errors if inputs pose problems, which makes more expensive checks + +```{r, error = TRUE} +validate_factor <- function(x) { + values <- unclass(x) + levels <- attr(x, "levels") + + if (!all(!is.na(values) & values > 0)) { + stop( + "All `x` values must be non-missing and greater than zero", + call. = FALSE + ) + } + + if (length(levels) < max(values)) { + stop( + "There must be at least as many `levels` as possible values in `x`", + call. = FALSE + ) + } + + x +} + +# error messages are informative and user-facing +validate_factor(new_factor(1:5, "a")) +``` + +Maybe there is a typo in the `validate_factor()` function? Do the integers need to start at 1 and be consecutive? + +* If not, then `length(levels) < max(values)` should be `length(levels) < length(values)`, right? +* If so, why do the integers need to start at 1 and be consecutive? And if they need to be as such, we should tell the user, right? + +```{r, error = TRUE} +validate_factor(new_factor(1:3, levels = c("a", "b", "c"))) +validate_factor(new_factor(10:12, levels = c("a", "b", "c"))) +``` + + +### Helpers + +Some desired virtues: + +- Have the same name as the class +- Call the constructor and validator, if the latter exists. +- Issue error informative, user-facing error messages +- Adopt thoughtful/useful defaults or type conversion + + +Exercise 5 in 13.3.4 + +Q: Read the documentation for `utils::as.roman()`. How would you write a constructor for this class? Does it need a validator? What might a helper do? + +A: This function transforms numeric input into Roman numbers. It is built on the integer type, which results in the following constructor. + + +```{r} +new_roman <- function(x = integer()) { + stopifnot(is.integer(x)) + structure(x, class = "roman") +} +``` + +The documentation tells us, that only values between 1 and 3899 are uniquely represented, which we then include in our validation function. + +```{r} +validate_roman <- function(x) { + values <- unclass(x) + + if (any(values < 1 | values > 3899)) { + stop( + "Roman numbers must fall between 1 and 3899.", + call. = FALSE + ) + } + x +} +``` + +For convenience, we allow the user to also pass real values to a helper function. + +```{r, error = TRUE} +roman <- function(x = integer()) { + x <- as.integer(x) + + validate_roman(new_roman(x)) +} + +# Test +roman(c(1, 753, 2024)) + +roman(0) +``` + + + +## Generics and methods + +**Generic functions:** + +- Consist of a call to `UseMethod()` +- Pass arguments from the generic to the dispatched method "auto-magically" + +```{r} +my_new_generic <- function(x) { + UseMethod("my_new_generic") +} +``` + +### Method dispatch + +- `UseMethod()` creates a vector of method names +- Dispatch + - Examines all methods in the vector + - Selects a method + +```{r} +x <- Sys.Date() +sloop::s3_dispatch(print(x)) +``` + +### Finding methods + +While `sloop::s3_dispatch()` gives the specific method selected for a specific call, on can see the methods defined: + +- For a generic +```{r} +sloop::s3_methods_generic("mean") +``` +- For a class +```{r} +sloop::s3_methods_class("ordered") +``` + +### Creating methods + +Two rules: + +- Only write a method if you own the generic. Otherwise, bad manners. +- Method must have same arguments as its generic--with one important exception: `...` + +**Example from text:** + +I thought it would be good for us to work through this problem. + +> Carefully read the documentation for `UseMethod()` and explain why the following code returns the results that it does. What two usual rules of function evaluation does `UseMethod()` violate? + +```{r} +g <- function(x) { + x <- 10 + y <- 10 + UseMethod("g") +} +g.default <- function(x) c(x = x, y = y) + +x <- 1 +y <- 1 +g(x) +g.default(x) +``` + + + +**Examples caught in the wild:** + +- [`haven::zap_label`](https://github.com/tidyverse/haven/blob/main/R/zap_label.R), which removes column labels +- [`dplyr::mutate`](https://github.com/tidyverse/dplyr/blob/main/R/mutate.R) +- [`tidyr::pivot_longer`](https://github.com/tidyverse/tidyr/blob/main/R/pivot-long.R) + +## Object styles + +## Inheritance + +Three ideas: + +1. Class is a vector of classes +```{r} +class(ordered("x")) +class(Sys.time()) +``` +2. Dispatch moves through class vector until it finds a defined method +```{r} +sloop::s3_dispatch(print(ordered("x"))) +``` +3. Method can delegate to another method via `NextMethod()`, which is indicated by `->` as below: +```{r} +sloop::s3_dispatch(ordered("x")[1]) +``` + +### `NextMethod()` + +Consider `secret` class that masks each character of the input with `x` in output + +```{r} +new_secret <- function(x = double()) { + stopifnot(is.double(x)) + structure(x, class = "secret") +} + +print.secret <- function(x, ...) { + print(strrep("x", nchar(x))) + invisible(x) +} + +y <- new_secret(c(15, 1, 456)) +y +``` + +Notice that the `[` method is problematic in that it does not preserve the `secret` class. Additionally, it returns `15` as the first element instead of `xx`. + +```{r} +sloop::s3_dispatch(y[1]) +y[1] +``` + +Fix this with a `[.secret` method: + +The first fix (not run) is inefficient because it creates a copy of `y`. + +```{r eval = FALSE} +# not run +`[.secret` <- function(x, i) { + x <- unclass(x) + new_secret(x[i]) +} +``` + +`NextMethod()` is more efficient. + +```{r} +`[.secret` <- function(x, i) { + # first, dispatch to `[` + # then, coerce subset value to `secret` class + new_secret(NextMethod()) +} +``` + +Notice that `[.secret` is selected for dispatch, but that the method delegates to the internal `[`. + +```{r} +sloop::s3_dispatch(y[1]) +y[1] +``` + + +### Allowing subclassing + +Continue the example above to have a `supersecret` subclass that hides even the number of characters in the input (e.g., `123` -> `xxxxx`, 12345678 -> `xxxxx`, 1 -> `xxxxx`). + +To allow for this subclass, the constructor function needs to include two additional arguments: + +- `...` for passing an arbitrary set of arguments to different subclasses +- `class` for defining the subclass + +```{r} +new_secret <- function(x, ..., class = character()) { + stopifnot(is.double(x)) + + structure( + x, + ..., + class = c(class, "secret") + ) +} +``` + +To create the subclass, simply invoke the parent class constructor inside of the subclass constructor: + +```{r} +new_supersecret <- function(x) { + new_secret(x, class = "supersecret") +} + +print.supersecret <- function(x, ...) { + print(rep("xxxxx", length(x))) + invisible(x) +} +``` + +But this means the subclass inherits all parent methods and needs to overwrite all parent methods with subclass methods that return the sublclass rather than the parent class. + +There's no easy solution to this problem in base R. + +There is a solution in the vectors package: `vctrs::vec_restore()` + +<!-- TODO: read docs/vignettes to be able to summarize how this works --> + +## Meeting Videos + +### Cohort 1 + +`r knitr::include_url("https://www.youtube.com/embed/Fy3JF5Em6qY")` + +### Cohort 2 + +`r knitr::include_url("https://www.youtube.com/embed/9GkgNC15EAw")` + +### Cohort 3 + +`r knitr::include_url("https://www.youtube.com/embed/q7lFXSLdC1g")` + +`r knitr::include_url("https://www.youtube.com/embed/2rHS_urTGFg")` + +### Cohort 4 + +`r knitr::include_url("https://www.youtube.com/embed/4la5adcWwKE")` + +`r knitr::include_url("https://www.youtube.com/embed/eTCT2O58GYM")` + +### Cohort 5 + +`r knitr::include_url("https://www.youtube.com/embed/NeHtEGab1Og")` + +### Cohort 6 + +`r knitr::include_url("https://www.youtube.com/embed/vzbl2o-MEeQ")` + +<details> +<summary> Meeting chat log </summary> + +``` +00:05:30 Oluwafemi Oyedele: Hi everyone, Good Evening !!! +00:09:44 Trevin: I agree Arthur, need to look at that package some more +``` +</details> + +### Cohort 7 + +`r knitr::include_url("https://www.youtube.com/embed/zNLx4q8TCKQ")` diff --git a/14_R6.Rmd b/slides/14_R6.Rmd diff --git a/slides/15_S4.Rmd b/slides/15_S4.Rmd @@ -0,0 +1,360 @@ +# S4 + +## Introduction + +Object consists of: + +- Slots. Like fields in R6. +- Methods. Accessed through generics. Dispatched to particular methods. + +Uses functions to define classes and their methods: + +- `setClass()`. Define class and its components. +- `setGenerics()`. Define generic functions. Used to dispatch. +- `setMethods()`. Define methods + +## Basics overview + +### Set class + +Define the class: + +```{r} +setClass("Person", + slots = c( + name = "character", + age = "numeric" + ) +) +``` + +Create an instance of the class + +```{r} +john <- new("Person", name = "John Smith", age = NA_real_) +``` + + +### Set generics + +Define generic functions for setting and getting the age slot + +```{r} +# get the value +setGeneric("age", function(x) standardGeneric("age")) +# set the value +setGeneric("age<-", function(x, value) standardGeneric("age<-")) +``` + +### Set methods + +Define methods for the generics: + +```{r} +# get the value +setMethod("age", "Person", function(x) x@age) +# set the value +setMethod("age<-", "Person", function(x, value) { + x@age <- value + x +}) + +# set the value +age(john) <- 50 +# get the value +age(john) +``` + +To give a flavor, there is only one method per slot. In more realistic cases, there might be several methods. + +## Details on defining the class + +### Inheritance + +```{r, eval=FALSE} +setClass("Employee", + contains = "Person", + slots = c( + boss = "Person" + ), + prototype = list( + boss = new("Person") + ) +) +``` + +### Instantiation + +Create an instance of the class at two levels: + +- For developer (you): `methods::new()` +- For user: constructor function + +```{r} +# how user constructs an instance +Person <- function(name, age = NA) { + age <- as.double(age) + + # how the developer constructs an instance + new("Person", name = name, age = age) +} + +Person("Someone") +``` + +### Validation + +S4 objects + +- Check class of slot at creation +```{r, error = TRUE} +Person(mtcars) +``` + +- Do **not** check other things +```{r} +Person("Hadley", age = c(30, 37)) +``` + +That's where validation comes in--at two stages: + +1. At creation +2. At modification + +#### At creation + +```{r, error = TRUE} +setValidity("Person", function(object) { + if (length(object@name) != length(object@age)) { + "@name and @age must be same length" + } else { + TRUE + } +}) + +Person("Hadley", age = c(30, 37)) +``` + +#### At modification + +```{r, error = TRUE} +# get value +setGeneric("name", function(x) standardGeneric("name")) +setMethod("name", "Person", function(x) x@name) + +# set value--and assess whether resulting object is valid +setGeneric("name<-", function(x, value) standardGeneric("name<-")) +setMethod("name<-", "Person", function(x, value) { + x@name <- value + validObject(x) + x +}) + +# normal name; no problem +name(john) <- "Jon Smythe" +name(john) + +# invalid name; error thrown +name(john) <- letters +``` + + +## Details on generics and methods + +### Dictate dispatch via signature + +Specify function arguments to be used in determining method. + +```{r} +setGeneric("myGeneric", + function(x, ..., verbose = TRUE) standardGeneric("myGeneric"), + signature = "x" +) +``` + +### Define generics + +General form: + +```{r, eval=FALSE} +setMethod("myGeneric", "Person", function(x) { + # method implementation +}) +``` + +Example to print object: + +```{r} +setMethod("show", "Person", function(object) { + cat(is(object)[[1]], "\n", + " Name: ", object@name, "\n", + " Age: ", object@age, "\n", + sep = "" + ) +}) +john +``` + +Example to access slot: + +```{r} +setGeneric("name", function(x) standardGeneric("name")) +setMethod("name", "Person", function(x) x@name) + +name(john) +``` + + +This is how end users should access slots. + +## Example: `lubridate::period()` + +### Define the class + +```{r, eval=FALSE} +setClass("Period", + # inherits from these classes + contains = c("Timespan", "numeric"), + # has slots for time components + slots = c( + year = "numeric", + month = "numeric", + day = "numeric", + hour = "numeric", + minute = "numeric" + ), + # defines prototype as period of zero duration for all slots + prototype = prototype(year = 0, month = 0, day = 0, hour = 0, minute = 0), + # check validity with `check_period` function; see section below + validity = check_period +) +``` + +See source code [here](https://github.com/tidyverse/lubridate/blob/0bb49b21c88736240219dc67e7ed0eb3df15d9b1/R/periods.r#L90) + +### Validate object + +Check whether object is valid--notably if all arugments have the same length and are integers. + +```{r, eval=FALSE} +check_period <- function(object) { + # start with an empty vector of error messages + errors <- character() + + # check length of object's data + length(object@.Data) -> n + # check length of each slot + lengths <- c( + length(object@year), + length(object@month), + length(object@day), + length(object@hour), + length(object@minute) + ) + + # if length of any slot is different than overall length, compose error message + if (any(lengths != n)) { + msg <- paste("Inconsistent lengths: year = ", lengths[1], + ", month = ", lengths[2], + ", day = ", lengths[3], + ", hour = ", lengths[4], + ", minute = ", lengths[5], + ", second = ", n, + sep = "" + ) + # add just-composed error to vector of error messages + errors <- c(errors, msg) + } + + values <- c(object@year, object@month, object@day, object@hour, object@minute) + values <- na.omit(values) + if (sum(values - trunc(values))) { + msg <- "periods must have integer values" + errors <- c(errors, msg) + } + + if (length(errors) == 0) { + TRUE + } else { + errors + } +} +``` + + +See source code [here](https://github.com/tidyverse/lubridate/blob/0bb49b21c88736240219dc67e7ed0eb3df15d9b1/R/periods.r#L6). + +### Set methods + +Show period: + +```{r, eval=FALSE} +#' @export +setMethod("show", signature(object = "Period"), function(object) { + if (length(object@.Data) == 0) { + cat("<Period[0]>\n") + } else { + print(format(object)) + } +}) + +#' @export +format.Period <- function(x, ...) { + if (length(x) == 0) { + return(character()) + } + + show <- paste( + x@year, "y ", x@month, "m ", x@day, "d ", + x@hour, "H ", x@minute, "M ", x@.Data, "S", + sep = "" + ) + start <- regexpr("[-1-9]|(0\\.)", show) + show <- ifelse(start > 0, substr(show, start, nchar(show)), "0S") + + show[is.na(x)] <- NA + show +} +``` + +See source code [here](https://github.com/tidyverse/lubridate/blob/0bb49b21c88736240219dc67e7ed0eb3df15d9b1/R/periods.r#L195) + +## Meeting Videos + +### Cohort 1 + +`r knitr::include_url("https://www.youtube.com/embed/a1jzpWiksyA")` + +### Cohort 2 + +`r knitr::include_url("https://www.youtube.com/embed/bzo37PHCM1I")` + +### Cohort 3 + +`r knitr::include_url("https://www.youtube.com/embed/WWnJ5Cl-aTE")` + +`r knitr::include_url("https://www.youtube.com/embed/_byYFTQHp1Y")` + +### Cohort 4 + +`r knitr::include_url("https://www.youtube.com/embed/M8Poajmj-HU")` + +### Cohort 5 + +`r knitr::include_url("https://www.youtube.com/embed/unNfE1fDFEY")` + +### Cohort 6 + +`r knitr::include_url("https://www.youtube.com/embed/q1-QUFJsbLA")` + +### Cohort 7 + +`r knitr::include_url("https://www.youtube.com/embed/puvaJtv9gQw")` + +<details> + +<summary>Meeting chat log</summary> +``` +01:09:37 Ron Legere: https://en.wikipedia.org/wiki/Composition_over_inheritance +``` +</details> diff --git a/16_Trade-offs.Rmd b/slides/16_Trade-offs.Rmd diff --git a/17_Big_picture.Rmd b/slides/17_Big_picture.Rmd diff --git a/18_Expressions.Rmd b/slides/18_Expressions.Rmd diff --git a/19_Quasiquotation.Rmd b/slides/19_Quasiquotation.Rmd diff --git a/20_Evaluation.Rmd b/slides/20_Evaluation.Rmd diff --git a/21_Translating_R_code.Rmd b/slides/21_Translating_R_code.Rmd diff --git a/22_Debugging.Rmd b/slides/22_Debugging.Rmd diff --git a/23_Measuring_performance.Rmd b/slides/23_Measuring_performance.Rmd diff --git a/slides/24_Improving_performance.Rmd b/slides/24_Improving_performance.Rmd @@ -0,0 +1,290 @@ +# Improving performance + + +## Overview + +1. Code organization +2. Check for existing solutions +3. Do as little as possible +4. Vectorise +5. Avoid Copies + +## Organizing code + +- Write a function for each approach +```{r} +mean1 <- function(x) mean(x) +mean2 <- function(x) sum(x) / length(x) +``` +- Keep old functions that you've tried, even the failures +- Generate a representative test case +```{r} +x <- runif(1e5) +``` +- Use `bench::mark` to compare the different versions (and include unit tests) +```{r} +bench::mark( + mean1(x), + mean2(x) +)[c("expression", "min", "median", "itr/sec", "n_gc")] +``` + +## Check for Existing Solution +- CRAN task views (http://cran.rstudio.com/web/views/) +- Reverse dependencies of Rcpp (https://cran.r-project.org/web/packages/Rcpp/) +- Talk to others! + - Google (rseek) + - Stackoverflow ([R]) + - https://community.rstudio.com/ + - DSLC community + +## Do as little as possible +- use a function tailored to a more specific type of input or output, or to a more specific problem + - `rowSums()`, `colSums()`, `rowMeans()`, and `colMeans()` are faster than equivalent invocations that use `apply()` because they are vectorised + - `vapply()` is faster than `sapply()` because it pre-specifies the output type + - `any(x == 10)` is much faster than `10 %in% x` because testing equality is simpler than testing set inclusion +- Some functions coerce their inputs into a specific type. If your input is not the right type, the function has to do extra work + - e.g. `apply()` will always turn a dataframe into a matrix +- Other examples + - `read.csv()`: specify known column types with `colClasses`. (Also consider + switching to `readr::read_csv()` or `data.table::fread()` which are + considerably faster than `read.csv()`.) + + - `factor()`: specify known levels with `levels`. + + - `cut()`: don't generate labels with `labels = FALSE` if you don't need them, + or, even better, use `findInterval()` as mentioned in the "see also" section + of the documentation. + + - `unlist(x, use.names = FALSE)` is much faster than `unlist(x)`. + + - `interaction()`: if you only need combinations that exist in the data, use + `drop = TRUE`. + +## Avoiding Method Dispatch +```{r} +x <- runif(1e2) +bench::mark( + mean(x), + mean.default(x) +)[c("expression", "min", "median", "itr/sec", "n_gc")] +``` + +```{r} +x <- runif(1e2) +bench::mark( + mean(x), + mean.default(x), + .Internal(mean(x)) +)[c("expression", "min", "median", "itr/sec", "n_gc")] +``` + +```{r} +x <- runif(1e4) +bench::mark( + mean(x), + mean.default(x), + .Internal(mean(x)) +)[c("expression", "min", "median", "itr/sec", "n_gc")] +``` + +## Avoiding Input Coercion +- `as.data.frame()` is quite slow because it coerces each element into a data frame and then `rbind()`s them together +- instead, if you have a named list with vectors of equal length, you can directly transform it into a data frame + +```{r} +quickdf <- function(l) { + class(l) <- "data.frame" + attr(l, "row.names") <- .set_row_names(length(l[[1]])) + l +} +l <- lapply(1:26, function(i) runif(1e3)) +names(l) <- letters +bench::mark( + as.data.frame = as.data.frame(l), + quick_df = quickdf(l) +)[c("expression", "min", "median", "itr/sec", "n_gc")] +``` + +*Caveat!* This method is fast because it's dangerous! + +## Vectorise +- vectorisation means finding the existing R function that is implemented in C and most closely applies to your problem +- Vectorised functions that apply to many scenarios + - `rowSums()`, `colSums()`, `rowMeans()`, and `colMeans()` + - Vectorised subsetting can lead to big improvements in speed + - `cut()` and `findInterval()` for converting continuous variables to categorical + - Be aware of vectorised functions like `cumsum()` and `diff()` + - Matrix algebra is a general example of vectorisation + +## Avoiding copies + +- Whenever you use c(), append(), cbind(), rbind(), or paste() to create a bigger object, R must first allocate space for the new object and then copy the old object to its new home. + +```{r} +random_string <- function() { + paste(sample(letters, 50, replace = TRUE), collapse = "") +} +strings10 <- replicate(10, random_string()) +strings100 <- replicate(100, random_string()) +collapse <- function(xs) { + out <- "" + for (x in xs) { + out <- paste0(out, x) + } + out +} +bench::mark( + loop10 = collapse(strings10), + loop100 = collapse(strings100), + vec10 = paste(strings10, collapse = ""), + vec100 = paste(strings100, collapse = ""), + check = FALSE +)[c("expression", "min", "median", "itr/sec", "n_gc")] +``` + +## Case study: t-test + +```{r} +m <- 1000 +n <- 50 +X <- matrix(rnorm(m * n, mean = 10, sd = 3), nrow = m) +grp <- rep(1:2, each = n / 2) +``` + +```{r} +# formula interface +system.time( + for (i in 1:m) { + t.test(X[i, ] ~ grp)$statistic + } +) +# provide two vectors +system.time( + for (i in 1:m) { + t.test(X[i, grp == 1], X[i, grp == 2])$statistic + } +) +``` + +Add functionality to save values + +```{r} +compT <- function(i){ + t.test(X[i, grp == 1], X[i, grp == 2])$statistic +} +system.time(t1 <- purrr::map_dbl(1:m, compT)) +``` + +If you look at the source code of `stats:::t.test.default()`, you’ll see that it does a lot more than just compute the t-statistic. + +```{r} +# Do less work +my_t <- function(x, grp) { + t_stat <- function(x) { + m <- mean(x) + n <- length(x) + var <- sum((x - m) ^ 2) / (n - 1) + list(m = m, n = n, var = var) + } + g1 <- t_stat(x[grp == 1]) + g2 <- t_stat(x[grp == 2]) + se_total <- sqrt(g1$var / g1$n + g2$var / g2$n) + (g1$m - g2$m) / se_total +} +system.time(t2 <- purrr::map_dbl(1:m, ~ my_t(X[.,], grp))) +stopifnot(all.equal(t1, t2)) +``` + +This gives us a six-fold speed improvement! + +```{r} +# Vectorise it +rowtstat <- function(X, grp){ + t_stat <- function(X) { + m <- rowMeans(X) + n <- ncol(X) + var <- rowSums((X - m) ^ 2) / (n - 1) + list(m = m, n = n, var = var) + } + g1 <- t_stat(X[, grp == 1]) + g2 <- t_stat(X[, grp == 2]) + se_total <- sqrt(g1$var / g1$n + g2$var / g2$n) + (g1$m - g2$m) / se_total +} +system.time(t3 <- rowtstat(X, grp)) +stopifnot(all.equal(t1, t3)) +``` + +1000 times faster than when we started! + +## Other techniques +* [Read R blogs](http://www.r-bloggers.com/) to see what performance + problems other people have struggled with, and how they have made their + code faster. + +* Read other R programming books, like The Art of R Programming or Patrick Burns' + [_R Inferno_](http://www.burns-stat.com/documents/books/the-r-inferno/) to + learn about common traps. + +* Take an algorithms and data structure course to learn some + well known ways of tackling certain classes of problems. I have heard + good things about Princeton's + [Algorithms course](https://www.coursera.org/course/algs4partI) offered on + Coursera. + +* Learn how to parallelise your code. Two places to start are + Parallel R and Parallel Computing for Data Science + +* Read general books about optimisation like Mature optimisation + or the Pragmatic Programmer + +* Read more R code. StackOverflow, R Mailing List, DSLC, GitHub, etc. + +## Meeting Videos + +### Cohort 1 + +(no video) + +### Cohort 2 + +`r knitr::include_url("https://www.youtube.com/embed/fSdAqlkeq6I")` + +### Cohort 3 + +`r knitr::include_url("https://www.youtube.com/embed/yCkvUcT7wW8")` + +### Cohort 4 + +`r knitr::include_url("https://www.youtube.com/embed/LCaqvuv3JNg")` + +### Cohort 5 + +`r knitr::include_url("https://www.youtube.com/embed/pOaiDK7J7EE")` + +### Cohort 6 + +`r knitr::include_url("https://www.youtube.com/embed/UaXimKd3vg8")` + +<details> +<summary> Meeting chat log </summary> + +``` +00:24:42 Arthur Shaw: I wonder if there's a task view for R Universe: https://r-universe.dev/search/ +01:01:13 Arthur Shaw: https://www.alexejgossmann.com/benchmarking_r/ +01:04:34 Trevin: I agree that the chapter is a good jumping off point. Gonna have to dig into some of the listed resources 😄 +``` +</details> + +### Cohort 7 + +`r knitr::include_url("https://www.youtube.com/embed/rOkrHvN8Uqg")` + +<details> + +<summary>Meeting chat log</summary> +``` +00:23:48 Ron Legere: https://www.mathworks.com/help/matlab/matlab_prog/vectorization.html +``` +</details> diff --git a/25_Rewriting_R_code_in_C++.Rmd b/slides/25_Rewriting_R_code_in_C++.Rmd diff --git a/slides/_metadata.yml b/slides/_metadata.yml @@ -0,0 +1,16 @@ +format: + revealjs: + theme: [dark, custom.scss] + footer: <a href="/" target="advr_club-index">DSLC.io/advr</a> | <a href="https://DSLC.io" target="_blank">DSLC.io</a> + link-external-newwindow: true + transition: slide + incremental: false +execute: + eval: true + r: + echo: true + mermaid: + echo: false +knitr: + opts_chunk: + comment: "#>" diff --git a/slides/custom.scss b/slides/custom.scss @@ -0,0 +1,31 @@ +/*-- scss:defaults --*/ + +/*-- Fonts --*/ +$presentation-font-size-root: 36px; + +/*-- Headings --*/ +$presentation-h1-font-size: 2em; + +/*-- new vars --*/ +$highlight-font-color: #A0E9FF; + +/*-- scss:rules --*/ + +.reveal .footer { + font-size: .6em !important; +} + +.reveal .slide strong { + font-weight: 900; + color: $highlight-font-color; +} + +.reveal .slide h3 { + color: $highlight-font-color; +} + +/*-- Hide +.reveal .footer { + font-size: 30px !important; +} +--*/ diff --git a/images/01-hadley-image1.jpeg b/slides/images/01-hadley-image1.jpeg Binary files differ. diff --git a/images/01-hadley-image2.jpeg b/slides/images/01-hadley-image2.jpeg Binary files differ. diff --git a/images/02-character-2.png b/slides/images/02-character-2.png Binary files differ. diff --git a/images/02-copy_on_modify_fig2.png b/slides/images/02-copy_on_modify_fig2.png Binary files differ. diff --git a/images/02-l-modify-2.png b/slides/images/02-l-modify-2.png Binary files differ. diff --git a/images/02-trace.png b/slides/images/02-trace.png Binary files differ. diff --git a/images/06_forms.png b/slides/images/06_forms.png Binary files differ. diff --git a/images/06_functions.png b/slides/images/06_functions.png Binary files differ. diff --git a/images/10-1-factories.png b/slides/images/10-1-factories.png Binary files differ. diff --git a/images/10-2-how.jpg b/slides/images/10-2-how.jpg Binary files differ. diff --git a/images/10-3-procedure.png b/slides/images/10-3-procedure.png Binary files differ. diff --git a/images/11-function_operators.png b/slides/images/11-function_operators.png Binary files differ. diff --git a/images/11-maths_example.png b/slides/images/11-maths_example.png Binary files differ. diff --git a/images/14-four-pillars.png b/slides/images/14-four-pillars.png Binary files differ. diff --git a/images/14-r6-logo.png b/slides/images/14-r6-logo.png Binary files differ. diff --git a/images/14-r6_active_field.png b/slides/images/14-r6_active_field.png Binary files differ. diff --git a/images/14-r6_environment.png b/slides/images/14-r6_environment.png Binary files differ. diff --git a/images/16-objects.png b/slides/images/16-objects.png Binary files differ. diff --git a/images/16-oop.png b/slides/images/16-oop.png Binary files differ. diff --git a/images/16-trade-offs.png b/slides/images/16-trade-offs.png Binary files differ. diff --git a/images/23_code_faster.jpeg b/slides/images/23_code_faster.jpeg Binary files differ. diff --git a/images/23_microbenchmarking.jpeg b/slides/images/23_microbenchmarking.jpeg Binary files differ. diff --git a/images/9_2_3_map-arg.png b/slides/images/9_2_3_map-arg.png Binary files differ. diff --git a/images/9_5_1-reduce.png b/slides/images/9_5_1-reduce.png Binary files differ. diff --git a/images/ambig-order.png b/slides/images/ambig-order.png Binary files differ. diff --git a/images/base_types/base_types_Sankey_graph.png b/slides/images/base_types/base_types_Sankey_graph.png Binary files differ. diff --git a/images/base_types/john_chambers_about_objects.png b/slides/images/base_types/john_chambers_about_objects.png Binary files differ. diff --git a/images/base_types/sloop_john_b.png b/slides/images/base_types/sloop_john_b.png Binary files differ. diff --git a/images/base_types/standards.png b/slides/images/base_types/standards.png Binary files differ. diff --git a/images/browse.png b/slides/images/browse.png Binary files differ. diff --git a/images/browser.png b/slides/images/browser.png Binary files differ. diff --git a/images/browser2.png b/slides/images/browser2.png Binary files differ. diff --git a/images/call-call.png b/slides/images/call-call.png Binary files differ. diff --git a/images/case_study.jpg b/slides/images/case_study.jpg Binary files differ. diff --git a/images/complicated.png b/slides/images/complicated.png Binary files differ. diff --git a/images/debug-toolbar.png b/slides/images/debug-toolbar.png Binary files differ. diff --git a/images/fa.png b/slides/images/fa.png Binary files differ. diff --git a/images/forloop.png b/slides/images/forloop.png Binary files differ. diff --git a/images/lazy-evaluation.png b/slides/images/lazy-evaluation.png Binary files differ. diff --git a/images/locating-errors.png b/slides/images/locating-errors.png Binary files differ. diff --git a/images/map_variants.png b/slides/images/map_variants.png Binary files differ. diff --git a/images/non-interractive-debugging.png b/slides/images/non-interractive-debugging.png Binary files differ. diff --git a/images/options.png b/slides/images/options.png Binary files differ. diff --git a/images/pmap.png b/slides/images/pmap.png Binary files differ. diff --git a/images/prefix.png b/slides/images/prefix.png Binary files differ. diff --git a/images/print-debug.png b/slides/images/print-debug.png Binary files differ. diff --git a/images/print-debugging.png b/slides/images/print-debugging.png Binary files differ. diff --git a/images/print-recover.png b/slides/images/print-recover.png Binary files differ. diff --git a/images/recover.png b/slides/images/recover.png Binary files differ. diff --git a/images/reduce-init.png b/slides/images/reduce-init.png Binary files differ. diff --git a/images/reduce2-init.png b/slides/images/reduce2-init.png Binary files differ. diff --git a/images/show-traceback.png b/slides/images/show-traceback.png Binary files differ. diff --git a/images/simple.png b/slides/images/simple.png Binary files differ. diff --git a/images/subsetting/hadley-tweet.png b/slides/images/subsetting/hadley-tweet.png Binary files differ. diff --git a/images/subsetting/train-1.png b/slides/images/subsetting/train-1.png Binary files differ. diff --git a/images/subsetting/train-2.png b/slides/images/subsetting/train-2.png Binary files differ. diff --git a/images/subsetting/train-3.png b/slides/images/subsetting/train-3.png Binary files differ. diff --git a/images/trace-env.png b/slides/images/trace-env.png Binary files differ. diff --git a/images/traceback.png b/slides/images/traceback.png Binary files differ. diff --git a/images/translating/calculus_cat.png b/slides/images/translating/calculus_cat.png Binary files differ. diff --git a/images/translating/greek_letters.txt b/slides/images/translating/greek_letters.txt diff --git a/images/translating/tags.txt b/slides/images/translating/tags.txt diff --git a/images/translating/tags_r_venn.png b/slides/images/translating/tags_r_venn.png Binary files differ. diff --git a/images/vectors/atomic.png b/slides/images/vectors/atomic.png Binary files differ. diff --git a/images/vectors/attr-names-1.png b/slides/images/vectors/attr-names-1.png Binary files differ. diff --git a/images/vectors/attr-names-2.png b/slides/images/vectors/attr-names-2.png Binary files differ. diff --git a/images/vectors/attr.png b/slides/images/vectors/attr.png Binary files differ. diff --git a/images/vectors/bayes_rules_textbook.png b/slides/images/vectors/bayes_rules_textbook.png Binary files differ. diff --git a/images/vectors/culmen_depth.png b/slides/images/vectors/culmen_depth.png Binary files differ. diff --git a/images/vectors/lter_penguins.png b/slides/images/vectors/lter_penguins.png Binary files differ. diff --git a/images/vectors/lter_penguins_no_gentoo.png b/slides/images/vectors/lter_penguins_no_gentoo.png Binary files differ. diff --git a/images/vectors/summary-tree-atomic.png b/slides/images/vectors/summary-tree-atomic.png Binary files differ. diff --git a/images/vectors/summary-tree-s3-1.png b/slides/images/vectors/summary-tree-s3-1.png Binary files differ. diff --git a/images/vectors/summary-tree-s3-2.png b/slides/images/vectors/summary-tree-s3-2.png Binary files differ. diff --git a/images/vectors/summary-tree.png b/slides/images/vectors/summary-tree.png Binary files differ. diff --git a/images/vectors/surly_tibbles.png b/slides/images/vectors/surly_tibbles.png Binary files differ. diff --git a/images/view_expr.png b/slides/images/view_expr.png Binary files differ. diff --git a/images/view_expression.png b/slides/images/view_expression.png Binary files differ. diff --git a/images/walk.png b/slides/images/walk.png Binary files differ. diff --git a/images/walk2.png b/slides/images/walk2.png Binary files differ. diff --git a/images/we-did-it-celebration-meme.jpg b/slides/images/we-did-it-celebration-meme.jpg Binary files differ. diff --git a/images/whatif.png b/slides/images/whatif.png Binary files differ. diff --git a/images/whatif2.png b/slides/images/whatif2.png Binary files differ. diff --git a/images/with_abort.png b/slides/images/with_abort.png Binary files differ. diff --git a/scripts/16-example_accumulator_programming.R b/slides/scripts/16-example_accumulator_programming.R diff --git a/scripts/profiling-example.R b/slides/scripts/profiling-example.R diff --git a/styles.css b/styles.css diff --git a/videos/01.qmd b/videos/01.qmd @@ -0,0 +1,67 @@ +--- +title: Meetings +--- + +## Cohort 1 + +(no video recorded) + +## Cohort 2 + +{{< video https://www.youtube.com/embed/PCG52lU_YlA >}} + +## Cohort 3 + +{{< video https://www.youtube.com/embed/f6PuOnuZWBc >}} + +## Cohort 4 + +{{< video https://www.youtube.com/embed/qDaJvX-Mpls >}} + +## Cohort 5 + +{{< video https://www.youtube.com/embed/BvmiQlWOP5o >}} + +## Cohort 6 + +{{< video https://www.youtube.com/embed/dH72riiXrVI >}} + +<details> +<summary> Meeting chat log </summary> + +``` +00:14:40 SriRam: From Toronto, Civil Engineer. I use R for infrastructure planning/ GIS. Here coz of the ping 😄 , was not ready with a good computer with mic/audio ! +00:15:20 SriRam: I was with Ryan, Federica on other courses +00:23:21 SriRam: I think the only caution is about Copyright issues +00:31:32 Ryan Metcalf: Citation, giving credit back to source. Great comment SriRam. +00:34:33 SriRam: one = one, in my opinion +00:41:53 Ryan Metcalf: https://docs.google.com/spreadsheets/d/1_WFY82UxAdvP4GUdZ2luh15quwdO1n0Km3Q0tfYuqvc/edit#gid=0 +00:48:35 Arthur Shaw: The README has a nice step-by-step process at the bottom: https://github.com/r4ds/bookclub-advr#how-to-present. I've not done this myself yet, but it looks fairly straightforward. +00:54:13 lucus w: Thanks Ryan. Probably {usethis} will be easier. It looks straight forward +01:00:02 Moria W.: Thank you for sharing that. This has been good! +01:00:08 Vaibhav Janve: Thank you +01:00:44 Federica Gazzelloni: hi SriRam we are going.. +``` +</details> + +## Cohort 7 + +{{< video https://www.youtube.com/embed/vfTg6upHvO4 >}} +{{< video https://www.youtube.com/embed/3wRyE6-3OKQ >}} + +<details> + +<summary>Meeting chat log</summary> +``` +00:20:42 collinberke: https://rich-iannone.github.io/pointblank/ +00:27:36 Ryan Honomichl: brb +00:37:05 collinberke: https://rstudio.github.io/renv/articles/renv.html +00:51:52 Ryan Honomichl: gotta sign off I'll be ready to lead chapter 2 next week! +00:52:43 collinberke: https://r4ds.had.co.nz/iteration.html +00:59:44 collinberke: https://mastering-shiny.org/action-tidy.html +01:00:12 collinberke: https://dplyr.tidyverse.org/articles/programming.html +01:05:02 collinberke: https://usethis.r-lib.org/reference/create_from_github.html +01:05:53 collinberke: https://github.com/r4ds/bookclub-advr +01:06:28 Ron: I gotta run , fun conversation, and nice to meet you Matthew ! +``` +</details>