commit a5d14654fd8c6c674170529a4c974380e3018957
parent bd50c63e863e3817fdd7d9dee09192a8b081dafe
Author: DrEntropy <DrEntropy@users.noreply.github.com>
Date:   Mon, 12 Dec 2022 16:14:11 -0700
Chapter5 rjl (#41)
* Restructured notes,
* Clean up add exercises
* Adding figures back
* Remove flow chart
* minor edit
* Update deployment GHA.
Co-authored-by: Jon Harmon <jonthegeek@gmail.com>
Diffstat:
2 files changed, 164 insertions(+), 149 deletions(-)
diff --git a/.github/workflows/deploy_bookdown.yml b/.github/workflows/deploy_bookdown.yml
@@ -23,32 +23,27 @@ jobs:
 
       - name: Render Book
         run: Rscript -e 'bookdown::render_book("index.Rmd")'
-      - uses: actions/upload-artifact@v3
+      - uses: actions/upload-pages-artifact@v1
         with:
-          name: _book
           path: _book/
 
-# Need to first create an empty gh-pages branch
-# see https://pkgdown.r-lib.org/reference/deploy_site_github.html
-# and also add secrets for a GH_PAT and EMAIL to the repository
-# gh-action from Cecilapp/GitHub-Pages-deploy
-  checkout-and-deploy:
-   runs-on: ubuntu-latest
-   needs: bookdown
-   steps:
-     - name: Checkout
-       uses: actions/checkout@v3
-     - name: Download artifact
-       uses: actions/download-artifact@v3
-       with:
-         # Artifact name
-         name: _book # optional
-         # Destination path
-         path: _book # optional
-     - name: Deploy to GitHub Pages
-       uses: Cecilapp/GitHub-Pages-deploy@v3
-       env:
-         GITHUB_TOKEN: ${{ secrets.GH_PAT }}
-       with:
-         email: ${{ secrets.EMAIL }}
-         build_dir: _book/
+  deploy:
+    # Add a dependency to the build job
+    needs: bookdown
+
+    # Grant GITHUB_TOKEN the permissions required to make a Pages deployment
+    permissions:
+      pages: write      # to deploy to Pages
+      id-token: write   # to verify the deployment originates from an appropriate source
+
+    # Deploy to the github-pages environment
+    environment:
+      name: github-pages
+      url: ${{ steps.deployment.outputs.page_url }}
+
+    # Specify runner + deployment step
+    runs-on: ubuntu-latest
+    steps:
+      - name: Deploy to GitHub Pages
+        id: deployment
+        uses: actions/deploy-pages@v1
diff --git a/05_Control_flow.Rmd b/05_Control_flow.Rmd
@@ -2,148 +2,119 @@
 
 **Learning objectives:**
 
-- What are the **tools** for controlling a flow
-- What is the difference between **choices (if, switch)** and **loops (for, while)**
-- How to use conditional tools in data analysis
+- Learn the **tools** for controlling flow of execution.
 
----
-
-## Introduction
+- Learn some technical pitfalls and (perhaps lesser known) useful features.
 
-In this chapter we will see how to use conditions for making data analysis.
-There are two main group of conditional tools: **choices** and **loops**. These are both very useful for making iterating data analysis such as multiple substitutions matching predefined inputs or performing more or less flexible indexing.
-
-```{r echo=FALSE,fig.align='center',fig.dim="100%"}
-knitr::include_graphics("images/whatif.png")
+```{r echo = FALSE, fig.align = 'left', fig.dim = '100%'}
+knitr::include_graphics("images/whatif2.png")
+```
+```{r echo = FALSE, fig.align = 'right', fig.dim = '100%'}
+knitr::include_graphics("images/forloop.png")
 ```
 
 ---
 
+## Introduction
+
+There are two main groups of flow control tools: **choices** and **loops**: 
 
-## Choices and Loops
+- Choices (`if`, `switch`, `ifelse`, `dplyr::if_else`, `dplyr::case_when`) allow you to run different code depending on the input. 
+    
+- Loops (`for`, `while`, `repeat`) allow you to repeatedly run code 
 
-Iterators of objects pointing to an element inside the container
 
-Use if to specify a block of code to be executed, if a specified condition is true. Use else to specify a block of code to be executed, if the same condition is false. Use else if to specify a new condition to test, if the first condition is false.
+---
 
-```{r echo=FALSE,fig.align='left',fig.dim="100%"}
-knitr::include_graphics("images/whatif2.png")
-```
-```{r echo=FALSE,fig.align='right',fig.dim="100%"}
-knitr::include_graphics("images/forloop.png")
-```
 
-----
+## Choices
 
 
-## Choises
 
-- `if()` and `ifelse()`
+`if()` and `else`
+
+Use `if` to specify a block of code to be executed, if a specified condition is true. Use `else` to specify a block of code to be executed, if the same condition is false. 
 
-    
-```{r 05-lib,include=FALSE}
-library(DiagrammeR)
-```
-<center>
-```{r echo=FALSE, fig.align='center', fig.dim="100%"}
-DiagrammeR("
-graph TD
-A{if}-->B(condition)
-     B-->C(true action)
-     
-D{if}-->E(condition)
-     E-->F(true action)
-E-->G((else))
-     G-->H(false action)
-     
-I{ifelse}-->L(condition)
-     L-->M(true action)
-     L-->N(false action)
-
-style A fill:#f96
-style D fill:#f96
-style I fill:#f96
-style G fill:#f96
-style B fill:#bbf,stroke:#f66,stroke-width:2px,color:#fff,stroke-dasharray: 5 5
-style E fill:#bbf,stroke:#f66,stroke-width:2px,color:#fff,stroke-dasharray: 5 5
-style L fill:#bbf,stroke:#f66,stroke-width:2px,color:#fff,stroke-dasharray: 5 5
-
- ",height = '100%', width = '100%')
-```
-</center>
 
-`if (condition) true_action`
-<br>
-`if (condition) true_action else false_action`
 
 ```{r eval=FALSE, include=T}
-if (test_expression) {
-   yes
- }
-    
-    
 if (test_expression) {   
-  yes
-  } else if (other test_expression) {
-    no
-    } else {
-      other
-      }
-    
----  
-ifelse(test, yes, no)
+  true_action
+} else {
+  false_action
+}
 ```
 
-Note: **What is the difference?** 
+(Note braces are only *needed* for compound expressions)
 
-`dplyr::if_else()` and base R `ifelse()` 
+Can be expanded to more alternatives:
 
->if_else is more strict. It checks that both alternatives are of the same type and otherwise throws an error, while ifelse will promote types as necessary. This may be a benefit in some circumstances, but may otherwise break scripts if you don't check for errors or explicitly force type conversion. 
-source: https://stackoverflow.com/questions/50646133/dplyr-if-else-vs-base-r-ifelse
+```{r, eval=FALSE}
+if (test_expression) {   
+  true_action
+} else if (other_test_expression) {
+  other_action
+} else {
+  false_action
+}
+```
 
-**For example:**
+## Exercise {-}
+Why does this work?
+```
+x <- 1:10
+if (length(x)) "not empty" else "empty"
+#> [1] "not empty"
 
-```{r eval=FALSE, include=T}
-ifelse(c(TRUE,TRUE,FALSE),"a",3)
-dplyr::if_else(c(TRUE,TRUE,FALSE),"a",3)
+x <- numeric()
+if (length(x)) "not empty" else "empty"
+#> [1] "empty"
 ```
 
-It releases an error in `dplyr::if_else(c(TRUE, TRUE, FALSE), "a", 3)` :
+## Invalid inputs {-}
 
-`must be a character vector, not a double vector.`
+- *Condition* must evaluate to a *single* `TRUE` or `FALSE`, otherwise (usually) an error
 
+- Exception is a logical vector of length greater than 1, which only generates a warning, unless you have `_R_CHECK_LENGTH_1_CONDITION_` set to `TRUE`, which might be the default now?
 
+```{r, eval=FALSE}
+if (c(TRUE, FALSE)) 1
+#>Error in if (c(TRUE, FALSE)) 1 : the condition has length > 1
+```
 
-An alternative to `ifelse` defined as **condition-vector pairs** and more broadly as a general vectorised `if` is:
-    
-- `case_when()` 
+## Vectorized choices {-}
 
-It allows you to vectorise multiple `if_else()` statements
+- `ifelse()` is a vectorized version of `if`:
 
+```{r, eval=FALSE}
+x <- 1:10
+ifelse(x %% 5 == 0, "XXX", as.character(x))
+#>  [1] "1"   "2"   "3"   "4"   "XXX" "6"   "7"   "8"   "9"   "XXX"
 
-<center>
-```{r echo=FALSE, fig.align='center', fig.dim="100%"}
-DiagrammeR("
-   graph TD
+ifelse(x %% 2 == 0, "even", "odd")
+#>  [1] "odd"  "even" "odd"  "even" "odd"  "even" "odd"  "even" "odd"  "even"
+```
 
-I{ifelse}-->L(condition)
+-  `dplyr::if_else()`
 
-A{case_when}-->B(condition)
+- Book recommends only using `ifelse()`  "only when the yes and no vectors are the same type as it is otherwise hard to predict the output type." 
 
-C{switch}-->D((list of conditions))
+- `dplyr::if_else()` enforces this recommendation.
 
-style I fill:#f27,stroke:#f66,stroke-width:3px
-style A fill:#f99,stroke:#f66,stroke-width:3px
-style C fill:#f96,stroke:#f66,stroke-width:3px
-style L fill:#bbf,stroke:#f66,stroke-width:2px,color:#fff,stroke-dasharray: 5 5
-style B fill:#bbf,stroke:#f66,stroke-width:2px,color:#fff,stroke-dasharray: 5 5
-style D fill:#bbf,stroke:#f66,stroke-width:2px,color:#fff,stroke-dasharray: 5 5
+**For example:**
 
- ",height = '100%', width = '100%')
+```{r eval=FALSE, include=T}
+ifelse(c(TRUE,TRUE,FALSE),"a",3)
+#> [1] "a" "a" "3"
+dplyr::if_else(c(TRUE,TRUE,FALSE),"a",3)
+#> Error in `dplyr::if_else()`:
+#> ! `false` must be a character vector, not a double vector.
 ```
-</center>
+ 
+## Switch {-}
+
+Rather then string together multiple if - else if chains, you can often use `switch`.
 
-Finally, the last tool just mentioned is `switch()`, here is an example on how to use it:
 
 ```{r message=FALSE, warning=FALSE}
 require(stats)
@@ -155,39 +126,42 @@ centre <- function(x, type) {
 }
 
 set.seed(123)
-x <- rcauchy(10)
+x <- rlnorm(100)
+
+centers <- data.frame(type = c('mean', 'median', 'trimmed'))
+centers$value = sapply(centers$type, \(t){centre(x,t)})
 
 require(ggplot2)
 ggplot(data = data.frame(x), aes(x))+
   geom_density()+
-  geom_vline(xintercept = c(centre(x, "mean"),
-                            centre(x, "median"),
-                            centre(x, "trimmed")),
-             size=0.5,linetype="dashed",
-             color=c("darkgreen","red","blue"))+
-  xlim(-10,10)+
+  geom_vline(data = centers, 
+             mapping = aes(color = type, xintercept = value), 
+             size=0.5,linetype="dashed") +
+  xlim(-1,10)+
   theme_bw()
 ```
 
+## Using `dplyr::case_when` {-}
 
-## Loops
+- `case_when` is a more general `if_else` and can be used often in place multiple chained `if_else` or sapply'ing `switch`.
 
-Iteration of a set of values with:
+- It uses a special syntax to allow any number of condition-vector pairs:
 
-- `for (var in seq) expr`
-- `while (cond) expr`
-- `repeat expr`
-- `break`
-- `next`
+```{r message=FALSE, warning=FALSE}
+centers <- data.frame(type = c('mean', 'median', 'trimmed'))
+centers$value = dplyr::case_when(centers$type == 'mean' ~ mean(x),
+                                 centers$type == 'median' ~ median(x),
+                                 centers$type == 'trimmed' ~ mean(x, trim = 0.1))
+centers
+```
 
+ 
 
-`for (item in vector) perform_action`
+## Loops
 
-```{r eval=FALSE, include=T}
-? for (variable in vector) {
-  
-}
-```
+- Iteration over a elements of a vector
+
+`for (item in vector) perform_action`
 
 **First example**
 ```{r}
@@ -198,6 +172,10 @@ for(i in 1:5) {
 
 
 **Second example**: terminate a *for loop* earlier
+
+- `next` skips rest of current iteration
+- `break` exits the loop entirely
+
 ```{r}
 for (i in 1:10) {
   if (i < 3) 
@@ -210,13 +188,55 @@ for (i in 1:10) {
 }
 ```
 
+## Exercise {-}
+
+When the following code is evaluated, what can you say about the vector being iterated?
+```
+xs <- c(1, 2, 3)
+for (x in xs) {
+  xs <- c(xs, x * 2)
+}
+xs
+#> [1] 1 2 3 2 4 6
+```
+
+## Pitfalls {-}
+
+- Preallocate output containers to avoid *slow* code. 
+
+- Beware that `1:length(v)` when `v` has length 0 results in a iterating backwards over `1:0`, probably not what is intended.  Use `seq_along(v)` instead.
+
+- When iterating over S3 vectors, use  `[[]]` yourself to avoid stripping attributes. 
+
+```
+xs <- as.Date(c("2020-01-01", "2010-01-01"))
+for (x in xs) {
+  print(x)
+}
+#> [1] 18262
+#> [1] 14610
+```
+vs. 
+```
+for (i in seq_along(xs)) {
+  print(xs[[i]])
+}
+#> [1] "2020-01-01"
+#> [1] "2010-01-01"
+```
 
-**More tools**
+## Related tools {-}
 
 - `while(condition) action`: performs action while condition is TRUE.
 - `repeat(action)`: repeats action forever (i.e. until it encounters break).
 
->Generally speaking you shouldn’t need to use for loops for data analysis tasks, as map() and apply() already provide less flexible solutions to most problems. You’ll learn more in Chapter 9.
+- Note that `for` can be rewritten as `while` and while can be rewritten as `repeat` (this goes in one direction only!);  *however*:
+
+>Good practice is to use the least-flexible solution to a problem, so you should use `for` wherever possible.
+BUT you shouldn't even use for loops for data analysis tasks as `map()` and `apply()` already provide *less flexible* solutions to most problems. (More in Chapter 9.)
+
+
+
 
 ---