bookclub-advr

DSLC Advanced R Book Club
git clone https://git.eamoncaddigan.net/bookclub-advr.git
Log | Files | Refs | README | LICENSE

commit c6bf260c19a6aefd646ada8bb74ab92ea0d4b73b
parent e28ddb56a9a449b5c1a54515c86db259c96b6ba2
Author: Oluwafemi OYEDELE <oluwafemioyedele908@gmail.com>
Date:   Fri, 19 Aug 2022 13:01:31 +0100

Chapter9 (#19)

* Add my notes for this week that I am still working on

* update my notes

* update my notes

* Add my notes

* Update my chapter numbering

* Update my note

* add the map function

* Add pluck function

* Update note

* Updated my note

Co-authored-by: Jon Harmon <jonthegeek@gmail.com>
Diffstat:
M09_Functionals.Rmd | 427+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
Aimages/9_2_3_map-arg.png | 0
Aimages/9_5_1-reduce.png | 0
Aimages/map_variants.png | 0
Aimages/pmap.png | 0
Aimages/reduce-init.png | 0
Aimages/reduce2-init.png | 0
Aimages/walk.png | 0
Aimages/walk2.png | 0
9 files changed, 421 insertions(+), 6 deletions(-)

diff --git a/09_Functionals.Rmd b/09_Functionals.Rmd @@ -1,13 +1,428 @@ -# Functionals +# Functionals -**Learning objectives:** +## **Learning objectives:** {-} -- THESE ARE NICE TO HAVE BUT NOT ABSOLUTELY NECESSARY +9.1. **Introduction** -## SLIDE 1 +9.2. **map()** + +9.3. **purrr** style + +9.4. **map_** variants + +9.5. **reduce()** and **accumulate** family of functions + +- Some functions that weren't covered + + +## What are functionals {-} + +## Introduction + +__Functionals__ are functions that take function as input and return a vector as output. Functionals that you probably have used before are: `apply()`, `lapply()` or `tapply()`. + + +- alternatives to loops + +- a functional is better than a `for` loop is better than `while` is better than `repeat` + + +### Benefits {-} + + +- encourages function logic to be separated from iteration logic + +- can collapse into vectors/data frames easily + + +## Map + +`map()` has two arguments, a vector and a function. It performs the function on each element of the vector and returns a list. We can also pass in some additional argument into the function. + +```{r,echo=FALSE,warning=FALSE,message=FALSE} +knitr::include_graphics(path = 'images/9_2_3_map-arg.png') +``` + +```{r} +simple_map <- function(x, f, ...) { +out <- vector("list", length(x)) +for (i in seq_along(x)) { +out[[i]] <- f(x[[i]], ...) +} +out +} +``` +# **Benefit of using the map function in purrr** {-} + +- `purrr::map()` is equivalent to `lapply()` + +- returns a list and is the most general + +- the length of the input == the length of the output + + + + + +```{r load,echo=FALSE,warning=FALSE,message=FALSE} +library(tidyverse) +``` + +# **Atomic vectors** {-} + + +- has 4 variants to return atomic vectors + - `map_chr()` + - `map_dbl()` + - `map_int()` + - `map_lgl()` + +```{r} +triple <- function(x) x * 3 +map(.x=1:3, .f=triple) + +map_dbl(.x=1:3, .f=triple) + +map_lgl(.x=c(1, NA, 3), .f=is.na) +``` + +# Anonymous functions and shortcuts {-} + + **Anonymous functions** +```{r} +map_dbl(.x=mtcars, .f=function(x) mean(x, na.rm = TRUE)) %>% + head() +``` + +- the "twiddle" uses a twiddle `~` to set a formula +- can use `.x` to reference the input `map(.x = ..., .f = )` +```{r, eval=FALSE} +map_dbl(.x=mtcars, .f=~mean(.x, na.rm = TRUE)) +``` + +- can be simplified further as +```{r} +map_dbl(.x=mtcars, .f=mean, na.rm = TRUE) +``` +## Modify {-} +Sometimes we might want the output to be the same as the input, then in that case we can use the modify function rather than map + +```{r} +df <- data.frame(x=1:3,y=6:4) + +map(df, .f=~.x*3) + +modify(.x=df,.f=~.x*3) +``` + +## `purrr` style + +```{r} +mtcars %>% + map(head, 20) %>% # pull first 20 of each column + map_dbl(mean) %>% # mean of each vector + head() +``` + +An example from `tidytuesday` +```{r, eval=FALSE} +tt <- tidytuesdayR::tt_load("2020-06-30") + +# filter data & exclude columns with lost of nulls +list_df <- + map( + .x = tt[1:3], + .f = + ~ .x %>% + filter(issue <= 152 | issue > 200) %>% + mutate(timeframe = ifelse(issue <= 152, "first 5 years", "last 5 years")) %>% + select_if(~mean(is.na(.)) < 0.2) + ) + + + + +# write to global environment +iwalk( + .x = list_df, + .f = ~ assign(x = .y, value = .x, envir = globalenv()) +) +``` + +## `map_*()` variants + +There are many variants + +![](images/map_variants.png) + + + + +# `map2_*()` {-} + +- raise each value `.x` by 2 + +```{r} +map_dbl( + .x = 1:5, + .f = function(x) x ^ 2 +) +``` + +- raise each value `.x` by another value `.y` + +```{r} +map2_dbl( + .x = 1:5, + .y = 2:6, + .f = ~ (.x ^ .y) +) +``` + +--- + +## The benefit of using the map over apply family of function {-} +- It is written in C +- It preserves names +- We always know the return values +- We can apply function into multiple input value +- We can pass in some additional arguments to the function + + +# `walk()` {-} + + +- We use walk when we want to call a function for it side effect rather than it return value, like generating plots, `write.csv()` or `ggsave()`, `map()` will print more info than you may want + +```{r} +map(1:3, ~cat(.x, "\n")) +``` + +- for these cases, use `walk()` instead +```{r} +walk(1:3, ~cat(.x, "\n")) +``` + + +We can use pwalk to save a list of plot to disk +```{r} +plots <- mtcars %>% + split(.$cyl) %>% + map(~ggplot(.,aes(mpg,wt))+geom_point()) + +paths <- stringr::str_c(names(plots),'.png') + +pwalk(.l=list(paths,plots),.f=ggsave,path=tempdir()) + + +``` +- walk, walk2 and pwalk all invisibly return .x the first argument. This makes them suitable for use in the middle of pipelines. + +--- + +# `imap()` {-} + +- `imap()` is like `map2()`except that `.y` is derived from `names(.x)` if named or `seq_along(.x)` if not + +- These two produce the same result + +```{r} +imap_chr(.x = mtcars, .f = ~ paste(.y, "has a mean of", round(mean(.x), 1))) %>% +head() +``` + +--- + +# `pmap()` {-} + +- you can pass a named list or dataframe as arguments to a function + +- for example `runif()` has the parameters `n`, `min` and `max` + +```{r} +params <- tibble::tribble( + ~ n, ~ min, ~ max, + 1L, 1, 10, + 2L, 10, 100, + 3L, 100, 1000 +) + +pmap(params, runif) +``` + +- could also be + +```{r} +list( + n = 1:3, + min = 10 ^ (0:2), + max = 10 ^ (1:3) +) %>% +pmap(runif) +``` + +## `reduce()` family +The reduce() function is a powerful functional that allows you to abstract away from a sequence of functions that are applied in a fixed direction. + +reduce takes a vector as its first argument, a function as its second argument, and an optional .init argument, it will then apply this function repeatedly to a list until there is only a single element left. + +```{r,echo=FALSE,warning=FALSE,message=FALSE} +knitr::include_graphics(path = 'images/reduce-init.png') +``` + + +Let me really quickly demonstrate `reduce()` in action. + +Say you wanted to add up the numbers 1 through 5, but only using the plus operator +. You could do something like this + +```{r} +1 + 2 + 3 + 4 + 5 + +``` +Which is the same thing as this: +```{r} +set.seed(1234) + +reduce(1:5, `+`) +``` + +And if you want the start value to be something that’s not the first argument of the vector, pass that to the .init argument: + +```{r} + +identical( + 0.5 + 1 + 2 + 3 + 4 + 5, + reduce(1:5, `+`, .init = 0.5) +) + +``` +# ggplot2 Example with reduce {-} + +```{r} +ggplot(mtcars, aes(hp, mpg)) + + geom_point(size = 8, alpha = .5) + + geom_point(size = 4, alpha = .5) + + geom_point(size = 2, alpha = .5) + +``` +Let us use the reduce `function` +```{r} +reduce( + c(8, 4, 2), + ~ .x + geom_point(size = .y, alpha = .5), + .init = ggplot(mtcars, aes(hp, mpg)) +) + +``` + +```{r} +df <- list(age=tibble(name='john',age=30), + sex=tibble(name=c('john','mary'),sex=c('M','F'), + trt=tibble(name='Mary',treatment='A'))) + + +df |> reduce(.f = full_join) + +reduce(.x = df,.f = full_join) +``` + +- to see all intermediate steps, use **accumulate()** +```{r} +set.seed(1234) +accumulate(1:5, `+`) +``` + + +--- + +# Not covered: `map_df*()` variants {-} + +- `map_dfr()` = row bind the results + +- `map_dfc()` = column bind the results + +```{r} +col_stats <- function(n) { + head(mtcars, n) %>% + summarise_all(mean) %>% + mutate_all(floor) %>% + mutate(n = paste("N =", n)) +} + +map((1:2) * 10, col_stats) + +map_dfr((1:2) * 10, col_stats) +``` + +--- + +# Not covered: `pluck()` {-} + +- `pluck()` will pull a single element from a list + +```{r} +my_list <- list( + 1:3, + 10 + (1:5), + 20 + (1:10) +) + +pluck(my_list, 1) + +map(my_list, pluck, 1) + +map_dbl(my_list, pluck, 1) +``` + +--- + +# Not covered: `flatten()` {-} + +- `flatten()` will turn a list of lists into a simpler vector + +```{r} +my_list <- + list( + a = 1:3, + b = list(1:3) + ) + +map_if(my_list, is.list, pluck) + +map_if(my_list, is.list, flatten_int) + +map_if(my_list, is.list, flatten_int) %>% + flatten_int() +``` +## Dealing with Failures {-} + +## Safely {-} + +safely is an adverb, it takes a function (a verb) and returns a modified version. In this case, the modified function will never throw an error. Instead it always returns a list with two elements. + +- Result is the original result. If there is an error this will be NULL + +- Error is an error object. If the operation was successful this will be NULL. + +```{r} +A <- list(1,10,"a") + +map(.x = A,.f = safely(log)) + +``` + +## Possibly {-} +Possibly always succeeds. It is simpler than safely, because you can give it a default value to return when there is an error. + +```{r} +A <- list(1,10,"a") + + map_dbl(.x =A,.f = possibly(log,otherwise = NA_real_) ) + +``` + +--- -- ADD SLIDES AS SECTIONS (`##`). -- TRY TO KEEP THEM RELATIVELY SLIDE-LIKE; THESE ARE NOTES, NOT THE BOOK ITSELF. ## Meeting Videos diff --git a/images/9_2_3_map-arg.png b/images/9_2_3_map-arg.png Binary files differ. diff --git a/images/9_5_1-reduce.png b/images/9_5_1-reduce.png Binary files differ. diff --git a/images/map_variants.png b/images/map_variants.png Binary files differ. diff --git a/images/pmap.png b/images/pmap.png Binary files differ. diff --git a/images/reduce-init.png b/images/reduce-init.png Binary files differ. diff --git a/images/reduce2-init.png b/images/reduce2-init.png Binary files differ. diff --git a/images/walk.png b/images/walk.png Binary files differ. diff --git a/images/walk2.png b/images/walk2.png Binary files differ.