09.Rmd (12426B)
1 --- 2 engine: knitr 3 title: Functionals 4 --- 5 6 ## Learning objectives: 7 8 - Define functionals. 9 - Use the `purrr::map()` family of functionals. 10 - Use the `purrr::walk()` family of functionals. 11 - Use the `purrr::reduce()` and `purrr::accumulate()` family of functionals. 12 - Use `purrr::safely()` and `purrr::possibly()` to deal with failure. 13 14 9.1. **Introduction** 15 16 9.2. **map()** 17 18 9.3. **purrr** style 19 20 9.4. **map_** variants 21 22 9.5. **reduce()** and **accumulate** family of functions 23 24 - Some functions that weren't covered 25 26 27 ## What are functionals {-} 28 29 ## Introduction 30 31 __Functionals__ are functions that take function as input and return a vector as output. Functionals that you probably have used before are: `apply()`, `lapply()` or `tapply()`. 32 33 34 - alternatives to loops 35 36 - a functional is better than a `for` loop is better than `while` is better than `repeat` 37 38 39 ### Benefits {-} 40 41 42 - encourages function logic to be separated from iteration logic 43 44 - can collapse into vectors/data frames easily 45 46 47 ## Map 48 49 `map()` has two arguments, a vector and a function. It performs the function on each element of the vector and returns a list. We can also pass in some additional argument into the function. 50 51 ```{r,echo=FALSE,warning=FALSE,message=FALSE} 52 knitr::include_graphics(path = 'images/9_2_3_map-arg.png') 53 ``` 54 55 ```{r} 56 simple_map <- function(x, f, ...) { 57 out <- vector("list", length(x)) 58 for (i in seq_along(x)) { 59 out[[i]] <- f(x[[i]], ...) 60 } 61 out 62 } 63 ``` 64 65 ## Benefit of using the map function in purrr {-} 66 67 - `purrr::map()` is equivalent to `lapply()` 68 69 - returns a list and is the most general 70 71 - the length of the input == the length of the output 72 73 - `map()` is more flexible, with additional arguments allowed 74 75 - `map()` has a host of extensions 76 77 78 79 ```{r load,echo=FALSE,warning=FALSE,message=FALSE} 80 library(tidyverse) 81 ``` 82 83 ## Atomic vectors {-} 84 85 86 - has 4 variants to return atomic vectors 87 - `map_chr()` 88 - `map_dbl()` 89 - `map_int()` 90 - `map_lgl()` 91 92 ```{r} 93 triple <- function(x) x * 3 94 map(.x=1:3, .f=triple) 95 96 map_dbl(.x=1:3, .f=triple) 97 98 map_lgl(.x=c(1, NA, 3), .f=is.na) 99 ``` 100 101 ## Anonymous functions and shortcuts {-} 102 103 **Anonymous functions** 104 ```{r} 105 map_dbl(.x=mtcars, .f=function(x) mean(x, na.rm = TRUE)) |> 106 head() 107 ``` 108 109 - the "twiddle" uses a twiddle `~` to set a formula 110 - can use `.x` to reference the input `map(.x = ..., .f = )` 111 ```{r, eval=FALSE} 112 map_dbl(.x=mtcars, .f=~mean(.x, na.rm = TRUE)) 113 ``` 114 115 - can be simplified further as 116 ```{r} 117 map_dbl(.x=mtcars, .f=mean, na.rm = TRUE) 118 ``` 119 120 - what happens when we try a handful of variants of the task at hand? (how many unique values are there for each variable?) 121 122 Note that `.x` is the **name** of the first argument in `map()` (`.f` is the name of the second argument). 123 124 ```{r} 125 #| error: true 126 # the task 127 map_dbl(mtcars, function(x) length(unique(x))) 128 map_dbl(mtcars, function(unicorn) length(unique(unicorn))) 129 map_dbl(mtcars, ~length(unique(.x))) 130 map_dbl(mtcars, ~length(unique(..1))) 131 map_dbl(mtcars, ~length(unique(.))) 132 133 # not the task 134 map_dbl(mtcars, length) 135 map_dbl(mtcars, length(unique)) 136 map_dbl(mtcars, 1) 137 ``` 138 139 ```{r} 140 #| echo: false 141 #| message: false 142 #| warning: false 143 rm(x) 144 ``` 145 146 ```{r} 147 #| error: true 148 #error 149 map_dbl(mtcars, length(unique())) 150 map_dbl(mtcars, ~length(unique(x))) 151 ``` 152 153 154 ## Modify {-} 155 156 Sometimes we might want the output to be the same as the input, then in that case we can use the modify function rather than map 157 158 ```{r} 159 df <- data.frame(x=1:3,y=6:4) 160 161 map(df, .f=~.x*3) 162 163 modify(.x=df,.f=~.x*3) 164 ``` 165 166 Note that `modify()` always returns the same type of output (which is not necessarily true with `map()`). Additionally, `modify()` does not actually change the value of `df`. 167 168 ```{r} 169 df 170 ``` 171 172 173 ## `purrr` style 174 175 ```{r} 176 mtcars |> 177 map(head, 20) |> # pull first 20 of each column 178 map_dbl(mean) |> # mean of each vector 179 head() 180 ``` 181 182 An example from `tidytuesday` 183 ```{r, eval=FALSE} 184 #| warning: false 185 #| message: false 186 187 tt <- tidytuesdayR::tt_load("2020-06-30") 188 189 # filter data & exclude columns with lost of nulls 190 list_df <- 191 map( 192 .x = tt[1:3], 193 .f = 194 ~ .x |> 195 filter(issue <= 152 | issue > 200) |> 196 mutate(timeframe = ifelse(issue <= 152, "first 5 years", "last 5 years")) |> 197 select_if(~mean(is.na(.x)) < 0.2) 198 ) 199 200 201 # write to global environment 202 iwalk( 203 .x = list_df, 204 .f = ~ assign(x = .y, value = .x, envir = globalenv()) 205 ) 206 ``` 207 208 ## `map_*()` variants 209 210 There are many variants 211 212  213 214 215 ## `map2_*()` {-} 216 217 - raise each value `.x` by 2 218 219 ```{r} 220 map_dbl( 221 .x = 1:5, 222 .f = function(x) x ^ 2 223 ) 224 ``` 225 226 - raise each value `.x` by another value `.y` 227 228 ```{r} 229 map2_dbl( 230 .x = 1:5, 231 .y = 2:6, 232 .f = ~ (.x ^ .y) 233 ) 234 ``` 235 236 237 ## The benefit of using the map over apply family of function {-} 238 - It is written in C 239 - It preserves names 240 - We always know the return value type 241 - We can apply the function for multiple input values 242 - We can pass additional arguments into the function 243 244 245 ## `walk()` {-} 246 247 248 - We use `walk()` when we want to call a function for it side effect(s) rather than its return value, like generating plots, `write.csv()`, or `ggsave()`. If you don't want a return value, `map()` will print more info than you may want. 249 250 251 ```{r} 252 map(1:3, ~cat(.x, "\n")) 253 ``` 254 255 - for these cases, use `walk()` instead 256 ```{r} 257 walk(1:3, ~cat(.x, "\n")) 258 ``` 259 260 `cat()` does have a result, it's just usually returned invisibly. 261 262 ```{r} 263 cat("hello") 264 265 (cat("hello")) 266 ``` 267 268 269 We can use `pwalk()` to save a list of plot to disk. Note that the "p" in `pwalk()` means that we have more than 1 (or 2) variables to pipe into the function. Also note that the name of the first argument in all of the "p" functions is now `.l` (instead of `.x`). 270 271 ```{r} 272 plots <- mtcars |> 273 split(mtcars$cyl) |> 274 map(~ggplot(.x, aes(mpg,wt)) + 275 geom_point()) 276 277 paths <- stringr::str_c(names(plots), '.png') 278 279 pwalk(.l = list(paths,plots), .f = ggsave, path = tempdir()) 280 pmap(.l = list(paths,plots), .f = ggsave, path = tempdir()) 281 282 ``` 283 284 - walk, walk2 and pwalk all invisibly return .x the first argument. This makes them suitable for use in the middle of pipelines. 285 286 - note: I don't think that it is "`.x`" (or "`.l`") that they are returning invisibly. But I'm not sure what it is. Hadley says: 287 288 > purrr provides the walk family of functions that ignore the return values of the `.f` and instead return `.x` invisibly. 289 290 But not in the first `cat()` example, it is the `NULL` values that get returned invisibly (those aren't the same as `.x`). 291 292 ## `imap()` {-} 293 294 - `imap()` is like `map2()`except that `.y` is derived from `names(.x)` if named or `seq_along(.x)` if not. 295 296 - These two produce the same result 297 298 ```{r} 299 imap_chr(.x = mtcars, 300 .f = ~ paste(.y, "has a mean of", round(mean(.x), 1))) |> 301 head() 302 303 map2_chr(.x = mtcars, 304 .y = names(mtcars), 305 .f = ~ paste(.y, "has a mean of", round(mean(.x), 1))) |> 306 head() 307 ``` 308 309 310 ## `pmap()` {-} 311 312 - you can pass a named list or dataframe as arguments to a function 313 314 - for example `runif()` has the parameters `n`, `min` and `max` 315 316 ```{r} 317 params <- tibble::tribble( 318 ~ n, ~ min, ~ max, 319 1L, 1, 10, 320 2L, 10, 100, 321 3L, 100, 1000 322 ) 323 324 pmap(params, runif) 325 ``` 326 327 - could also be 328 329 ```{r} 330 list( 331 n = 1:3, 332 min = 10 ^ (0:2), 333 max = 10 ^ (1:3) 334 ) |> 335 pmap(runif) 336 ``` 337 338 - I like to use `expand_grid()` when I want all possible parameter combinations. 339 340 ```{r} 341 expand_grid(n = 1:3, 342 min = 10 ^ (0:1), 343 max = 10 ^ (1:2)) 344 345 expand_grid(n = 1:3, 346 min = 10 ^ (0:1), 347 max = 10 ^ (1:2)) |> 348 pmap(runif) 349 ``` 350 351 352 353 ## `reduce()` family 354 355 The `reduce()` function is a powerful functional that allows you to abstract away from a sequence of functions that are applied in a fixed direction. 356 357 `reduce()` takes a vector as its first argument, a function as its second argument, and an optional `.init` argument last. It will then apply the function repeatedly to the vector until there is only a single element left. 358 359 (Hint: start at the top of the image and read down.) 360 361 ```{r,echo=FALSE,warning=FALSE,message=FALSE} 362 knitr::include_graphics(path = 'images/reduce-init.png') 363 ``` 364 365 366 Let me really quickly demonstrate `reduce()` in action. 367 368 Say you wanted to add up the numbers 1 through 5 using only the plus operator `+`. You could do something like: 369 370 ```{r} 371 1 + 2 + 3 + 4 + 5 372 373 ``` 374 375 Which is the same as: 376 377 ```{r} 378 reduce(1:5, `+`) 379 ``` 380 381 And if you want the start value to be something that is not the first argument of the vector, pass that value to the .init argument: 382 383 ```{r} 384 385 identical( 386 0.5 + 1 + 2 + 3 + 4 + 5, 387 reduce(1:5, `+`, .init = 0.5) 388 ) 389 390 ``` 391 392 ## ggplot2 example with reduce {-} 393 394 ```{r} 395 ggplot(mtcars, aes(hp, mpg)) + 396 geom_point(size = 8, alpha = .5, color = "yellow") + 397 geom_point(size = 4, alpha = .5, color = "red") + 398 geom_point(size = 2, alpha = .5, color = "blue") 399 400 ``` 401 402 Let us use the `reduce()` function. Note that `reduce2()` takes two arguments, but the first value (`..1`) is given by the `.init` value. 403 404 ```{r} 405 reduce2( 406 c(8, 4, 2), 407 c("yellow", "red", "blue"), 408 ~ ..1 + geom_point(size = ..2, alpha = .5, color = ..3), 409 .init = ggplot(mtcars, aes(hp, mpg)) 410 ) 411 412 ``` 413 414 ```{r} 415 df <- list(age=tibble(name='john',age=30), 416 sex=tibble(name=c('john','mary'),sex=c('M','F'), 417 trt=tibble(name='Mary',treatment='A'))) 418 419 df 420 421 df |> reduce(.f = full_join) 422 423 reduce(.x = df,.f = full_join) 424 ``` 425 426 - to see all intermediate steps, use **accumulate()** 427 428 ```{r} 429 set.seed(1234) 430 accumulate(1:5, `+`) 431 ``` 432 433 ```{r} 434 accumulate2( 435 c(8, 4, 2), 436 c("yellow", "red", "blue"), 437 ~ ..1 + geom_point(size = ..2, alpha = .5, color = ..3), 438 .init = ggplot(mtcars, aes(hp, mpg)) 439 ) 440 ``` 441 442 443 ## `map_df*()` variants {-} 444 445 - `map_dfr()` = row bind the results 446 447 - `map_dfc()` = column bind the results 448 449 - Note that `map_dfr()` has been superseded by `map() |> list_rbind()` and `map_dfc()` has been superseded by `map() |> list_cbind()` 450 451 ```{r} 452 col_stats <- function(n) { 453 head(mtcars, n) |> 454 summarise_all(mean) |> 455 mutate_all(floor) |> 456 mutate(n = paste("N =", n)) 457 } 458 459 map((1:2) * 10, col_stats) 460 461 map_dfr((1:2) * 10, col_stats) 462 463 map((1:2) * 10, col_stats) |> list_rbind() 464 ``` 465 466 --- 467 468 ## `pluck()` {-} 469 470 - `pluck()` will pull a single element from a list 471 472 I like the example from the book because the starting object is not particularly easy to work with (as many JSON objects might not be). 473 474 ```{r} 475 my_list <- list( 476 list(-1, x = 1, y = c(2), z = "a"), 477 list(-2, x = 4, y = c(5, 6), z = "b"), 478 list(-3, x = 8, y = c(9, 10, 11)) 479 ) 480 my_list 481 ``` 482 483 Notice that the "first element" means something different in standard `pluck()` versus `map`ped `pluck()`. 484 485 ```{r} 486 pluck(my_list, 1) 487 488 map(my_list, pluck, 1) 489 490 map_dbl(my_list, pluck, 1) 491 ``` 492 493 The `map()` functions also have shortcuts for extracting elements from vectors (powered by `purrr::pluck()`). Note that `map(my_list, 3)` is a shortcut for `map(my_list, pluck, 3)`. 494 495 ```{r} 496 #| error: true 497 498 # Select by name 499 map_dbl(my_list, "x") 500 501 # Or by position 502 map_dbl(my_list, 1) 503 504 # Or by both 505 map_dbl(my_list, list("y", 1)) 506 507 # You'll get an error if you try to retrieve an inside item that doesn't have 508 # a consistent format and you want a numeric output 509 map_dbl(my_list, list("y")) 510 511 512 # You'll get an error if a component doesn't exist: 513 map_chr(my_list, "z") 514 #> Error: Result 3 must be a single string, not NULL of length 0 515 516 # Unless you supply a .default value 517 map_chr(my_list, "z", .default = NA) 518 #> [1] "a" "b" NA 519 ``` 520 521 522 ## Not covered: `flatten()` {-} 523 524 - `flatten()` will turn a list of lists into a simpler vector. 525 526 ```{r} 527 my_list <- 528 list( 529 a = 1:3, 530 b = list(1:3) 531 ) 532 533 my_list 534 535 map_if(my_list, is.list, pluck) 536 537 map_if(my_list, is.list, flatten_int) 538 539 map_if(my_list, is.list, flatten_int) |> 540 flatten_int() 541 ``` 542 543 ## Dealing with Failures {-} 544 545 ## Safely {-} 546 547 `safely()` is an adverb. It takes a function (a verb) and returns a modified version. In this case, the modified function will never throw an error. Instead it always returns a list with two elements. 548 549 - `result` is the original result. If there is an error this will be NULL 550 551 - `error` is an error object. If the operation was successful the "`error`" will be NULL. 552 553 ```{r} 554 A <- list(1, 10, "a") 555 556 map(.x = A, .f = safely(log)) 557 558 ``` 559 560 ## Possibly {-} 561 562 `possibly()` always succeeds. It is simpler than `safely()`, because you can give it a default value to return when there is an error. 563 564 ```{r} 565 A <- list(1,10,"a") 566 567 map_dbl(.x = A, .f = possibly(log, otherwise = NA_real_) ) 568 569 ```