bookclub-advr

DSLC Advanced R Book Club
git clone https://git.eamoncaddigan.net/bookclub-advr.git
Log | Files | Refs | README | LICENSE

html.json (27282B)


      1 {
      2   "hash": "991db3288a65eb2cbeef3e20bc63271f",
      3   "result": {
      4     "engine": "knitr",
      5     "markdown": "---\nengine: knitr\ntitle: Functionals\n---\n\n## Learning objectives:\n\n- Define functionals.\n- Use the `purrr::map()` family of functionals.\n- Use the `purrr::walk()` family of functionals.\n- Use the `purrr::reduce()` and `purrr::accumulate()` family of functionals.\n- Use `purrr::safely()` and `purrr::possibly()` to deal with failure.\n\n9.1. **Introduction**\n\n9.2.  **map()**\n\n9.3. **purrr** style\n\n9.4. **map_** variants\n\n9.5. **reduce()** and **accumulate** family of functions\n\n- Some functions that weren't covered\n\n\n## What are functionals {-}\n\n## Introduction \n\n__Functionals__ are functions that take function as input and return a vector as output. Functionals that you probably have used before are: `apply()`, `lapply()` or `tapply()`. \n\n\n- alternatives to loops\n\n- a functional is better than a `for` loop is better than `while` is better than `repeat`\n\n\n### Benefits {-}\n\n\n- encourages function logic to be separated from iteration logic\n\n- can collapse into vectors/data frames easily\n\n\n## Map\n\n`map()` has two arguments, a vector and a function. It performs the function on each element of the vector and returns a list. We can also pass in some additional argument into the function.\n\n\n::: {.cell}\n::: {.cell-output-display}\n![](images/9_2_3_map-arg.png){width=448}\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsimple_map <- function(x, f, ...) {\nout <- vector(\"list\", length(x))\nfor (i in seq_along(x)) {\nout[[i]] <- f(x[[i]], ...)\n}\nout\n}\n```\n:::\n\n\n## Benefit of using the map function in purrr {-}\n\n- `purrr::map()` is equivalent to `lapply()`\n\n- returns a list and is the most general\n\n- the length of the input == the length of the output\n\n- `map()` is more flexible, with additional arguments allowed\n\n- `map()` has a host of extensions\n\n\n\n\n::: {.cell}\n\n:::\n\n\n## Atomic vectors {-}\n\n\n- has 4 variants to return atomic vectors\n    - `map_chr()`\n    - `map_dbl()`\n    - `map_int()`\n    - `map_lgl()`\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntriple <- function(x) x * 3\nmap(.x=1:3, .f=triple)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [[1]]\n#> [1] 3\n#> \n#> [[2]]\n#> [1] 6\n#> \n#> [[3]]\n#> [1] 9\n```\n\n\n:::\n\n```{.r .cell-code}\nmap_dbl(.x=1:3, .f=triple)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 3 6 9\n```\n\n\n:::\n\n```{.r .cell-code}\nmap_lgl(.x=c(1, NA, 3), .f=is.na)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] FALSE  TRUE FALSE\n```\n\n\n:::\n:::\n\n\n## Anonymous functions and shortcuts  {-}\n\n **Anonymous functions** \n\n::: {.cell}\n\n```{.r .cell-code}\nmap_dbl(.x=mtcars, .f=function(x) mean(x, na.rm = TRUE)) |> \n  head()\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#>        mpg        cyl       disp         hp       drat         wt \n#>  20.090625   6.187500 230.721875 146.687500   3.596563   3.217250\n```\n\n\n:::\n:::\n\n\n- the \"twiddle\" uses a twiddle `~` to set a formula\n- can use `.x` to reference the input `map(.x = ..., .f = )`\n\n::: {.cell}\n\n```{.r .cell-code}\nmap_dbl(.x=mtcars,  .f=~mean(.x, na.rm = TRUE))\n```\n:::\n\n\n- can be simplified further as\n\n::: {.cell}\n\n```{.r .cell-code}\nmap_dbl(.x=mtcars, .f=mean, na.rm = TRUE)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#>        mpg        cyl       disp         hp       drat         wt       qsec \n#>  20.090625   6.187500 230.721875 146.687500   3.596563   3.217250  17.848750 \n#>         vs         am       gear       carb \n#>   0.437500   0.406250   3.687500   2.812500\n```\n\n\n:::\n:::\n\n\n- what happens when we try a handful of variants of the task at hand?  (how many unique values are there for each variable?)\n\nNote that `.x` is the **name** of the first argument in `map()` (`.f` is the name of the second argument).\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# the task\nmap_dbl(mtcars, function(x) length(unique(x)))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#>  mpg  cyl disp   hp drat   wt qsec   vs   am gear carb \n#>   25    3   27   22   22   29   30    2    2    3    6\n```\n\n\n:::\n\n```{.r .cell-code}\nmap_dbl(mtcars, function(unicorn) length(unique(unicorn)))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#>  mpg  cyl disp   hp drat   wt qsec   vs   am gear carb \n#>   25    3   27   22   22   29   30    2    2    3    6\n```\n\n\n:::\n\n```{.r .cell-code}\nmap_dbl(mtcars, ~length(unique(.x)))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#>  mpg  cyl disp   hp drat   wt qsec   vs   am gear carb \n#>   25    3   27   22   22   29   30    2    2    3    6\n```\n\n\n:::\n\n```{.r .cell-code}\nmap_dbl(mtcars, ~length(unique(..1)))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#>  mpg  cyl disp   hp drat   wt qsec   vs   am gear carb \n#>   25    3   27   22   22   29   30    2    2    3    6\n```\n\n\n:::\n\n```{.r .cell-code}\nmap_dbl(mtcars, ~length(unique(.)))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#>  mpg  cyl disp   hp drat   wt qsec   vs   am gear carb \n#>   25    3   27   22   22   29   30    2    2    3    6\n```\n\n\n:::\n\n```{.r .cell-code}\n# not the task\nmap_dbl(mtcars, length)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#>  mpg  cyl disp   hp drat   wt qsec   vs   am gear carb \n#>   32   32   32   32   32   32   32   32   32   32   32\n```\n\n\n:::\n\n```{.r .cell-code}\nmap_dbl(mtcars, length(unique))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#>    mpg    cyl   disp     hp   drat     wt   qsec     vs     am   gear   carb \n#>  21.00   6.00 160.00 110.00   3.90   2.62  16.46   0.00   1.00   4.00   4.00\n```\n\n\n:::\n\n```{.r .cell-code}\nmap_dbl(mtcars, 1)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#>    mpg    cyl   disp     hp   drat     wt   qsec     vs     am   gear   carb \n#>  21.00   6.00 160.00 110.00   3.90   2.62  16.46   0.00   1.00   4.00   4.00\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\n#error\nmap_dbl(mtcars, length(unique()))\n```\n\n::: {.cell-output .cell-output-error}\n\n```\n#> Error in unique.default(): argument \"x\" is missing, with no default\n```\n\n\n:::\n\n```{.r .cell-code}\nmap_dbl(mtcars, ~length(unique(x)))\n```\n\n::: {.cell-output .cell-output-error}\n\n```\n#> Error in `map_dbl()`:\n#> ℹ In index: 1.\n#> ℹ With name: mpg.\n#> Caused by error in `.f()`:\n#> ! object 'x' not found\n```\n\n\n:::\n:::\n\n\n\n## Modify {-}\n\nSometimes we might want the output to be the same as the input, then in that case we can use the modify function rather than map\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndf <- data.frame(x=1:3,y=6:4)\n\nmap(df, .f=~.x*3)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> $x\n#> [1] 3 6 9\n#> \n#> $y\n#> [1] 18 15 12\n```\n\n\n:::\n\n```{.r .cell-code}\nmodify(.x=df,.f=~.x*3)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#>   x  y\n#> 1 3 18\n#> 2 6 15\n#> 3 9 12\n```\n\n\n:::\n:::\n\n\nNote that `modify()` always returns the same type of output (which is not necessarily true with `map()`).  Additionally, `modify()` does not actually change the value of `df`.\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndf\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#>   x y\n#> 1 1 6\n#> 2 2 5\n#> 3 3 4\n```\n\n\n:::\n:::\n\n\n\n## `purrr` style\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmtcars |> \n  map(head, 20) |> # pull first 20 of each column\n  map_dbl(mean) |> # mean of each vector\n  head()\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#>       mpg       cyl      disp        hp      drat        wt \n#>  20.13000   6.20000 233.93000 136.20000   3.54500   3.39845\n```\n\n\n:::\n:::\n\n\nAn example from `tidytuesday`\n\n::: {.cell}\n\n```{.r .cell-code}\ntt <- tidytuesdayR::tt_load(\"2020-06-30\")\n\n# filter data & exclude columns with lost of nulls\nlist_df <- \n  map(\n    .x = tt[1:3], \n    .f = \n      ~ .x |> \n      filter(issue <= 152 | issue > 200) |> \n      mutate(timeframe = ifelse(issue <= 152, \"first 5 years\", \"last 5 years\")) |> \n      select_if(~mean(is.na(.x)) < 0.2) \n  )\n\n\n# write to global environment\niwalk(\n  .x = list_df,\n  .f = ~ assign(x = .y, value = .x, envir = globalenv())\n)\n```\n:::\n\n\n## `map_*()` variants \n\nThere are many variants\n\n![](images/map_variants.png)\n\n\n## `map2_*()` {-}\n\n- raise each value `.x` by 2\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmap_dbl(\n  .x = 1:5, \n  .f = function(x) x ^ 2\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1]  1  4  9 16 25\n```\n\n\n:::\n:::\n\n\n- raise each value `.x` by another value `.y`\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmap2_dbl(\n  .x = 1:5, \n  .y = 2:6, \n  .f = ~ (.x ^ .y)\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1]     1     8    81  1024 15625\n```\n\n\n:::\n:::\n\n\n\n## The benefit of using the map over apply family of function {-}\n- It is written in C\n- It preserves names\n- We always know the return value type\n- We can apply the function for multiple input values\n- We can pass additional arguments into the function\n\n\n## `walk()` {-}\n\n\n- We use `walk()` when we want to call a function for it side effect(s) rather than its return value, like generating plots, `write.csv()`, or `ggsave()`. If you don't want a return value, `map()` will print more info than you may want.\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmap(1:3, ~cat(.x, \"\\n\"))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> 1 \n#> 2 \n#> 3\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [[1]]\n#> NULL\n#> \n#> [[2]]\n#> NULL\n#> \n#> [[3]]\n#> NULL\n```\n\n\n:::\n:::\n\n\n- for these cases, use `walk()` instead\n\n::: {.cell}\n\n```{.r .cell-code}\nwalk(1:3, ~cat(.x, \"\\n\"))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> 1 \n#> 2 \n#> 3\n```\n\n\n:::\n:::\n\n\n`cat()` does have a result, it's just usually returned invisibly.\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncat(\"hello\")\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> hello\n```\n\n\n:::\n\n```{.r .cell-code}\n(cat(\"hello\"))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> hello\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> NULL\n```\n\n\n:::\n:::\n\n\n\nWe can use `pwalk()` to save a list of plot to disk.  Note that the \"p\" in `pwalk()` means that we have more than 1 (or 2) variables to pipe into the function.  Also note that the name of the first argument in all of the \"p\" functions is now `.l` (instead of `.x`).\n\n\n::: {.cell}\n\n```{.r .cell-code}\nplots <- mtcars |>  \n  split(mtcars$cyl) |>  \n  map(~ggplot(.x, aes(mpg,wt)) +\n        geom_point())\n\npaths <- stringr::str_c(names(plots), '.png')\n\npwalk(.l = list(paths,plots), .f = ggsave, path = tempdir())\n```\n\n::: {.cell-output .cell-output-stderr}\n\n```\n#> Saving 7 x 5 in image\n#> Saving 7 x 5 in image\n#> Saving 7 x 5 in image\n```\n\n\n:::\n\n```{.r .cell-code}\npmap(.l = list(paths,plots), .f = ggsave, path = tempdir())\n```\n\n::: {.cell-output .cell-output-stderr}\n\n```\n#> Saving 7 x 5 in image\n#> Saving 7 x 5 in image\n#> Saving 7 x 5 in image\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [[1]]\n#> [1] \"C:\\\\Users\\\\jonth\\\\AppData\\\\Local\\\\Temp\\\\RtmpSMVSZz/4.png\"\n#> \n#> [[2]]\n#> [1] \"C:\\\\Users\\\\jonth\\\\AppData\\\\Local\\\\Temp\\\\RtmpSMVSZz/6.png\"\n#> \n#> [[3]]\n#> [1] \"C:\\\\Users\\\\jonth\\\\AppData\\\\Local\\\\Temp\\\\RtmpSMVSZz/8.png\"\n```\n\n\n:::\n:::\n\n\n- walk, walk2 and pwalk all invisibly return .x the first argument. This makes them suitable for use in the middle of pipelines.\n\n- note: I don't think that it is \"`.x`\" (or \"`.l`\") that they are returning invisibly.  But I'm not sure what it is.  Hadley says:\n\n> purrr provides the walk family of functions that ignore the return values of the `.f` and instead return `.x` invisibly.\n\nBut not in the first `cat()` example, it is the `NULL` values that get returned invisibly (those aren't the same as `.x`).\n\n## `imap()` {-}\n\n- `imap()` is like `map2()`except that `.y` is derived from `names(.x)` if named or `seq_along(.x)` if not.\n\n- These two produce the same result\n\n\n::: {.cell}\n\n```{.r .cell-code}\nimap_chr(.x = mtcars, \n         .f = ~ paste(.y, \"has a mean of\", round(mean(.x), 1))) |> \nhead()\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#>                        mpg                        cyl \n#>   \"mpg has a mean of 20.1\"    \"cyl has a mean of 6.2\" \n#>                       disp                         hp \n#> \"disp has a mean of 230.7\"   \"hp has a mean of 146.7\" \n#>                       drat                         wt \n#>   \"drat has a mean of 3.6\"     \"wt has a mean of 3.2\"\n```\n\n\n:::\n\n```{.r .cell-code}\nmap2_chr(.x = mtcars, \n         .y = names(mtcars),\n         .f = ~ paste(.y, \"has a mean of\", round(mean(.x), 1))) |> \nhead()\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#>                        mpg                        cyl \n#>   \"mpg has a mean of 20.1\"    \"cyl has a mean of 6.2\" \n#>                       disp                         hp \n#> \"disp has a mean of 230.7\"   \"hp has a mean of 146.7\" \n#>                       drat                         wt \n#>   \"drat has a mean of 3.6\"     \"wt has a mean of 3.2\"\n```\n\n\n:::\n:::\n\n\n\n## `pmap()` {-}\n\n- you can pass a named list or dataframe as arguments to a function\n\n- for example `runif()` has the parameters `n`, `min` and `max`\n\n\n::: {.cell}\n\n```{.r .cell-code}\nparams <- tibble::tribble(\n  ~ n, ~ min, ~ max,\n   1L,     1,    10,\n   2L,    10,   100,\n   3L,   100,  1000\n)\n\npmap(params, runif)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [[1]]\n#> [1] 9.52234\n#> \n#> [[2]]\n#> [1] 49.53679 46.47017\n#> \n#> [[3]]\n#> [1] 488.8100 796.6801 282.7772\n```\n\n\n:::\n:::\n\n\n- could also be\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlist(\n  n = 1:3, \n  min = 10 ^ (0:2), \n  max = 10 ^ (1:3)\n) |> \npmap(runif)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [[1]]\n#> [1] 5.246834\n#> \n#> [[2]]\n#> [1] 73.39068 30.57879\n#> \n#> [[3]]\n#> [1] 169.6667 950.1126 820.9357\n```\n\n\n:::\n:::\n\n\n- I like to use `expand_grid()` when I want all possible parameter combinations.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nexpand_grid(n = 1:3,\n            min = 10 ^ (0:1),\n            max = 10 ^ (1:2))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> # A tibble: 12 × 3\n#>        n   min   max\n#>    <int> <dbl> <dbl>\n#>  1     1     1    10\n#>  2     1     1   100\n#>  3     1    10    10\n#>  4     1    10   100\n#>  5     2     1    10\n#>  6     2     1   100\n#>  7     2    10    10\n#>  8     2    10   100\n#>  9     3     1    10\n#> 10     3     1   100\n#> 11     3    10    10\n#> 12     3    10   100\n```\n\n\n:::\n\n```{.r .cell-code}\nexpand_grid(n = 1:3,\n            min = 10 ^ (0:1),\n            max = 10 ^ (1:2)) |> \npmap(runif)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [[1]]\n#> [1] 9.474848\n#> \n#> [[2]]\n#> [1] 10.63548\n#> \n#> [[3]]\n#> [1] 10\n#> \n#> [[4]]\n#> [1] 92.44257\n#> \n#> [[5]]\n#> [1] 7.165047 6.201947\n#> \n#> [[6]]\n#> [1] 64.79074 16.54110\n#> \n#> [[7]]\n#> [1] 10 10\n#> \n#> [[8]]\n#> [1] 62.12314 52.31713\n#> \n#> [[9]]\n#> [1] 6.806213 5.541865 8.580469\n#> \n#> [[10]]\n#> [1]  7.10806 51.56879 85.70133\n#> \n#> [[11]]\n#> [1] 10 10 10\n#> \n#> [[12]]\n#> [1] 74.48871 11.65879 58.31278\n```\n\n\n:::\n:::\n\n\n\n\n## `reduce()` family\n\nThe `reduce()` function is a powerful functional that allows you to abstract away from a sequence of functions that are applied in a fixed direction.\n\n`reduce()` takes a vector as its first argument, a function as its second argument, and an optional `.init` argument last.  It will then apply the function repeatedly to the vector until there is only a single element left.\n\n(Hint: start at the top of the image and read down.)\n\n\n::: {.cell}\n::: {.cell-output-display}\n![](images/reduce-init.png){width=508}\n:::\n:::\n\n\n\nLet me really quickly demonstrate `reduce()` in action.\n\nSay you wanted to add up the numbers 1 through 5 using only the plus operator `+`. You could do something like:\n\n\n::: {.cell}\n\n```{.r .cell-code}\n1 + 2 + 3 + 4 + 5\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 15\n```\n\n\n:::\n:::\n\n\nWhich is the same as:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nreduce(1:5, `+`)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 15\n```\n\n\n:::\n:::\n\n\nAnd if you want the start value to be something that is not the first argument of the vector, pass that value to the .init argument:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nidentical(\n  0.5 + 1 + 2 + 3 + 4 + 5,\n  reduce(1:5, `+`, .init = 0.5)\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] TRUE\n```\n\n\n:::\n:::\n\n\n## ggplot2 example with reduce {-}\n\n\n::: {.cell}\n\n```{.r .cell-code}\nggplot(mtcars, aes(hp, mpg)) + \n  geom_point(size = 8, alpha = .5, color = \"yellow\") +\n  geom_point(size = 4, alpha = .5, color = \"red\") +\n  geom_point(size = 2, alpha = .5, color = \"blue\")\n```\n\n::: {.cell-output-display}\n![](09_files/figure-html/unnamed-chunk-28-1.png){width=672}\n:::\n:::\n\n\nLet us use the `reduce()` function.  Note that `reduce2()` takes two arguments, but the first value (`..1`) is given by the `.init` value.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nreduce2(\n  c(8, 4, 2),\n  c(\"yellow\", \"red\", \"blue\"),\n  ~ ..1 + geom_point(size = ..2, alpha = .5, color = ..3),\n  .init = ggplot(mtcars, aes(hp, mpg))\n)\n```\n\n::: {.cell-output-display}\n![](09_files/figure-html/unnamed-chunk-29-1.png){width=672}\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndf <- list(age=tibble(name='john',age=30),\n    sex=tibble(name=c('john','mary'),sex=c('M','F'),\n    trt=tibble(name='Mary',treatment='A')))\n\ndf\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> $age\n#> # A tibble: 1 × 2\n#>   name    age\n#>   <chr> <dbl>\n#> 1 john     30\n#> \n#> $sex\n#> # A tibble: 2 × 3\n#>   name  sex   trt$name $treatment\n#>   <chr> <chr> <chr>    <chr>     \n#> 1 john  M     Mary     A         \n#> 2 mary  F     Mary     A\n```\n\n\n:::\n\n```{.r .cell-code}\ndf |> reduce(.f = full_join)\n```\n\n::: {.cell-output .cell-output-stderr}\n\n```\n#> Joining with `by = join_by(name)`\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> # A tibble: 2 × 4\n#>   name    age sex   trt$name $treatment\n#>   <chr> <dbl> <chr> <chr>    <chr>     \n#> 1 john     30 M     Mary     A         \n#> 2 mary     NA F     Mary     A\n```\n\n\n:::\n\n```{.r .cell-code}\nreduce(.x = df,.f = full_join)\n```\n\n::: {.cell-output .cell-output-stderr}\n\n```\n#> Joining with `by = join_by(name)`\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> # A tibble: 2 × 4\n#>   name    age sex   trt$name $treatment\n#>   <chr> <dbl> <chr> <chr>    <chr>     \n#> 1 john     30 M     Mary     A         \n#> 2 mary     NA F     Mary     A\n```\n\n\n:::\n:::\n\n\n- to see all intermediate steps, use **accumulate()**\n\n\n::: {.cell}\n\n```{.r .cell-code}\nset.seed(1234)\naccumulate(1:5, `+`)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1]  1  3  6 10 15\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\naccumulate2(\n  c(8, 4, 2),\n  c(\"yellow\", \"red\", \"blue\"),\n  ~ ..1 + geom_point(size = ..2, alpha = .5, color = ..3),\n  .init = ggplot(mtcars, aes(hp, mpg))\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [[1]]\n```\n\n\n:::\n\n::: {.cell-output-display}\n![](09_files/figure-html/unnamed-chunk-32-1.png){width=672}\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> \n#> [[2]]\n```\n\n\n:::\n\n::: {.cell-output-display}\n![](09_files/figure-html/unnamed-chunk-32-2.png){width=672}\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> \n#> [[3]]\n```\n\n\n:::\n\n::: {.cell-output-display}\n![](09_files/figure-html/unnamed-chunk-32-3.png){width=672}\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> \n#> [[4]]\n```\n\n\n:::\n\n::: {.cell-output-display}\n![](09_files/figure-html/unnamed-chunk-32-4.png){width=672}\n:::\n:::\n\n\n\n## `map_df*()` variants {-}\n\n- `map_dfr()` = row bind the results\n\n- `map_dfc()` = column bind the results\n\n- Note that `map_dfr()` has been superseded by `map() |> list_rbind()` and `map_dfc()` has been superseded by `map() |> list_cbind()`\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncol_stats <- function(n) {\n  head(mtcars, n) |> \n    summarise_all(mean) |> \n    mutate_all(floor) |> \n    mutate(n = paste(\"N =\", n))\n}\n\nmap((1:2) * 10, col_stats)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [[1]]\n#>   mpg cyl disp  hp drat wt qsec vs am gear carb      n\n#> 1  20   5  208 122    3  3   18  0  0    3    2 N = 10\n#> \n#> [[2]]\n#>   mpg cyl disp  hp drat wt qsec vs am gear carb      n\n#> 1  20   6  233 136    3  3   18  0  0    3    2 N = 20\n```\n\n\n:::\n\n```{.r .cell-code}\nmap_dfr((1:2) * 10, col_stats)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#>   mpg cyl disp  hp drat wt qsec vs am gear carb      n\n#> 1  20   5  208 122    3  3   18  0  0    3    2 N = 10\n#> 2  20   6  233 136    3  3   18  0  0    3    2 N = 20\n```\n\n\n:::\n\n```{.r .cell-code}\nmap((1:2) * 10, col_stats) |> list_rbind()\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#>   mpg cyl disp  hp drat wt qsec vs am gear carb      n\n#> 1  20   5  208 122    3  3   18  0  0    3    2 N = 10\n#> 2  20   6  233 136    3  3   18  0  0    3    2 N = 20\n```\n\n\n:::\n:::\n\n\n---\n\n## `pluck()` {-}\n\n- `pluck()` will pull a single element from a list\n\nI like the example from the book because the starting object is not particularly easy to work with (as many JSON objects might not be).\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmy_list <- list(\n  list(-1, x = 1, y = c(2), z = \"a\"),\n  list(-2, x = 4, y = c(5, 6), z = \"b\"),\n  list(-3, x = 8, y = c(9, 10, 11))\n)\nmy_list\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [[1]]\n#> [[1]][[1]]\n#> [1] -1\n#> \n#> [[1]]$x\n#> [1] 1\n#> \n#> [[1]]$y\n#> [1] 2\n#> \n#> [[1]]$z\n#> [1] \"a\"\n#> \n#> \n#> [[2]]\n#> [[2]][[1]]\n#> [1] -2\n#> \n#> [[2]]$x\n#> [1] 4\n#> \n#> [[2]]$y\n#> [1] 5 6\n#> \n#> [[2]]$z\n#> [1] \"b\"\n#> \n#> \n#> [[3]]\n#> [[3]][[1]]\n#> [1] -3\n#> \n#> [[3]]$x\n#> [1] 8\n#> \n#> [[3]]$y\n#> [1]  9 10 11\n```\n\n\n:::\n:::\n\n\nNotice that the \"first element\" means something different in standard `pluck()` versus `map`ped `pluck()`.\n\n\n::: {.cell}\n\n```{.r .cell-code}\npluck(my_list, 1)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [[1]]\n#> [1] -1\n#> \n#> $x\n#> [1] 1\n#> \n#> $y\n#> [1] 2\n#> \n#> $z\n#> [1] \"a\"\n```\n\n\n:::\n\n```{.r .cell-code}\nmap(my_list, pluck, 1)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [[1]]\n#> [1] -1\n#> \n#> [[2]]\n#> [1] -2\n#> \n#> [[3]]\n#> [1] -3\n```\n\n\n:::\n\n```{.r .cell-code}\nmap_dbl(my_list, pluck, 1)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] -1 -2 -3\n```\n\n\n:::\n:::\n\n\nThe `map()` functions also have shortcuts for extracting elements from vectors (powered by `purrr::pluck()`).  Note that `map(my_list, 3)` is a shortcut for `map(my_list, pluck, 3)`.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Select by name\nmap_dbl(my_list, \"x\")\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1 4 8\n```\n\n\n:::\n\n```{.r .cell-code}\n# Or by position\nmap_dbl(my_list, 1)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] -1 -2 -3\n```\n\n\n:::\n\n```{.r .cell-code}\n# Or by both\nmap_dbl(my_list, list(\"y\", 1))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 2 5 9\n```\n\n\n:::\n\n```{.r .cell-code}\n# You'll get an error if you try to retrieve an inside item that doesn't have \n# a consistent format and you want a numeric output\nmap_dbl(my_list, list(\"y\"))\n```\n\n::: {.cell-output .cell-output-error}\n\n```\n#> Error in `map_dbl()`:\n#> ℹ In index: 2.\n#> Caused by error:\n#> ! Result must be length 1, not 2.\n```\n\n\n:::\n\n```{.r .cell-code}\n# You'll get an error if a component doesn't exist:\nmap_chr(my_list, \"z\")\n```\n\n::: {.cell-output .cell-output-error}\n\n```\n#> Error in `map_chr()`:\n#> ℹ In index: 3.\n#> Caused by error:\n#> ! Result must be length 1, not 0.\n```\n\n\n:::\n\n```{.r .cell-code}\n#> Error: Result 3 must be a single string, not NULL of length 0\n\n# Unless you supply a .default value\nmap_chr(my_list, \"z\", .default = NA)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"a\" \"b\" NA\n```\n\n\n:::\n\n```{.r .cell-code}\n#> [1] \"a\" \"b\" NA\n```\n:::\n\n\n\n## Not covered: `flatten()` {-}\n\n- `flatten()` will turn a list of lists into a simpler vector.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmy_list <-\n  list(\n    a = 1:3,\n    b = list(1:3)\n  )\n\nmy_list\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> $a\n#> [1] 1 2 3\n#> \n#> $b\n#> $b[[1]]\n#> [1] 1 2 3\n```\n\n\n:::\n\n```{.r .cell-code}\nmap_if(my_list, is.list, pluck)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> $a\n#> [1] 1 2 3\n#> \n#> $b\n#> $b[[1]]\n#> [1] 1 2 3\n```\n\n\n:::\n\n```{.r .cell-code}\nmap_if(my_list, is.list, flatten_int)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> $a\n#> [1] 1 2 3\n#> \n#> $b\n#> [1] 1 2 3\n```\n\n\n:::\n\n```{.r .cell-code}\nmap_if(my_list, is.list, flatten_int) |> \n  flatten_int()\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1 2 3 1 2 3\n```\n\n\n:::\n:::\n\n\n## Dealing with Failures {-}\n\n## Safely {-}\n\n`safely()` is an adverb.  It takes a function (a verb) and returns a modified version. In this case, the modified function will never throw an error. Instead it always returns a list with two elements.\n\n- `result` is the original result. If there is an error this will be NULL\n\n- `error` is an error object. If the operation was successful the \"`error`\" will be NULL.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nA <- list(1, 10, \"a\")\n\nmap(.x = A, .f = safely(log))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [[1]]\n#> [[1]]$result\n#> [1] 0\n#> \n#> [[1]]$error\n#> NULL\n#> \n#> \n#> [[2]]\n#> [[2]]$result\n#> [1] 2.302585\n#> \n#> [[2]]$error\n#> NULL\n#> \n#> \n#> [[3]]\n#> [[3]]$result\n#> NULL\n#> \n#> [[3]]$error\n#> <simpleError in .Primitive(\"log\")(x, base): non-numeric argument to mathematical function>\n```\n\n\n:::\n:::\n\n\n## Possibly {-}\n\n`possibly()` always succeeds. It is simpler than `safely()`, because you can give it a default value to return when there is an error.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nA <- list(1,10,\"a\")\n\nmap_dbl(.x = A, .f = possibly(log, otherwise = NA_real_) )\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 0.000000 2.302585       NA\n```\n\n\n:::\n:::\n\n",
      6     "supporting": [
      7       "09_files"
      8     ],
      9     "filters": [
     10       "rmarkdown/pagebreak.lua"
     11     ],
     12     "includes": {},
     13     "engineDependencies": {},
     14     "preserve": {},
     15     "postProcess": true
     16   }
     17 }