bookclub-advr

DSLC Advanced R Book Club
git clone https://git.eamoncaddigan.net/bookclub-advr.git
Log | Files | Refs | README | LICENSE

commit 7a1c1fe3dd18c20bd3c595b712a2a3870cbd4b8a
parent bca18621e37d40974bc265a35cdf1f26f3323fec
Author: Steffi LaZerte <steffi@steffi.ca>
Date:   Thu,  1 Aug 2024 12:17:21 -0400

Steffi's notes for Friday's cohort meeting (#67)


Diffstat:
M10_Function_factories.Rmd | 350++++++++++++++++++++++++++++++++++++++++++++++++++++++-------------------------
1 file changed, 242 insertions(+), 108 deletions(-)

diff --git a/10_Function_factories.Rmd b/10_Function_factories.Rmd @@ -7,23 +7,17 @@ - Learn about non-obvious combination of function features - Generate a family of functions from data - - -## Introduction - - -In this chapter we'll talk about function factories. - -A type of functions that make other functions. - -We'll tackle some challegenges in understanding how to make a workflow of functions. - +```{r, message = FALSE} +library(rlang) +library(ggplot2) +library(scales) +``` ## What is a function factory? -A **function factory** is a function that makes functions +A **function factory** is a function that makes (returns) functions Factory made function are **manufactured functions**. @@ -50,124 +44,98 @@ cube <- power1(3) ``` `power1()` is the function factory and `square()` and `cube()` are manufactured functions. +## Important to remember -**What we know already: Function Features** +1. R has First-class functions (can be created with `function()` and `<-`) -1. first-class functions -```{r 10-ex2, eval=FALSE} -name <- function(variables) { - -} -``` +> R functions are objects in their own right, a language property often called “first-class functions” +> -- [Section 6.2.3](https://adv-r.hadley.nz/functions.html?q=first%20class#first-class-functions) + +2. Functions capture (enclose) environment in which they are created -2. function environment `fn_env()` ```{r 10-ex3} -y <- 1 -f <- function(x) x + y -rlang::fn_env(f) +f <- function(x) function(y) x + y +fn_env(f) # The function f() +fn_env(f()) # The function created by f() ``` -3. function execution environment +3. Functions create a new environment on each run ```{r 10-ex4} -h <- function(x) { - # 1. - a <- 2 # 2. assingned value in the environment - x + a +f <- function(x) { + function() x + 1 } -y <- h(1) # 3. +ff <- f(1) +ff() +ff() ``` +## Fundamentals - Environment -A **function factory** is defined as a **functional programming tools**, as well as **functionals**, and **function operators**. - -An application would be to tackle the understanding of: - -- **Box-Cox transformation** -- **maximum likelihood** problems -- **bootstrap resamples** +- Environment when function is created defines arguments in the function +- Use `env_print(fun)` and `fn_env()` to explore +```{r} +env_print(square) +fn_env(square)$exp +``` -### Factory fundamentals +![Blue indicates environment, arrows bindings](images/10-3-procedure.png){width=50% fig-align=center} -- Environments -- Force calculation -- Super assignments -- Cleaning up +## Fundamentals - Forcing +- Lazy evaluation means arguments only evaluated when used +- "[can] lead to a real head-scratcher of a bug" -```{r 10-lib} -library(rlang) -library(ggplot2) -library(scales) +```{r} +x <- 2 +square <- power1(x) +x <- 3 +square(4) ``` -Looking at manufactored functions, in our example the `square()` function, and imagine to do not know anything about it. The environment can be investigated with `env_print()` function. Two different enviroments can be found, with the same **parent**. +- *Only applies if passing object as argument* +- Here argument `2` evaluated when function called -```{r 10-6} -rlang::env_print(square) +```{r} +square <- power1(2) +x <- 3 +square(4) ``` -To know more about it, **exp** is visible as the engine of the function, the computed value (`square <- power1(2)`) can be retrieved like this: -```{r 10-7} -fn_env(square)$exp -``` -```{r 10-8, echo=FALSE, fig.align="center", fig.dim="50%"} -knitr::include_graphics("images/10-3-procedure.png") -``` +So use `force()`! (Unless you want it to change with the `x` in the parent environment) -To make sure every argument is evaluated, whenever x is assigned to a different value. -```{r 10-9, eval=FALSE} -force() -``` +## Forcing - Reiterated -```{r 10-10} +Only required if the argument is **not** evaluated before the new function is created: +```{r} power1 <- function(exp) { - function(x) { - x ^ exp - } + stopifnot(is.numeric(exp)) + function(x) x ^ exp } -square <- power1(x) + x <- 2 -square(2) -``` -Assign a value to the variable x, while it goes in the environment, see what's happen to our function: -```{r 10-11} +square <- power1(x) x <- 3 -square(2) +square(4) ``` -This is not necessarely wrong, but what if we'd like to set a variable **x** which can assume other values in the environment? +## Fundamentals - Stateful functions -```{r 10-12} -power2 <- function(exp) { - force(exp) - function(x) { - x ^ exp - } -} +Because -x <- 2 -square <- power2(x) -square(2) -``` -Look what's happen now: -```{r 10-13} -x <- 3 -square(2) -``` -Even if we assigned a new value to **x** and it is stored in the environment, our function doesn't take consideration of it and keep doing its calculation based on the first variable assignment. +- The enclosing environment is unique and constant, and +- We have `<<-` (super assignment) -**The super assignment operator** +We can *change* that enclosing environment and keep track of that state +across iterations (!) -```{r 10-14, eval=FALSE} -<<- -``` +- `<-` Assignment in *current* environment +- `<<-` Assignment in *parent* environment -Another way is to set the function to **mantain the state**, in a way that each invocation will be completely independent, and a new *environment* is created, because considered as a **fresh start** (6.4.3). ```{r 10-15} new_counter <- function() { - i <- 0 # first assignment inside the function (fresh start) + i <- 0 function() { i <<- i + 1 # second assignment (super assignment) i @@ -176,41 +144,207 @@ new_counter <- function() { counter_one <- new_counter() counter_two <- new_counter() -counter_one() -counter_one() -counter_two() +c(counter_one(), counter_one(), counter_one()) +c(counter_two(), counter_two(), counter_two()) ``` > "As soon as your function starts managing the state of multiple variables, it’s better to switch to R6" +## Fundamentals - Garbage collection + +- Because environment is attached to (enclosed by) function, temporary objects +don't go away. + **Cleaning up** using `rm()` inside a function: ```{r 10-16} -f1 <- function(n) { +f_dirty <- function(n) { x <- runif(n) m <- mean(x) function() m } -g1 <- f1(1e6) -lobstr::obj_size(g1) -#> 8,013,104 B - -f2 <- function(n) { +f_clean <- function(n) { x <- runif(n) m <- mean(x) - rm(x) + rm(x) # <---- Important part! function() m } -g2 <- f2(1e6) -lobstr::obj_size(g2) -#> 12,944 B +lobstr::obj_size(f_dirty(1e6)) +lobstr::obj_size(f_clean(1e6)) + +``` + + +## Useful Examples - Histograms and binwidth + +**Useful when...** + +- You need to pass a function +- You don't want to have to re-write the function every time + (the *default* behaviour of the function should be flexible) + + +For example, these bins are not appropriate +```{r} +#| fig-asp: 0.3 +sd <- c(1, 5, 15) +n <- 100 +df <- data.frame(x = rnorm(3 * n, sd = sd), sd = rep(sd, n)) + +ggplot(df, aes(x)) + + geom_histogram(binwidth = 2) + + facet_wrap(~ sd, scales = "free_x") + + labs(x = NULL) +``` + +We could just make a function... +```{r} +#| fig-asp: 0.3 +binwidth_bins <- function(x) (max(x) - min(x)) / 20 + +ggplot(df, aes(x = x)) + + geom_histogram(binwidth = binwidth_bins) + + facet_wrap(~ sd, scales = "free_x") + + labs(x = NULL) +``` + +But if we want to change the number of bins (20) we'd have to re-write the function +each time. + +If we use a factory, we don't have to do that. +```{r} +#| fig-asp: 0.3 +binwidth_bins <- function(n) { + force(n) + function(x) (max(x) - min(x)) / n +} + +ggplot(df, aes(x = x)) + + geom_histogram(binwidth = binwidth_bins(20)) + + facet_wrap(~ sd, scales = "free_x") + + labs(x = NULL, title = "20 bins") + +ggplot(df, aes(x = x)) + + geom_histogram(binwidth = binwidth_bins(5)) + + facet_wrap(~ sd, scales = "free_x") + + labs(x = NULL, title = "5 bins") ``` +> Similar benefit in Box-cox example + +## Useful Examples - Wrapper +**Useful when...** -### Example of: Graphical factories +- You want to create a function that wraps a bunch of other functions + +For example, `ggsave()` wraps a bunch of different graphics device functions: + +```{r} +# (Even more simplified) +plot_dev <- function(ext, dpi = 96) { + force(dpi) + + switch( + ext, + svg = function(filename, ...) svglite::svglite(file = filename, ...), + png = function(...) grDevices::png(..., res = dpi, units = "in"), + jpg = , + jpeg = function(...) grDevices::jpeg(..., res = dpi, units = "in"), + stop("Unknown graphics extension: ", ext, call. = FALSE) + ) +} +``` + +Then `ggsave()` uses + +``` +ggsave <- function(...) { + dev <- plot_dev(device, filename, dpi = dpi) + ... + dev(filename = filename, width = dim[1], height = dim[2], bg = bg, ...) + ... +} +``` + +Otherwise, would have to do something like like a bunch of if/else statements. + +## Useful Examples - Optimizing + +**Useful when...** + +- Want to pass function on to `optimise()`/`optimize()` +- Want to perform pre-computations to speed things up +- Want to re-use this for other datasets + +(*Skipping to final results from section*) + +Here, using MLE want to to find the most likely value of lambda for a Poisson distribution +and this data. +```{r} +x1 <- c(41, 30, 31, 38, 29, 24, 30, 29, 31, 38) +``` + +We'll create a function that creates a lambda assessment function for a given +data set. + +```{r} +ll_poisson <- function(x) { + n <- length(x) + sum_x <- sum(x) + c <- sum(lfactorial(x)) + + function(lambda) { + log(lambda) * sum_x - n * lambda - c + } +} +``` + +We can use this on different data sets, but here use ours `x1` +```{r} +ll <- ll_poisson(x1) +ll(10) # Log-probility of a lambda = 10 +``` + +Use `optimise()` rather than trial and error +```{r} +optimise(ll, c(0, 100), maximum = TRUE) +``` + +Result: Highest log-probability is -30.3, best lambda is 32.1 + + +## Function factories + functionals + +Combine functionals and function factories to turn data into many functions. + +```{r} +names <- list( + square = 2, + cube = 3, + root = 1/2, + cuberoot = 1/3, + reciprocal = -1 +) +funs <- purrr::map(names, power1) +names(funs) +funs$root(64) +funs$square(3) +``` + +Avoid the prefix with + +- `with()` - `with(funs, root(100))` + - Temporary, clear, short-term +- `attach()` - `attach(funs)` / `detach(funs)` + - Added to search path (like package function), cannot be overwritten, but can be attached multiple times! +- `rlang::env_bind` - `env_bind(globalenv(), !!!funs)` / `env_unbind(gloablenv(), names(funs))` + - Added to global env (like created function), can be overwritten + +<!-- +## EXTRA - Previous set of slides Graphical factories **useful function factories**, such as: @@ -495,7 +629,7 @@ rlang::env_unbind(globalenv(), names(funs)) ``` - +--> ## Meeting Videos