10.Rmd - bookclub-advr - DSLC Advanced R Book Club

10.Rmd (12741B)
      1 ---
      2 engine: knitr
      3 title: Function factories
      4 ---
      5 
      6 ## Learning objectives:
      7 
      8 - Understand what a function factory is
      9 - Recognise how function factories work
     10 - Learn about non-obvious combination of function features
     11 - Generate a family of functions from data
     12 
     13 ```{r, message = FALSE}
     14 library(rlang)
     15 library(ggplot2)
     16 library(scales)
     17 ```
     18 
     19 
     20 ## What is a function factory?
     21 
     22 
     23 A **function factory** is a function that makes (returns) functions
     24 
     25 Factory made function are **manufactured functions**.
     26 
     27 ```{r 10-1, echo=FALSE, fig.align='center', fig.dim="50%",fig.alt="https://epsis.com/no/operations-centers-focus-on-ways-of-working/",fig.cap="Function factory | Credits: epsis.com"}
     28 knitr::include_graphics("images/10-1-factories.png")
     29 ```
     30 
     31 
     32 
     33 ## How does a function factory work?
     34 ```{r 10-2, echo=FALSE, fig.align='center', fig.dim="100%",fig.cap="How does it work? | Credits: kakaakigas.com/how-it-works/"}
     35 knitr::include_graphics("images/10-2-how.jpg")
     36 ```
     37 
     38 ```{r 10-ex1}
     39 power1 <- function(exp) {
     40   function(x) {
     41     x ^ exp
     42   }
     43 }
     44 
     45 square <- power1(2)
     46 cube <- power1(3)
     47 ```
     48 `power1()` is the function factory and `square()` and `cube()` are manufactured functions.
     49 
     50 ## Important to remember
     51 
     52 1. R has First-class functions (can be created with `function()` and `<-`)
     53 
     54 > R functions are objects in their own right, a language property often called “first-class functions”  
     55 > -- [Section 6.2.3](https://adv-r.hadley.nz/functions.html?q=first%20class#first-class-functions)
     56 
     57 2. Functions capture (enclose) environment in which they are created
     58 
     59 ```{r 10-ex3}
     60 f <- function(x) function(y) x + y
     61 fn_env(f)    # The function f()
     62 fn_env(f())  # The function created by f()
     63 ```
     64 
     65 3. Functions create a new environment on each run
     66 ```{r 10-ex4}
     67 f <- function(x) {
     68   function() x + 1
     69 }
     70 ff <- f(1)
     71 ff()
     72 ff()
     73 ```
     74 
     75 
     76 ## Fundamentals - Environment
     77 
     78 - Environment when function is created defines arguments in the function
     79 - Use `env_print(fun)` and `fn_env()` to explore
     80 
     81 ```{r}
     82 env_print(square)
     83 fn_env(square)$exp
     84 ```
     85 
     86 ![Blue indicates environment, arrows bindings](images/10-3-procedure.png){width=50% fig-align=center}
     87 
     88 ## Fundamentals - Forcing
     89 
     90 - Lazy evaluation means arguments only evaluated when used
     91 - "[can] lead to a real head-scratcher of a bug"
     92 
     93 ```{r}
     94 x <- 2
     95 square <- power1(x)
     96 x <- 3
     97 square(4)
     98 ```
     99 
    100 - *Only applies if passing object as argument*
    101 - Here argument `2` evaluated when function called
    102 
    103 ```{r}
    104 square <- power1(2)
    105 x <- 3
    106 square(4)
    107 ```
    108 
    109 So use `force()`! (Unless you want it to change with the `x` in the parent environment)
    110 
    111 ## Forcing - Reiterated
    112 
    113 Only required if the argument is **not** evaluated before the new function is created:
    114 ```{r}
    115 power1 <- function(exp) {
    116   stopifnot(is.numeric(exp))
    117   function(x) x ^ exp
    118 }
    119 
    120 x <- 2
    121 square <- power1(x)
    122 x <- 3
    123 square(4)
    124 ```
    125 
    126 ## Fundamentals - Stateful functions
    127 
    128 Because
    129 
    130 - The enclosing environment is unique and constant, and
    131 - We have `<<-` (super assignment)
    132 
    133 We can *change* that enclosing environment and keep track of that state
    134 across iterations (!)
    135 
    136 - `<-` Assignment in *current* environment
    137 - `<<-` Assignment in *parent* environment
    138 
    139 ```{r 10-15}
    140 new_counter <- function() {
    141   i <- 0        
    142   function() {
    143     i <<- i + 1 # second assignment (super assignment)
    144     i
    145   }
    146 }
    147 
    148 counter_one <- new_counter()
    149 counter_two <- new_counter()
    150 c(counter_one(), counter_one(), counter_one())
    151 c(counter_two(), counter_two(), counter_two())
    152 ```
    153 
    154 
    155 > "As soon as your function starts managing the state of multiple variables, it’s better to switch to R6"
    156 
    157 ## Fundamentals - Garbage collection
    158 
    159 - Because environment is attached to (enclosed by) function, temporary objects
    160 don't go away.
    161 
    162 **Cleaning up** using `rm()` inside a function:
    163 ```{r 10-16}
    164 f_dirty <- function(n) {
    165   x <- runif(n)
    166   m <- mean(x)
    167   function() m
    168 }
    169 
    170 f_clean <- function(n) {
    171   x <- runif(n)
    172   m <- mean(x)
    173   rm(x)            # <---- Important part!
    174   function() m
    175 }
    176 
    177 lobstr::obj_size(f_dirty(1e6))
    178 lobstr::obj_size(f_clean(1e6))
    179 
    180 ```
    181 
    182 
    183 ## Useful Examples - Histograms and binwidth
    184 
    185 **Useful when...**  
    186 
    187 - You need to pass a function  
    188 - You don't want to have to re-write the function every time 
    189   (the *default* behaviour of the function should be flexible)
    190 
    191 
    192 For example, these bins are not appropriate
    193 ```{r}
    194 #| fig-asp: 0.3
    195 sd <- c(1, 5, 15)
    196 n <- 100
    197 df <- data.frame(x = rnorm(3 * n, sd = sd), sd = rep(sd, n))
    198 
    199 ggplot(df, aes(x)) + 
    200   geom_histogram(binwidth = 2) + 
    201   facet_wrap(~ sd, scales = "free_x") + 
    202   labs(x = NULL)
    203 ```
    204 
    205 We could just make a function...
    206 ```{r}
    207 #| fig-asp: 0.3
    208 binwidth_bins <- function(x) (max(x) - min(x)) / 20
    209 
    210 ggplot(df, aes(x = x)) + 
    211   geom_histogram(binwidth = binwidth_bins) + 
    212   facet_wrap(~ sd, scales = "free_x") + 
    213   labs(x = NULL)
    214 ```
    215 
    216 But if we want to change the number of bins (20) we'd have to re-write the function
    217 each time.
    218 
    219 If we use a factory, we don't have to do that.
    220 ```{r}
    221 #| fig-asp: 0.3
    222 binwidth_bins <- function(n) {
    223   force(n)
    224   function(x) (max(x) - min(x)) / n
    225 }
    226 
    227 ggplot(df, aes(x = x)) + 
    228   geom_histogram(binwidth = binwidth_bins(20)) + 
    229   facet_wrap(~ sd, scales = "free_x") + 
    230   labs(x = NULL, title = "20 bins")
    231 
    232 ggplot(df, aes(x = x)) + 
    233   geom_histogram(binwidth = binwidth_bins(5)) + 
    234   facet_wrap(~ sd, scales = "free_x") + 
    235   labs(x = NULL, title = "5 bins")
    236 ```
    237 
    238 > Similar benefit in Box-cox example
    239 
    240 ## Useful Examples - Wrapper
    241 
    242 **Useful when...**
    243 
    244 - You want to create a function that wraps a bunch of other functions
    245 
    246 For example, `ggsave()` wraps a bunch of different graphics device functions: 
    247 
    248 ```{r}
    249 # (Even more simplified)
    250 plot_dev <- function(ext, dpi = 96) {
    251   force(dpi)
    252   
    253   switch(
    254     ext,
    255     svg = function(filename, ...) svglite::svglite(file = filename, ...),
    256     png = function(...) grDevices::png(..., res = dpi, units = "in"),
    257     jpg = ,
    258     jpeg = function(...) grDevices::jpeg(..., res = dpi, units = "in"),
    259     stop("Unknown graphics extension: ", ext, call. = FALSE)
    260   )
    261 }
    262 ```
    263 
    264 Then `ggsave()` uses
    265 
    266 ```
    267 ggsave <- function(...) {
    268   dev <- plot_dev(device, filename, dpi = dpi)
    269   ...
    270   dev(filename = filename, width = dim[1], height = dim[2], bg = bg, ...)
    271   ...
    272 }
    273 ```
    274 
    275 Otherwise, would have to do something like like a bunch of if/else statements.
    276 
    277 ## Useful Examples - Optimizing
    278 
    279 **Useful when...**
    280 
    281 - Want to pass function on to `optimise()`/`optimize()`
    282 - Want to perform pre-computations to speed things up
    283 - Want to re-use this for other datasets
    284 
    285 (*Skipping to final results from section*)
    286 
    287 Here, using MLE want to to find the most likely value of lambda for a Poisson distribution
    288 and this data.
    289 ```{r}
    290 x1 <- c(41, 30, 31, 38, 29, 24, 30, 29, 31, 38)
    291 ```
    292 
    293 We'll create a function that creates a lambda assessment function for a given 
    294 data set.
    295 
    296 ```{r}
    297 ll_poisson <- function(x) {
    298   n <- length(x)
    299   sum_x <- sum(x)
    300   c <- sum(lfactorial(x))
    301 
    302   function(lambda) {
    303     log(lambda) * sum_x - n * lambda - c
    304   }
    305 }
    306 ```
    307 
    308 We can use this on different data sets, but here use ours `x1`
    309 ```{r}
    310 ll <- ll_poisson(x1)
    311 ll(10)  # Log-probility of a lambda = 10
    312 ```
    313 
    314 Use `optimise()` rather than trial and error
    315 ```{r}
    316 optimise(ll, c(0, 100), maximum = TRUE)
    317 ```
    318 
    319 Result: Highest log-probability is -30.3, best lambda is 32.1
    320 
    321 
    322 ## Function factories + functionals
    323 
    324 Combine functionals and function factories to turn data into many functions.
    325 
    326 ```{r}
    327 names <- list(
    328   square = 2, 
    329   cube = 3, 
    330   root = 1/2, 
    331   cuberoot = 1/3, 
    332   reciprocal = -1
    333 )
    334 funs <- purrr::map(names, power1)
    335 names(funs)
    336 funs$root(64)
    337 funs$square(3)
    338 ```
    339 
    340 Avoid the prefix with
    341 
    342 - `with()` - `with(funs, root(100))`
    343   - Temporary, clear, short-term
    344 - `attach()` - `attach(funs)` / `detach(funs)`
    345   - Added to search path (like package function), cannot be overwritten, but can be attached multiple times!
    346 - `rlang::env_bind` - `env_bind(globalenv(), !!!funs)` / `env_unbind(gloablenv(), names(funs))`
    347   - Added to global env (like created function), can be overwritten
    348 
    349 <!--
    350 ## EXTRA - Previous set of slides
    351 
    352 Graphical factories **useful function factories**, such as:
    353 
    354 1.  Labelling with:
    355 
    356     * formatter functions
    357     
    358 ```{r 10-19}
    359 y <- c(12345, 123456, 1234567)
    360 comma_format()(y)
    361 ```
    362 ```{r 10-20}
    363 number_format(scale = 1e-3, suffix = " K")(y)
    364 ```
    365 They are more commonly used inside a ggplot:
    366 ```{r 10-21, include=FALSE}
    367 df <- data.frame(x = 1, y = y)
    368 a_ggplot_object <- ggplot(df, aes(x, y)) + 
    369   geom_point() + 
    370   scale_x_continuous(breaks = 1, labels = NULL) +
    371   labs(x = NULL, y = NULL)
    372 ```
    373 
    374 ```{r 10-22, eval=T}
    375 a_ggplot_object + 
    376   scale_y_continuous(
    377   labels = comma_format()
    378 )
    379 ```
    380 
    381 2.  Using binwidth in facet histograms
    382 
    383       * binwidth_bins
    384       
    385 ```{r}
    386 binwidth_bins <- function(n) {
    387   force(n)
    388   
    389   function(x) {
    390     (max(x) - min(x)) / n
    391   }
    392 }
    393 ```
    394    
    395 Or use a concatenation of this typr of detecting number of bins functions:
    396 
    397       - nclass.Sturges()
    398       - nclass.scott()
    399       - nclass.FD()
    400       
    401 ```{r}
    402 base_bins <- function(type) {
    403   fun <- switch(type,
    404     Sturges = nclass.Sturges,
    405     scott = nclass.scott,
    406     FD = nclass.FD,
    407     stop("Unknown type", call. = FALSE)
    408   )
    409   
    410   function(x) {
    411     (max(x) - min(x)) / fun(x)
    412   }
    413 }
    414 ```
    415       
    416 
    417 3.  Internals:
    418 
    419       * ggplot2:::plot_dev()
    420 
    421 
    422 ## Non-obvious combinations
    423 
    424 
    425 - The **Box-Cox** transformation.
    426 - **Bootstrap** resampling.
    427 - **Maximum likelihood** estimation.
    428 
    429 
    430 ### Statistical factories
    431 
    432 The **Box-Cox** transformation towards normality:
    433 ```{r}
    434 boxcox1 <- function(x, lambda) {
    435   stopifnot(length(lambda) == 1)
    436   
    437   if (lambda == 0) {
    438     log(x)
    439   } else {
    440     (x ^ lambda - 1) / lambda
    441   }
    442 }
    443 ```
    444 
    445 
    446 ```{r}
    447 boxcox2 <- function(lambda) {
    448   if (lambda == 0) {
    449     function(x) log(x)
    450   } else {
    451     function(x) (x ^ lambda - 1) / lambda
    452   }
    453 }
    454 
    455 stat_boxcox <- function(lambda) {
    456   stat_function(aes(colour = lambda), fun = boxcox2(lambda), size = 1)
    457 }
    458 
    459 plot1 <- ggplot(data.frame(x = c(0, 5)), aes(x)) + 
    460   lapply(c(0.5, 1, 1.5), stat_boxcox) + 
    461   scale_colour_viridis_c(limits = c(0, 1.5))
    462 
    463 # visually, log() does seem to make sense as the transformation
    464 # for lambda = 0; as values get smaller and smaller, the function
    465 # gets close and closer to a log transformation
    466 plot2 <- ggplot(data.frame(x = c(0.01, 1)), aes(x)) + 
    467   lapply(c(0.5, 0.25, 0.1, 0), stat_boxcox) + 
    468   scale_colour_viridis_c(limits = c(0, 1.5))
    469 library(patchwork)
    470 plot1+plot2
    471 ```
    472 
    473 **Bootstrap generators**
    474 
    475 
    476 ```{r}
    477 boot_permute <- function(df, var) {
    478   n <- nrow(df)
    479   force(var)
    480   
    481   function() {
    482     col <- df[[var]]
    483     col[sample(n, replace = TRUE)]
    484   }
    485 }
    486 
    487 boot_mtcars1 <- boot_permute(mtcars, "mpg")
    488 head(boot_mtcars1())
    489 #> [1] 16.4 22.8 22.8 22.8 16.4 19.2
    490 head(boot_mtcars1())
    491 #> [1] 17.8 18.7 30.4 30.4 16.4 21.0
    492 ```
    493 ```{r}
    494 boot_model <- function(df, formula) {
    495   mod <- lm(formula, data = df)
    496   fitted <- unname(fitted(mod))
    497   resid <- unname(resid(mod))
    498   rm(mod)
    499 
    500   function() {
    501     fitted + sample(resid)
    502   }
    503 } 
    504 
    505 boot_mtcars2 <- boot_model(mtcars, mpg ~ wt)
    506 head(boot_mtcars2())
    507 #> [1] 25.0 24.0 21.7 19.2 24.9 16.0
    508 head(boot_mtcars2())
    509 #> [1] 27.4 21.0 20.3 19.4 16.3 21.3
    510 ```
    511 
    512 **Maximum likelihood estimation**
    513 
    514 $$P(\lambda,x)=\prod_{i=1}^{n}\frac{\lambda^{x_i}e^{-\lambda}}{x_i!}$$
    515 ```{r}
    516 lprob_poisson <- function(lambda, x) {
    517   n <- length(x)
    518   (log(lambda) * sum(x)) - (n * lambda) - sum(lfactorial(x))
    519 }
    520 ```
    521 
    522 ```{r}
    523 x1 <- c(41, 30, 31, 38, 29, 24, 30, 29, 31, 38)
    524 ```
    525 
    526 ```{r}
    527 lprob_poisson(10, x1)
    528 #> [1] -184
    529 lprob_poisson(20, x1)
    530 #> [1] -61.1
    531 lprob_poisson(30, x1)
    532 #> [1] -31
    533 ```
    534 
    535 ```{r}
    536 ll_poisson1 <- function(x) {
    537   n <- length(x)
    538 
    539   function(lambda) {
    540     log(lambda) * sum(x) - n * lambda - sum(lfactorial(x))
    541   }
    542 }
    543 ```
    544 ```{r}
    545 ll_poisson2 <- function(x) {
    546   n <- length(x)
    547   sum_x <- sum(x)
    548   c <- sum(lfactorial(x))
    549 
    550   function(lambda) {
    551     log(lambda) * sum_x - n * lambda - c
    552   }
    553 }
    554 ```
    555 
    556 ```{r}
    557 ll1 <- ll_poisson2(x1)
    558 
    559 ll1(10)
    560 #> [1] -184
    561 ll1(20)
    562 #> [1] -61.1
    563 ll1(30)
    564 #> [1] -31
    565 ```
    566 ```{r}
    567 optimise(ll1, c(0, 100), maximum = TRUE)
    568 #> $maximum
    569 #> [1] 32.1
    570 #> 
    571 #> $objective
    572 #> [1] -30.3
    573 ```
    574 ```{r}
    575 optimise(lprob_poisson, c(0, 100), x = x1, maximum = TRUE)
    576 #> $maximum
    577 #> [1] 32.1
    578 #> 
    579 #> $objective
    580 #> [1] -30.3
    581 ```
    582 
    583 ## Function factory applications
    584 
    585 
    586 Combine functionals and function factories to turn data into many functions.
    587 
    588 ### Function factories + functionals
    589 ```{r}
    590 names <- list(
    591   square = 2, 
    592   cube = 3, 
    593   root = 1/2, 
    594   cuberoot = 1/3, 
    595   reciprocal = -1
    596 )
    597 funs <- purrr::map(names, power1)
    598 
    599 funs$root(64)
    600 #> [1] 8
    601 funs$root
    602 #> function(x) {
    603 #>     x ^ exp
    604 #>   }
    605 #> <bytecode: 0x7fe85512a410>
    606 #> <environment: 0x7fe85b21f190>
    607 ```
    608 ```{r}
    609 with(funs, root(100))
    610 #> [1] 10
    611 ```
    612 
    613 ```{r}
    614 attach(funs)
    615 #> The following objects are masked _by_ .GlobalEnv:
    616 #> 
    617 #>     cube, square
    618 root(100)
    619 #> [1] 10
    620 detach(funs)
    621 ```
    622 
    623 
    624 ```{r}
    625 rlang::env_bind(globalenv(), !!!funs)
    626 root(100)
    627 #> [1] 10
    628 ```
    629 
    630 ```{r}
    631 rlang::env_unbind(globalenv(), names(funs))
    632 ```
    633 
    634 
    635 -->
	bookclub-advr DSLC Advanced R Book Club
	git clone https://git.eamoncaddigan.net/bookclub-advr.git
	Log \| Files \| Refs \| README \| LICENSE