10.Rmd (12741B)
1 --- 2 engine: knitr 3 title: Function factories 4 --- 5 6 ## Learning objectives: 7 8 - Understand what a function factory is 9 - Recognise how function factories work 10 - Learn about non-obvious combination of function features 11 - Generate a family of functions from data 12 13 ```{r, message = FALSE} 14 library(rlang) 15 library(ggplot2) 16 library(scales) 17 ``` 18 19 20 ## What is a function factory? 21 22 23 A **function factory** is a function that makes (returns) functions 24 25 Factory made function are **manufactured functions**. 26 27 ```{r 10-1, echo=FALSE, fig.align='center', fig.dim="50%",fig.alt="https://epsis.com/no/operations-centers-focus-on-ways-of-working/",fig.cap="Function factory | Credits: epsis.com"} 28 knitr::include_graphics("images/10-1-factories.png") 29 ``` 30 31 32 33 ## How does a function factory work? 34 ```{r 10-2, echo=FALSE, fig.align='center', fig.dim="100%",fig.cap="How does it work? | Credits: kakaakigas.com/how-it-works/"} 35 knitr::include_graphics("images/10-2-how.jpg") 36 ``` 37 38 ```{r 10-ex1} 39 power1 <- function(exp) { 40 function(x) { 41 x ^ exp 42 } 43 } 44 45 square <- power1(2) 46 cube <- power1(3) 47 ``` 48 `power1()` is the function factory and `square()` and `cube()` are manufactured functions. 49 50 ## Important to remember 51 52 1. R has First-class functions (can be created with `function()` and `<-`) 53 54 > R functions are objects in their own right, a language property often called “first-class functions” 55 > -- [Section 6.2.3](https://adv-r.hadley.nz/functions.html?q=first%20class#first-class-functions) 56 57 2. Functions capture (enclose) environment in which they are created 58 59 ```{r 10-ex3} 60 f <- function(x) function(y) x + y 61 fn_env(f) # The function f() 62 fn_env(f()) # The function created by f() 63 ``` 64 65 3. Functions create a new environment on each run 66 ```{r 10-ex4} 67 f <- function(x) { 68 function() x + 1 69 } 70 ff <- f(1) 71 ff() 72 ff() 73 ``` 74 75 76 ## Fundamentals - Environment 77 78 - Environment when function is created defines arguments in the function 79 - Use `env_print(fun)` and `fn_env()` to explore 80 81 ```{r} 82 env_print(square) 83 fn_env(square)$exp 84 ``` 85 86 {width=50% fig-align=center} 87 88 ## Fundamentals - Forcing 89 90 - Lazy evaluation means arguments only evaluated when used 91 - "[can] lead to a real head-scratcher of a bug" 92 93 ```{r} 94 x <- 2 95 square <- power1(x) 96 x <- 3 97 square(4) 98 ``` 99 100 - *Only applies if passing object as argument* 101 - Here argument `2` evaluated when function called 102 103 ```{r} 104 square <- power1(2) 105 x <- 3 106 square(4) 107 ``` 108 109 So use `force()`! (Unless you want it to change with the `x` in the parent environment) 110 111 ## Forcing - Reiterated 112 113 Only required if the argument is **not** evaluated before the new function is created: 114 ```{r} 115 power1 <- function(exp) { 116 stopifnot(is.numeric(exp)) 117 function(x) x ^ exp 118 } 119 120 x <- 2 121 square <- power1(x) 122 x <- 3 123 square(4) 124 ``` 125 126 ## Fundamentals - Stateful functions 127 128 Because 129 130 - The enclosing environment is unique and constant, and 131 - We have `<<-` (super assignment) 132 133 We can *change* that enclosing environment and keep track of that state 134 across iterations (!) 135 136 - `<-` Assignment in *current* environment 137 - `<<-` Assignment in *parent* environment 138 139 ```{r 10-15} 140 new_counter <- function() { 141 i <- 0 142 function() { 143 i <<- i + 1 # second assignment (super assignment) 144 i 145 } 146 } 147 148 counter_one <- new_counter() 149 counter_two <- new_counter() 150 c(counter_one(), counter_one(), counter_one()) 151 c(counter_two(), counter_two(), counter_two()) 152 ``` 153 154 155 > "As soon as your function starts managing the state of multiple variables, it’s better to switch to R6" 156 157 ## Fundamentals - Garbage collection 158 159 - Because environment is attached to (enclosed by) function, temporary objects 160 don't go away. 161 162 **Cleaning up** using `rm()` inside a function: 163 ```{r 10-16} 164 f_dirty <- function(n) { 165 x <- runif(n) 166 m <- mean(x) 167 function() m 168 } 169 170 f_clean <- function(n) { 171 x <- runif(n) 172 m <- mean(x) 173 rm(x) # <---- Important part! 174 function() m 175 } 176 177 lobstr::obj_size(f_dirty(1e6)) 178 lobstr::obj_size(f_clean(1e6)) 179 180 ``` 181 182 183 ## Useful Examples - Histograms and binwidth 184 185 **Useful when...** 186 187 - You need to pass a function 188 - You don't want to have to re-write the function every time 189 (the *default* behaviour of the function should be flexible) 190 191 192 For example, these bins are not appropriate 193 ```{r} 194 #| fig-asp: 0.3 195 sd <- c(1, 5, 15) 196 n <- 100 197 df <- data.frame(x = rnorm(3 * n, sd = sd), sd = rep(sd, n)) 198 199 ggplot(df, aes(x)) + 200 geom_histogram(binwidth = 2) + 201 facet_wrap(~ sd, scales = "free_x") + 202 labs(x = NULL) 203 ``` 204 205 We could just make a function... 206 ```{r} 207 #| fig-asp: 0.3 208 binwidth_bins <- function(x) (max(x) - min(x)) / 20 209 210 ggplot(df, aes(x = x)) + 211 geom_histogram(binwidth = binwidth_bins) + 212 facet_wrap(~ sd, scales = "free_x") + 213 labs(x = NULL) 214 ``` 215 216 But if we want to change the number of bins (20) we'd have to re-write the function 217 each time. 218 219 If we use a factory, we don't have to do that. 220 ```{r} 221 #| fig-asp: 0.3 222 binwidth_bins <- function(n) { 223 force(n) 224 function(x) (max(x) - min(x)) / n 225 } 226 227 ggplot(df, aes(x = x)) + 228 geom_histogram(binwidth = binwidth_bins(20)) + 229 facet_wrap(~ sd, scales = "free_x") + 230 labs(x = NULL, title = "20 bins") 231 232 ggplot(df, aes(x = x)) + 233 geom_histogram(binwidth = binwidth_bins(5)) + 234 facet_wrap(~ sd, scales = "free_x") + 235 labs(x = NULL, title = "5 bins") 236 ``` 237 238 > Similar benefit in Box-cox example 239 240 ## Useful Examples - Wrapper 241 242 **Useful when...** 243 244 - You want to create a function that wraps a bunch of other functions 245 246 For example, `ggsave()` wraps a bunch of different graphics device functions: 247 248 ```{r} 249 # (Even more simplified) 250 plot_dev <- function(ext, dpi = 96) { 251 force(dpi) 252 253 switch( 254 ext, 255 svg = function(filename, ...) svglite::svglite(file = filename, ...), 256 png = function(...) grDevices::png(..., res = dpi, units = "in"), 257 jpg = , 258 jpeg = function(...) grDevices::jpeg(..., res = dpi, units = "in"), 259 stop("Unknown graphics extension: ", ext, call. = FALSE) 260 ) 261 } 262 ``` 263 264 Then `ggsave()` uses 265 266 ``` 267 ggsave <- function(...) { 268 dev <- plot_dev(device, filename, dpi = dpi) 269 ... 270 dev(filename = filename, width = dim[1], height = dim[2], bg = bg, ...) 271 ... 272 } 273 ``` 274 275 Otherwise, would have to do something like like a bunch of if/else statements. 276 277 ## Useful Examples - Optimizing 278 279 **Useful when...** 280 281 - Want to pass function on to `optimise()`/`optimize()` 282 - Want to perform pre-computations to speed things up 283 - Want to re-use this for other datasets 284 285 (*Skipping to final results from section*) 286 287 Here, using MLE want to to find the most likely value of lambda for a Poisson distribution 288 and this data. 289 ```{r} 290 x1 <- c(41, 30, 31, 38, 29, 24, 30, 29, 31, 38) 291 ``` 292 293 We'll create a function that creates a lambda assessment function for a given 294 data set. 295 296 ```{r} 297 ll_poisson <- function(x) { 298 n <- length(x) 299 sum_x <- sum(x) 300 c <- sum(lfactorial(x)) 301 302 function(lambda) { 303 log(lambda) * sum_x - n * lambda - c 304 } 305 } 306 ``` 307 308 We can use this on different data sets, but here use ours `x1` 309 ```{r} 310 ll <- ll_poisson(x1) 311 ll(10) # Log-probility of a lambda = 10 312 ``` 313 314 Use `optimise()` rather than trial and error 315 ```{r} 316 optimise(ll, c(0, 100), maximum = TRUE) 317 ``` 318 319 Result: Highest log-probability is -30.3, best lambda is 32.1 320 321 322 ## Function factories + functionals 323 324 Combine functionals and function factories to turn data into many functions. 325 326 ```{r} 327 names <- list( 328 square = 2, 329 cube = 3, 330 root = 1/2, 331 cuberoot = 1/3, 332 reciprocal = -1 333 ) 334 funs <- purrr::map(names, power1) 335 names(funs) 336 funs$root(64) 337 funs$square(3) 338 ``` 339 340 Avoid the prefix with 341 342 - `with()` - `with(funs, root(100))` 343 - Temporary, clear, short-term 344 - `attach()` - `attach(funs)` / `detach(funs)` 345 - Added to search path (like package function), cannot be overwritten, but can be attached multiple times! 346 - `rlang::env_bind` - `env_bind(globalenv(), !!!funs)` / `env_unbind(gloablenv(), names(funs))` 347 - Added to global env (like created function), can be overwritten 348 349 <!-- 350 ## EXTRA - Previous set of slides 351 352 Graphical factories **useful function factories**, such as: 353 354 1. Labelling with: 355 356 * formatter functions 357 358 ```{r 10-19} 359 y <- c(12345, 123456, 1234567) 360 comma_format()(y) 361 ``` 362 ```{r 10-20} 363 number_format(scale = 1e-3, suffix = " K")(y) 364 ``` 365 They are more commonly used inside a ggplot: 366 ```{r 10-21, include=FALSE} 367 df <- data.frame(x = 1, y = y) 368 a_ggplot_object <- ggplot(df, aes(x, y)) + 369 geom_point() + 370 scale_x_continuous(breaks = 1, labels = NULL) + 371 labs(x = NULL, y = NULL) 372 ``` 373 374 ```{r 10-22, eval=T} 375 a_ggplot_object + 376 scale_y_continuous( 377 labels = comma_format() 378 ) 379 ``` 380 381 2. Using binwidth in facet histograms 382 383 * binwidth_bins 384 385 ```{r} 386 binwidth_bins <- function(n) { 387 force(n) 388 389 function(x) { 390 (max(x) - min(x)) / n 391 } 392 } 393 ``` 394 395 Or use a concatenation of this typr of detecting number of bins functions: 396 397 - nclass.Sturges() 398 - nclass.scott() 399 - nclass.FD() 400 401 ```{r} 402 base_bins <- function(type) { 403 fun <- switch(type, 404 Sturges = nclass.Sturges, 405 scott = nclass.scott, 406 FD = nclass.FD, 407 stop("Unknown type", call. = FALSE) 408 ) 409 410 function(x) { 411 (max(x) - min(x)) / fun(x) 412 } 413 } 414 ``` 415 416 417 3. Internals: 418 419 * ggplot2:::plot_dev() 420 421 422 ## Non-obvious combinations 423 424 425 - The **Box-Cox** transformation. 426 - **Bootstrap** resampling. 427 - **Maximum likelihood** estimation. 428 429 430 ### Statistical factories 431 432 The **Box-Cox** transformation towards normality: 433 ```{r} 434 boxcox1 <- function(x, lambda) { 435 stopifnot(length(lambda) == 1) 436 437 if (lambda == 0) { 438 log(x) 439 } else { 440 (x ^ lambda - 1) / lambda 441 } 442 } 443 ``` 444 445 446 ```{r} 447 boxcox2 <- function(lambda) { 448 if (lambda == 0) { 449 function(x) log(x) 450 } else { 451 function(x) (x ^ lambda - 1) / lambda 452 } 453 } 454 455 stat_boxcox <- function(lambda) { 456 stat_function(aes(colour = lambda), fun = boxcox2(lambda), size = 1) 457 } 458 459 plot1 <- ggplot(data.frame(x = c(0, 5)), aes(x)) + 460 lapply(c(0.5, 1, 1.5), stat_boxcox) + 461 scale_colour_viridis_c(limits = c(0, 1.5)) 462 463 # visually, log() does seem to make sense as the transformation 464 # for lambda = 0; as values get smaller and smaller, the function 465 # gets close and closer to a log transformation 466 plot2 <- ggplot(data.frame(x = c(0.01, 1)), aes(x)) + 467 lapply(c(0.5, 0.25, 0.1, 0), stat_boxcox) + 468 scale_colour_viridis_c(limits = c(0, 1.5)) 469 library(patchwork) 470 plot1+plot2 471 ``` 472 473 **Bootstrap generators** 474 475 476 ```{r} 477 boot_permute <- function(df, var) { 478 n <- nrow(df) 479 force(var) 480 481 function() { 482 col <- df[[var]] 483 col[sample(n, replace = TRUE)] 484 } 485 } 486 487 boot_mtcars1 <- boot_permute(mtcars, "mpg") 488 head(boot_mtcars1()) 489 #> [1] 16.4 22.8 22.8 22.8 16.4 19.2 490 head(boot_mtcars1()) 491 #> [1] 17.8 18.7 30.4 30.4 16.4 21.0 492 ``` 493 ```{r} 494 boot_model <- function(df, formula) { 495 mod <- lm(formula, data = df) 496 fitted <- unname(fitted(mod)) 497 resid <- unname(resid(mod)) 498 rm(mod) 499 500 function() { 501 fitted + sample(resid) 502 } 503 } 504 505 boot_mtcars2 <- boot_model(mtcars, mpg ~ wt) 506 head(boot_mtcars2()) 507 #> [1] 25.0 24.0 21.7 19.2 24.9 16.0 508 head(boot_mtcars2()) 509 #> [1] 27.4 21.0 20.3 19.4 16.3 21.3 510 ``` 511 512 **Maximum likelihood estimation** 513 514 $$P(\lambda,x)=\prod_{i=1}^{n}\frac{\lambda^{x_i}e^{-\lambda}}{x_i!}$$ 515 ```{r} 516 lprob_poisson <- function(lambda, x) { 517 n <- length(x) 518 (log(lambda) * sum(x)) - (n * lambda) - sum(lfactorial(x)) 519 } 520 ``` 521 522 ```{r} 523 x1 <- c(41, 30, 31, 38, 29, 24, 30, 29, 31, 38) 524 ``` 525 526 ```{r} 527 lprob_poisson(10, x1) 528 #> [1] -184 529 lprob_poisson(20, x1) 530 #> [1] -61.1 531 lprob_poisson(30, x1) 532 #> [1] -31 533 ``` 534 535 ```{r} 536 ll_poisson1 <- function(x) { 537 n <- length(x) 538 539 function(lambda) { 540 log(lambda) * sum(x) - n * lambda - sum(lfactorial(x)) 541 } 542 } 543 ``` 544 ```{r} 545 ll_poisson2 <- function(x) { 546 n <- length(x) 547 sum_x <- sum(x) 548 c <- sum(lfactorial(x)) 549 550 function(lambda) { 551 log(lambda) * sum_x - n * lambda - c 552 } 553 } 554 ``` 555 556 ```{r} 557 ll1 <- ll_poisson2(x1) 558 559 ll1(10) 560 #> [1] -184 561 ll1(20) 562 #> [1] -61.1 563 ll1(30) 564 #> [1] -31 565 ``` 566 ```{r} 567 optimise(ll1, c(0, 100), maximum = TRUE) 568 #> $maximum 569 #> [1] 32.1 570 #> 571 #> $objective 572 #> [1] -30.3 573 ``` 574 ```{r} 575 optimise(lprob_poisson, c(0, 100), x = x1, maximum = TRUE) 576 #> $maximum 577 #> [1] 32.1 578 #> 579 #> $objective 580 #> [1] -30.3 581 ``` 582 583 ## Function factory applications 584 585 586 Combine functionals and function factories to turn data into many functions. 587 588 ### Function factories + functionals 589 ```{r} 590 names <- list( 591 square = 2, 592 cube = 3, 593 root = 1/2, 594 cuberoot = 1/3, 595 reciprocal = -1 596 ) 597 funs <- purrr::map(names, power1) 598 599 funs$root(64) 600 #> [1] 8 601 funs$root 602 #> function(x) { 603 #> x ^ exp 604 #> } 605 #> <bytecode: 0x7fe85512a410> 606 #> <environment: 0x7fe85b21f190> 607 ``` 608 ```{r} 609 with(funs, root(100)) 610 #> [1] 10 611 ``` 612 613 ```{r} 614 attach(funs) 615 #> The following objects are masked _by_ .GlobalEnv: 616 #> 617 #> cube, square 618 root(100) 619 #> [1] 10 620 detach(funs) 621 ``` 622 623 624 ```{r} 625 rlang::env_bind(globalenv(), !!!funs) 626 root(100) 627 #> [1] 10 628 ``` 629 630 ```{r} 631 rlang::env_unbind(globalenv(), names(funs)) 632 ``` 633 634 635 -->