bookclub-advr

DSLC Advanced R Book Club
git clone https://git.eamoncaddigan.net/bookclub-advr.git
Log | Files | Refs | README | LICENSE

06.Rmd (11290B)


      1 ---
      2 engine: knitr
      3 title: Functions
      4 ---
      5 
      6 ## Learning objectives:
      7 
      8 - How to make functions in R
      9 - What are the parts of a function
     10 - Nested functions
     11 
     12 ## How to make a simple function in R
     13 
     14 **Function components**
     15 
     16 Functions have three parts, `formals()`, `body()`, and `environment()`.
     17 
     18 ```{r 06-c01, echo=FALSE, fig.align='center', fig.dim="100%"}
     19 DiagrammeR::mermaid("
     20    graph LR
     21      A{formals}-->B(body)
     22      B-->C(environment)
     23 style A fill:#bbf,stroke:#f66,stroke-width:2px,color:#fff,stroke-dasharray: 5 5
     24 ",height = '100%', width = '100%')
     25 ```
     26 
     27 --------------------------------------------------------------------------------
     28 
     29 **Example: [coffee rations](https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-07-07/coffee_ratings.csv)**
     30 
     31 ```{r, echo = FALSE, message = FALSE}
     32 coffee_ratings <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-07-07/coffee_ratings.csv')
     33 library(tidyverse)
     34 ```
     35 
     36 ```{r, function_example}
     37 avg_points <- function(species){
     38   # this function is for calculating the mean
     39   avg <-  coffee_ratings |>
     40     filter(species == species) |>
     41     summarise(mean = mean(total_cup_points))
     42   return(avg)
     43 }
     44 ```
     45 
     46 
     47 ```{r}
     48 avg_points("Arabica")
     49 ```
     50 
     51 ---------------------------------------------------------------------------------
     52 
     53 ```{r}
     54 formals(avg_points)
     55 ```
     56 
     57 ```{r}
     58 body(avg_points)
     59 ```
     60 
     61 ```{r}
     62 environment(avg_points)
     63 ```
     64 
     65 
     66 -----------------------------------------------------------------------------------
     67 
     68 Functions uses attributes, one attribute used by base R is `srcref`, short for **source reference**. 
     69 It points to the source code used to create the function. 
     70 It contains code comments and other formatting.
     71 
     72 ```{r}
     73 
     74 #| eval=FALSE
     75 my_f <- function(){
     76   #test
     77   return(1)
     78 }
     79 
     80 attr(my_f, "srcref")
     81 ```
     82 
     83 ## Primitive functions
     84 
     85 Are the core function in base R, such as `sum()`
     86 ```{r}
     87 sum
     88 ```
     89 
     90 Type of primitives:
     91 
     92 - builtin 
     93 - special
     94 
     95 ```{r}
     96 typeof(sum)
     97 ```
     98 
     99 
    100 These core functions have components to NULL.
    101 
    102 ## Anonymous functions
    103 
    104 If you don't provide a name to a function
    105 
    106 ```{r}
    107 lapply(mtcars |> select(mpg, cyl), function(x) length(unique(x)))
    108 ```
    109 
    110 
    111 ```{r}
    112 vector_len <- function(x) {
    113   length(unique(x))
    114 }
    115 ```
    116 
    117 ```{r}
    118 lapply(mtcars |> select(mpg, cyl), vector_len)
    119 ```
    120 
    121 -------------------------------------------------------------------------------------------
    122 
    123 **Invoking a function**
    124 
    125 ```{r}
    126 args <- unique(coffee_ratings$species)[[1]] |> as.list()
    127 
    128 do.call(avg_points, args)
    129 ```
    130 
    131 `do.call` is used a lot with apply's family 
    132 
    133 ## Function composition
    134 
    135 ```{r nestedfunctioncalls}
    136 x <- runif(100)
    137 # Nested function calls
    138 square <- function(x) x^2
    139 deviation <- function(x) x - mean(x)
    140 sqrt(mean(square(deviation(x))))
    141 ```
    142 
    143 ```{r intermediateresults}
    144 # intermediate result
    145 out <- deviation(x)
    146 out <- square(out)
    147 out <- mean(out)
    148 out <- sqrt(out)
    149 out
    150 ```
    151 
    152 
    153 ```{r pipebinaryoperator}
    154 # Pipe workflow
    155 x |> 
    156   deviation() |>
    157   square() |>
    158   mean() |>
    159   sqrt()
    160 ```
    161 
    162 ## More about functions insights
    163 
    164 ### Lexical scoping
    165 
    166 **Rules**
    167 
    168 - Name masking
    169 - Functions versus variables
    170 - A fresh start
    171 - Dynamic lookup
    172 
    173 --------------------------------------------------------------------------------
    174 
    175 #### Name masking
    176 
    177 Save us from our polluted global environment! 
    178 
    179 ```{r namemasking}
    180 bob <- 1; alice <- 2
    181 
    182 g02 <- function() {bob <- "test"; alice <- "bacon"; c(bob, alice)}
    183 
    184 g03 <- function() {bob <- "test"; c(bob, alice)}
    185 
    186 g02()
    187 g03()
    188 ```
    189 
    190 Note: also applied for functions names
    191 
    192 ------------------------------------------------------------------------------------
    193 
    194 #### Functions versus variables
    195 
    196 ::: {.callout-caution}
    197 Do not be too smart!
    198 :::
    199 
    200 ```{r donotdothat}
    201 g09 <- function(x) x + 100
    202 g10 <- function() {
    203   g09 <- 10
    204   g09(g09)
    205 }
    206 g10()
    207 ```
    208 
    209 ------------------------------------------------------------------------------------
    210 
    211 #### Fresh start:
    212 
    213 ```{r}
    214 g11 <- function() {
    215   if (!exists("a")) {
    216     print("I am here")
    217     a <- 1
    218   } else {
    219     print("Dead branch!")
    220     a <- a + 1
    221   }
    222   a
    223 }
    224 
    225 g11()
    226 g11()
    227 ```
    228 
    229 -------------------------------------------------------------------------------------
    230 
    231 #### Dynamic scoping:
    232 
    233 **"When"**: R looks for the value **when** the function is run 
    234 
    235 This function 
    236 ```{r}
    237 g12 <- function() x + 1
    238 x <- 15
    239 g12()
    240 x <- 20
    241 g12()
    242 ```
    243 
    244 ------------------------------------------------------------------------------------
    245 
    246 #### Debugging
    247 
    248 ```{r}
    249 codetools::findGlobals(g12)
    250 ```
    251 
    252 You can change the function’s environment to an environment which contains nothing:
    253 ```{r}
    254 # environment(g12) <- emptyenv()
    255 # g12()
    256 # Error in x + 1 : could not find function "+"
    257 ```
    258 
    259 ------------------------------------------------------------------------------------
    260 
    261 ## Lazy Evaluation:
    262 
    263 
    264 ::: {.callout-important}
    265 function arguments are only evaluated when accessed
    266 :::
    267 
    268 ------------------------------------------------------------------------------------
    269 
    270 #### Promises 
    271 
    272 See you at chapter 20! 
    273 
    274 Promises: multiple definitions? 
    275 
    276 - [Futurverse](https://journal.r-project.org/archive/2021/RJ-2021-048/RJ-2021-048.pdf)
    277 
    278 - [Promises](https://rstudio.github.io/promises/) in Shiny 
    279 
    280 ----------------------------------------------------------------------------------
    281 
    282 #### Default arguments
    283 
    284 Arguments defined by other arguments or variables in Functions
    285 
    286 ::: {.callout-caution}
    287 Not recommended by Hadley!
    288 :::
    289 
    290 ----------------------------------------------------------------------------------
    291 
    292 #### Missing arguments 
    293 
    294 `missing()` checks if an argument is missing (TRUE/FALSE) then you can "branch" according to it
    295 
    296 ::: {.callout-caution}
    297 Do not be like `sample()` ! 
    298 :::
    299 
    300 ------------------------------------------------------------------------------------
    301 
    302 ### ... (dot-dot-dot)
    303 
    304 **Example**
    305 
    306 ```{r}
    307 i01 <- function(y, z) {
    308   list(y = y, z = z)
    309 }
    310 i02 <- function(x, ...) {
    311   i01(...)
    312 }
    313 str(i02(x = 1, y = 2, z = 3))
    314 ```
    315 
    316 Used a lot with higher ordered functions!  
    317 
    318 -----------------------------------------------------------------------------------
    319 
    320 ### Exiting a function
    321 
    322 1. Implicit or explicit returns
    323 
    324 2. Invisibility (`<-` most famous function that returns an invisible value)
    325 
    326 3. `stop()` to stop a function with an error.
    327 
    328 4. Exit handlers (`on.exit()`)
    329 
    330 ------------------------------------------------------------------------------------
    331 
    332 ### Function forms
    333 
    334 >Everything that exists is an object.
    335 Everything that happens is a function call.
    336 — John Chambers
    337 
    338 
    339 ```{r echo=FALSE}
    340 knitr::include_graphics("images/06_forms.png")
    341 ```
    342 
    343 ---
    344 
    345 
    346 ## Case Study: SIR model function
    347 
    348 This is an interesting example taken from a course on Coursera: [Infectious disease modelling-ICL](https://www.coursera.org/specializations/infectious-disease-modelling)
    349 
    350 The purpose of this example is to show how to make a model passing through making a function.
    351 
    352 First we need to load some useful libraries:
    353 
    354 ```{r message=FALSE, warning=FALSE, paged.print=FALSE}
    355 library(deSolve)
    356 library(reshape2)
    357 ```
    358 
    359 
    360 Then set the model inputs:
    361 
    362 - population size (N)
    363 - number of susceptable (S)
    364 - infected (I)
    365 - recovered (R)
    366 
    367 And add the model parameters:
    368 
    369 - infection rate ($\beta$)
    370 - recovery rate ($\gamma$)
    371 
    372 
    373 ```{r}
    374 N<- 100000                  # population
    375 
    376 state_values<- c(S = N -1,   # susceptible
    377                  I = 1,      # infected
    378                  R = 0)      # recovered
    379 
    380 parameters<- c(beta = 1/2,  # infection rate days^-1
    381                gamma = 1/4) # recovery rate days^-1
    382 ```
    383 
    384 
    385 Then we set the **time** as an important factor, which defines the length of time we are looking at this model run. It is intended as the time range in which the infections spread out, let’s say that we are aiming to investigate an infection period of 100 days.
    386 
    387 ```{r}
    388 times<- seq(0, 100, by = 1)
    389 ```
    390 
    391 Finally, we set up the **SIR model**, the susceptable, infected and recovered model. How do we do that is passing the paramenters through a function of the time, and state.
    392 
    393 Within the model function we calculate one more paramenter, the **force of infection**: $\lambda$ 
    394 
    395 ```{r}
    396 sir_model<- function(time, state, parameters){
    397   with(as.list(c(state, parameters)),{
    398     N<- S + I + R
    399     lambda = beta * I/N    # force of infection
    400     dS<- - lambda * S 
    401     dI<- lambda * S - gamma * I
    402     dR<- gamma * I
    403     return(list(c(dS,dI,dR)))
    404   })
    405 }
    406 ```
    407 
    408 
    409 Once we have our **SIR model** function ready, we can calculate the **output** of the model, with the help of the function `ode()` from {deSolve} package.
    410 
    411 ```{r}
    412 output<- as.data.frame(ode(y = state_values,
    413                            times = times,
    414                            func = sir_model,
    415                            parms = parameters))
    416 output %>% head
    417 ```
    418 
    419 In addition to our builtin SIR model function we can have a look at:
    420 
    421 ```{r eval=FALSE, include=T}
    422 ?deSolve::ode()
    423 ```
    424 
    425 It solves **Ordinary Differential Equations**. 
    426 
    427 ```{r}
    428 deSolve:::ode
    429 ```
    430 
    431 ```{r}
    432 methods("ode")
    433 ```
    434 
    435 
    436 ---
    437 
    438 With the help of the {reshape2} package we use the function `melt()` to reshape the output:
    439 
    440 ```{r}
    441 melt(output,id="time") %>% head
    442 ```
    443 
    444 
    445 The same as usign `pivot_longer()` function.
    446 ```{r}
    447 output%>%
    448   pivot_longer(cols = c("S","I","R"),
    449                names_to="variable",
    450                values_to="values") %>%
    451   arrange(desc(variable)) %>%
    452   head
    453 ```
    454 
    455 ---
    456 
    457 
    458 Before to proceed with the visualization of the SIR model output we do a bit of investigations.
    459 
    460 **What if we want to see how `melt()` function works?**
    461 
    462 **What instruments we can use to see inside the function and understand how it works?**
    463 
    464 
    465 Using just the function name **melt** or `structure()` function with *melt* as an argument, we obtain the same output. To select just the argument of the function we can do `args(melt)`
    466 
    467 ```{r}
    468 reshape2:::melt
    469 ```
    470 
    471 
    472 ```{r}
    473 body(melt)
    474 ```
    475 
    476 ```{r}
    477 formals(melt)
    478 ```
    479 
    480 ```{r}
    481 environment(melt)
    482 ```
    483 
    484 
    485 ```{r}
    486 typeof(melt)
    487 ```
    488 
    489 > "R functions simulate a closure by keeping an explicit reference to the environment that was active when the function was defined."
    490 
    491 
    492 ref: [closures](https://www.r-bloggers.com/2015/03/using-closures-as-objects-in-r/)
    493 
    494 
    495 
    496 Try with `methods()`, or `print(methods(melt))`: Non-visible functions are asterisked!
    497 
    498 > The S3 method name is followed by an asterisk * if the method definition is not exported from the package namespace in which the method is defined.
    499 
    500 ```{r}
    501 methods("melt", data)
    502 ```
    503 
    504 ```{r}
    505 methods(class="table")
    506 ```
    507 
    508 ```{r eval=FALSE, include=T}
    509 help(UseMethod)
    510 ```
    511 
    512 
    513 We can access to some of the above calls with `getAnywhere()`, for example here is done for "melt.data.frame":
    514 ```{r}
    515 getAnywhere("melt.data.frame")
    516 ```
    517 
    518 
    519 References:
    520 
    521 - [stackoverflow article](https://stackoverflow.com/questions/11173683/how-can-i-read-the-source-code-for-an-r-function)
    522 - [Rnews bulletin: R Help Desk](https://www.r-project.org/doc/Rnews/Rnews_2006-4.pdf)
    523 
    524 
    525 
    526 ---
    527 
    528 Going back to out model output visualization.
    529 
    530 ```{r}
    531 output_full<- melt(output,id="time")
    532 ```
    533 
    534 ```{r}
    535 output_full$proportion<- output_full$value/sum(state_values)
    536 ```
    537 
    538 ```{r}
    539 ggplot(data = output, aes(x = time, y = I)) + 
    540   geom_line() + 
    541   xlab("Time(days)") +
    542   ylab("Number of Infected") + 
    543   labs("SIR Model: prevalence of infection")
    544 ```
    545 
    546 ```{r}
    547 ggplot(output_full, aes(time, proportion, color = variable, group = variable)) + 
    548   geom_line() + 
    549   xlab("Time(days)") +
    550   ylab("Prevalence") + 
    551   labs(color = "Compartment", title = "SIR Model")
    552 ```