bookclub-advr

DSLC Advanced R Book Club
git clone https://git.eamoncaddigan.net/bookclub-advr.git
Log | Files | Refs | README | LICENSE

14.Rmd (13178B)


      1 ---
      2 engine: knitr
      3 title: R6
      4 ---
      5 
      6 ## Learning objectives:
      7 
      8 ```{r include=FALSE}
      9 library(ids)
     10 ```
     11 
     12 - Discuss how to construct a R6 class.
     13 - Overview the different mechanisms of a R6 class (e.g. initialization, print, public, private, and active fields and methods).
     14 - Observe various examples using R6's mechanisms to create R6 classes, objects, fields, and methods.
     15 - Observe the consequences of R6's reference semantics.
     16 - Review the book's arguments on the use of R6 over reference classes.
     17 
     18 ## A review of OOP
     19 
     20 ![](images/14-four-pillars.png)
     21 
     22 * **A PIE**
     23 
     24 ## Introducing R6 
     25 
     26 ![](images/14-r6-logo.png)
     27 
     28 * R6 classes are not built into base.
     29   * It is a separate [package](https://r6.r-lib.org/).
     30   * You have to install and attach to use.
     31   * If R6 objects are used in a package, it needs to be specified as a dependency in the `DESCRIPTION` file.
     32 
     33 ```{r eval=FALSE}
     34 install.packages("R6")
     35 ```
     36 
     37 ```{r}
     38 library(R6)
     39 ```
     40 
     41 * R6 classes have two special properties:
     42   1. Uses an encapsulated OOP paradigm.
     43      * Methods belong to objects, not generics.
     44      * They follow the form `object$method()` for calling fields and methods.
     45   2. R6 objects are mutable.
     46      * Modified in place.
     47      * They follow reference semantics.
     48 * R6 is similar to OOP in other languages.
     49 * However, its use can lead ton non-idiomatic R code.
     50   * Tradeoffs - follows an OOP paradigm but sacrafice what users are use to. 
     51   * [Microsoft365R](https://github.com/Azure/Microsoft365R).
     52 
     53 ## Constructing an R6 class, the basics
     54 
     55 * Really simple to do, just use the `R6::R6Class()` function.
     56 
     57 ```{r}
     58 Accumulator <- R6Class("Accumulator", list(
     59   sum = 0,
     60   add = function(x = 1) {
     61     self$sum <- self$sum + x
     62     invisible(self)
     63   }
     64 ))
     65 ```
     66 
     67 * Two important arguments:
     68   1. `classname` - A string used to name the class (not needed but suggested)
     69   2. `public` - A list of methods (functions) and fields (anything else)
     70 * Suggested style conventions to follow:
     71   * Class name should follow `UpperCamelCase`.
     72   * Methods and fields should use `snake_case`.
     73   * Always assign the result of a `R6Class()` into a variable with the same name as the class.
     74 * You can use `self$` to access methods and fields of the current object.
     75 
     76 ## Constructing an R6 object
     77 
     78 * Just use `$new()`
     79 
     80 ```{r}
     81 x <- Accumulator$new()
     82 ```
     83 
     84 ```{r}
     85 x$add(4)
     86 x$sum
     87 ```
     88 
     89 ## R6 objects and method chaining
     90 
     91 * All side-effect R6 methods should return `self` invisibly.
     92 * This allows for method chaining.
     93 
     94 ```{r eval=FALSE}
     95 x$add(10)$add(10)$sum
     96 # [1] 24
     97 ```
     98 
     99 * To improve readability:
    100 
    101 ```{r eval=FALSE}
    102 # Method chaining
    103 x$
    104   add(10)$
    105   add(10)$
    106   sum
    107 # [1] 44
    108 ```
    109 
    110 ## R6 useful methods
    111 
    112 * `$print()` - Modifies the default printing method.
    113   * `$print()` should always return `invisible(self)`.
    114 * `$initialize()` - Overides the default behaviour of `$new()`.
    115   * Also provides a space to validate inputs.
    116 
    117 ## Constructing a bank account class
    118 
    119 ```{r}
    120 BankAccount <- R6Class("BankAccount", list(
    121   owner = NULL,
    122   type = NULL,
    123   balance = 0,
    124   initialize = function(owner, type) {
    125     stopifnot(is.character(owner), length(owner) == 1)
    126     stopifnot(is.character(type), length(type) == 1)
    127   },
    128   deposit = function(amount) {
    129     self$balance <- self$balance + amount
    130     invisible(self)
    131   },
    132   withdraw = function(amount) {
    133     self$balance <- self$balance - amount
    134     invisible(self)
    135   }
    136 ))
    137 ```
    138 
    139 ## Simple transactions
    140 
    141 ```{r}
    142 collinsavings <- BankAccount$new("Collin", type = "Savings")
    143 collinsavings$deposit(10)
    144 collinsavings
    145 ```
    146 
    147 ```{r}
    148 collinsavings$withdraw(10)
    149 collinsavings
    150 ```
    151 
    152 ## Modifying the `$print()` method 
    153 
    154 ```{r}
    155 BankAccount <- R6Class("BankAccount", list(
    156   owner = NULL,
    157   type = NULL,
    158   balance = 0,
    159   initialize = function(owner, type) {
    160     stopifnot(is.character(owner), length(owner) == 1)
    161     stopifnot(is.character(type), length(type) == 1)
    162 
    163     self$owner <- owner
    164     self$type <- type
    165   },
    166   deposit = function(amount) {
    167     self$balance <- self$balance + amount
    168     invisible(self)
    169   },
    170   withdraw = function(amount) {
    171     self$balance <- self$balance - amount
    172     invisible(self)
    173   },
    174   print = function(...) {
    175     cat("Account owner: ", self$owner, "\n", sep = "")
    176     cat("Account type: ", self$type, "\n", sep = "")
    177     cat("  Balance: ", self$balance, "\n", sep = "")
    178     invisible(self)
    179   }
    180 ))
    181 ```
    182 
    183 * Important point: Methods are bound to individual objects.
    184   * Reference semantics vs. copy-on-modify.
    185 
    186 ```{r eval=FALSE}
    187 collinsavings
    188 
    189 hadleychecking <- BankAccount$new("Hadley", type = "Checking")
    190 
    191 hadleychecking
    192 ```
    193 
    194 ## How does this work? 
    195 
    196 * [Winston Chang's 2017 useR talk](https://www.youtube.com/watch?v=3GEFd8rZQgY&list=WL&index=11)
    197 
    198 * [R6 objects are just environments with a particular structure.](https://youtu.be/3GEFd8rZQgY?t=759)
    199  
    200 ![](images/14-r6_environment.png)
    201 
    202 ## Adding methods after class creation
    203 
    204 * Use `$set()` to add methods after creation.
    205 * Keep in mind methods added with `$set()` are only available with new objects.
    206 
    207 ```{r eval=FALSE}
    208 Accumulator <- R6Class("Accumulator")
    209 Accumlator$set("public", "sum", 0)
    210 Accumulator$set("public", "add", function(x = 1) {
    211   self$sum <- self$sum + x
    212   invisible(self)
    213 })
    214 ```
    215 
    216 ## Inheritance
    217 
    218 * To inherit behaviour from an existing class, provide the class object via the `inherit` argument.
    219 * This example also provides a good example on how to [debug]() an R6 class.
    220 
    221 ```{r eval=FALSE}
    222 BankAccountOverDraft <- R6Class("BankAccountOverDraft",
    223   inherit = BankAccount,
    224   public = list(
    225     withdraw = function(amount) {
    226       if ((self$balance - amount) < 0) {
    227         stop("Overdraft")
    228       }
    229       # self$balance() <- self$withdraw()
    230       self$balance <- self$balance - amount
    231       invisible(self)
    232     }
    233   )
    234 )
    235 ```
    236 
    237 ### Future instances debugging
    238 
    239 ```{r eval=FALSE}
    240 BankAccountOverDraft$debug("withdraw")
    241 x <- BankAccountOverDraft$new("x", type = "Savings")
    242 x$withdraw(20)
    243 
    244 # Turn debugging off
    245 BankAccountOverDraft$undebug("withdraw")
    246 ```
    247 
    248 ### Individual object debugging
    249 
    250 * Use the `debug()` function.
    251 
    252 ```{r eval=FALSE}
    253 x <- BankAccountOverDraft$new("x", type = "Savings")
    254 # Turn on debugging
    255 debug(x$withdraw)
    256 x$withdraw(10)
    257 
    258 # Turn off debugging
    259 undebug(x$withdraw)
    260 x$withdraw(5)
    261 ```
    262 
    263 ### Test out our debugged class
    264 
    265 ```{r eval=FALSE}
    266 collinsavings <- BankAccountOverDraft$new("Collin", type = "Savings")
    267 collinsavings
    268 collinsavings$withdraw(10)
    269 collinsavings
    270 collinsavings$deposit(5)
    271 collinsavings
    272 collinsavings$withdraw(5)
    273 ```
    274 
    275 ## Introspection
    276 
    277 * Every R6 object has an S3 class that reflects its hierarchy of R6 classes.
    278 * Use the `class()` function to determine class (and all classes it inherits from).
    279 
    280 ```{r eval=FALSE}
    281 class(collinsavings)
    282 ```
    283 
    284 * You can also list all methods and fields of an R6 object with `names()`.
    285 
    286 ```{r eval=FALSE}
    287 names(collinsavings)
    288 ```
    289 
    290 ## Controlling access
    291 
    292 * R6 provides two other arguments:
    293   * `private` - create fields and methods only available from within the class.
    294   * `active` - allows you to use accessor functions to define dynamic or active fields.
    295 
    296 ## Privacy
    297 
    298 * Private fields and methods - elements that can only be accessed from within the class, not from the outside.
    299 * We need to know two things to use private elements:
    300   1. `private`'s interface is just like `public`'s interface.
    301      * List of methods (functions) and fields (everything else).
    302   2. You use `private$` instead of `self$`
    303      * You cannot access private fields or methods outside of the class.
    304 * Why might you want to keep your methods and fields private?
    305   * You'll want to be clear what is ok for others to access, especially if you have a complex system of classes.
    306   * It's easier to refactor private fields and methods, as you know others are not relying on it.
    307 
    308 ## Active fields
    309 
    310 * Active fields allow you to define components that look like fields from the outside, but are defined with functions, like methods.
    311 * Implemented using active bindings.
    312 * Each active binding is a function that takes a single argument `value`.
    313 * Great when used in conjunction with private fields.
    314   * This allows for additional checks.
    315   * For example, we can use them to make a read-only field and to validate inputs.
    316 
    317 ## Adding a read-only bank account number
    318 
    319 ```{r eval=FALSE}
    320 BankAccount <- R6Class("BankAccount", public = list(
    321   owner = NULL,
    322   type = NULL,
    323   balance = 0,
    324   initialize = function(owner, type, acct_num = NULL) {
    325     private$acct_num <- acct_num
    326     self$owner <- owner
    327     self$type <- type
    328   },
    329   deposit = function(amount) {
    330     self$balance <- self$balance + amount
    331     invisible(self)
    332   },
    333   withdraw = function(amount) {
    334     self$balance <- self$balance - amount
    335     invisible(self)
    336   },
    337   print = function(...) {
    338     cat("Account owner: ", self$owner, "\n", sep = "")
    339     cat("Account type: ", self$type, "\n", sep = "")
    340     cat("Account #: ", private$acct_num, "\n", sep = "")
    341     cat("  Balance: ", self$balance, "\n", sep = "")
    342     invisible(self)
    343   }
    344   ),
    345   private = list(
    346     acct_num = NULL
    347   ),
    348   active = list(
    349     create_acct_num = function(value) {
    350       if (is.null(private$acct_num)) {
    351         private$acct_num <- ids::uuid()
    352       } else {
    353         stop("`$acct_num` already assigned")
    354       }
    355     }
    356   )
    357 )
    358 ```
    359 
    360 ```{r eval=FALSE}
    361 collinsavings <- BankAccount$new("Collin", type = "Savings")
    362 collinsavings$create_acct_num
    363 # Stops because account number is assigned
    364 collinsavings$create_acct_num()
    365 collinsavings$print()
    366 ```
    367 
    368 ## How does an active field work?
    369 
    370 * Not sold on this, as I don't know if `active` gets its own environment. 
    371   * Any ideas?
    372 
    373 ![](images/14-r6_active_field.png)
    374 
    375 ## Reference semantics
    376 
    377 * Big difference to note about R6 objects in relation to other objects:
    378   * R6 objects have reference semantics.
    379 * The primary consequence of reference semantics is that objects are not copied when modified.
    380 * If you want to copy an R6 object, you need to use `$clone`.
    381 * There are some other less obvious consequences:
    382   * It's harder to reason about code that uses R6 objects, as you need more context.
    383   * Think about when an R6 object is deleted, you can use `$finalize()` to clean up after yourself.
    384   * If one of the fields is an R6 object, you must create it inside `$initialize()`, not `R6Class()`
    385 
    386 ## R6 makes it harder to reason about code
    387 
    388 * Reference semantics makes code harder to reason about.
    389 
    390 ```{r eval=FALSE}
    391 x <- list(a = 1)
    392 y <- list(b = 2)
    393 
    394 # Here we know the final line only modifies z
    395 z <- f(x, y)
    396 
    397 # vs.
    398 
    399 x <- List$new(a = 1)
    400 y <- List$new(b = 2)
    401 
    402 # If x or y is a method, we don't know if it modifies
    403 # something other than z. Is this a limitation of
    404 # abstraction?
    405 z <- f(x, y)
    406 ```
    407 
    408 * I understand the basics, but not necessarily the tradeoffs.
    409   * Anyone care to fill me in?
    410   * Is this a limitation of abstraction?
    411 
    412 ## Better sense of what's going on by looking at a finalizer
    413 
    414 * Since R6 objects are not copied-on-modified, so they are only deleted once.
    415 * We can use this characteristic to complement our `$initialize()` with a `$finalize()` method.
    416   * i.e., to clean up after we delete an R6 object.
    417   * This could be a way to close a database connection.
    418 
    419 ```{r eval=FALSE}
    420 TemporaryFile <- R6Class("TemporaryFile", list(
    421   path = NULL,
    422   initialize = function() {
    423     self$path <- tempfile()
    424   },
    425   finalize = function() {
    426     message("Cleaning up ", self$path)
    427     unlink(self$path)
    428   }
    429 ))
    430 ```
    431 
    432 ```{r eval=FALSE}
    433 tf <- TemporaryFile$new()
    434 # The finalizer will clean up, once the R6 object is deleted.
    435 rm(tf)
    436 ```
    437 
    438 ## Consequences of R6 fields
    439 
    440 * If you use an R6 class as the default value of a field, it will be shared across all instances of the object.
    441 
    442 ```{r eval=FALSE}
    443 TemporaryDatabase <- R6Class("TemporaryDatabase", list(
    444   con = NULL,
    445   file = TemporaryFile$new(),
    446   initialize = function() {
    447     self$con <- DBI::dbConnect(RSQLite::SQLite(), path = file$path)
    448   },
    449   finalize = function() {
    450     DBI::dbDisconnect(self$con)
    451   }
    452 ))
    453 
    454 db_a <- TemporaryDatabase$new()
    455 db_b <- TemporaryDatabase$new()
    456 
    457 db_a$file$path == db_b$file$path
    458 #> [1] TRUE
    459 ```
    460 
    461 * To fix this, we need to move the class method call to `$intialize()`
    462 
    463 ```{r eval=FALSE}
    464 TemporaryDatabase <- R6Class("TemporaryDatabase", list(
    465   con = NULL,
    466   file = NULL,
    467   initialize = function() {
    468     self$file <- TemporaryFile$new()
    469     self$con <- DBI::dbConnect(RSQLite::SQLite(), path = file$path)
    470   },
    471   finalize = function() {
    472     DBI::dbDisconnect(self$con)
    473   }
    474 ))
    475 
    476 db_a <- TemporaryDatabase$new()
    477 db_b <- TemporaryDatabase$new()
    478 
    479 db_a$file$path == db_b$file$path
    480 #> [1] FALSE
    481 ```
    482 
    483 ## Why use R6?
    484 
    485 * Book mentions R6 is similar to the built-in reference classes.
    486 * Then why use R6?
    487 * R6 is simpler. 
    488   * RC requires you to understand S4.
    489 * [Comprehensive documentation](https://r6.r-lib.org/articles/Introduction.html).
    490 * Simpler mechanisms for cross-package subclassing, which just works.
    491 * R6 separates public and private fields in separate environments, RC stacks everything in the same environment. 
    492 * [R6 is faster](https://r6.r-lib.org/articles/Performance.html).
    493 * RC is tied to R, so any bug fixes need a newer version of R.
    494   * This is especially important if you're writing packages that need to work with multiple R versions.
    495 * R6 and RC are similar, so if you need RC, it will only require a small amount of additional effort to learn RC.