bookclub-advr

DSLC Advanced R Book Club
git clone https://git.eamoncaddigan.net/bookclub-advr.git
Log | Files | Refs | README | LICENSE

html.json (15479B)


      1 {
      2   "hash": "f015c42b2d4738df55458227dca78b47",
      3   "result": {
      4     "engine": "knitr",
      5     "markdown": "---\nengine: knitr\ntitle: R6\n---\n\n## Learning objectives:\n\n\n\n- Discuss how to construct a R6 class.\n- Overview the different mechanisms of a R6 class (e.g. initialization, print, public, private, and active fields and methods).\n- Observe various examples using R6's mechanisms to create R6 classes, objects, fields, and methods.\n- Observe the consequences of R6's reference semantics.\n- Review the book's arguments on the use of R6 over reference classes.\n\n## A review of OOP\n\n![](images/14-four-pillars.png)\n\n* **A PIE**\n\n## Introducing R6 \n\n![](images/14-r6-logo.png)\n\n* R6 classes are not built into base.\n  * It is a separate [package](https://r6.r-lib.org/).\n  * You have to install and attach to use.\n  * If R6 objects are used in a package, it needs to be specified as a dependency in the `DESCRIPTION` file.\n\n\n::: {.cell}\n\n```{.r .cell-code}\ninstall.packages(\"R6\")\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(R6)\n```\n:::\n\n\n* R6 classes have two special properties:\n  1. Uses an encapsulated OOP paradigm.\n     * Methods belong to objects, not generics.\n     * They follow the form `object$method()` for calling fields and methods.\n  2. R6 objects are mutable.\n     * Modified in place.\n     * They follow reference semantics.\n* R6 is similar to OOP in other languages.\n* However, its use can lead ton non-idiomatic R code.\n  * Tradeoffs - follows an OOP paradigm but sacrafice what users are use to. \n  * [Microsoft365R](https://github.com/Azure/Microsoft365R).\n\n## Constructing an R6 class, the basics\n\n* Really simple to do, just use the `R6::R6Class()` function.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nAccumulator <- R6Class(\"Accumulator\", list(\n  sum = 0,\n  add = function(x = 1) {\n    self$sum <- self$sum + x\n    invisible(self)\n  }\n))\n```\n:::\n\n\n* Two important arguments:\n  1. `classname` - A string used to name the class (not needed but suggested)\n  2. `public` - A list of methods (functions) and fields (anything else)\n* Suggested style conventions to follow:\n  * Class name should follow `UpperCamelCase`.\n  * Methods and fields should use `snake_case`.\n  * Always assign the result of a `R6Class()` into a variable with the same name as the class.\n* You can use `self$` to access methods and fields of the current object.\n\n## Constructing an R6 object\n\n* Just use `$new()`\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- Accumulator$new()\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx$add(4)\nx$sum\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 4\n```\n\n\n:::\n:::\n\n\n## R6 objects and method chaining\n\n* All side-effect R6 methods should return `self` invisibly.\n* This allows for method chaining.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx$add(10)$add(10)$sum\n# [1] 24\n```\n:::\n\n\n* To improve readability:\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Method chaining\nx$\n  add(10)$\n  add(10)$\n  sum\n# [1] 44\n```\n:::\n\n\n## R6 useful methods\n\n* `$print()` - Modifies the default printing method.\n  * `$print()` should always return `invisible(self)`.\n* `$initialize()` - Overides the default behaviour of `$new()`.\n  * Also provides a space to validate inputs.\n\n## Constructing a bank account class\n\n\n::: {.cell}\n\n```{.r .cell-code}\nBankAccount <- R6Class(\"BankAccount\", list(\n  owner = NULL,\n  type = NULL,\n  balance = 0,\n  initialize = function(owner, type) {\n    stopifnot(is.character(owner), length(owner) == 1)\n    stopifnot(is.character(type), length(type) == 1)\n  },\n  deposit = function(amount) {\n    self$balance <- self$balance + amount\n    invisible(self)\n  },\n  withdraw = function(amount) {\n    self$balance <- self$balance - amount\n    invisible(self)\n  }\n))\n```\n:::\n\n\n## Simple transactions\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncollinsavings <- BankAccount$new(\"Collin\", type = \"Savings\")\ncollinsavings$deposit(10)\ncollinsavings\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> <BankAccount>\n#>   Public:\n#>     balance: 10\n#>     clone: function (deep = FALSE) \n#>     deposit: function (amount) \n#>     initialize: function (owner, type) \n#>     owner: NULL\n#>     type: NULL\n#>     withdraw: function (amount)\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncollinsavings$withdraw(10)\ncollinsavings\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> <BankAccount>\n#>   Public:\n#>     balance: 0\n#>     clone: function (deep = FALSE) \n#>     deposit: function (amount) \n#>     initialize: function (owner, type) \n#>     owner: NULL\n#>     type: NULL\n#>     withdraw: function (amount)\n```\n\n\n:::\n:::\n\n\n## Modifying the `$print()` method \n\n\n::: {.cell}\n\n```{.r .cell-code}\nBankAccount <- R6Class(\"BankAccount\", list(\n  owner = NULL,\n  type = NULL,\n  balance = 0,\n  initialize = function(owner, type) {\n    stopifnot(is.character(owner), length(owner) == 1)\n    stopifnot(is.character(type), length(type) == 1)\n\n    self$owner <- owner\n    self$type <- type\n  },\n  deposit = function(amount) {\n    self$balance <- self$balance + amount\n    invisible(self)\n  },\n  withdraw = function(amount) {\n    self$balance <- self$balance - amount\n    invisible(self)\n  },\n  print = function(...) {\n    cat(\"Account owner: \", self$owner, \"\\n\", sep = \"\")\n    cat(\"Account type: \", self$type, \"\\n\", sep = \"\")\n    cat(\"  Balance: \", self$balance, \"\\n\", sep = \"\")\n    invisible(self)\n  }\n))\n```\n:::\n\n\n* Important point: Methods are bound to individual objects.\n  * Reference semantics vs. copy-on-modify.\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncollinsavings\n\nhadleychecking <- BankAccount$new(\"Hadley\", type = \"Checking\")\n\nhadleychecking\n```\n:::\n\n\n## How does this work? \n\n* [Winston Chang's 2017 useR talk](https://www.youtube.com/watch?v=3GEFd8rZQgY&list=WL&index=11)\n\n* [R6 objects are just environments with a particular structure.](https://youtu.be/3GEFd8rZQgY?t=759)\n \n![](images/14-r6_environment.png)\n\n## Adding methods after class creation\n\n* Use `$set()` to add methods after creation.\n* Keep in mind methods added with `$set()` are only available with new objects.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nAccumulator <- R6Class(\"Accumulator\")\nAccumlator$set(\"public\", \"sum\", 0)\nAccumulator$set(\"public\", \"add\", function(x = 1) {\n  self$sum <- self$sum + x\n  invisible(self)\n})\n```\n:::\n\n\n## Inheritance\n\n* To inherit behaviour from an existing class, provide the class object via the `inherit` argument.\n* This example also provides a good example on how to [debug]() an R6 class.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nBankAccountOverDraft <- R6Class(\"BankAccountOverDraft\",\n  inherit = BankAccount,\n  public = list(\n    withdraw = function(amount) {\n      if ((self$balance - amount) < 0) {\n        stop(\"Overdraft\")\n      }\n      # self$balance() <- self$withdraw()\n      self$balance <- self$balance - amount\n      invisible(self)\n    }\n  )\n)\n```\n:::\n\n\n### Future instances debugging\n\n\n::: {.cell}\n\n```{.r .cell-code}\nBankAccountOverDraft$debug(\"withdraw\")\nx <- BankAccountOverDraft$new(\"x\", type = \"Savings\")\nx$withdraw(20)\n\n# Turn debugging off\nBankAccountOverDraft$undebug(\"withdraw\")\n```\n:::\n\n\n### Individual object debugging\n\n* Use the `debug()` function.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- BankAccountOverDraft$new(\"x\", type = \"Savings\")\n# Turn on debugging\ndebug(x$withdraw)\nx$withdraw(10)\n\n# Turn off debugging\nundebug(x$withdraw)\nx$withdraw(5)\n```\n:::\n\n\n### Test out our debugged class\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncollinsavings <- BankAccountOverDraft$new(\"Collin\", type = \"Savings\")\ncollinsavings\ncollinsavings$withdraw(10)\ncollinsavings\ncollinsavings$deposit(5)\ncollinsavings\ncollinsavings$withdraw(5)\n```\n:::\n\n\n## Introspection\n\n* Every R6 object has an S3 class that reflects its hierarchy of R6 classes.\n* Use the `class()` function to determine class (and all classes it inherits from).\n\n\n::: {.cell}\n\n```{.r .cell-code}\nclass(collinsavings)\n```\n:::\n\n\n* You can also list all methods and fields of an R6 object with `names()`.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nnames(collinsavings)\n```\n:::\n\n\n## Controlling access\n\n* R6 provides two other arguments:\n  * `private` - create fields and methods only available from within the class.\n  * `active` - allows you to use accessor functions to define dynamic or active fields.\n\n## Privacy\n\n* Private fields and methods - elements that can only be accessed from within the class, not from the outside.\n* We need to know two things to use private elements:\n  1. `private`'s interface is just like `public`'s interface.\n     * List of methods (functions) and fields (everything else).\n  2. You use `private$` instead of `self$`\n     * You cannot access private fields or methods outside of the class.\n* Why might you want to keep your methods and fields private?\n  * You'll want to be clear what is ok for others to access, especially if you have a complex system of classes.\n  * It's easier to refactor private fields and methods, as you know others are not relying on it.\n\n## Active fields\n\n* Active fields allow you to define components that look like fields from the outside, but are defined with functions, like methods.\n* Implemented using active bindings.\n* Each active binding is a function that takes a single argument `value`.\n* Great when used in conjunction with private fields.\n  * This allows for additional checks.\n  * For example, we can use them to make a read-only field and to validate inputs.\n\n## Adding a read-only bank account number\n\n\n::: {.cell}\n\n```{.r .cell-code}\nBankAccount <- R6Class(\"BankAccount\", public = list(\n  owner = NULL,\n  type = NULL,\n  balance = 0,\n  initialize = function(owner, type, acct_num = NULL) {\n    private$acct_num <- acct_num\n    self$owner <- owner\n    self$type <- type\n  },\n  deposit = function(amount) {\n    self$balance <- self$balance + amount\n    invisible(self)\n  },\n  withdraw = function(amount) {\n    self$balance <- self$balance - amount\n    invisible(self)\n  },\n  print = function(...) {\n    cat(\"Account owner: \", self$owner, \"\\n\", sep = \"\")\n    cat(\"Account type: \", self$type, \"\\n\", sep = \"\")\n    cat(\"Account #: \", private$acct_num, \"\\n\", sep = \"\")\n    cat(\"  Balance: \", self$balance, \"\\n\", sep = \"\")\n    invisible(self)\n  }\n  ),\n  private = list(\n    acct_num = NULL\n  ),\n  active = list(\n    create_acct_num = function(value) {\n      if (is.null(private$acct_num)) {\n        private$acct_num <- ids::uuid()\n      } else {\n        stop(\"`$acct_num` already assigned\")\n      }\n    }\n  )\n)\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncollinsavings <- BankAccount$new(\"Collin\", type = \"Savings\")\ncollinsavings$create_acct_num\n# Stops because account number is assigned\ncollinsavings$create_acct_num()\ncollinsavings$print()\n```\n:::\n\n\n## How does an active field work?\n\n* Not sold on this, as I don't know if `active` gets its own environment. \n  * Any ideas?\n\n![](images/14-r6_active_field.png)\n\n## Reference semantics\n\n* Big difference to note about R6 objects in relation to other objects:\n  * R6 objects have reference semantics.\n* The primary consequence of reference semantics is that objects are not copied when modified.\n* If you want to copy an R6 object, you need to use `$clone`.\n* There are some other less obvious consequences:\n  * It's harder to reason about code that uses R6 objects, as you need more context.\n  * Think about when an R6 object is deleted, you can use `$finalize()` to clean up after yourself.\n  * If one of the fields is an R6 object, you must create it inside `$initialize()`, not `R6Class()`\n\n## R6 makes it harder to reason about code\n\n* Reference semantics makes code harder to reason about.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- list(a = 1)\ny <- list(b = 2)\n\n# Here we know the final line only modifies z\nz <- f(x, y)\n\n# vs.\n\nx <- List$new(a = 1)\ny <- List$new(b = 2)\n\n# If x or y is a method, we don't know if it modifies\n# something other than z. Is this a limitation of\n# abstraction?\nz <- f(x, y)\n```\n:::\n\n\n* I understand the basics, but not necessarily the tradeoffs.\n  * Anyone care to fill me in?\n  * Is this a limitation of abstraction?\n\n## Better sense of what's going on by looking at a finalizer\n\n* Since R6 objects are not copied-on-modified, so they are only deleted once.\n* We can use this characteristic to complement our `$initialize()` with a `$finalize()` method.\n  * i.e., to clean up after we delete an R6 object.\n  * This could be a way to close a database connection.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nTemporaryFile <- R6Class(\"TemporaryFile\", list(\n  path = NULL,\n  initialize = function() {\n    self$path <- tempfile()\n  },\n  finalize = function() {\n    message(\"Cleaning up \", self$path)\n    unlink(self$path)\n  }\n))\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntf <- TemporaryFile$new()\n# The finalizer will clean up, once the R6 object is deleted.\nrm(tf)\n```\n:::\n\n\n## Consequences of R6 fields\n\n* If you use an R6 class as the default value of a field, it will be shared across all instances of the object.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nTemporaryDatabase <- R6Class(\"TemporaryDatabase\", list(\n  con = NULL,\n  file = TemporaryFile$new(),\n  initialize = function() {\n    self$con <- DBI::dbConnect(RSQLite::SQLite(), path = file$path)\n  },\n  finalize = function() {\n    DBI::dbDisconnect(self$con)\n  }\n))\n\ndb_a <- TemporaryDatabase$new()\ndb_b <- TemporaryDatabase$new()\n\ndb_a$file$path == db_b$file$path\n#> [1] TRUE\n```\n:::\n\n\n* To fix this, we need to move the class method call to `$intialize()`\n\n\n::: {.cell}\n\n```{.r .cell-code}\nTemporaryDatabase <- R6Class(\"TemporaryDatabase\", list(\n  con = NULL,\n  file = NULL,\n  initialize = function() {\n    self$file <- TemporaryFile$new()\n    self$con <- DBI::dbConnect(RSQLite::SQLite(), path = file$path)\n  },\n  finalize = function() {\n    DBI::dbDisconnect(self$con)\n  }\n))\n\ndb_a <- TemporaryDatabase$new()\ndb_b <- TemporaryDatabase$new()\n\ndb_a$file$path == db_b$file$path\n#> [1] FALSE\n```\n:::\n\n\n## Why use R6?\n\n* Book mentions R6 is similar to the built-in reference classes.\n* Then why use R6?\n* R6 is simpler. \n  * RC requires you to understand S4.\n* [Comprehensive documentation](https://r6.r-lib.org/articles/Introduction.html).\n* Simpler mechanisms for cross-package subclassing, which just works.\n* R6 separates public and private fields in separate environments, RC stacks everything in the same environment. \n* [R6 is faster](https://r6.r-lib.org/articles/Performance.html).\n* RC is tied to R, so any bug fixes need a newer version of R.\n  * This is especially important if you're writing packages that need to work with multiple R versions.\n* R6 and RC are similar, so if you need RC, it will only require a small amount of additional effort to learn RC.\n",
      6     "supporting": [
      7       "14_files"
      8     ],
      9     "filters": [
     10       "rmarkdown/pagebreak.lua"
     11     ],
     12     "includes": {},
     13     "engineDependencies": {},
     14     "preserve": {},
     15     "postProcess": true
     16   }
     17 }