bookclub-advr

DSLC Advanced R Book Club
git clone https://git.eamoncaddigan.net/bookclub-advr.git
Log | Files | Refs | README | LICENSE

13.Rmd (8830B)


      1 ---
      2 engine: knitr
      3 title: S3
      4 ---
      5 
      6 # Introduction
      7 
      8 ## Basics
      9 
     10 - Has class
     11 - Uses a generic function to decide on method
     12   - method = implementation for a specific class
     13   - dispatch = process of searching for right method
     14 
     15 ## Classes
     16 
     17 **Theory:**
     18 
     19 What is class?
     20 
     21   - No formal definition in S3
     22   - Simply set class attribute
     23 
     24 How to set class?
     25 
     26   - At time of object creation
     27   - After object creation
     28   
     29 ```{r}
     30 # at time of object creation
     31 x <- structure(list(), class = "my_class")
     32 
     33 # after object creation
     34 x <- list()
     35 class(x) <- "my_class"
     36 ```
     37 
     38 Some advice on style:
     39 
     40   - Rules: Can be any string
     41   - Advice: Consider using/including package name to avoid collision with name of another class (e.g., `blob`, which defines a single class; haven has `labelled` and `haven_labelled`)
     42   - Convention: letters and `_`; avoid `.` since it might be confused as separator between generic and class name
     43 
     44 **Practice:**
     45 
     46 How to compose a class in practice?
     47 
     48 - **Constructor**, which helps the developer create new object of target class. Provide always.
     49 - **Validator**, which checks that values in constructor are valid. May not be necessary for simple classes.
     50 - **Helper**, which helps users create new objects of target class. May be relevant only for user-facing classes.
     51 
     52 ### Constructors
     53 
     54 Help developers construct an object of the target class:
     55 
     56 ```{r}
     57 new_difftime <- function(x = double(), units = "secs") {
     58   # check inputs
     59   # issue generic system error if unexpected type or value
     60   stopifnot(is.double(x))
     61   units <- match.arg(units, c("secs", "mins", "hours", "days", "weeks"))
     62 
     63   # construct instance of target class
     64   structure(x,
     65     class = "difftime",
     66     units = units
     67   )
     68 }
     69 ```
     70 
     71 ### Validators
     72 
     73 Contrast a constructor, aimed at quickly creating instances of a class, which only checks type of inputs ...
     74 
     75 ```{r, error = TRUE}
     76 new_factor <- function(x = integer(), levels = character()) {
     77   stopifnot(is.integer(x))
     78   stopifnot(is.character(levels))
     79 
     80   structure(
     81     x,
     82     levels = levels,
     83     class = "factor"
     84   )
     85 }
     86 
     87 # error messages are for system default and developer-facing
     88 new_factor(1:5, "a")
     89 ```
     90 
     91 
     92 ... with a validator, aimed at emitting errors if inputs pose problems, which makes more expensive checks
     93 
     94 ```{r, error = TRUE}
     95 validate_factor <- function(x) {
     96   values <- unclass(x)
     97   levels <- attr(x, "levels")
     98 
     99   if (!all(!is.na(values) & values > 0)) {
    100     stop(
    101       "All `x` values must be non-missing and greater than zero",
    102       call. = FALSE
    103     )
    104   }
    105 
    106   if (length(levels) < max(values)) {
    107     stop(
    108       "There must be at least as many `levels` as possible values in `x`",
    109       call. = FALSE
    110     )
    111   }
    112 
    113   x
    114 }
    115 
    116 # error messages are informative and user-facing
    117 validate_factor(new_factor(1:5, "a"))
    118 ```
    119 
    120 Maybe there is a typo in the `validate_factor()` function? Do the integers need to start at 1 and be consecutive?  
    121 
    122 * If not, then `length(levels) < max(values)` should be `length(levels) < length(values)`, right?
    123 * If so, why do the integers need to start at 1 and be consecutive?  And if they need to be as such, we should tell the user, right?
    124 
    125 ```{r, error = TRUE}
    126 validate_factor(new_factor(1:3, levels = c("a", "b", "c")))
    127 validate_factor(new_factor(10:12, levels = c("a", "b", "c")))
    128 ```
    129 
    130 
    131 ### Helpers
    132 
    133 Some desired virtues:
    134 
    135 - Have the same name as the class
    136 - Call the constructor and validator, if the latter exists.
    137 - Issue error informative, user-facing error messages
    138 - Adopt thoughtful/useful defaults or type conversion
    139 
    140 
    141 Exercise 5 in 13.3.4
    142 
    143 Q: Read the documentation for `utils::as.roman()`. How would you write a constructor for this class? Does it need a validator? What might a helper do?
    144 
    145 A: This function transforms numeric input into Roman numbers. It is built on the integer type, which results in the following constructor.
    146  
    147  
    148 ```{r}
    149 new_roman <- function(x = integer()) {
    150   stopifnot(is.integer(x))
    151   structure(x, class = "roman")
    152 }
    153 ```
    154 
    155 The documentation tells us, that only values between 1 and 3899 are uniquely represented, which we then include in our validation function.
    156 
    157 ```{r}
    158 validate_roman <- function(x) {
    159   values <- unclass(x)
    160   
    161   if (any(values < 1 | values > 3899)) {
    162     stop(
    163       "Roman numbers must fall between 1 and 3899.",
    164       call. = FALSE
    165     )
    166   }
    167   x
    168 }
    169 ```
    170 
    171 For convenience, we allow the user to also pass real values to a helper function.
    172 
    173 ```{r, error = TRUE}
    174 roman <- function(x = integer()) {
    175   x <- as.integer(x)
    176   
    177   validate_roman(new_roman(x))
    178 }
    179 
    180 # Test
    181 roman(c(1, 753, 2024))
    182 
    183 roman(0)
    184 ```
    185 
    186 
    187 
    188 ## Generics and methods
    189 
    190 **Generic functions:**
    191 
    192 - Consist of a call to `UseMethod()`
    193 - Pass arguments from the generic to the dispatched method "auto-magically"
    194 
    195 ```{r}
    196 my_new_generic <- function(x) {
    197   UseMethod("my_new_generic")
    198 }
    199 ```
    200 
    201 ### Method dispatch
    202 
    203 - `UseMethod()` creates a vector of method names
    204 - Dispatch 
    205   - Examines all methods in the vector
    206   - Selects a method
    207 
    208 ```{r}
    209 x <- Sys.Date()
    210 sloop::s3_dispatch(print(x))
    211 ```
    212 
    213 ### Finding methods
    214 
    215 While `sloop::s3_dispatch()` gives the specific method selected for a specific call, on can see the methods defined:
    216 
    217 - For a generic
    218 ```{r}
    219 sloop::s3_methods_generic("mean")
    220 ```
    221 - For a class
    222 ```{r}
    223 sloop::s3_methods_class("ordered")
    224 ```
    225 
    226 ### Creating methods
    227 
    228 Two rules:
    229 
    230 - Only write a method if you own the generic. Otherwise, bad manners.
    231 - Method must have same arguments as its generic--with one important exception: `...`
    232 
    233 **Example from text:**
    234 
    235 I thought it would be good for us to work through this problem.
    236 
    237 > Carefully read the documentation for `UseMethod()` and explain why the following code returns the results that it does. What two usual rules of function evaluation does `UseMethod()` violate?
    238 
    239 ```{r}
    240 g <- function(x) {
    241   x <- 10
    242   y <- 10
    243   UseMethod("g")
    244 }
    245 g.default <- function(x) c(x = x, y = y)
    246 
    247 x <- 1
    248 y <- 1
    249 g(x)
    250 g.default(x)
    251 ```
    252 
    253 
    254 
    255 **Examples caught in the wild:**
    256 
    257 - [`haven::zap_label`](https://github.com/tidyverse/haven/blob/main/R/zap_label.R), which removes column labels
    258 - [`dplyr::mutate`](https://github.com/tidyverse/dplyr/blob/main/R/mutate.R)
    259 - [`tidyr::pivot_longer`](https://github.com/tidyverse/tidyr/blob/main/R/pivot-long.R)
    260 
    261 ## Object styles
    262 
    263 ## Inheritance
    264 
    265 Three ideas:
    266 
    267 1. Class is a vector of classes
    268 ```{r}
    269 class(ordered("x"))
    270 class(Sys.time())
    271 ```
    272 2. Dispatch moves through class vector until it finds a defined method
    273 ```{r}
    274 sloop::s3_dispatch(print(ordered("x")))
    275 ```
    276 3. Method can delegate to another method via `NextMethod()`, which is indicated by `->` as below:
    277 ```{r}
    278 sloop::s3_dispatch(ordered("x")[1])
    279 ```
    280 
    281 ### `NextMethod()`
    282 
    283 Consider `secret` class that masks each character of the input with `x` in output
    284 
    285 ```{r}
    286 new_secret <- function(x = double()) {
    287   stopifnot(is.double(x))
    288   structure(x, class = "secret")
    289 }
    290 
    291 print.secret <- function(x, ...) {
    292   print(strrep("x", nchar(x)))
    293   invisible(x)
    294 }
    295 
    296 y <- new_secret(c(15, 1, 456))
    297 y
    298 ```
    299 
    300 Notice that the `[` method is problematic in that it does not preserve the `secret` class.  Additionally, it returns `15` as the first element instead of `xx`.
    301 
    302 ```{r}
    303 sloop::s3_dispatch(y[1])
    304 y[1]
    305 ```
    306 
    307 Fix this with a `[.secret` method:
    308 
    309 The first fix (not run) is inefficient because it creates a copy of `y`.
    310 
    311 ```{r eval = FALSE}
    312 # not run
    313 `[.secret` <- function(x, i) {
    314   x <- unclass(x)
    315   new_secret(x[i])
    316 }
    317 ```
    318 
    319 `NextMethod()` is more efficient.
    320 
    321 ```{r}
    322 `[.secret` <- function(x, i) {
    323   # first, dispatch to `[`
    324   # then, coerce subset value to `secret` class
    325   new_secret(NextMethod())
    326 }
    327 ```
    328 
    329 Notice that `[.secret` is selected for dispatch, but that the method delegates to the internal `[`.
    330 
    331 ```{r}
    332 sloop::s3_dispatch(y[1])
    333 y[1]
    334 ```
    335 
    336 
    337 ### Allowing subclassing
    338 
    339 Continue the example above to have a `supersecret` subclass that hides even the number of characters in the input (e.g., `123` -> `xxxxx`, 12345678 -> `xxxxx`, 1 -> `xxxxx`).
    340 
    341 To allow for this subclass, the constructor function needs to include two additional arguments:
    342 
    343 - `...` for passing an arbitrary set of arguments to different subclasses
    344 - `class` for defining the subclass
    345 
    346 ```{r}
    347 new_secret <- function(x, ..., class = character()) {
    348   stopifnot(is.double(x))
    349 
    350   structure(
    351     x,
    352     ...,
    353     class = c(class, "secret")
    354   )
    355 }
    356 ```
    357 
    358 To create the subclass, simply invoke the parent class constructor inside of the subclass constructor:
    359 
    360 ```{r}
    361 new_supersecret <- function(x) {
    362   new_secret(x, class = "supersecret")
    363 }
    364 
    365 print.supersecret <- function(x, ...) {
    366   print(rep("xxxxx", length(x)))
    367   invisible(x)
    368 }
    369 ```
    370 
    371 But this means the subclass inherits all parent methods and needs to overwrite all parent methods with subclass methods that return the sublclass rather than the parent class.
    372 
    373 There's no easy solution to this problem in base R.
    374 
    375 There is a solution in the vectors package: `vctrs::vec_restore()`
    376 
    377 <!-- TODO: read docs/vignettes to be able to summarize how this works -->