bookclub-advr

DSLC Advanced R Book Club
git clone https://git.eamoncaddigan.net/bookclub-advr.git
Log | Files | Refs | README | LICENSE

12.Rmd (7739B)


      1 ---
      2 engine: knitr
      3 title: Base types
      4 ---
      5 
      6 ## Learning objectives:
      7 
      8 - Understand what OOP means--at the very least for R
      9 - Know how to discern an object's nature--base or OO--and type
     10 
     11 ![John Chambers, creator of S programming language](images/base_types/john_chambers_about_objects.png)
     12 
     13 <details>
     14 <summary>Session Info</summary>
     15 ```{r}
     16 library("DiagrammeR")
     17 ```
     18 
     19 ```{r}
     20 utils::sessionInfo()
     21 ```
     22 
     23 </details>
     24 
     25 
     26 ## Why OOP is hard in R
     27 
     28 - Multiple OOP systems exist: S3, R6, S4, and (now/soon) S7.
     29 - Multiple preferences: some users prefer one system; others, another.
     30 - R's OOP systems are different enough that prior OOP experience may not transfer well.
     31 
     32 [![XKCD 927](images/base_types/standards.png)](https://xkcd.com/927/)
     33 
     34 
     35 ## OOP: Big Ideas
     36 
     37 1. **Polymorphism.** Function has a single interface (outside), but contains (inside) several class-specific implementations.
     38 ```{r, eval=FALSE}
     39 # imagine a function with object x as an argument
     40 # from the outside, users interact with the same function
     41 # but inside the function, there are provisions to deal with objects of different classes
     42 some_function <- function(x) {
     43   if is.numeric(x) {
     44     # implementation for numeric x
     45   } else if is.character(x) {
     46     # implementation for character x
     47   } ...
     48 }
     49 ```
     50 
     51 <details>
     52 <summary>Example of polymorphism</summary>
     53 
     54 ```{r polymorphism_example}
     55 # data frame
     56 summary(mtcars[,1:4])
     57 
     58 # statistical model
     59 lin_fit <- lm(mpg ~ hp, data = mtcars)
     60 summary(lin_fit)
     61 ```
     62 
     63 </details>
     64 
     65 2. **Encapsulation.** Function "encapsulates"--that is, encloses in an inviolate capsule--both data and how it acts on data. Think of a REST API: a client interacts with with an API only through a set of discrete endpoints (i.e., things to get or set), but the server does not otherwise give access to its internal workings or state. Like with an API, this creates a separation of concerns: OOP functions take inputs and yield results; users only consume those results.
     66 
     67 ## OOP: Properties
     68 
     69 ### Objects have class
     70 
     71 - Class defines:
     72   - Method (i.e., what can be done with object)
     73   - Fields (i.e., data that defines an instance of the class)
     74 - Objects are an instance of a class
     75 
     76 ### Class is inherited
     77 
     78 - Class is defined:
     79   - By an object's class (e.g., ordered factor)
     80   - By the parent of the object's class (e.g., factor)
     81 - Inheritance matters for method dispatch
     82   - If a method is defined for an object's class, use that method
     83   - If an object doesn't have a method, use the method of the parent class
     84   - The process of finding a method, is called dispatch
     85 
     86 ## OOP in R: Two Paradigms
     87 
     88 **1. Encapsulated OOP**
     89 
     90 - Objects "encapsulate"
     91   - Methods (i.e., what can be done)
     92   - Fields (i.e., data on which things are done)
     93 - Calls communicate this encapsulation, since form follows function
     94   - Form: `object.method(arg1, arg2)`
     95   - Function: for `object`, apply `method` for `object`'s class with arguments `arg1` and `arg2`
     96 
     97 **2. Functional OOP**
     98 
     99 - Methods belong to "generic" functions
    100 - From the outside, look like regular functions: `generic(object, arg2, arg3)`
    101 - From the inside, components are also functions
    102 
    103 ### Concept Map
    104 
    105 ```{r, echo = FALSE, eval = TRUE}
    106 DiagrammeR::mermaid("
    107 graph LR
    108 
    109 OOP --> encapsulated_OOP
    110 OOP --> functional_OOP
    111 
    112 functional_OOP --> S3
    113 functional_OOP --> S4
    114 
    115 encapsulated_OOP --> R6
    116 encapsulated_OOP --> RC
    117 ")
    118 ```
    119 
    120 <details>
    121 <summary>Mermaid code</summary>
    122 ```{r, echo = TRUE, eval = FALSE}
    123 DiagrammeR::mermaid("
    124 graph LR
    125 
    126 OOP --> encapsulated_OOP
    127 OOP --> functional_OOP
    128 
    129 functional_OOP --> S3
    130 functional_OOP --> S4
    131 
    132 encapsulated_OOP --> R6
    133 encapsulated_OOP --> RC
    134 ")
    135 ```
    136 </details>
    137 
    138 ## OOP in base R
    139 
    140 - **S3**
    141   - Paradigm: functional OOP
    142   - Noteworthy: R's first OOP system
    143   - Use case: low-cost solution for common problems
    144   - Downsides: no guarantees
    145 - **S4**
    146   - Paradigm: functional OOP
    147   - Noteworthy: rewrite of S3, used by `Bioconductor`
    148   - Use case: "more guarantees and greater encapsulation" than S3
    149   - Downsides: higher setup cost than S3
    150 - **RC**
    151   - Paradigm: encapsulated OOP
    152   - Noteworthy: special type of S4 object is mutable--in other words, that can be modified in place (instead of R's usual copy-on-modify behavior)
    153   - Use cases: problems that are hard to tackle with functional OOP (in S3 and S4)
    154   - Downsides: harder to reason about (because of modify-in-place logic)
    155 
    156 ## OOP in packages
    157 
    158 - **R6**
    159   - Paradigm: encapsulated OOP
    160   - Noteworthy: resolves issues with RC
    161 - **R7**
    162   - Paradigm: functional OOP
    163   - Noteworthy: 
    164     - best parts of S3 and S4
    165     - ease of S3
    166     - power of S4
    167     - See more in [rstudio::conf(2022) talk](https://www.rstudio.com/conference/2022/talks/introduction-to-r7/)
    168 - **R.oo**
    169   - Paradigm: hybrid functional and encapsulated (?)
    170 - **proto**
    171   - Paradigm: prototype OOP
    172   - Noteworthy: OOP style used in `ggplot2`
    173 
    174 ## How can you tell if an object is base or OOP?
    175 
    176 ### Functions
    177 
    178 Two functions:
    179 
    180 - `base::is.object()`, which yields TRUE/FALSE about whether is OOP object
    181 - `sloop::otype()`, which says what type of object type: `"base"`, `"S3"`, etc.
    182 
    183 An few examples:
    184 
    185 ```{r}
    186 # Example 1: a base object
    187 is.object(1:10)
    188 sloop::otype(1:10)
    189 
    190 # Example 2: an OO object
    191 is.object(mtcars)
    192 sloop::otype(mtcars)
    193 ```
    194 
    195 ### sloop
    196 
    197 * **S** **L**anguage **O**bject-**O**riented **P**rogramming
    198 
    199 [![XKCD 927](images/base_types/sloop_john_b.png)](https://en.wikipedia.org/wiki/Sloop_John_B)
    200 
    201 ### Class
    202 
    203 OO objects have a "class" attribute:
    204 
    205 ```{r}
    206 # base object has no class
    207 attr(1:10, "class")
    208 
    209 # OO object has one or more classes
    210 attr(mtcars, "class")
    211 ```
    212 
    213 ## What about types?
    214 
    215 Only OO objects have a "class" attribute, but every object--whether base or OO--has class
    216 
    217 ### Vectors
    218 
    219 ```{r}
    220 typeof(NULL)
    221 typeof(c("a", "b", "c"))
    222 typeof(1L)
    223 typeof(1i)
    224 ```
    225 
    226 
    227 ### Functions
    228 
    229 ```{r}
    230 # "normal" function
    231 my_fun <- function(x) { x + 1 }
    232 typeof(my_fun)
    233 # internal function
    234 typeof(`[`)
    235 # primitive function
    236 typeof(sum)    
    237 ```
    238 
    239 ### Environments
    240 
    241 ```{r}
    242 typeof(globalenv())
    243 ```
    244 
    245 
    246 ### S4
    247 
    248 ```{r}
    249 mle_obj <- stats4::mle(function(x = 1) (x - 2) ^ 2)
    250 typeof(mle_obj)
    251 ```
    252 
    253 
    254 ### Language components
    255 
    256 ```{r}
    257 typeof(quote(a))
    258 typeof(quote(a + 1))
    259 typeof(formals(my_fun))
    260 ```
    261 
    262 ### Concept Map
    263 
    264 ![Base types in R](images/base_types/base_types_Sankey_graph.png)
    265 
    266 <details>
    267 <summary>Sankey graph code</summary>
    268 
    269 The graph above was made with [SankeyMATIC](https://sankeymatic.com/)
    270 
    271 ```
    272 // toggle "Show Values"
    273 // set Default Flow Colors from "each flow's Source"
    274 
    275 base\ntypes [8] vectors
    276 base\ntypes [3] functions
    277 base\ntypes [1] environments
    278 base\ntypes [1] S4 OOP
    279 base\ntypes [3] language\ncomponents
    280 base\ntypes [6] C components
    281 
    282 vectors [1] NULL
    283 vectors [1] logical
    284 vectors [1] integer
    285 vectors [1] double
    286 vectors [1] complex
    287 vectors [1] character
    288 vectors [1] list
    289 vectors [1] raw
    290 
    291 functions [1] closure
    292 functions [1] special
    293 functions [1] builtin
    294 
    295 environments [1] environment
    296 
    297 S4 OOP [1] S4
    298 
    299 language\ncomponents [1] symbol
    300 language\ncomponents [1] language
    301 language\ncomponents [1] pairlist
    302 
    303 C components [1] externalptr
    304 C components [1] weakref
    305 C components [1] bytecode
    306 C components [1] promise
    307 C components [1] ...
    308 C components [1] any
    309 ```
    310 
    311 </details>
    312 
    313 ## Be careful about the numeric type
    314 
    315 1. Often "numeric" is treated as synonymous for double:
    316 
    317 ```{r}
    318 # create a double and integeger objects
    319 one <- 1
    320 oneL <- 1L
    321 typeof(one)
    322 typeof(oneL)
    323 
    324 # check their type after as.numeric()
    325 one |> as.numeric() |> typeof()
    326 oneL |> as.numeric() |> typeof()
    327 ```
    328 
    329 2. In S3 and S4, "numeric" is taken as either integer or double, when choosing methods:
    330 
    331 ```{r}
    332 sloop::s3_class(1)
    333 sloop::s3_class(1L)
    334 ```
    335 
    336 3. `is.numeric()` tests whether an object behaves like a number
    337 
    338 ```{r}
    339 typeof(factor("x"))
    340 is.numeric(factor("x"))
    341 ```
    342 
    343 But Advanced R consistently uses numeric to mean integer or double type.