bookclub-advr

DSLC Advanced R Book Club
git clone https://git.eamoncaddigan.net/bookclub-advr.git
Log | Files | Refs | README | LICENSE

html.json (14990B)


      1 {
      2   "hash": "a94a54d500276d02c4752c2293baa2b5",
      3   "result": {
      4     "engine": "knitr",
      5     "markdown": "---\nengine: knitr\ntitle: Base types\n---\n\n## Learning objectives:\n\n- Understand what OOP means--at the very least for R\n- Know how to discern an object's nature--base or OO--and type\n\n![John Chambers, creator of S programming language](images/base_types/john_chambers_about_objects.png)\n\n<details>\n<summary>Session Info</summary>\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(\"DiagrammeR\")\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nutils::sessionInfo()\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> R version 4.5.1 (2025-06-13 ucrt)\n#> Platform: x86_64-w64-mingw32/x64\n#> Running under: Windows 11 x64 (build 26100)\n#> \n#> Matrix products: default\n#>   LAPACK version 3.12.1\n#> \n#> locale:\n#> [1] LC_COLLATE=English_United States.utf8 \n#> [2] LC_CTYPE=English_United States.utf8   \n#> [3] LC_MONETARY=English_United States.utf8\n#> [4] LC_NUMERIC=C                          \n#> [5] LC_TIME=English_United States.utf8    \n#> \n#> time zone: America/Chicago\n#> tzcode source: internal\n#> \n#> attached base packages:\n#> [1] stats     graphics  grDevices utils     datasets  methods   base     \n#> \n#> other attached packages:\n#> [1] DiagrammeR_1.0.11\n#> \n#> loaded via a namespace (and not attached):\n#>  [1] digest_0.6.37      RColorBrewer_1.1-3 R6_2.6.1           fastmap_1.2.0     \n#>  [5] xfun_0.52          magrittr_2.0.3     glue_1.8.0         knitr_1.50        \n#>  [9] htmltools_0.5.8.1  rmarkdown_2.29     cli_3.6.5          visNetwork_2.1.2  \n#> [13] compiler_4.5.1     tools_4.5.1        evaluate_1.0.4     yaml_2.3.10       \n#> [17] rlang_1.1.6        jsonlite_2.0.0     htmlwidgets_1.6.4  keyring_1.4.1\n```\n\n\n:::\n:::\n\n\n</details>\n\n\n## Why OOP is hard in R\n\n- Multiple OOP systems exist: S3, R6, S4, and (now/soon) S7.\n- Multiple preferences: some users prefer one system; others, another.\n- R's OOP systems are different enough that prior OOP experience may not transfer well.\n\n[![XKCD 927](images/base_types/standards.png)](https://xkcd.com/927/)\n\n\n## OOP: Big Ideas\n\n1. **Polymorphism.** Function has a single interface (outside), but contains (inside) several class-specific implementations.\n\n::: {.cell}\n\n```{.r .cell-code}\n# imagine a function with object x as an argument\n# from the outside, users interact with the same function\n# but inside the function, there are provisions to deal with objects of different classes\nsome_function <- function(x) {\n  if is.numeric(x) {\n    # implementation for numeric x\n  } else if is.character(x) {\n    # implementation for character x\n  } ...\n}\n```\n:::\n\n\n<details>\n<summary>Example of polymorphism</summary>\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# data frame\nsummary(mtcars[,1:4])\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#>       mpg             cyl             disp             hp       \n#>  Min.   :10.40   Min.   :4.000   Min.   : 71.1   Min.   : 52.0  \n#>  1st Qu.:15.43   1st Qu.:4.000   1st Qu.:120.8   1st Qu.: 96.5  \n#>  Median :19.20   Median :6.000   Median :196.3   Median :123.0  \n#>  Mean   :20.09   Mean   :6.188   Mean   :230.7   Mean   :146.7  \n#>  3rd Qu.:22.80   3rd Qu.:8.000   3rd Qu.:326.0   3rd Qu.:180.0  \n#>  Max.   :33.90   Max.   :8.000   Max.   :472.0   Max.   :335.0\n```\n\n\n:::\n\n```{.r .cell-code}\n# statistical model\nlin_fit <- lm(mpg ~ hp, data = mtcars)\nsummary(lin_fit)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> \n#> Call:\n#> lm(formula = mpg ~ hp, data = mtcars)\n#> \n#> Residuals:\n#>     Min      1Q  Median      3Q     Max \n#> -5.7121 -2.1122 -0.8854  1.5819  8.2360 \n#> \n#> Coefficients:\n#>             Estimate Std. Error t value Pr(>|t|)    \n#> (Intercept) 30.09886    1.63392  18.421  < 2e-16 ***\n#> hp          -0.06823    0.01012  -6.742 1.79e-07 ***\n#> ---\n#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n#> \n#> Residual standard error: 3.863 on 30 degrees of freedom\n#> Multiple R-squared:  0.6024,\tAdjusted R-squared:  0.5892 \n#> F-statistic: 45.46 on 1 and 30 DF,  p-value: 1.788e-07\n```\n\n\n:::\n:::\n\n\n</details>\n\n2. **Encapsulation.** Function \"encapsulates\"--that is, encloses in an inviolate capsule--both data and how it acts on data. Think of a REST API: a client interacts with with an API only through a set of discrete endpoints (i.e., things to get or set), but the server does not otherwise give access to its internal workings or state. Like with an API, this creates a separation of concerns: OOP functions take inputs and yield results; users only consume those results.\n\n## OOP: Properties\n\n### Objects have class\n\n- Class defines:\n  - Method (i.e., what can be done with object)\n  - Fields (i.e., data that defines an instance of the class)\n- Objects are an instance of a class\n\n### Class is inherited\n\n- Class is defined:\n  - By an object's class (e.g., ordered factor)\n  - By the parent of the object's class (e.g., factor)\n- Inheritance matters for method dispatch\n  - If a method is defined for an object's class, use that method\n  - If an object doesn't have a method, use the method of the parent class\n  - The process of finding a method, is called dispatch\n\n## OOP in R: Two Paradigms\n\n**1. Encapsulated OOP**\n\n- Objects \"encapsulate\"\n  - Methods (i.e., what can be done)\n  - Fields (i.e., data on which things are done)\n- Calls communicate this encapsulation, since form follows function\n  - Form: `object.method(arg1, arg2)`\n  - Function: for `object`, apply `method` for `object`'s class with arguments `arg1` and `arg2`\n\n**2. Functional OOP**\n\n- Methods belong to \"generic\" functions\n- From the outside, look like regular functions: `generic(object, arg2, arg3)`\n- From the inside, components are also functions\n\n### Concept Map\n\n\n::: {.cell}\n::: {.cell-output-display}\n\n```{=html}\n<div class=\"DiagrammeR html-widget html-fill-item\" id=\"htmlwidget-f19ffbf1ff0c903a1236\" style=\"width:100%;height:464px;\"></div>\n<script type=\"application/json\" data-for=\"htmlwidget-f19ffbf1ff0c903a1236\">{\"x\":{\"diagram\":\"\\ngraph LR\\n\\nOOP --> encapsulated_OOP\\nOOP --> functional_OOP\\n\\nfunctional_OOP --> S3\\nfunctional_OOP --> S4\\n\\nencapsulated_OOP --> R6\\nencapsulated_OOP --> RC\\n\"},\"evals\":[],\"jsHooks\":[]}</script>\n```\n\n:::\n:::\n\n\n<details>\n<summary>Mermaid code</summary>\n\n::: {.cell}\n\n```{.r .cell-code}\nDiagrammeR::mermaid(\"\ngraph LR\n\nOOP --> encapsulated_OOP\nOOP --> functional_OOP\n\nfunctional_OOP --> S3\nfunctional_OOP --> S4\n\nencapsulated_OOP --> R6\nencapsulated_OOP --> RC\n\")\n```\n:::\n\n</details>\n\n## OOP in base R\n\n- **S3**\n  - Paradigm: functional OOP\n  - Noteworthy: R's first OOP system\n  - Use case: low-cost solution for common problems\n  - Downsides: no guarantees\n- **S4**\n  - Paradigm: functional OOP\n  - Noteworthy: rewrite of S3, used by `Bioconductor`\n  - Use case: \"more guarantees and greater encapsulation\" than S3\n  - Downsides: higher setup cost than S3\n- **RC**\n  - Paradigm: encapsulated OOP\n  - Noteworthy: special type of S4 object is mutable--in other words, that can be modified in place (instead of R's usual copy-on-modify behavior)\n  - Use cases: problems that are hard to tackle with functional OOP (in S3 and S4)\n  - Downsides: harder to reason about (because of modify-in-place logic)\n\n## OOP in packages\n\n- **R6**\n  - Paradigm: encapsulated OOP\n  - Noteworthy: resolves issues with RC\n- **R7**\n  - Paradigm: functional OOP\n  - Noteworthy: \n    - best parts of S3 and S4\n    - ease of S3\n    - power of S4\n    - See more in [rstudio::conf(2022) talk](https://www.rstudio.com/conference/2022/talks/introduction-to-r7/)\n- **R.oo**\n  - Paradigm: hybrid functional and encapsulated (?)\n- **proto**\n  - Paradigm: prototype OOP\n  - Noteworthy: OOP style used in `ggplot2`\n\n## How can you tell if an object is base or OOP?\n\n### Functions\n\nTwo functions:\n\n- `base::is.object()`, which yields TRUE/FALSE about whether is OOP object\n- `sloop::otype()`, which says what type of object type: `\"base\"`, `\"S3\"`, etc.\n\nAn few examples:\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Example 1: a base object\nis.object(1:10)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] FALSE\n```\n\n\n:::\n\n```{.r .cell-code}\nsloop::otype(1:10)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"base\"\n```\n\n\n:::\n\n```{.r .cell-code}\n# Example 2: an OO object\nis.object(mtcars)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] TRUE\n```\n\n\n:::\n\n```{.r .cell-code}\nsloop::otype(mtcars)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"S3\"\n```\n\n\n:::\n:::\n\n\n### sloop\n\n* **S** **L**anguage **O**bject-**O**riented **P**rogramming\n\n[![XKCD 927](images/base_types/sloop_john_b.png)](https://en.wikipedia.org/wiki/Sloop_John_B)\n\n### Class\n\nOO objects have a \"class\" attribute:\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# base object has no class\nattr(1:10, \"class\")\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> NULL\n```\n\n\n:::\n\n```{.r .cell-code}\n# OO object has one or more classes\nattr(mtcars, \"class\")\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"data.frame\"\n```\n\n\n:::\n:::\n\n\n## What about types?\n\nOnly OO objects have a \"class\" attribute, but every object--whether base or OO--has class\n\n### Vectors\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntypeof(NULL)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"NULL\"\n```\n\n\n:::\n\n```{.r .cell-code}\ntypeof(c(\"a\", \"b\", \"c\"))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"character\"\n```\n\n\n:::\n\n```{.r .cell-code}\ntypeof(1L)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"integer\"\n```\n\n\n:::\n\n```{.r .cell-code}\ntypeof(1i)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"complex\"\n```\n\n\n:::\n:::\n\n\n\n### Functions\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# \"normal\" function\nmy_fun <- function(x) { x + 1 }\ntypeof(my_fun)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"closure\"\n```\n\n\n:::\n\n```{.r .cell-code}\n# internal function\ntypeof(`[`)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"special\"\n```\n\n\n:::\n\n```{.r .cell-code}\n# primitive function\ntypeof(sum)    \n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"builtin\"\n```\n\n\n:::\n:::\n\n\n### Environments\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntypeof(globalenv())\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"environment\"\n```\n\n\n:::\n:::\n\n\n\n### S4\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmle_obj <- stats4::mle(function(x = 1) (x - 2) ^ 2)\ntypeof(mle_obj)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"S4\"\n```\n\n\n:::\n:::\n\n\n\n### Language components\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntypeof(quote(a))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"symbol\"\n```\n\n\n:::\n\n```{.r .cell-code}\ntypeof(quote(a + 1))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"language\"\n```\n\n\n:::\n\n```{.r .cell-code}\ntypeof(formals(my_fun))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"pairlist\"\n```\n\n\n:::\n:::\n\n\n### Concept Map\n\n![Base types in R](images/base_types/base_types_Sankey_graph.png)\n\n<details>\n<summary>Sankey graph code</summary>\n\nThe graph above was made with [SankeyMATIC](https://sankeymatic.com/)\n\n```\n// toggle \"Show Values\"\n// set Default Flow Colors from \"each flow's Source\"\n\nbase\\ntypes [8] vectors\nbase\\ntypes [3] functions\nbase\\ntypes [1] environments\nbase\\ntypes [1] S4 OOP\nbase\\ntypes [3] language\\ncomponents\nbase\\ntypes [6] C components\n\nvectors [1] NULL\nvectors [1] logical\nvectors [1] integer\nvectors [1] double\nvectors [1] complex\nvectors [1] character\nvectors [1] list\nvectors [1] raw\n\nfunctions [1] closure\nfunctions [1] special\nfunctions [1] builtin\n\nenvironments [1] environment\n\nS4 OOP [1] S4\n\nlanguage\\ncomponents [1] symbol\nlanguage\\ncomponents [1] language\nlanguage\\ncomponents [1] pairlist\n\nC components [1] externalptr\nC components [1] weakref\nC components [1] bytecode\nC components [1] promise\nC components [1] ...\nC components [1] any\n```\n\n</details>\n\n## Be careful about the numeric type\n\n1. Often \"numeric\" is treated as synonymous for double:\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# create a double and integeger objects\none <- 1\noneL <- 1L\ntypeof(one)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"double\"\n```\n\n\n:::\n\n```{.r .cell-code}\ntypeof(oneL)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"integer\"\n```\n\n\n:::\n\n```{.r .cell-code}\n# check their type after as.numeric()\none |> as.numeric() |> typeof()\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"double\"\n```\n\n\n:::\n\n```{.r .cell-code}\noneL |> as.numeric() |> typeof()\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"double\"\n```\n\n\n:::\n:::\n\n\n2. In S3 and S4, \"numeric\" is taken as either integer or double, when choosing methods:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsloop::s3_class(1)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"double\"  \"numeric\"\n```\n\n\n:::\n\n```{.r .cell-code}\nsloop::s3_class(1L)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"integer\" \"numeric\"\n```\n\n\n:::\n:::\n\n\n3. `is.numeric()` tests whether an object behaves like a number\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntypeof(factor(\"x\"))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"integer\"\n```\n\n\n:::\n\n```{.r .cell-code}\nis.numeric(factor(\"x\"))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] FALSE\n```\n\n\n:::\n:::\n\n\nBut Advanced R consistently uses numeric to mean integer or double type.\n",
      6     "supporting": [
      7       "12_files"
      8     ],
      9     "filters": [
     10       "rmarkdown/pagebreak.lua"
     11     ],
     12     "includes": {
     13       "include-in-header": [
     14         "<link href=\"../site_libs/htmltools-fill-0.5.8.1/fill.css\" rel=\"stylesheet\" />\n<script src=\"../site_libs/htmlwidgets-1.6.4/htmlwidgets.js\"></script>\n<script src=\"../site_libs/d3-3.3.8/d3.min.js\"></script>\n<script src=\"../site_libs/dagre-0.4.0/dagre-d3.min.js\"></script>\n<link href=\"../site_libs/mermaid-0.3.0/dist/mermaid.css\" rel=\"stylesheet\" />\n<script src=\"../site_libs/mermaid-0.3.0/dist/mermaid.slim.min.js\"></script>\n<link href=\"../site_libs/DiagrammeR-styles-0.2/styles.css\" rel=\"stylesheet\" />\n<script src=\"../site_libs/chromatography-0.1/chromatography.js\"></script>\n<script src=\"../site_libs/DiagrammeR-binding-1.0.11/DiagrammeR.js\"></script>\n"
     15       ]
     16     },
     17     "engineDependencies": {},
     18     "preserve": {},
     19     "postProcess": true
     20   }
     21 }