bookclub-advr

DSLC Advanced R Book Club
git clone https://git.eamoncaddigan.net/bookclub-advr.git
Log | Files | Refs | README | LICENSE

html.json (100881B)


      1 {
      2   "hash": "f9d9a682b3ccae551ce90320fcaa1d33",
      3   "result": {
      4     "engine": "knitr",
      5     "markdown": "---\nengine: knitr\ntitle: Vectors\n---\n\n## Learning objectives:\n\n-   Learn about different types of vectors and their attributes\n-   Navigate through vector types and their value types\n-   Venture into factors and date-time objects\n-   Discuss the differences between data frames and tibbles\n-   Do not get absorbed by the `NA` and `NULL` black hole\n\n\n## Session Info\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(\"dplyr\")\nlibrary(\"gt\")\nlibrary(\"palmerpenguins\")\n```\n:::\n\n\n\n<details>\n<summary>Session Info</summary>\n\n::: {.cell}\n\n```{.r .cell-code}\nutils::sessionInfo()\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> R version 4.5.1 (2025-06-13 ucrt)\n#> Platform: x86_64-w64-mingw32/x64\n#> Running under: Windows 11 x64 (build 26100)\n#> \n#> Matrix products: default\n#>   LAPACK version 3.12.1\n#> \n#> locale:\n#> [1] LC_COLLATE=English_United States.utf8 \n#> [2] LC_CTYPE=English_United States.utf8   \n#> [3] LC_MONETARY=English_United States.utf8\n#> [4] LC_NUMERIC=C                          \n#> [5] LC_TIME=English_United States.utf8    \n#> \n#> time zone: America/Chicago\n#> tzcode source: internal\n#> \n#> attached base packages:\n#> [1] stats     graphics  grDevices utils     datasets  methods   base     \n#> \n#> other attached packages:\n#> [1] palmerpenguins_0.1.1 gt_1.0.0             dplyr_1.1.4         \n#> \n#> loaded via a namespace (and not attached):\n#>  [1] digest_0.6.37     R6_2.6.1          fastmap_1.2.0     tidyselect_1.2.1 \n#>  [5] xfun_0.53         magrittr_2.0.3    glue_1.8.0        tibble_3.3.0     \n#>  [9] knitr_1.50        pkgconfig_2.0.3   htmltools_0.5.8.1 rmarkdown_2.29   \n#> [13] generics_0.1.4    lifecycle_1.0.4   xml2_1.3.8        cli_3.6.5        \n#> [17] vctrs_0.6.5       compiler_4.5.1    tools_4.5.1       pillar_1.11.0    \n#> [21] evaluate_1.0.4    yaml_2.3.10       rlang_1.1.6       jsonlite_2.0.0   \n#> [25] keyring_1.4.1\n```\n\n\n:::\n:::\n\n</details>\n\n## Aperitif\n\n![Palmer Penguins](images/vectors/lter_penguins.png)\n\n## Counting Penguins\n\nConsider this code to count the number of Gentoo penguins in the `penguins` data set. We see that there are 124 Gentoo penguins.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsum(\"Gentoo\" == penguins$species)\n# output: 124\n```\n:::\n\n\n## In\n\nOne subtle error can arise in trying out `%in%` here instead.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nspecies_vector <- penguins |> select(species)\nprint(\"Gentoo\" %in% species_vector)\n# output: FALSE\n```\n:::\n\n\n![Where did the penguins go?](images/vectors/lter_penguins_no_gentoo.png)\n\n## Fix: base R \n\n\n::: {.cell}\n\n```{.r .cell-code}\nspecies_unlist <- penguins |> select(species) |> unlist()\nprint(\"Gentoo\" %in% species_unlist)\n# output: TRUE\n```\n:::\n\n\n## Fix: dplyr\n\n\n::: {.cell}\n\n```{.r .cell-code}\nspecies_pull <- penguins |> select(species) |> pull()\nprint(\"Gentoo\" %in% species_pull)\n# output: TRUE\n```\n:::\n\n\n## Motivation\n\n* What are the different types of vectors?\n* How does this affect accessing vectors?\n\n<details>\n<summary>Side Quest: Looking up the `%in%` operator</summary>\nIf you want to look up the manual pages for the `%in%` operator with the `?`, use backticks:\n\n\n::: {.cell}\n\n```{.r .cell-code}\n?`%in%`\n```\n:::\n\n\nand we find that `%in%` is a wrapper for the `match()` function.\n\n</details>\n\n\n## Types of Vectors\n\n![Image Credit: Advanced R](images/vectors/summary-tree.png) \n\nTwo main types:\n\n-   **Atomic**: Elements all the same type.\n-   **List**: Elements are different Types.\n\nClosely related but not technically a vector:\n\n-   **NULL**: Null elements. Often length zero.\n\n\n## Types of Atomic Vectors (1/2)\n\n![Image Credit: Advanced R](images/vectors/summary-tree-atomic.png){width=50%} \n\n## Types of Atomic Vectors (2/2)\n\n-   **Logical**: True/False\n-   **Integer**: Numeric (discrete, no decimals)\n-   **Double**: Numeric (continuous, decimals)\n-   **Character**: String\n\n## Vectors of Length One\n\n**Scalars** are vectors that consist of a single value.\n\n## Logicals\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlgl1 <- TRUE\nlgl2 <- T #abbreviation for TRUE\nlgl3 <- FALSE\nlgl4 <- F #abbreviation for FALSE\n```\n:::\n\n\n## Doubles\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# integer, decimal, scientific, or hexidecimal format\ndbl1 <- 1\ndbl2 <- 1.234 # decimal\ndbl3 <- 1.234e0 # scientific format\ndbl4 <- 0xcafe # hexidecimal format\n```\n:::\n\n\n## Integers\n\nIntegers must be followed by L and cannot have fractional values\n\n\n::: {.cell}\n\n```{.r .cell-code}\nint1 <- 1L\nint2 <- 1234L\nint3 <- 1234e0L\nint4 <- 0xcafeL\n```\n:::\n\n\n<details>\n<summary>Pop Quiz: Why \"L\" for integers?</summary>\nWickham notes that the use of `L` dates back to the **C** programming language and its \"long int\" type for memory allocation.\n</details>\n\n## Strings\n\nStrings can use single or double quotes and special characters are escaped with \\\n\n\n::: {.cell}\n\n```{.r .cell-code}\nstr1 <- \"hello\" # double quotes\nstr2 <- 'hello' # single quotes\nstr3 <- \"مرحبًا\" # Unicode\nstr4 <- \"\\U0001f605\" # sweaty_smile 😅\n```\n:::\n\n\n## Longer 1/2\n\nThere are several ways to make longer vectors:\n\n**1. With single values** inside c() for combine.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlgl_var <- c(TRUE, FALSE)\nint_var <- c(1L, 6L, 10L)\ndbl_var <- c(1, 2.5, 4.5)\nchr_var <- c(\"these are\", \"some strings\")\n```\n:::\n\n\n![Image Credit: Advanced R](images/vectors/atomic.png) \n\n## Longer 2/2\n\n**2. With other vectors**\n\n\n::: {.cell}\n\n```{.r .cell-code}\nc(c(1, 2), c(3, 4)) # output is not nested\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1 2 3 4\n```\n\n\n:::\n:::\n\n\n## Type and Length\n\nWe can determine the type of a vector with `typeof()` and its length with `length()`\n\n\n::: {.cell}\n::: {.cell-output-display}\n\n```{=html}\n<div id=\"scliwijqex\" style=\"padding-left:0px;padding-right:0px;padding-top:10px;padding-bottom:10px;overflow-x:auto;overflow-y:auto;width:auto;height:auto;\">\n<style>#scliwijqex table {\n  font-family: system-ui, 'Segoe UI', Roboto, Helvetica, Arial, sans-serif, 'Apple Color Emoji', 'Segoe UI Emoji', 'Segoe UI Symbol', 'Noto Color Emoji';\n  -webkit-font-smoothing: antialiased;\n  -moz-osx-font-smoothing: grayscale;\n}\n\n#scliwijqex thead, #scliwijqex tbody, #scliwijqex tfoot, #scliwijqex tr, #scliwijqex td, #scliwijqex th {\n  border-style: none;\n}\n\n#scliwijqex p {\n  margin: 0;\n  padding: 0;\n}\n\n#scliwijqex .gt_table {\n  display: table;\n  border-collapse: collapse;\n  line-height: normal;\n  margin-left: auto;\n  margin-right: auto;\n  color: #333333;\n  font-size: 16px;\n  font-weight: normal;\n  font-style: normal;\n  background-color: #FFFFFF;\n  width: auto;\n  border-top-style: solid;\n  border-top-width: 2px;\n  border-top-color: #A8A8A8;\n  border-right-style: none;\n  border-right-width: 2px;\n  border-right-color: #D3D3D3;\n  border-bottom-style: solid;\n  border-bottom-width: 2px;\n  border-bottom-color: #A8A8A8;\n  border-left-style: none;\n  border-left-width: 2px;\n  border-left-color: #D3D3D3;\n}\n\n#scliwijqex .gt_caption {\n  padding-top: 4px;\n  padding-bottom: 4px;\n}\n\n#scliwijqex .gt_title {\n  color: #333333;\n  font-size: 125%;\n  font-weight: initial;\n  padding-top: 4px;\n  padding-bottom: 4px;\n  padding-left: 5px;\n  padding-right: 5px;\n  border-bottom-color: #FFFFFF;\n  border-bottom-width: 0;\n}\n\n#scliwijqex .gt_subtitle {\n  color: #333333;\n  font-size: 85%;\n  font-weight: initial;\n  padding-top: 3px;\n  padding-bottom: 5px;\n  padding-left: 5px;\n  padding-right: 5px;\n  border-top-color: #FFFFFF;\n  border-top-width: 0;\n}\n\n#scliwijqex .gt_heading {\n  background-color: #FFFFFF;\n  text-align: center;\n  border-bottom-color: #FFFFFF;\n  border-left-style: none;\n  border-left-width: 1px;\n  border-left-color: #D3D3D3;\n  border-right-style: none;\n  border-right-width: 1px;\n  border-right-color: #D3D3D3;\n}\n\n#scliwijqex .gt_bottom_border {\n  border-bottom-style: solid;\n  border-bottom-width: 2px;\n  border-bottom-color: #D3D3D3;\n}\n\n#scliwijqex .gt_col_headings {\n  border-top-style: solid;\n  border-top-width: 2px;\n  border-top-color: #D3D3D3;\n  border-bottom-style: solid;\n  border-bottom-width: 2px;\n  border-bottom-color: #D3D3D3;\n  border-left-style: none;\n  border-left-width: 1px;\n  border-left-color: #D3D3D3;\n  border-right-style: none;\n  border-right-width: 1px;\n  border-right-color: #D3D3D3;\n}\n\n#scliwijqex .gt_col_heading {\n  color: #333333;\n  background-color: #FFFFFF;\n  font-size: 100%;\n  font-weight: normal;\n  text-transform: inherit;\n  border-left-style: none;\n  border-left-width: 1px;\n  border-left-color: #D3D3D3;\n  border-right-style: none;\n  border-right-width: 1px;\n  border-right-color: #D3D3D3;\n  vertical-align: bottom;\n  padding-top: 5px;\n  padding-bottom: 6px;\n  padding-left: 5px;\n  padding-right: 5px;\n  overflow-x: hidden;\n}\n\n#scliwijqex .gt_column_spanner_outer {\n  color: #333333;\n  background-color: #FFFFFF;\n  font-size: 100%;\n  font-weight: normal;\n  text-transform: inherit;\n  padding-top: 0;\n  padding-bottom: 0;\n  padding-left: 4px;\n  padding-right: 4px;\n}\n\n#scliwijqex .gt_column_spanner_outer:first-child {\n  padding-left: 0;\n}\n\n#scliwijqex .gt_column_spanner_outer:last-child {\n  padding-right: 0;\n}\n\n#scliwijqex .gt_column_spanner {\n  border-bottom-style: solid;\n  border-bottom-width: 2px;\n  border-bottom-color: #D3D3D3;\n  vertical-align: bottom;\n  padding-top: 5px;\n  padding-bottom: 5px;\n  overflow-x: hidden;\n  display: inline-block;\n  width: 100%;\n}\n\n#scliwijqex .gt_spanner_row {\n  border-bottom-style: hidden;\n}\n\n#scliwijqex .gt_group_heading {\n  padding-top: 8px;\n  padding-bottom: 8px;\n  padding-left: 5px;\n  padding-right: 5px;\n  color: #333333;\n  background-color: #FFFFFF;\n  font-size: 100%;\n  font-weight: initial;\n  text-transform: inherit;\n  border-top-style: solid;\n  border-top-width: 2px;\n  border-top-color: #D3D3D3;\n  border-bottom-style: solid;\n  border-bottom-width: 2px;\n  border-bottom-color: #D3D3D3;\n  border-left-style: none;\n  border-left-width: 1px;\n  border-left-color: #D3D3D3;\n  border-right-style: none;\n  border-right-width: 1px;\n  border-right-color: #D3D3D3;\n  vertical-align: middle;\n  text-align: left;\n}\n\n#scliwijqex .gt_empty_group_heading {\n  padding: 0.5px;\n  color: #333333;\n  background-color: #FFFFFF;\n  font-size: 100%;\n  font-weight: initial;\n  border-top-style: solid;\n  border-top-width: 2px;\n  border-top-color: #D3D3D3;\n  border-bottom-style: solid;\n  border-bottom-width: 2px;\n  border-bottom-color: #D3D3D3;\n  vertical-align: middle;\n}\n\n#scliwijqex .gt_from_md > :first-child {\n  margin-top: 0;\n}\n\n#scliwijqex .gt_from_md > :last-child {\n  margin-bottom: 0;\n}\n\n#scliwijqex .gt_row {\n  padding-top: 8px;\n  padding-bottom: 8px;\n  padding-left: 5px;\n  padding-right: 5px;\n  margin: 10px;\n  border-top-style: solid;\n  border-top-width: 1px;\n  border-top-color: #D3D3D3;\n  border-left-style: none;\n  border-left-width: 1px;\n  border-left-color: #D3D3D3;\n  border-right-style: none;\n  border-right-width: 1px;\n  border-right-color: #D3D3D3;\n  vertical-align: middle;\n  overflow-x: hidden;\n}\n\n#scliwijqex .gt_stub {\n  color: #333333;\n  background-color: #FFFFFF;\n  font-size: 100%;\n  font-weight: initial;\n  text-transform: inherit;\n  border-right-style: solid;\n  border-right-width: 2px;\n  border-right-color: #D3D3D3;\n  padding-left: 5px;\n  padding-right: 5px;\n}\n\n#scliwijqex .gt_stub_row_group {\n  color: #333333;\n  background-color: #FFFFFF;\n  font-size: 100%;\n  font-weight: initial;\n  text-transform: inherit;\n  border-right-style: solid;\n  border-right-width: 2px;\n  border-right-color: #D3D3D3;\n  padding-left: 5px;\n  padding-right: 5px;\n  vertical-align: top;\n}\n\n#scliwijqex .gt_row_group_first td {\n  border-top-width: 2px;\n}\n\n#scliwijqex .gt_row_group_first th {\n  border-top-width: 2px;\n}\n\n#scliwijqex .gt_summary_row {\n  color: #333333;\n  background-color: #FFFFFF;\n  text-transform: inherit;\n  padding-top: 8px;\n  padding-bottom: 8px;\n  padding-left: 5px;\n  padding-right: 5px;\n}\n\n#scliwijqex .gt_first_summary_row {\n  border-top-style: solid;\n  border-top-color: #D3D3D3;\n}\n\n#scliwijqex .gt_first_summary_row.thick {\n  border-top-width: 2px;\n}\n\n#scliwijqex .gt_last_summary_row {\n  padding-top: 8px;\n  padding-bottom: 8px;\n  padding-left: 5px;\n  padding-right: 5px;\n  border-bottom-style: solid;\n  border-bottom-width: 2px;\n  border-bottom-color: #D3D3D3;\n}\n\n#scliwijqex .gt_grand_summary_row {\n  color: #333333;\n  background-color: #FFFFFF;\n  text-transform: inherit;\n  padding-top: 8px;\n  padding-bottom: 8px;\n  padding-left: 5px;\n  padding-right: 5px;\n}\n\n#scliwijqex .gt_first_grand_summary_row {\n  padding-top: 8px;\n  padding-bottom: 8px;\n  padding-left: 5px;\n  padding-right: 5px;\n  border-top-style: double;\n  border-top-width: 6px;\n  border-top-color: #D3D3D3;\n}\n\n#scliwijqex .gt_last_grand_summary_row_top {\n  padding-top: 8px;\n  padding-bottom: 8px;\n  padding-left: 5px;\n  padding-right: 5px;\n  border-bottom-style: double;\n  border-bottom-width: 6px;\n  border-bottom-color: #D3D3D3;\n}\n\n#scliwijqex .gt_striped {\n  background-color: rgba(128, 128, 128, 0.05);\n}\n\n#scliwijqex .gt_table_body {\n  border-top-style: solid;\n  border-top-width: 2px;\n  border-top-color: #D3D3D3;\n  border-bottom-style: solid;\n  border-bottom-width: 2px;\n  border-bottom-color: #D3D3D3;\n}\n\n#scliwijqex .gt_footnotes {\n  color: #333333;\n  background-color: #FFFFFF;\n  border-bottom-style: none;\n  border-bottom-width: 2px;\n  border-bottom-color: #D3D3D3;\n  border-left-style: none;\n  border-left-width: 2px;\n  border-left-color: #D3D3D3;\n  border-right-style: none;\n  border-right-width: 2px;\n  border-right-color: #D3D3D3;\n}\n\n#scliwijqex .gt_footnote {\n  margin: 0px;\n  font-size: 90%;\n  padding-top: 4px;\n  padding-bottom: 4px;\n  padding-left: 5px;\n  padding-right: 5px;\n}\n\n#scliwijqex .gt_sourcenotes {\n  color: #333333;\n  background-color: #FFFFFF;\n  border-bottom-style: none;\n  border-bottom-width: 2px;\n  border-bottom-color: #D3D3D3;\n  border-left-style: none;\n  border-left-width: 2px;\n  border-left-color: #D3D3D3;\n  border-right-style: none;\n  border-right-width: 2px;\n  border-right-color: #D3D3D3;\n}\n\n#scliwijqex .gt_sourcenote {\n  font-size: 90%;\n  padding-top: 4px;\n  padding-bottom: 4px;\n  padding-left: 5px;\n  padding-right: 5px;\n}\n\n#scliwijqex .gt_left {\n  text-align: left;\n}\n\n#scliwijqex .gt_center {\n  text-align: center;\n}\n\n#scliwijqex .gt_right {\n  text-align: right;\n  font-variant-numeric: tabular-nums;\n}\n\n#scliwijqex .gt_font_normal {\n  font-weight: normal;\n}\n\n#scliwijqex .gt_font_bold {\n  font-weight: bold;\n}\n\n#scliwijqex .gt_font_italic {\n  font-style: italic;\n}\n\n#scliwijqex .gt_super {\n  font-size: 65%;\n}\n\n#scliwijqex .gt_footnote_marks {\n  font-size: 75%;\n  vertical-align: 0.4em;\n  position: initial;\n}\n\n#scliwijqex .gt_asterisk {\n  font-size: 100%;\n  vertical-align: 0;\n}\n\n#scliwijqex .gt_indent_1 {\n  text-indent: 5px;\n}\n\n#scliwijqex .gt_indent_2 {\n  text-indent: 10px;\n}\n\n#scliwijqex .gt_indent_3 {\n  text-indent: 15px;\n}\n\n#scliwijqex .gt_indent_4 {\n  text-indent: 20px;\n}\n\n#scliwijqex .gt_indent_5 {\n  text-indent: 25px;\n}\n\n#scliwijqex .katex-display {\n  display: inline-flex !important;\n  margin-bottom: 0.75em !important;\n}\n\n#scliwijqex div.Reactable > div.rt-table > div.rt-thead > div.rt-tr.rt-tr-group-header > div.rt-th-group:after {\n  height: 0px !important;\n}\n</style>\n<table class=\"gt_table\" data-quarto-disable-processing=\"false\" data-quarto-bootstrap=\"false\">\n  <thead>\n    <tr class=\"gt_heading\">\n      <td colspan=\"4\" class=\"gt_heading gt_title gt_font_normal gt_bottom_border\" style>Types of Atomic Vectors<span class=\"gt_footnote_marks\" style=\"white-space:nowrap;font-style:italic;font-weight:normal;line-height:0;\"><sup>1</sup></span></td>\n    </tr>\n    \n    <tr class=\"gt_col_headings\">\n      <th class=\"gt_col_heading gt_columns_bottom_border gt_center\" rowspan=\"1\" colspan=\"1\" scope=\"col\" id=\"var_names\">name</th>\n      <th class=\"gt_col_heading gt_columns_bottom_border gt_center\" rowspan=\"1\" colspan=\"1\" scope=\"col\" id=\"var_values\">value</th>\n      <th class=\"gt_col_heading gt_columns_bottom_border gt_center\" rowspan=\"1\" colspan=\"1\" scope=\"col\" id=\"var_type\">typeof()</th>\n      <th class=\"gt_col_heading gt_columns_bottom_border gt_center\" rowspan=\"1\" colspan=\"1\" scope=\"col\" id=\"var_length\">length()</th>\n    </tr>\n  </thead>\n  <tbody class=\"gt_table_body\">\n    <tr><td headers=\"var_names\" class=\"gt_row gt_center\">lgl_var</td>\n<td headers=\"var_values\" class=\"gt_row gt_center\">TRUE, FALSE</td>\n<td headers=\"var_type\" class=\"gt_row gt_center\" style=\"background-color: #F9E3D6;\">logical</td>\n<td headers=\"var_length\" class=\"gt_row gt_center\" style=\"background-color: #E0FFFF;\">2</td></tr>\n    <tr><td headers=\"var_names\" class=\"gt_row gt_center\">int_var</td>\n<td headers=\"var_values\" class=\"gt_row gt_center\">1L, 6L, 10L</td>\n<td headers=\"var_type\" class=\"gt_row gt_center\" style=\"background-color: #F9E3D6;\">integer</td>\n<td headers=\"var_length\" class=\"gt_row gt_center\" style=\"background-color: #E0FFFF;\">3</td></tr>\n    <tr><td headers=\"var_names\" class=\"gt_row gt_center\">dbl_var</td>\n<td headers=\"var_values\" class=\"gt_row gt_center\">1, 2.5, 4.5</td>\n<td headers=\"var_type\" class=\"gt_row gt_center\" style=\"background-color: #F9E3D6;\">double</td>\n<td headers=\"var_length\" class=\"gt_row gt_center\" style=\"background-color: #E0FFFF;\">3</td></tr>\n    <tr><td headers=\"var_names\" class=\"gt_row gt_center\">chr_var</td>\n<td headers=\"var_values\" class=\"gt_row gt_center\">'these are', 'some strings'</td>\n<td headers=\"var_type\" class=\"gt_row gt_center\" style=\"background-color: #F9E3D6;\">character</td>\n<td headers=\"var_length\" class=\"gt_row gt_center\" style=\"background-color: #E0FFFF;\">2</td></tr>\n  </tbody>\n  \n  <tfoot class=\"gt_footnotes\">\n    <tr>\n      <td class=\"gt_footnote\" colspan=\"4\"><span class=\"gt_footnote_marks\" style=\"white-space:nowrap;font-style:italic;font-weight:normal;line-height:0;\"><sup>1</sup></span> Source: https://adv-r.hadley.nz/index.html</td>\n    </tr>\n  </tfoot>\n</table>\n</div>\n```\n\n:::\n:::\n\n\n## Side Quest: Penguins\n\n<details>\n\n::: {.cell}\n\n```{.r .cell-code}\ntypeof(penguins$species)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"integer\"\n```\n\n\n:::\n\n```{.r .cell-code}\nclass(penguins$species)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"factor\"\n```\n\n\n:::\n\n```{.r .cell-code}\ntypeof(species_unlist)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"integer\"\n```\n\n\n:::\n\n```{.r .cell-code}\nclass(species_unlist)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"factor\"\n```\n\n\n:::\n\n```{.r .cell-code}\ntypeof(species_pull)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"integer\"\n```\n\n\n:::\n\n```{.r .cell-code}\nclass(species_pull)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"factor\"\n```\n\n\n:::\n:::\n\n\n</details>\n\n## Missing values: Contagion\n\nFor most computations, an operation over values that includes a missing value yields a missing value (unless you're careful)\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# contagion\n5*NA\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] NA\n```\n\n\n:::\n\n```{.r .cell-code}\nsum(c(1, 2, NA, 3))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] NA\n```\n\n\n:::\n:::\n\n\n## Missing values: Contagion Exceptions\n\n\n::: {.cell}\n\n```{.r .cell-code}\nNA ^ 0\n#> [1] 1\nNA | TRUE\n#> [1] TRUE\nNA & FALSE\n#> [1] FALSE\n```\n:::\n\n\n\n#### Innoculation\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsum(c(1, 2, NA, 3), na.rm = TRUE)\n# output: 6\n```\n:::\n\n\nTo search for missing values use `is.na()`\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- c(NA, 5, NA, 10)\nx == NA\n# output: NA NA NA NA [BATMAN!]\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nis.na(x)\n# output: TRUE FALSE TRUE FALSE\n```\n:::\n\n\n## Missing Values: NA Types \n\n<details>\nEach type has its own NA type\n\n-   Logical: `NA`\n-   Integer: `NA_integer`\n-   Double: `NA_double`\n-   Character: `NA_character`\n\nThis may not matter in many contexts.\n\nCan matter for operations where types matter like `dplyr::if_else()`.\n</details>\n\n\n## Testing (1/2)\n\n**What type of vector `is.*`() it?**\n\nTest data type:\n\n-   Logical: `is.logical()`\n-   Integer: `is.integer()`\n-   Double: `is.double()`\n-   Character: `is.character()`\n\n\n## Testing (2/2)\n\n**What type of object is it?**\n\nDon't test objects with these tools:\n\n-   `is.vector()`\n-   `is.atomic()`\n-   `is.numeric()` \n\nThey don’t test if you have a vector, atomic vector, or numeric vector; you’ll need to carefully read the documentation to figure out what they actually do (preview: *attributes*)\n\n## Side Quest: rlang `is_*()`\n\n<details>\n<summary>Maybe use `{rlang}`?</summary>\n\n-   `rlang::is_vector`\n-   `rlang::is_atomic`\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# vector\nrlang::is_vector(c(1, 2))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] TRUE\n```\n\n\n:::\n\n```{.r .cell-code}\nrlang::is_vector(list(1, 2))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] TRUE\n```\n\n\n:::\n\n```{.r .cell-code}\n# atomic\nrlang::is_atomic(c(1, 2))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] TRUE\n```\n\n\n:::\n\n```{.r .cell-code}\nrlang::is_atomic(list(1, \"a\"))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] FALSE\n```\n\n\n:::\n:::\n\n\nSee more [here](https://rlang.r-lib.org/reference/type-predicates.html)\n</details>\n\n\n## Coercion\n\n* R follows rules for coercion: character → double → integer → logical\n\n* R can coerce either automatically or explicitly\n\n#### **Automatic**\n\nTwo contexts for automatic coercion:\n\n1.  Combination\n2.  Mathematical\n\n\n\n## Coercion by Combination:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nstr(c(TRUE, \"TRUE\"))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#>  chr [1:2] \"TRUE\" \"TRUE\"\n```\n\n\n:::\n:::\n\n\n## Coercion by Mathematical operations:\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# imagine a logical vector about whether an attribute is present\nhas_attribute <- c(TRUE, FALSE, TRUE, TRUE)\n\n# number with attribute\nsum(has_attribute)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 3\n```\n\n\n:::\n:::\n\n\n## **Explicit**\n\n<!--\n\nUse `as.*()`\n\n-   Logical: `as.logical()`\n-   Integer: `as.integer()`\n-   Double: `as.double()`\n-   Character: `as.character()`\n\n-->\n\n\n::: {.cell}\n::: {.cell-output-display}\n\n```{=html}\n<div id=\"prgzooqwyi\" style=\"padding-left:0px;padding-right:0px;padding-top:10px;padding-bottom:10px;overflow-x:auto;overflow-y:auto;width:auto;height:auto;\">\n<style>#prgzooqwyi table {\n  font-family: system-ui, 'Segoe UI', Roboto, Helvetica, Arial, sans-serif, 'Apple Color Emoji', 'Segoe UI Emoji', 'Segoe UI Symbol', 'Noto Color Emoji';\n  -webkit-font-smoothing: antialiased;\n  -moz-osx-font-smoothing: grayscale;\n}\n\n#prgzooqwyi thead, #prgzooqwyi tbody, #prgzooqwyi tfoot, #prgzooqwyi tr, #prgzooqwyi td, #prgzooqwyi th {\n  border-style: none;\n}\n\n#prgzooqwyi p {\n  margin: 0;\n  padding: 0;\n}\n\n#prgzooqwyi .gt_table {\n  display: table;\n  border-collapse: collapse;\n  line-height: normal;\n  margin-left: auto;\n  margin-right: auto;\n  color: #333333;\n  font-size: 16px;\n  font-weight: normal;\n  font-style: normal;\n  background-color: #FFFFFF;\n  width: auto;\n  border-top-style: solid;\n  border-top-width: 2px;\n  border-top-color: #A8A8A8;\n  border-right-style: none;\n  border-right-width: 2px;\n  border-right-color: #D3D3D3;\n  border-bottom-style: solid;\n  border-bottom-width: 2px;\n  border-bottom-color: #A8A8A8;\n  border-left-style: none;\n  border-left-width: 2px;\n  border-left-color: #D3D3D3;\n}\n\n#prgzooqwyi .gt_caption {\n  padding-top: 4px;\n  padding-bottom: 4px;\n}\n\n#prgzooqwyi .gt_title {\n  color: #333333;\n  font-size: 125%;\n  font-weight: initial;\n  padding-top: 4px;\n  padding-bottom: 4px;\n  padding-left: 5px;\n  padding-right: 5px;\n  border-bottom-color: #FFFFFF;\n  border-bottom-width: 0;\n}\n\n#prgzooqwyi .gt_subtitle {\n  color: #333333;\n  font-size: 85%;\n  font-weight: initial;\n  padding-top: 3px;\n  padding-bottom: 5px;\n  padding-left: 5px;\n  padding-right: 5px;\n  border-top-color: #FFFFFF;\n  border-top-width: 0;\n}\n\n#prgzooqwyi .gt_heading {\n  background-color: #FFFFFF;\n  text-align: center;\n  border-bottom-color: #FFFFFF;\n  border-left-style: none;\n  border-left-width: 1px;\n  border-left-color: #D3D3D3;\n  border-right-style: none;\n  border-right-width: 1px;\n  border-right-color: #D3D3D3;\n}\n\n#prgzooqwyi .gt_bottom_border {\n  border-bottom-style: solid;\n  border-bottom-width: 2px;\n  border-bottom-color: #D3D3D3;\n}\n\n#prgzooqwyi .gt_col_headings {\n  border-top-style: solid;\n  border-top-width: 2px;\n  border-top-color: #D3D3D3;\n  border-bottom-style: solid;\n  border-bottom-width: 2px;\n  border-bottom-color: #D3D3D3;\n  border-left-style: none;\n  border-left-width: 1px;\n  border-left-color: #D3D3D3;\n  border-right-style: none;\n  border-right-width: 1px;\n  border-right-color: #D3D3D3;\n}\n\n#prgzooqwyi .gt_col_heading {\n  color: #333333;\n  background-color: #FFFFFF;\n  font-size: 100%;\n  font-weight: normal;\n  text-transform: inherit;\n  border-left-style: none;\n  border-left-width: 1px;\n  border-left-color: #D3D3D3;\n  border-right-style: none;\n  border-right-width: 1px;\n  border-right-color: #D3D3D3;\n  vertical-align: bottom;\n  padding-top: 5px;\n  padding-bottom: 6px;\n  padding-left: 5px;\n  padding-right: 5px;\n  overflow-x: hidden;\n}\n\n#prgzooqwyi .gt_column_spanner_outer {\n  color: #333333;\n  background-color: #FFFFFF;\n  font-size: 100%;\n  font-weight: normal;\n  text-transform: inherit;\n  padding-top: 0;\n  padding-bottom: 0;\n  padding-left: 4px;\n  padding-right: 4px;\n}\n\n#prgzooqwyi .gt_column_spanner_outer:first-child {\n  padding-left: 0;\n}\n\n#prgzooqwyi .gt_column_spanner_outer:last-child {\n  padding-right: 0;\n}\n\n#prgzooqwyi .gt_column_spanner {\n  border-bottom-style: solid;\n  border-bottom-width: 2px;\n  border-bottom-color: #D3D3D3;\n  vertical-align: bottom;\n  padding-top: 5px;\n  padding-bottom: 5px;\n  overflow-x: hidden;\n  display: inline-block;\n  width: 100%;\n}\n\n#prgzooqwyi .gt_spanner_row {\n  border-bottom-style: hidden;\n}\n\n#prgzooqwyi .gt_group_heading {\n  padding-top: 8px;\n  padding-bottom: 8px;\n  padding-left: 5px;\n  padding-right: 5px;\n  color: #333333;\n  background-color: #FFFFFF;\n  font-size: 100%;\n  font-weight: initial;\n  text-transform: inherit;\n  border-top-style: solid;\n  border-top-width: 2px;\n  border-top-color: #D3D3D3;\n  border-bottom-style: solid;\n  border-bottom-width: 2px;\n  border-bottom-color: #D3D3D3;\n  border-left-style: none;\n  border-left-width: 1px;\n  border-left-color: #D3D3D3;\n  border-right-style: none;\n  border-right-width: 1px;\n  border-right-color: #D3D3D3;\n  vertical-align: middle;\n  text-align: left;\n}\n\n#prgzooqwyi .gt_empty_group_heading {\n  padding: 0.5px;\n  color: #333333;\n  background-color: #FFFFFF;\n  font-size: 100%;\n  font-weight: initial;\n  border-top-style: solid;\n  border-top-width: 2px;\n  border-top-color: #D3D3D3;\n  border-bottom-style: solid;\n  border-bottom-width: 2px;\n  border-bottom-color: #D3D3D3;\n  vertical-align: middle;\n}\n\n#prgzooqwyi .gt_from_md > :first-child {\n  margin-top: 0;\n}\n\n#prgzooqwyi .gt_from_md > :last-child {\n  margin-bottom: 0;\n}\n\n#prgzooqwyi .gt_row {\n  padding-top: 8px;\n  padding-bottom: 8px;\n  padding-left: 5px;\n  padding-right: 5px;\n  margin: 10px;\n  border-top-style: solid;\n  border-top-width: 1px;\n  border-top-color: #D3D3D3;\n  border-left-style: none;\n  border-left-width: 1px;\n  border-left-color: #D3D3D3;\n  border-right-style: none;\n  border-right-width: 1px;\n  border-right-color: #D3D3D3;\n  vertical-align: middle;\n  overflow-x: hidden;\n}\n\n#prgzooqwyi .gt_stub {\n  color: #333333;\n  background-color: #FFFFFF;\n  font-size: 100%;\n  font-weight: initial;\n  text-transform: inherit;\n  border-right-style: solid;\n  border-right-width: 2px;\n  border-right-color: #D3D3D3;\n  padding-left: 5px;\n  padding-right: 5px;\n}\n\n#prgzooqwyi .gt_stub_row_group {\n  color: #333333;\n  background-color: #FFFFFF;\n  font-size: 100%;\n  font-weight: initial;\n  text-transform: inherit;\n  border-right-style: solid;\n  border-right-width: 2px;\n  border-right-color: #D3D3D3;\n  padding-left: 5px;\n  padding-right: 5px;\n  vertical-align: top;\n}\n\n#prgzooqwyi .gt_row_group_first td {\n  border-top-width: 2px;\n}\n\n#prgzooqwyi .gt_row_group_first th {\n  border-top-width: 2px;\n}\n\n#prgzooqwyi .gt_summary_row {\n  color: #333333;\n  background-color: #FFFFFF;\n  text-transform: inherit;\n  padding-top: 8px;\n  padding-bottom: 8px;\n  padding-left: 5px;\n  padding-right: 5px;\n}\n\n#prgzooqwyi .gt_first_summary_row {\n  border-top-style: solid;\n  border-top-color: #D3D3D3;\n}\n\n#prgzooqwyi .gt_first_summary_row.thick {\n  border-top-width: 2px;\n}\n\n#prgzooqwyi .gt_last_summary_row {\n  padding-top: 8px;\n  padding-bottom: 8px;\n  padding-left: 5px;\n  padding-right: 5px;\n  border-bottom-style: solid;\n  border-bottom-width: 2px;\n  border-bottom-color: #D3D3D3;\n}\n\n#prgzooqwyi .gt_grand_summary_row {\n  color: #333333;\n  background-color: #FFFFFF;\n  text-transform: inherit;\n  padding-top: 8px;\n  padding-bottom: 8px;\n  padding-left: 5px;\n  padding-right: 5px;\n}\n\n#prgzooqwyi .gt_first_grand_summary_row {\n  padding-top: 8px;\n  padding-bottom: 8px;\n  padding-left: 5px;\n  padding-right: 5px;\n  border-top-style: double;\n  border-top-width: 6px;\n  border-top-color: #D3D3D3;\n}\n\n#prgzooqwyi .gt_last_grand_summary_row_top {\n  padding-top: 8px;\n  padding-bottom: 8px;\n  padding-left: 5px;\n  padding-right: 5px;\n  border-bottom-style: double;\n  border-bottom-width: 6px;\n  border-bottom-color: #D3D3D3;\n}\n\n#prgzooqwyi .gt_striped {\n  background-color: rgba(128, 128, 128, 0.05);\n}\n\n#prgzooqwyi .gt_table_body {\n  border-top-style: solid;\n  border-top-width: 2px;\n  border-top-color: #D3D3D3;\n  border-bottom-style: solid;\n  border-bottom-width: 2px;\n  border-bottom-color: #D3D3D3;\n}\n\n#prgzooqwyi .gt_footnotes {\n  color: #333333;\n  background-color: #FFFFFF;\n  border-bottom-style: none;\n  border-bottom-width: 2px;\n  border-bottom-color: #D3D3D3;\n  border-left-style: none;\n  border-left-width: 2px;\n  border-left-color: #D3D3D3;\n  border-right-style: none;\n  border-right-width: 2px;\n  border-right-color: #D3D3D3;\n}\n\n#prgzooqwyi .gt_footnote {\n  margin: 0px;\n  font-size: 90%;\n  padding-top: 4px;\n  padding-bottom: 4px;\n  padding-left: 5px;\n  padding-right: 5px;\n}\n\n#prgzooqwyi .gt_sourcenotes {\n  color: #333333;\n  background-color: #FFFFFF;\n  border-bottom-style: none;\n  border-bottom-width: 2px;\n  border-bottom-color: #D3D3D3;\n  border-left-style: none;\n  border-left-width: 2px;\n  border-left-color: #D3D3D3;\n  border-right-style: none;\n  border-right-width: 2px;\n  border-right-color: #D3D3D3;\n}\n\n#prgzooqwyi .gt_sourcenote {\n  font-size: 90%;\n  padding-top: 4px;\n  padding-bottom: 4px;\n  padding-left: 5px;\n  padding-right: 5px;\n}\n\n#prgzooqwyi .gt_left {\n  text-align: left;\n}\n\n#prgzooqwyi .gt_center {\n  text-align: center;\n}\n\n#prgzooqwyi .gt_right {\n  text-align: right;\n  font-variant-numeric: tabular-nums;\n}\n\n#prgzooqwyi .gt_font_normal {\n  font-weight: normal;\n}\n\n#prgzooqwyi .gt_font_bold {\n  font-weight: bold;\n}\n\n#prgzooqwyi .gt_font_italic {\n  font-style: italic;\n}\n\n#prgzooqwyi .gt_super {\n  font-size: 65%;\n}\n\n#prgzooqwyi .gt_footnote_marks {\n  font-size: 75%;\n  vertical-align: 0.4em;\n  position: initial;\n}\n\n#prgzooqwyi .gt_asterisk {\n  font-size: 100%;\n  vertical-align: 0;\n}\n\n#prgzooqwyi .gt_indent_1 {\n  text-indent: 5px;\n}\n\n#prgzooqwyi .gt_indent_2 {\n  text-indent: 10px;\n}\n\n#prgzooqwyi .gt_indent_3 {\n  text-indent: 15px;\n}\n\n#prgzooqwyi .gt_indent_4 {\n  text-indent: 20px;\n}\n\n#prgzooqwyi .gt_indent_5 {\n  text-indent: 25px;\n}\n\n#prgzooqwyi .katex-display {\n  display: inline-flex !important;\n  margin-bottom: 0.75em !important;\n}\n\n#prgzooqwyi div.Reactable > div.rt-table > div.rt-thead > div.rt-tr.rt-tr-group-header > div.rt-th-group:after {\n  height: 0px !important;\n}\n</style>\n<table class=\"gt_table\" data-quarto-disable-processing=\"false\" data-quarto-bootstrap=\"false\">\n  <thead>\n    <tr class=\"gt_heading\">\n      <td colspan=\"6\" class=\"gt_heading gt_title gt_font_normal gt_bottom_border\" style>Coercion of Atomic Vectors<span class=\"gt_footnote_marks\" style=\"white-space:nowrap;font-style:italic;font-weight:normal;line-height:0;\"><sup>1</sup></span></td>\n    </tr>\n    \n    <tr class=\"gt_col_headings\">\n      <th class=\"gt_col_heading gt_columns_bottom_border gt_center\" rowspan=\"1\" colspan=\"1\" scope=\"col\" id=\"var_names\">name</th>\n      <th class=\"gt_col_heading gt_columns_bottom_border gt_center\" rowspan=\"1\" colspan=\"1\" scope=\"col\" id=\"var_values\">value</th>\n      <th class=\"gt_col_heading gt_columns_bottom_border gt_center\" rowspan=\"1\" colspan=\"1\" scope=\"col\" id=\"as_logical\">as.logical()</th>\n      <th class=\"gt_col_heading gt_columns_bottom_border gt_center\" rowspan=\"1\" colspan=\"1\" scope=\"col\" id=\"as_integer\">as.integer()</th>\n      <th class=\"gt_col_heading gt_columns_bottom_border gt_center\" rowspan=\"1\" colspan=\"1\" scope=\"col\" id=\"as_double\">as.double()</th>\n      <th class=\"gt_col_heading gt_columns_bottom_border gt_center\" rowspan=\"1\" colspan=\"1\" scope=\"col\" id=\"as_character\">as.character()</th>\n    </tr>\n  </thead>\n  <tbody class=\"gt_table_body\">\n    <tr><td headers=\"var_names\" class=\"gt_row gt_center\">lgl_var</td>\n<td headers=\"var_values\" class=\"gt_row gt_center\">TRUE, FALSE</td>\n<td headers=\"as_logical\" class=\"gt_row gt_center\" style=\"background-color: #F9E3D6;\">TRUE FALSE</td>\n<td headers=\"as_integer\" class=\"gt_row gt_center\" style=\"background-color: #E0FFFF;\">1 0</td>\n<td headers=\"as_double\" class=\"gt_row gt_center\" style=\"background-color: #F9E3D6;\">1 0</td>\n<td headers=\"as_character\" class=\"gt_row gt_center\" style=\"background-color: #E0FFFF;\">'TRUE' 'FALSE'</td></tr>\n    <tr><td headers=\"var_names\" class=\"gt_row gt_center\">int_var</td>\n<td headers=\"var_values\" class=\"gt_row gt_center\">1L, 6L, 10L</td>\n<td headers=\"as_logical\" class=\"gt_row gt_center\" style=\"background-color: #F9E3D6;\">TRUE TRUE TRUE</td>\n<td headers=\"as_integer\" class=\"gt_row gt_center\" style=\"background-color: #E0FFFF;\">1 6 10</td>\n<td headers=\"as_double\" class=\"gt_row gt_center\" style=\"background-color: #F9E3D6;\">1 6 10</td>\n<td headers=\"as_character\" class=\"gt_row gt_center\" style=\"background-color: #E0FFFF;\">'1' '6' '10'</td></tr>\n    <tr><td headers=\"var_names\" class=\"gt_row gt_center\">dbl_var</td>\n<td headers=\"var_values\" class=\"gt_row gt_center\">1, 2.5, 4.5</td>\n<td headers=\"as_logical\" class=\"gt_row gt_center\" style=\"background-color: #F9E3D6;\">TRUE TRUE TRUE</td>\n<td headers=\"as_integer\" class=\"gt_row gt_center\" style=\"background-color: #E0FFFF;\">1 2 4</td>\n<td headers=\"as_double\" class=\"gt_row gt_center\" style=\"background-color: #F9E3D6;\">1.0 2.5 4.5</td>\n<td headers=\"as_character\" class=\"gt_row gt_center\" style=\"background-color: #E0FFFF;\">'1' '2.5' '4.5'</td></tr>\n    <tr><td headers=\"var_names\" class=\"gt_row gt_center\">chr_var</td>\n<td headers=\"var_values\" class=\"gt_row gt_center\">'these are', 'some strings'</td>\n<td headers=\"as_logical\" class=\"gt_row gt_center\" style=\"background-color: #F9E3D6;\">NA NA</td>\n<td headers=\"as_integer\" class=\"gt_row gt_center\" style=\"background-color: #E0FFFF;\">NA_integer</td>\n<td headers=\"as_double\" class=\"gt_row gt_center\" style=\"background-color: #F9E3D6;\">NA_double</td>\n<td headers=\"as_character\" class=\"gt_row gt_center\" style=\"background-color: #E0FFFF;\">'these are', 'some strings'</td></tr>\n  </tbody>\n  \n  <tfoot class=\"gt_footnotes\">\n    <tr>\n      <td class=\"gt_footnote\" colspan=\"6\"><span class=\"gt_footnote_marks\" style=\"white-space:nowrap;font-style:italic;font-weight:normal;line-height:0;\"><sup>1</sup></span> Source: https://adv-r.hadley.nz/index.html</td>\n    </tr>\n  </tfoot>\n</table>\n</div>\n```\n\n:::\n:::\n\n\nBut note that coercion may fail in one of two ways, or both:\n\n-   With warning/error\n-   NAs\n\n\n::: {.cell}\n\n```{.r .cell-code}\nas.integer(c(1, 2, \"three\"))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1]  1  2 NA\n```\n\n\n:::\n:::\n\n\n## Exercises 1/5\n\n1. How do you create raw and complex scalars?\n\n<details><summary>Answer(s)</summary>\n\n::: {.cell}\n\n```{.r .cell-code}\nas.raw(42)\n#> [1] 2a\ncharToRaw(\"A\")\n#> [1] 41\ncomplex(length.out = 1, real = 1, imaginary = 1)\n#> [1] 1+1i\n```\n:::\n\n</details>\n\n## Exercises 2/5\n\n2. Test your knowledge of the vector coercion rules by predicting the output of the following uses of c():\n\n\n::: {.cell}\n\n```{.r .cell-code}\nc(1, FALSE)\nc(\"a\", 1)\nc(TRUE, 1L)\n```\n:::\n\n\n<details><summary>Answer(s)</summary>\n\n::: {.cell}\n\n```{.r .cell-code}\nc(1, FALSE)      # will be coerced to double    -> 1 0\nc(\"a\", 1)        # will be coerced to character -> \"a\" \"1\"\nc(TRUE, 1L)      # will be coerced to integer   -> 1 1\n```\n:::\n\n</details>\n\n## Exercises 3/5\n\n3. Why is `1 == \"1\"` true? Why is `-1 < FALSE` true? Why is `\"one\" < 2` false?\n\n<details><summary>Answer(s)</summary>\nThese comparisons are carried out by operator-functions (==, <), which coerce their arguments to a common type. In the examples above, these types will be character, double and character: 1 will be coerced to \"1\", FALSE is represented as 0 and 2 turns into \"2\" (and numbers precede letters in lexicographic order (may depend on locale)).\n\n</details>\n\n## Exercises 4/5\n\n4. Why is the default missing value, NA, a logical vector? What’s special about logical vectors?\n\n<details><summary>Answer(s)</summary>\nThe presence of missing values shouldn’t affect the type of an object. Recall that there is a type-hierarchy for coercion from character → double → integer → logical. When combining `NA`s with other atomic types, the `NA`s will be coerced to integer (`NA_integer_`), double (`NA_real_`) or character (`NA_character_`) and not the other way round. If `NA` were a character and added to a set of other values all of these would be coerced to character as well.\n</details>\n\n## Exercises 5/5\n\n5. Precisely what do `is.atomic()`, `is.numeric()`, and `is.vector()` test for?\n\n<details><summary>Answer(s)</summary>\n\n* `is.atomic()` tests if an object is an atomic vector or is `NULL` (!). Atomic vectors are objects of type logical, integer, double, complex, character or raw.\n* `is.numeric()` tests if an object has type integer or double and is not of class `factor`, `Date`, `POSIXt` or `difftime`.\n* `is.vector()` tests if an object is a vector or an expression and has no attributes, apart from names. Vectors are atomic vectors or lists.\n \n</details>\n\n\n## Attributes\n\nAttributes are name-value pairs that attach metadata to an object (vector).\n\n* **Name-value pairs**: attributes have a name and a value\n* **Metadata**: not data itself, but data about the data\n \n## Getting and Setting\n\nThree functions:\n\n1. retrieve and modify single attributes with `attr()`\n2. retrieve en masse with `attributes()`\n3. set en masse with `structure()`\n\n## Single attribute\n\nUse `attr()`\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# some object\na <- c(1, 2, 3)\n\n# set attribute\nattr(x = a, which = \"attribute_name\") <- \"some attribute\"\n\n# get attribute\nattr(a, \"attribute_name\")\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"some attribute\"\n```\n\n\n:::\n:::\n\n\n## Multiple attributes\n\n`structure()`: set multiple attributes, `attributes()`: get multiple attributes\n\n:::: columns\n::: column\n\n::: {.cell}\n\n```{.r .cell-code}\na <- 1:3\nattr(a, \"x\") <- \"abcdef\"\nattr(a, \"x\")\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"abcdef\"\n```\n\n\n:::\n\n```{.r .cell-code}\nattr(a, \"y\") <- 4:6\nstr(attributes(a))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> List of 2\n#>  $ x: chr \"abcdef\"\n#>  $ y: int [1:3] 4 5 6\n```\n\n\n:::\n\n```{.r .cell-code}\nb <- structure(\n  1:3, \n  x = \"abcdef\",\n  y = 4:6\n)\nidentical(a, b)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] TRUE\n```\n\n\n:::\n:::\n\n:::\n\n::: column\n![Image Credit: Advanced R](images/vectors/attr.png) \n:::\n::::\n\n\n## Why\n\nThree particularly important attributes: \n\n1. **names** - a character vector giving each element a name\n2. **dimension** - (or dim) turns vectors into matrices and arrays \n3. **class** - powers the S3 object system (we'll learn more about this in chapter 13)\n\nMost attributes are lost by most operations.  Only two attributes are routinely preserved: names and dimension.\n\n## Names\n\n~~Three~~ Four ways to name:\n\n:::: columns\n\n::: {.column width=\"50%\"}\n\n::: {.cell}\n\n```{.r .cell-code}\n# (1) On creation: \nx <- c(A = 1, B = 2, C = 3)\nx\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> A B C \n#> 1 2 3\n```\n\n\n:::\n\n```{.r .cell-code}\n# (2) Assign to names():\ny <- 1:3\nnames(y) <- c(\"a\", \"b\", \"c\")\ny\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> a b c \n#> 1 2 3\n```\n\n\n:::\n\n```{.r .cell-code}\n# (3) Inline:\nz <- setNames(1:3, c(\"a\", \"b\", \"c\"))\nz\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> a b c \n#> 1 2 3\n```\n\n\n:::\n:::\n\n:::\n\n::: {.column width=\"50%\"}\n![proper diagram](images/vectors/attr-names-1.png) \n:::\n\n::::\n\n## rlang Names\n\n:::: columns\n\n::: {.column width=\"50%\"}\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# (4) Inline with {rlang}:\na <- 1:3\nrlang::set_names(\n  a,\n  c(\"a\", \"b\", \"c\")\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> a b c \n#> 1 2 3\n```\n\n\n:::\n:::\n\n\n:::\n\n::: {.column width=\"50%\"}\n![simplified diagram](images/vectors/attr-names-2.png) \n:::\n\n::::\n\n\n## Removing names\n\n* `x <- unname(x)` or `names(x) <- NULL`\n* Thematically but not directly related: labelled class vectors with `haven::labelled()`\n\n\n## Dimensions: `matrix()` and `array()`\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Two scalar arguments specify row and column sizes\nx <- matrix(1:6, nrow = 2, ncol = 3)\nx\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#>      [,1] [,2] [,3]\n#> [1,]    1    3    5\n#> [2,]    2    4    6\n```\n\n\n:::\n\n```{.r .cell-code}\n# One vector argument to describe all dimensions\ny <- array(1:12, c(2, 3, 2)) # rows, columns, no of arrays\ny\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> , , 1\n#> \n#>      [,1] [,2] [,3]\n#> [1,]    1    3    5\n#> [2,]    2    4    6\n#> \n#> , , 2\n#> \n#>      [,1] [,2] [,3]\n#> [1,]    7    9   11\n#> [2,]    8   10   12\n```\n\n\n:::\n:::\n\n\n## Dimensions: assign to `dim()`\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# You can also modify an object in place by setting dim()\nz <- 1:6\ndim(z) <- c(2, 3) # rows, columns\nz\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#>      [,1] [,2] [,3]\n#> [1,]    1    3    5\n#> [2,]    2    4    6\n```\n\n\n:::\n\n```{.r .cell-code}\na <- 1:12\ndim(a) <- c(2, 3, 2) # rows, columns, no of arrays\na\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> , , 1\n#> \n#>      [,1] [,2] [,3]\n#> [1,]    1    3    5\n#> [2,]    2    4    6\n#> \n#> , , 2\n#> \n#>      [,1] [,2] [,3]\n#> [1,]    7    9   11\n#> [2,]    8   10   12\n```\n\n\n:::\n:::\n\n\n\n## Functions for working with vectors, matrices and arrays (1/2):\n\nVector | Matrix\t| Array\n:----- | :---------- | :-----\n`names()` | `rownames()`, `colnames()` | `dimnames()`\n`length()` | `nrow()`, `ncol()` | `dim()`\n`c()` | `rbind()`, `cbind()` | `abind::abind()`\n— | `t()` | `aperm()`\n`is.null(dim(x))` | `is.matrix()` | `is.array()`\n\n* **Caution**: Vector without `dim` set has `NULL` dimensions, not `1`.\n* One dimension?\n\n## Functions for working with vectors, matrices and arrays (2/2):\n\n\n::: {.cell}\n\n```{.r .cell-code}\nstr(1:3)                   # 1d vector\n#>  int [1:3] 1 2 3\nstr(matrix(1:3, ncol = 1)) # column vector\n#>  int [1:3, 1] 1 2 3\nstr(matrix(1:3, nrow = 1)) # row vector\n#>  int [1, 1:3] 1 2 3\nstr(array(1:3, 3))         # \"array\" vector\n#>  int [1:3(1d)] 1 2 3\n```\n:::\n\n\n\n## Exercises 1/4\n\n1. How is `setNames()` implemented? Read the source code.\n\n<details><summary>Answer(s)</summary>\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsetNames <- function(object = nm, nm) {\n  names(object) <- nm\n  object\n}\n```\n:::\n\n\n- Data arg 1st = works well with pipe.\n- 1st arg is optional\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsetNames( , c(\"a\", \"b\", \"c\"))\n#>   a   b   c \n#> \"a\" \"b\" \"c\"\n```\n:::\n\n</details>\n\n## Exercises 1/4 (cont)\n\n1. How is `unname()` implemented? Read the source code.\n\n<details><summary>Answer(s)</summary>\n\n\n::: {.cell}\n\n```{.r .cell-code}\nunname <- function(obj, force = FALSE) {\n  if (!is.null(names(obj))) \n    names(obj) <- NULL\n  if (!is.null(dimnames(obj)) && (force || !is.data.frame(obj))) \n    dimnames(obj) <- NULL\n  obj\n}\n```\n:::\n\n`unname()` sets existing `names` or `dimnames` to `NULL`.\n</details>\n\n## Exercises 2/4\n\n2. What does `dim()` return when applied to a 1-dimensional vector? When might you use `NROW()` or `NCOL()`?\n\n<details><summary>Answer(s)</summary>\n\n> `dim()` returns `NULL` when applied to a 1d vector.\n\n`NROW()` and `NCOL()` treats `NULL` and vectors like they have dimensions:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- 1:10\nnrow(x)\n#> NULL\nncol(x)\n#> NULL\nNROW(x)\n#> [1] 10\nNCOL(x)\n#> [1] 1\n```\n:::\n\n\n</details>\n\n## Exercises 3/4\n\n3. How would you describe the following three objects? What makes them different from `1:5`?\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx1 <- array(1:5, c(1, 1, 5))\nx2 <- array(1:5, c(1, 5, 1))\nx3 <- array(1:5, c(5, 1, 1))\n```\n:::\n\n\n<details><summary>Answer(s)</summary>\n\n::: {.cell}\n\n```{.r .cell-code}\nx1 <- array(1:5, c(1, 1, 5))  # 1 row,  1 column,  5 in third dim.\nx2 <- array(1:5, c(1, 5, 1))  # 1 row,  5 columns, 1 in third dim.\nx3 <- array(1:5, c(5, 1, 1))  # 5 rows, 1 column,  1 in third dim.\n```\n:::\n\n</details>\n\n## Exercises 4/4\n\n4. An early draft used this code to illustrate `structure()`:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nstructure(1:5, comment = \"my attribute\")\n#> [1] 1 2 3 4 5\n```\n:::\n\n\nWhy don't you see the comment attribute on print? Is the attribute missing, or is there something else special about it?\n\n<details><summary>Answer(s)</summary>\nThe documentation states (see `?comment`):\n\n> Contrary to other attributes, the comment is not printed (by print or print.default).\n\n## Exercises 4/4 (cont)\n\n<details><summary>Answer(s)</summary>\nAlso, from `?attributes:`\n\n> Note that some attributes (namely class, comment, dim, dimnames, names, row.names and tsp) are treated specially and have restrictions on the values which can be set.\n\nRetrieve comment attributes with `attr()`:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nfoo <- structure(1:5, comment = \"my attribute\")\n\nattributes(foo)\n#> $comment\n#> [1] \"my attribute\"\nattr(foo, which = \"comment\")\n#> [1] \"my attribute\"\n```\n:::\n\n\n</details>\n\n\n\n## **Class** - S3 atomic vectors\n\n![](images/vectors/summary-tree-s3-1.png) \n\nCredit: [Advanced R](https://adv-r.hadley.nz/index.html) by Hadley Wickham\n\n**Having a class attribute turns an object into an S3 object.**\n\nWhat makes S3 atomic vectors different?\n\n1. behave differently from a regular vector when passed to a generic function \n2. often store additional information in other attributes\n\n\n## Four important S3 vectors used in base R:\n\n1. **Factors** (categorical data)\n2. **Dates**\n3. **Date-times** (POSIXct)\n4. **Durations** (difftime)\n\n## Factors\n\nA factor is a vector used to store categorical data that can contain only predefined values.\n\nFactors are integer vectors with:\n\n-   Class: \"factor\"\n-   Attributes: \"levels\", or the set of allowed values\n\n## Factors examples\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncolors = c('red', 'blue', 'green','red','red', 'green')\ncolors_factor <- factor(\n  x = colors, levels = c('red', 'blue', 'green', 'yellow')\n)\n```\n:::\n\n\n:::: columns\n\n::: column\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntable(colors)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> colors\n#>  blue green   red \n#>     1     2     3\n```\n\n\n:::\n\n```{.r .cell-code}\ntable(colors_factor)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> colors_factor\n#>    red   blue  green yellow \n#>      3      1      2      0\n```\n\n\n:::\n:::\n\n:::\n\n::: column\n\n::: {.cell}\n\n```{.r .cell-code}\ntypeof(colors_factor)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"integer\"\n```\n\n\n:::\n\n```{.r .cell-code}\nclass(colors_factor)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"factor\"\n```\n\n\n:::\n\n```{.r .cell-code}\nattributes(colors_factor)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> $levels\n#> [1] \"red\"    \"blue\"   \"green\"  \"yellow\"\n#> \n#> $class\n#> [1] \"factor\"\n```\n\n\n:::\n:::\n\n:::\n::::\n\n## Custom Order\n\nFactors can be ordered. This can be useful for models or visualizations where order matters.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nvalues <- c('high', 'med', 'low', 'med', 'high', 'low', 'med', 'high')\nordered_factor <- ordered(\n  x = values,\n  levels = c('low', 'med', 'high') # in order\n)\nordered_factor\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] high med  low  med  high low  med  high\n#> Levels: low < med < high\n```\n\n\n:::\n\n```{.r .cell-code}\ntable(values)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> values\n#> high  low  med \n#>    3    2    3\n```\n\n\n:::\n\n```{.r .cell-code}\ntable(ordered_factor)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> ordered_factor\n#>  low  med high \n#>    2    3    3\n```\n\n\n:::\n:::\n\n\n## Dates\n\nDates are:\n\n-   Double vectors\n-   With class \"Date\"\n-   No other attributes\n\n\n::: {.cell}\n\n```{.r .cell-code}\nnotes_date <- Sys.Date()\n\n# type\ntypeof(notes_date)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"double\"\n```\n\n\n:::\n\n```{.r .cell-code}\n# class\nattributes(notes_date)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> $class\n#> [1] \"Date\"\n```\n\n\n:::\n:::\n\n\n## Dates Unix epoch\n\nThe double component represents the number of days since since the [Unix epoch](https://en.wikipedia.org/wiki/Unix_time) `1970-01-01`\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndate <- as.Date(\"1970-02-01\")\nunclass(date)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 31\n```\n\n\n:::\n:::\n\n\n## Date-times\n\nThere are 2 Date-time representations in base R:\n\n-   POSIXct, where \"ct\" denotes *calendar time*\n-   POSIXlt, where \"lt\" designates *local time*\n\n<!--\n\nJust for fun:\n\"How to pronounce 'POSIXct'?\"\nhttps://www.howtopronounce.com/posixct\n\n-->\n\n## Dates-times: POSIXct\n\nWe'll focus on POSIXct because:\n\n-   Simplest\n-   Built on an atomic (double) vector\n-   Most appropriate for use in a data frame\n\nLet's now build and deconstruct a Date-time\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Build\nnote_date_time <- as.POSIXct(\n  x = Sys.time(), # time\n  tz = \"America/New_York\" # time zone, used only for formatting\n)\n\n# Inspect\nnote_date_time\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"2025-09-03 07:11:21 EDT\"\n```\n\n\n:::\n\n```{.r .cell-code}\n# - type\ntypeof(note_date_time)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"double\"\n```\n\n\n:::\n\n```{.r .cell-code}\n# - attributes\nattributes(note_date_time)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> $class\n#> [1] \"POSIXct\" \"POSIXt\" \n#> \n#> $tzone\n#> [1] \"America/New_York\"\n```\n\n\n:::\n\n```{.r .cell-code}\nstructure(note_date_time, tzone = \"Europe/Paris\")\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"2025-09-03 13:11:21 CEST\"\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndate_time <- as.POSIXct(\"2024-02-22 12:34:56\", tz = \"EST\")\nunclass(date_time)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1708623296\n#> attr(,\"tzone\")\n#> [1] \"EST\"\n```\n\n\n:::\n:::\n\n\n\n## Durations\n\nDurations represent the amount of time between pairs of dates or date-times.\n\n-   Double vectors\n-   Class: \"difftime\"\n-   Attributes: \"units\", or the unit of duration (e.g., weeks, hours, minutes, seconds, etc.)\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Construct\none_minute <- as.difftime(1, units = \"mins\")\n# Inspect\none_minute\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> Time difference of 1 mins\n```\n\n\n:::\n\n```{.r .cell-code}\n# Dissect\n# - type\ntypeof(one_minute)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"double\"\n```\n\n\n:::\n\n```{.r .cell-code}\n# - attributes\nattributes(one_minute)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> $class\n#> [1] \"difftime\"\n#> \n#> $units\n#> [1] \"mins\"\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntime_since_01_01_1970 <- notes_date - date\ntime_since_01_01_1970\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> Time difference of 20303 days\n```\n\n\n:::\n:::\n\n\n\nSee also:\n\n-   [`lubridate::make_difftime()`](https://lubridate.tidyverse.org/reference/make_difftime.html)\n-   [`clock::date_time_build()`](https://clock.r-lib.org/reference/date_time_build.html)\n\n\n## Exercises 1/3\n\n1. What sort of object does `table()` return? What is its type? What attributes does it have? How does the dimensionality change as you tabulate more variables?\n\n<details><summary>Answer(s)</summary>\n\n`table()` returns a contingency table of its input variables. It is implemented as an integer vector with class table and dimensions (which makes it act like an array). Its attributes are dim (dimensions) and dimnames (one name for each input column). The dimensions correspond to the number of unique values (factor levels) in each input variable.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- table(mtcars[c(\"vs\", \"cyl\", \"am\")])\n\ntypeof(x)\n#> [1] \"integer\"\nattributes(x)\n#> $dim\n#> [1] 2 3 2\n#> \n#> $dimnames\n#> $dimnames$vs\n#> [1] \"0\" \"1\"\n#> \n#> $dimnames$cyl\n#> [1] \"4\" \"6\" \"8\"\n#> \n#> $dimnames$am\n#> [1] \"0\" \"1\"\n#> \n#> \n#> $class\n#> [1] \"table\"\n```\n:::\n\n</details>\n\n## Exercises 2/3\n\n2. What happens to a factor when you modify its levels?\n\n\n::: {.cell}\n\n```{.r .cell-code}\nf1 <- factor(letters)\nlevels(f1) <- rev(levels(f1))\n```\n:::\n\n\n<details><summary>Answer(s)</summary>\nThe underlying integer values stay the same, but the levels are changed, making it look like the data has changed.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nf1 <- factor(letters)\nf1\n#>  [1] a b c d e f g h i j k l m n o p q r s t u v w x y z\n#> Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z\nas.integer(f1)\n#>  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25\n#> [26] 26\n\nlevels(f1) <- rev(levels(f1))\nf1\n#>  [1] z y x w v u t s r q p o n m l k j i h g f e d c b a\n#> Levels: z y x w v u t s r q p o n m l k j i h g f e d c b a\nas.integer(f1)\n#>  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25\n#> [26] 26\n```\n:::\n\n</details>\n\n## Exercises 3/3\n\n3. What does this code do? How do `f2` and `f3` differ from `f1`?\n\n\n::: {.cell}\n\n```{.r .cell-code}\nf2 <- rev(factor(letters))\nf3 <- factor(letters, levels = rev(letters))\n```\n:::\n\n\n<details><summary>Answer(s)</summary>\nFor `f2` and `f3` either the order of the factor elements or its levels are being reversed. For `f1` both transformations are occurring.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Reverse element order\n(f2 <- rev(factor(letters)))\n#>  [1] z y x w v u t s r q p o n m l k j i h g f e d c b a\n#> Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z\nas.integer(f2)\n#>  [1] 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10  9  8  7  6  5  4  3  2\n#> [26]  1\n\n# Reverse factor levels (when creating factor)\n(f3 <- factor(letters, levels = rev(letters)))\n#>  [1] a b c d e f g h i j k l m n o p q r s t u v w x y z\n#> Levels: z y x w v u t s r q p o n m l k j i h g f e d c b a\nas.integer(f3)\n#>  [1] 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10  9  8  7  6  5  4  3  2\n#> [26]  1\n```\n:::\n\n</details>\n\n\n## Lists\n\n* sometimes called a generic vector or recursive vector\n* Recall ([section 2.3.3](https://adv-r.hadley.nz/names-values.html#list-references)): each element is really a *reference* to another object\n* an be composed of elements of different types (as opposed to atomic vectors which must be of only one type)\n\n## Constructing\n\nSimple lists:\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Construct\nsimple_list <- list(\n  c(TRUE, FALSE),   # logicals\n  1:20,             # integers\n  c(1.2, 2.3, 3.4), # doubles\n  c(\"primo\", \"secundo\", \"tercio\") # characters\n)\n\nsimple_list\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [[1]]\n#> [1]  TRUE FALSE\n#> \n#> [[2]]\n#>  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20\n#> \n#> [[3]]\n#> [1] 1.2 2.3 3.4\n#> \n#> [[4]]\n#> [1] \"primo\"   \"secundo\" \"tercio\"\n```\n\n\n:::\n\n```{.r .cell-code}\n# Inspect\n# - type\ntypeof(simple_list)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"list\"\n```\n\n\n:::\n\n```{.r .cell-code}\n# - structure\nstr(simple_list)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> List of 4\n#>  $ : logi [1:2] TRUE FALSE\n#>  $ : int [1:20] 1 2 3 4 5 6 7 8 9 10 ...\n#>  $ : num [1:3] 1.2 2.3 3.4\n#>  $ : chr [1:3] \"primo\" \"secundo\" \"tercio\"\n```\n\n\n:::\n\n```{.r .cell-code}\n# Accessing\nsimple_list[1]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [[1]]\n#> [1]  TRUE FALSE\n```\n\n\n:::\n\n```{.r .cell-code}\nsimple_list[2]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [[1]]\n#>  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20\n```\n\n\n:::\n\n```{.r .cell-code}\nsimple_list[3]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [[1]]\n#> [1] 1.2 2.3 3.4\n```\n\n\n:::\n\n```{.r .cell-code}\nsimple_list[4]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [[1]]\n#> [1] \"primo\"   \"secundo\" \"tercio\"\n```\n\n\n:::\n\n```{.r .cell-code}\nsimple_list[[1]][2]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] FALSE\n```\n\n\n:::\n\n```{.r .cell-code}\nsimple_list[[2]][8]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 8\n```\n\n\n:::\n\n```{.r .cell-code}\nsimple_list[[3]][2]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 2.3\n```\n\n\n:::\n\n```{.r .cell-code}\nsimple_list[[4]][3]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"tercio\"\n```\n\n\n:::\n:::\n\n\n## Even Simpler List\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Construct\nsimpler_list <- list(TRUE, FALSE, \n                    1, 2, 3, 4, 5, \n                    1.2, 2.3, 3.4, \n                    \"primo\", \"secundo\", \"tercio\")\n\n# Accessing\nsimpler_list[1]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [[1]]\n#> [1] TRUE\n```\n\n\n:::\n\n```{.r .cell-code}\nsimpler_list[5]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [[1]]\n#> [1] 3\n```\n\n\n:::\n\n```{.r .cell-code}\nsimpler_list[9]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [[1]]\n#> [1] 2.3\n```\n\n\n:::\n\n```{.r .cell-code}\nsimpler_list[11]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [[1]]\n#> [1] \"primo\"\n```\n\n\n:::\n:::\n\n\n## Nested lists:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nnested_list <- list(\n  # first level\n  list(\n    # second level\n    list(\n      # third level\n      list(1)\n    )\n  )\n)\n\nstr(nested_list)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> List of 1\n#>  $ :List of 1\n#>   ..$ :List of 1\n#>   .. ..$ :List of 1\n#>   .. .. ..$ : num 1\n```\n\n\n:::\n:::\n\n\nLike JSON.\n\n## Combined lists\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlist_comb1 <- list(list(1, 2), list(3, 4)) # with list()\nlist_comb2 <- c(list(1, 2), list(3, 4)) # with c()\n\n# compare structure\nstr(list_comb1)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> List of 2\n#>  $ :List of 2\n#>   ..$ : num 1\n#>   ..$ : num 2\n#>  $ :List of 2\n#>   ..$ : num 3\n#>   ..$ : num 4\n```\n\n\n:::\n\n```{.r .cell-code}\nstr(list_comb2)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> List of 4\n#>  $ : num 1\n#>  $ : num 2\n#>  $ : num 3\n#>  $ : num 4\n```\n\n\n:::\n\n```{.r .cell-code}\n# does this work if they are different data types?\nlist_comb3 <- c(list(1, 2), list(TRUE, FALSE))\nstr(list_comb3)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> List of 4\n#>  $ : num 1\n#>  $ : num 2\n#>  $ : logi TRUE\n#>  $ : logi FALSE\n```\n\n\n:::\n:::\n\n\n## Testing\n\nCheck that is a list:\n\n-   `is.list()`\n-   \\`rlang::is_list()\\`\\`\n\nThe two do the same, except that the latter can check for the number of elements\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# is list\nbase::is.list(list_comb2)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] TRUE\n```\n\n\n:::\n\n```{.r .cell-code}\nrlang::is_list(list_comb2)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] TRUE\n```\n\n\n:::\n\n```{.r .cell-code}\n# is list of 4 elements\nrlang::is_list(x = list_comb2, n = 4)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] TRUE\n```\n\n\n:::\n\n```{.r .cell-code}\n# is a vector (of a special type)\n# remember the family tree?\nrlang::is_vector(list_comb2)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] TRUE\n```\n\n\n:::\n:::\n\n\n## Coercion\n\nUse `as.list()`\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlist(1:3)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [[1]]\n#> [1] 1 2 3\n```\n\n\n:::\n\n```{.r .cell-code}\nas.list(1:3)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [[1]]\n#> [1] 1\n#> \n#> [[2]]\n#> [1] 2\n#> \n#> [[3]]\n#> [1] 3\n```\n\n\n:::\n:::\n\n\n## Matrices and arrays\n\nAlthough not often used, the dimension attribute can be added to create **list-matrices** or **list-arrays**.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nl <- list(1:3, \"a\", TRUE, 1.0)\ndim(l) <- c(2, 2); l\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#>      [,1]      [,2]\n#> [1,] integer,3 TRUE\n#> [2,] \"a\"       1\n```\n\n\n:::\n\n```{.r .cell-code}\nl[[1, 1]]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1 2 3\n```\n\n\n:::\n:::\n\n\n\n## Exercises 1/3\n\n1. List all the ways that a list differs from an atomic vector.\n\n<details><summary>Answer(s)</summary>\n\n* Atomic vectors are always homogeneous (all elements must be of the same type). Lists may be heterogeneous (the elements can be of different types) as described in the introduction of the vectors chapter.\n* Atomic vectors point to one address in memory, while lists contain a separate reference for each element. (This was described in the list sections of the vectors and the names and values chapters.)\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlobstr::ref(1:2)\n#> [1:0x7fcd936f6e80] <int>\nlobstr::ref(list(1:2, 2))\n#> █ [1:0x7fcd93d53048] <list> \n#> ├─[2:0x7fcd91377e40] <int> \n#> └─[3:0x7fcd93b41eb0] <dbl>\n```\n:::\n\n\n\n* Subsetting with out-of-bounds and NA values leads to different output. For example, [ returns NA for atomics and NULL for lists. (This is described in more detail within the subsetting chapter.)\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Subsetting atomic vectors\n(1:2)[3]\n#> [1] NA\n(1:2)[NA]\n#> [1] NA NA\n\n# Subsetting lists\nas.list(1:2)[3]\n#> [[1]]\n#> NULL\nas.list(1:2)[NA]\n#> [[1]]\n#> NULL\n#> \n#> [[2]]\n#> NULL\n```\n:::\n\n\n\n</details>\n\n## Exercises 2/3\n\n2. Why do you need to use `unlist()` to convert a list to an atomic vector? Why doesn’t `as.vector()` work?\n\n<details><summary>Answer(s)</summary>\nA list is already a vector, though not an atomic one! Note that as.vector() and is.vector() use different definitions of “vector!”\n\n\n::: {.cell}\n\n```{.r .cell-code}\nis.vector(as.vector(mtcars))\n#> [1] FALSE\n```\n:::\n\n\n</details>\n\n## Exercises 3/3\n\n3. Compare and contrast `c()` and `unlist()` when combining a date and date-time into a single vector.\n\n<details><summary>Answer(s)</summary>\nDate and date-time objects are both built upon doubles. While dates store the number of days since the reference date 1970-01-01 (also known as “the Epoch”) in days, date-time-objects (POSIXct) store the time difference to this date in seconds.\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndate    <- as.Date(\"1970-01-02\")\ndttm_ct <- as.POSIXct(\"1970-01-01 01:00\", tz = \"UTC\")\n\n# Internal representations\nunclass(date)\n#> [1] 1\nunclass(dttm_ct)\n#> [1] 3600\n#> attr(,\"tzone\")\n#> [1] \"UTC\"\n```\n:::\n\n\nAs the c() generic only dispatches on its first argument, combining date and date-time objects via c() could lead to surprising results in older R versions (pre R 4.0.0):\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Output in R version 3.6.2\nc(date, dttm_ct)  # equal to c.Date(date, dttm_ct) \n#> [1] \"1970-01-02\" \"1979-11-10\"\nc(dttm_ct, date)  # equal to c.POSIXct(date, dttm_ct)\n#> [1] \"1970-01-01 02:00:00 CET\" \"1970-01-01 01:00:01 CET\"\n```\n:::\n\n\nIn the first statement above c.Date() is executed, which incorrectly treats the underlying double of dttm_ct (3600) as days instead of seconds. Conversely, when c.POSIXct() is called on a date, one day is counted as one second only.\n\nWe can highlight these mechanics by the following code:\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Output in R version 3.6.2\nunclass(c(date, dttm_ct))  # internal representation\n#> [1] 1 3600\ndate + 3599\n#> \"1979-11-10\"\n```\n:::\n\n\nAs of R 4.0.0 these issues have been resolved and both methods now convert their input first into POSIXct and Date, respectively.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nc(dttm_ct, date)\n#> [1] \"1970-01-01 01:00:00 UTC\" \"1970-01-02 00:00:00 UTC\"\nunclass(c(dttm_ct, date))\n#> [1]  3600 86400\n\nc(date, dttm_ct)\n#> [1] \"1970-01-02\" \"1970-01-01\"\nunclass(c(date, dttm_ct))\n#> [1] 1 0\n```\n:::\n\n\nHowever, as c() strips the time zone (and other attributes) of POSIXct objects, some caution is still recommended.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n(dttm_ct <- as.POSIXct(\"1970-01-01 01:00\", tz = \"HST\"))\n#> [1] \"1970-01-01 01:00:00 HST\"\nattributes(c(dttm_ct))\n#> $class\n#> [1] \"POSIXct\" \"POSIXt\"\n```\n:::\n\n\nA package that deals with these kinds of problems in more depth and provides a structural solution for them is the {vctrs} package9 which is also used throughout the tidyverse.10\n\nLet’s look at unlist(), which operates on list input.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Attributes are stripped\nunlist(list(date, dttm_ct))  \n#> [1]     1 39600\n```\n:::\n\n\nWe see again that dates and date-times are internally stored as doubles. Unfortunately, this is all we are left with, when unlist strips the attributes of the list.\n\nTo summarise: c() coerces types and strips time zones. Errors may have occurred in older R versions because of inappropriate method dispatch/immature methods. unlist() strips attributes.\n</details>\n\n\n## Data frames and tibbles\n\n![](images/vectors/summary-tree-s3-2.png) \n\nCredit: [Advanced R](https://adv-r.hadley.nz/index.html) by Hadley Wickham\n\n## Data frame\n\nA data frame is a:\n\n-   Named list of vectors (i.e., column names)\n-   Attributes:\n    -   (column) `names`\n    -   `row.names`\n    -   Class: \"data frame\"\n\n## Data frame, examples 1/2:\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Construct\ndf <- data.frame(\n  col1 = c(1, 2, 3),              # named atomic vector\n  col2 = c(\"un\", \"deux\", \"trois\") # another named atomic vector\n  # ,stringsAsFactors = FALSE # default for versions after R 4.1\n)\n# Inspect\ndf\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#>   col1  col2\n#> 1    1    un\n#> 2    2  deux\n#> 3    3 trois\n```\n\n\n:::\n\n```{.r .cell-code}\n# Deconstruct\n# - type\ntypeof(df)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"list\"\n```\n\n\n:::\n\n```{.r .cell-code}\n# - attributes\nattributes(df)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> $names\n#> [1] \"col1\" \"col2\"\n#> \n#> $class\n#> [1] \"data.frame\"\n#> \n#> $row.names\n#> [1] 1 2 3\n```\n\n\n:::\n:::\n\n\n\n## Data frame, examples 2/2:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nrownames(df)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"1\" \"2\" \"3\"\n```\n\n\n:::\n\n```{.r .cell-code}\ncolnames(df)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"col1\" \"col2\"\n```\n\n\n:::\n\n```{.r .cell-code}\nnames(df) # Same as colnames(df)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"col1\" \"col2\"\n```\n\n\n:::\n\n```{.r .cell-code}\nnrow(df) \n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 3\n```\n\n\n:::\n\n```{.r .cell-code}\nncol(df)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 2\n```\n\n\n:::\n\n```{.r .cell-code}\nlength(df) # Same as ncol(df)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 2\n```\n\n\n:::\n:::\n\n\nUnlike other lists, the length of each vector must be the same (i.e. as many vector elements as rows in the data frame).\n\n## Tibble\n\nCreated to relieve some of the frustrations and pain points created by data frames, tibbles are data frames that are:\n\n-   Lazy (do less)\n-   Surly (complain more)\n\n## Lazy\n\nTibbles do not:\n\n-   Coerce strings\n-   Transform non-syntactic names\n-   Recycle vectors of length greater than 1\n\n## ! Coerce strings\n\n\n::: {.cell}\n\n```{.r .cell-code}\nchr_col <- c(\"don't\", \"factor\", \"me\", \"bro\")\n\n# data frame\ndf <- data.frame(\n  a = chr_col,\n  # in R 4.1 and earlier, this was the default\n  stringsAsFactors = TRUE\n)\n\n# tibble\ntbl <- tibble::tibble(\n  a = chr_col\n)\n\n# contrast the structure\nstr(df$a)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#>  Factor w/ 4 levels \"bro\",\"don't\",..: 2 3 4 1\n```\n\n\n:::\n\n```{.r .cell-code}\nstr(tbl$a)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#>  chr [1:4] \"don't\" \"factor\" \"me\" \"bro\"\n```\n\n\n:::\n:::\n\n\n## ! Transform non-syntactic names\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# data frame\ndf <- data.frame(\n  `1` = c(1, 2, 3)\n)\n\n# tibble\ntbl <- tibble::tibble(\n  `1` = c(1, 2, 3)\n)\n\n# contrast the names\nnames(df)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"X1\"\n```\n\n\n:::\n\n```{.r .cell-code}\nnames(tbl)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"1\"\n```\n\n\n:::\n:::\n\n\n## ! Recycle vectors of length greater than 1\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# data frame\ndf <- data.frame(\n  col1 = c(1, 2, 3, 4),\n  col2 = c(1, 2)\n)\n\n# tibble\ntbl <- tibble::tibble(\n  col1 = c(1, 2, 3, 4),\n  col2 = c(1, 2)\n)\n```\n\n::: {.cell-output .cell-output-error}\n\n```\n#> Error in `tibble::tibble()`:\n#> ! Tibble columns must have compatible sizes.\n#> • Size 4: Existing data.\n#> • Size 2: Column `col2`.\n#> ℹ Only values of size one are recycled.\n```\n\n\n:::\n:::\n\n\n## Surly\n\nTibbles do only what they're asked and complain if what they're asked doesn't make sense:\n\n-   Subsetting always yields a tibble\n-   Complains if cannot find column\n\n## Subsetting always yields a tibble\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# data frame\ndf <- data.frame(\n  col1 = c(1, 2, 3, 4)\n)\n\n# tibble\ntbl <- tibble::tibble(\n  col1 = c(1, 2, 3, 4)\n)\n\n# contrast\ndf_col <- df[, \"col1\"]\nstr(df_col)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#>  num [1:4] 1 2 3 4\n```\n\n\n:::\n\n```{.r .cell-code}\ntbl_col <- tbl[, \"col1\"]\nstr(tbl_col)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> tibble [4 × 1] (S3: tbl_df/tbl/data.frame)\n#>  $ col1: num [1:4] 1 2 3 4\n```\n\n\n:::\n\n```{.r .cell-code}\n# to select a vector, do one of these instead\ntbl_col_1 <- tbl[[\"col1\"]]\nstr(tbl_col_1)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#>  num [1:4] 1 2 3 4\n```\n\n\n:::\n\n```{.r .cell-code}\ntbl_col_2 <- dplyr::pull(tbl, col1)\nstr(tbl_col_2)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#>  num [1:4] 1 2 3 4\n```\n\n\n:::\n:::\n\n\n## Complains if cannot find column\n\n\n::: {.cell}\n\n```{.r .cell-code}\nnames(df)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"col1\"\n```\n\n\n:::\n\n```{.r .cell-code}\ndf$col\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 1 2 3 4\n```\n\n\n:::\n\n```{.r .cell-code}\nnames(tbl)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"col1\"\n```\n\n\n:::\n\n```{.r .cell-code}\ntbl$col\n```\n\n::: {.cell-output .cell-output-stderr}\n\n```\n#> Warning: Unknown or uninitialised column: `col`.\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> NULL\n```\n\n\n:::\n:::\n\n\n## One more difference\n\n**`tibble()` allows you to refer to variables created during construction**\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntibble::tibble(\n  x = 1:3,\n  y = x * 2 # x refers to the line above\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> # A tibble: 3 × 2\n#>       x     y\n#>   <int> <dbl>\n#> 1     1     2\n#> 2     2     4\n#> 3     3     6\n```\n\n\n:::\n:::\n\n\n<details>\n<summary>Side Quest: Row Names</summary>\n\n- character vector containing only unique values\n- get and set with `rownames()`\n- can use them to subset rows\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndf3 <- data.frame(\n  age = c(35, 27, 18),\n  hair = c(\"blond\", \"brown\", \"black\"),\n  row.names = c(\"Bob\", \"Susan\", \"Sam\")\n)\ndf3\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#>       age  hair\n#> Bob    35 blond\n#> Susan  27 brown\n#> Sam    18 black\n```\n\n\n:::\n\n```{.r .cell-code}\nrownames(df3)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"Bob\"   \"Susan\" \"Sam\"\n```\n\n\n:::\n\n```{.r .cell-code}\ndf3[\"Bob\", ]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#>     age  hair\n#> Bob  35 blond\n```\n\n\n:::\n\n```{.r .cell-code}\nrownames(df3) <- c(\"Susan\", \"Bob\", \"Sam\")\nrownames(df3)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"Susan\" \"Bob\"   \"Sam\"\n```\n\n\n:::\n\n```{.r .cell-code}\ndf3[\"Bob\", ]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#>     age  hair\n#> Bob  27 brown\n```\n\n\n:::\n:::\n\n\nThere are three reasons why row names are undesirable:\n\n3. Metadata is data, so storing it in a different way to the rest of the data is fundamentally a bad idea. \n2. Row names are a poor abstraction for labelling rows because they only work when a row can be identified by a single string. This fails in many cases.\n3. Row names must be unique, so any duplication of rows (e.g. from bootstrapping) will create new row names.\n\n</details>\n\n\n## Tibles: Printing\n\nData frames and tibbles print differently\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndf3\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#>       age  hair\n#> Susan  35 blond\n#> Bob    27 brown\n#> Sam    18 black\n```\n\n\n:::\n\n```{.r .cell-code}\ntibble::as_tibble(df3)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> # A tibble: 3 × 2\n#>     age hair \n#>   <dbl> <chr>\n#> 1    35 blond\n#> 2    27 brown\n#> 3    18 black\n```\n\n\n:::\n:::\n\n\n\n## Tibles: Subsetting\n\nTwo undesirable subsetting behaviours:\n\n1. When you subset columns with `df[, vars]`, you will get a vector if vars selects one variable, otherwise you’ll get a data frame, unless you always remember to use `df[, vars, drop = FALSE]`.\n2. When you attempt to extract a single column with `df$x` and there is no column `x`, a data frame will instead select any variable that starts with `x`. If no variable starts with `x`, `df$x` will return NULL.\n\nTibbles tweak these behaviours so that a [ always returns a tibble, and a $ doesn’t do partial matching and warns if it can’t find a variable (*this is what makes tibbles surly*).\n\n## Tibles: Testing\n\nWhether data frame: `is.data.frame()`. Note: both data frame and tibble are data frames.\n\nWhether tibble: `tibble::is_tibble`. Note: only tibbles are tibbles. Vanilla data frames are not.\n\n## Tibles: Coercion\n\n-   To data frame: `as.data.frame()`\n-   To tibble: `tibble::as_tibble()`\n\n## Tibles: List Columns\n\nList-columns are allowed in data frames but you have to do a little extra work by either adding the list-column after creation or wrapping the list in `I()`\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndf4 <- data.frame(x = 1:3)\ndf4$y <- list(1:2, 1:3, 1:4)\ndf4\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#>   x          y\n#> 1 1       1, 2\n#> 2 2    1, 2, 3\n#> 3 3 1, 2, 3, 4\n```\n\n\n:::\n\n```{.r .cell-code}\ndf5 <- data.frame(\n  x = 1:3, \n  y = I(list(1:2, 1:3, 1:4))\n)\ndf5\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#>   x          y\n#> 1 1       1, 2\n#> 2 2    1, 2, 3\n#> 3 3 1, 2, 3, 4\n```\n\n\n:::\n:::\n\n\n## Tibbles: Matrix and data frame columns\n\n- As long as the number of rows matches the data frame, it’s also possible to have a matrix or data frame as a column of a data frame.\n- same as list-columns, must either addi the list-column after creation or wrapping the list in `I()`\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndfm <- data.frame(\n  x = 1:3 * 10,\n  y = I(matrix(1:9, nrow = 3))\n)\n\ndfm$z <- data.frame(a = 3:1, b = letters[1:3], stringsAsFactors = FALSE)\n\nstr(dfm)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> 'data.frame':\t3 obs. of  3 variables:\n#>  $ x: num  10 20 30\n#>  $ y: 'AsIs' int [1:3, 1:3] 1 2 3 4 5 6 7 8 9\n#>  $ z:'data.frame':\t3 obs. of  2 variables:\n#>   ..$ a: int  3 2 1\n#>   ..$ b: chr  \"a\" \"b\" \"c\"\n```\n\n\n:::\n\n```{.r .cell-code}\ndfm$y\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#>      [,1] [,2] [,3]\n#> [1,]    1    4    7\n#> [2,]    2    5    8\n#> [3,]    3    6    9\n```\n\n\n:::\n\n```{.r .cell-code}\ndfm$z\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#>   a b\n#> 1 3 a\n#> 2 2 b\n#> 3 1 c\n```\n\n\n:::\n:::\n\n\n\n## Exercises 1/4\n\n1. Can you have a data frame with zero rows? What about zero columns?\n\n<details><summary>Answer(s)</summary>\nYes, you can create these data frames easily; either during creation or via subsetting. Even both dimensions can be zero. Create a 0-row, 0-column, or an empty data frame directly:\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndata.frame(a = integer(), b = logical())\n#> [1] a b\n#> <0 rows> (or 0-length row.names)\n\ndata.frame(row.names = 1:3)  # or data.frame()[1:3, ]\n#> data frame with 0 columns and 3 rows\n\ndata.frame()\n#> data frame with 0 columns and 0 rows\n```\n:::\n\n\nCreate similar data frames via subsetting the respective dimension with either 0, `NULL`, `FALSE` or a valid 0-length atomic (`logical(0)`, `character(0)`, `integer(0)`, `double(0)`). Negative integer sequences would also work. The following example uses a zero:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmtcars[0, ]\n#>  [1] mpg  cyl  disp hp   drat wt   qsec vs   am   gear carb\n#> <0 rows> (or 0-length row.names)\n\nmtcars[ , 0]  # or mtcars[0]\n#> data frame with 0 columns and 32 rows\n\nmtcars[0, 0]\n#> data frame with 0 columns and 0 rows\n```\n:::\n\n\n\n</details>\n\n## Exercises 2/4\n\n2. What happens if you attempt to set rownames that are not unique?\n\n<details><summary>Answer(s)</summary>\nMatrices can have duplicated row names, so this does not cause problems.\n\nData frames, however, require unique rownames and you get different results depending on how you attempt to set them. If you set them directly or via `row.names()`, you get an error:\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndata.frame(row.names = c(\"x\", \"y\", \"y\"))\n#> Error in data.frame(row.names = c(\"x\", \"y\", \"y\")): duplicate row.names: y\n\ndf <- data.frame(x = 1:3)\nrow.names(df) <- c(\"x\", \"y\", \"y\")\n#> Warning: non-unique value when setting 'row.names': 'y'\n#> Error in `.rowNamesDF<-`(x, value = value): duplicate 'row.names' are not allowed\n```\n:::\n\n\nIf you use subsetting, `[` automatically deduplicates:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nrow.names(df) <- c(\"x\", \"y\", \"z\")\ndf[c(1, 1, 1), , drop = FALSE]\n#>     x\n#> x   1\n#> x.1 1\n#> x.2 1\n```\n:::\n\n\n</details>\n\n## Exercises 3/4\n\n3. If `df` is a data frame, what can you say about `t(df)`, and `t(t(df))`? Perform some experiments, making sure to try different column types.\n\n<details><summary>Answer(s)</summary>\nBoth of `t(df)` and `t(t(df))` will return matrices:\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndf <- data.frame(x = 1:3, y = letters[1:3])\nis.matrix(df)\n#> [1] FALSE\nis.matrix(t(df))\n#> [1] TRUE\nis.matrix(t(t(df)))\n#> [1] TRUE\n```\n:::\n\n\nThe dimensions will respect the typical transposition rules:\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndim(df)\n#> [1] 3 2\ndim(t(df))\n#> [1] 2 3\ndim(t(t(df)))\n#> [1] 3 2\n```\n:::\n\n\nBecause the output is a matrix, every column is coerced to the same type. (It is implemented within `t.data.frame()` via `as.matrix()` which is described below).\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndf\n#>   x y\n#> 1 1 a\n#> 2 2 b\n#> 3 3 c\nt(df)\n#>   [,1] [,2] [,3]\n#> x \"1\"  \"2\"  \"3\" \n#> y \"a\"  \"b\"  \"c\"\n```\n:::\n\n\n</details>\n\n## Exercises 4/4\n\n4. What does `as.matrix()` do when applied to a data frame with columns of different types? How does it differ from `data.matrix()`?\n\n<details><summary>Answer(s)</summary>\nThe type of the result of as.matrix depends on the types of the input columns (see `?as.matrix`):\n\n> The method for data frames will return a character matrix if there is only atomic columns and any non-(numeric/logical/complex) column, applying as.vector to factors and format to other non-character columns. Otherwise the usual coercion hierarchy (logical < integer < double < complex) will be used, e.g. all-logical data frames will be coerced to a logical matrix, mixed logical-integer will give an integer matrix, etc.\n\nOn the other hand, `data.matrix` will always return a numeric matrix (see `?data.matrix()`).\n\n> Return the matrix obtained by converting all the variables in a data frame to numeric mode and then binding them together as the columns of a matrix. Factors and ordered factors are replaced by their internal codes. […] Character columns are first converted to factors and then to integers.\n\nWe can illustrate and compare the mechanics of these functions using a concrete example. `as.matrix()` makes it possible to retrieve most of the original information from the data frame but leaves us with characters. To retrieve all information from `data.matrix()`’s output, we would need a lookup table for each column.\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndf_coltypes <- data.frame(\n  a = c(\"a\", \"b\"),\n  b = c(TRUE, FALSE),\n  c = c(1L, 0L),\n  d = c(1.5, 2),\n  e = factor(c(\"f1\", \"f2\"))\n)\n\nas.matrix(df_coltypes)\n#>      a   b       c   d     e   \n#> [1,] \"a\" \"TRUE\"  \"1\" \"1.5\" \"f1\"\n#> [2,] \"b\" \"FALSE\" \"0\" \"2.0\" \"f2\"\ndata.matrix(df_coltypes)\n#>      a b c   d e\n#> [1,] 1 1 1 1.5 1\n#> [2,] 2 0 0 2.0 2\n```\n:::\n\n\n</details>\n\n\n## `NULL`\n\nSpecial type of object that:\n\n-   Length 0\n-   Cannot have attributes\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntypeof(NULL)\n#> [1] \"NULL\"\n\nlength(NULL)\n#> [1] 0\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nx <- NULL\nattr(x, \"y\") <- 1\n```\n\n::: {.cell-output .cell-output-error}\n\n```\n#> Error in attr(x, \"y\") <- 1: attempt to set an attribute on NULL\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nis.null(NULL)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] TRUE\n```\n\n\n:::\n:::\n\n\n\n## Digestif\n\nLet is use some of this chapter's skills on the `penguins` data.\n\n## Attributes\n\n\n::: {.cell}\n\n```{.r .cell-code}\nstr(penguins_raw)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> tibble [344 × 17] (S3: tbl_df/tbl/data.frame)\n#>  $ studyName          : chr [1:344] \"PAL0708\" \"PAL0708\" \"PAL0708\" \"PAL0708\" ...\n#>  $ Sample Number      : num [1:344] 1 2 3 4 5 6 7 8 9 10 ...\n#>  $ Species            : chr [1:344] \"Adelie Penguin (Pygoscelis adeliae)\" \"Adelie Penguin (Pygoscelis adeliae)\" \"Adelie Penguin (Pygoscelis adeliae)\" \"Adelie Penguin (Pygoscelis adeliae)\" ...\n#>  $ Region             : chr [1:344] \"Anvers\" \"Anvers\" \"Anvers\" \"Anvers\" ...\n#>  $ Island             : chr [1:344] \"Torgersen\" \"Torgersen\" \"Torgersen\" \"Torgersen\" ...\n#>  $ Stage              : chr [1:344] \"Adult, 1 Egg Stage\" \"Adult, 1 Egg Stage\" \"Adult, 1 Egg Stage\" \"Adult, 1 Egg Stage\" ...\n#>  $ Individual ID      : chr [1:344] \"N1A1\" \"N1A2\" \"N2A1\" \"N2A2\" ...\n#>  $ Clutch Completion  : chr [1:344] \"Yes\" \"Yes\" \"Yes\" \"Yes\" ...\n#>  $ Date Egg           : Date[1:344], format: \"2007-11-11\" \"2007-11-11\" ...\n#>  $ Culmen Length (mm) : num [1:344] 39.1 39.5 40.3 NA 36.7 39.3 38.9 39.2 34.1 42 ...\n#>  $ Culmen Depth (mm)  : num [1:344] 18.7 17.4 18 NA 19.3 20.6 17.8 19.6 18.1 20.2 ...\n#>  $ Flipper Length (mm): num [1:344] 181 186 195 NA 193 190 181 195 193 190 ...\n#>  $ Body Mass (g)      : num [1:344] 3750 3800 3250 NA 3450 ...\n#>  $ Sex                : chr [1:344] \"MALE\" \"FEMALE\" \"FEMALE\" NA ...\n#>  $ Delta 15 N (o/oo)  : num [1:344] NA 8.95 8.37 NA 8.77 ...\n#>  $ Delta 13 C (o/oo)  : num [1:344] NA -24.7 -25.3 NA -25.3 ...\n#>  $ Comments           : chr [1:344] \"Not enough blood for isotopes.\" NA NA \"Adult not sampled.\" ...\n#>  - attr(*, \"spec\")=List of 3\n#>   ..$ cols   :List of 17\n#>   .. ..$ studyName          : list()\n#>   .. .. ..- attr(*, \"class\")= chr [1:2] \"collector_character\" \"collector\"\n#>   .. ..$ Sample Number      : list()\n#>   .. .. ..- attr(*, \"class\")= chr [1:2] \"collector_double\" \"collector\"\n#>   .. ..$ Species            : list()\n#>   .. .. ..- attr(*, \"class\")= chr [1:2] \"collector_character\" \"collector\"\n#>   .. ..$ Region             : list()\n#>   .. .. ..- attr(*, \"class\")= chr [1:2] \"collector_character\" \"collector\"\n#>   .. ..$ Island             : list()\n#>   .. .. ..- attr(*, \"class\")= chr [1:2] \"collector_character\" \"collector\"\n#>   .. ..$ Stage              : list()\n#>   .. .. ..- attr(*, \"class\")= chr [1:2] \"collector_character\" \"collector\"\n#>   .. ..$ Individual ID      : list()\n#>   .. .. ..- attr(*, \"class\")= chr [1:2] \"collector_character\" \"collector\"\n#>   .. ..$ Clutch Completion  : list()\n#>   .. .. ..- attr(*, \"class\")= chr [1:2] \"collector_character\" \"collector\"\n#>   .. ..$ Date Egg           :List of 1\n#>   .. .. ..$ format: chr \"\"\n#>   .. .. ..- attr(*, \"class\")= chr [1:2] \"collector_date\" \"collector\"\n#>   .. ..$ Culmen Length (mm) : list()\n#>   .. .. ..- attr(*, \"class\")= chr [1:2] \"collector_double\" \"collector\"\n#>   .. ..$ Culmen Depth (mm)  : list()\n#>   .. .. ..- attr(*, \"class\")= chr [1:2] \"collector_double\" \"collector\"\n#>   .. ..$ Flipper Length (mm): list()\n#>   .. .. ..- attr(*, \"class\")= chr [1:2] \"collector_double\" \"collector\"\n#>   .. ..$ Body Mass (g)      : list()\n#>   .. .. ..- attr(*, \"class\")= chr [1:2] \"collector_double\" \"collector\"\n#>   .. ..$ Sex                : list()\n#>   .. .. ..- attr(*, \"class\")= chr [1:2] \"collector_character\" \"collector\"\n#>   .. ..$ Delta 15 N (o/oo)  : list()\n#>   .. .. ..- attr(*, \"class\")= chr [1:2] \"collector_double\" \"collector\"\n#>   .. ..$ Delta 13 C (o/oo)  : list()\n#>   .. .. ..- attr(*, \"class\")= chr [1:2] \"collector_double\" \"collector\"\n#>   .. ..$ Comments           : list()\n#>   .. .. ..- attr(*, \"class\")= chr [1:2] \"collector_character\" \"collector\"\n#>   ..$ default: list()\n#>   .. ..- attr(*, \"class\")= chr [1:2] \"collector_guess\" \"collector\"\n#>   ..$ skip   : num 1\n#>   ..- attr(*, \"class\")= chr \"col_spec\"\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nstr(penguins_raw, give.attr = FALSE)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> tibble [344 × 17] (S3: tbl_df/tbl/data.frame)\n#>  $ studyName          : chr [1:344] \"PAL0708\" \"PAL0708\" \"PAL0708\" \"PAL0708\" ...\n#>  $ Sample Number      : num [1:344] 1 2 3 4 5 6 7 8 9 10 ...\n#>  $ Species            : chr [1:344] \"Adelie Penguin (Pygoscelis adeliae)\" \"Adelie Penguin (Pygoscelis adeliae)\" \"Adelie Penguin (Pygoscelis adeliae)\" \"Adelie Penguin (Pygoscelis adeliae)\" ...\n#>  $ Region             : chr [1:344] \"Anvers\" \"Anvers\" \"Anvers\" \"Anvers\" ...\n#>  $ Island             : chr [1:344] \"Torgersen\" \"Torgersen\" \"Torgersen\" \"Torgersen\" ...\n#>  $ Stage              : chr [1:344] \"Adult, 1 Egg Stage\" \"Adult, 1 Egg Stage\" \"Adult, 1 Egg Stage\" \"Adult, 1 Egg Stage\" ...\n#>  $ Individual ID      : chr [1:344] \"N1A1\" \"N1A2\" \"N2A1\" \"N2A2\" ...\n#>  $ Clutch Completion  : chr [1:344] \"Yes\" \"Yes\" \"Yes\" \"Yes\" ...\n#>  $ Date Egg           : Date[1:344], format: \"2007-11-11\" \"2007-11-11\" ...\n#>  $ Culmen Length (mm) : num [1:344] 39.1 39.5 40.3 NA 36.7 39.3 38.9 39.2 34.1 42 ...\n#>  $ Culmen Depth (mm)  : num [1:344] 18.7 17.4 18 NA 19.3 20.6 17.8 19.6 18.1 20.2 ...\n#>  $ Flipper Length (mm): num [1:344] 181 186 195 NA 193 190 181 195 193 190 ...\n#>  $ Body Mass (g)      : num [1:344] 3750 3800 3250 NA 3450 ...\n#>  $ Sex                : chr [1:344] \"MALE\" \"FEMALE\" \"FEMALE\" NA ...\n#>  $ Delta 15 N (o/oo)  : num [1:344] NA 8.95 8.37 NA 8.77 ...\n#>  $ Delta 13 C (o/oo)  : num [1:344] NA -24.7 -25.3 NA -25.3 ...\n#>  $ Comments           : chr [1:344] \"Not enough blood for isotopes.\" NA NA \"Adult not sampled.\" ...\n```\n\n\n:::\n:::\n\n\n## Data Frames vs Tibbles\n\n\n::: {.cell}\n\n```{.r .cell-code}\npenguins_df <- data.frame(penguins)\npenguins_tb <- penguins #i.e. penguins was already a tibble\n```\n:::\n\n\n## Printing\n\n* Tip: print out these results in RStudio under different editor themes\n\n\n::: {.cell}\n\n```{.r .cell-code}\nprint(penguins_df) #don't run this\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nhead(penguins_df)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#>   species    island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g\n#> 1  Adelie Torgersen           39.1          18.7               181        3750\n#> 2  Adelie Torgersen           39.5          17.4               186        3800\n#> 3  Adelie Torgersen           40.3          18.0               195        3250\n#> 4  Adelie Torgersen             NA            NA                NA          NA\n#> 5  Adelie Torgersen           36.7          19.3               193        3450\n#> 6  Adelie Torgersen           39.3          20.6               190        3650\n#>      sex year\n#> 1   male 2007\n#> 2 female 2007\n#> 3 female 2007\n#> 4   <NA> 2007\n#> 5 female 2007\n#> 6   male 2007\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\npenguins_tb\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> # A tibble: 344 × 8\n#>    species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g\n#>    <fct>   <fct>              <dbl>         <dbl>             <int>       <int>\n#>  1 Adelie  Torgersen           39.1          18.7               181        3750\n#>  2 Adelie  Torgersen           39.5          17.4               186        3800\n#>  3 Adelie  Torgersen           40.3          18                 195        3250\n#>  4 Adelie  Torgersen           NA            NA                  NA          NA\n#>  5 Adelie  Torgersen           36.7          19.3               193        3450\n#>  6 Adelie  Torgersen           39.3          20.6               190        3650\n#>  7 Adelie  Torgersen           38.9          17.8               181        3625\n#>  8 Adelie  Torgersen           39.2          19.6               195        4675\n#>  9 Adelie  Torgersen           34.1          18.1               193        3475\n#> 10 Adelie  Torgersen           42            20.2               190        4250\n#> # ℹ 334 more rows\n#> # ℹ 2 more variables: sex <fct>, year <int>\n```\n\n\n:::\n:::\n\n\n## Atomic Vectors\n\n\n::: {.cell}\n\n```{.r .cell-code}\nspecies_vector_df <- penguins_df |> select(species)\nspecies_unlist_df <- penguins_df |> select(species) |> unlist()\nspecies_pull_df   <- penguins_df |> select(species) |> pull()\n\nspecies_vector_tb <- penguins_tb |> select(species)\nspecies_unlist_tb <- penguins_tb |> select(species) |> unlist()\nspecies_pull_tb   <- penguins_tb |> select(species) |> pull()\n```\n:::\n\n\n<details>\n<summary>`typeof()` and `class()`</summary>\n\n::: {.cell}\n\n```{.r .cell-code}\ntypeof(species_vector_df)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"list\"\n```\n\n\n:::\n\n```{.r .cell-code}\nclass(species_vector_df)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"data.frame\"\n```\n\n\n:::\n\n```{.r .cell-code}\ntypeof(species_unlist_df)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"integer\"\n```\n\n\n:::\n\n```{.r .cell-code}\nclass(species_unlist_df)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"factor\"\n```\n\n\n:::\n\n```{.r .cell-code}\ntypeof(species_pull_df)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"integer\"\n```\n\n\n:::\n\n```{.r .cell-code}\nclass(species_pull_df)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"factor\"\n```\n\n\n:::\n\n```{.r .cell-code}\ntypeof(species_vector_tb)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"list\"\n```\n\n\n:::\n\n```{.r .cell-code}\nclass(species_vector_tb)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"tbl_df\"     \"tbl\"        \"data.frame\"\n```\n\n\n:::\n\n```{.r .cell-code}\ntypeof(species_unlist_tb)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"integer\"\n```\n\n\n:::\n\n```{.r .cell-code}\nclass(species_unlist_tb)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"factor\"\n```\n\n\n:::\n\n```{.r .cell-code}\ntypeof(species_pull_tb)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"integer\"\n```\n\n\n:::\n\n```{.r .cell-code}\nclass(species_pull_tb)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"factor\"\n```\n\n\n:::\n:::\n\n\n</details>\n\n## Column Names\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncolnames(penguins_tb)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] \"species\"           \"island\"            \"bill_length_mm\"   \n#> [4] \"bill_depth_mm\"     \"flipper_length_mm\" \"body_mass_g\"      \n#> [7] \"sex\"               \"year\"\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nnames(penguins_tb) == colnames(penguins_tb)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nnames(penguins_df) == names(penguins_tb)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE\n```\n\n\n:::\n:::\n\n\n## What if we only invoke a partial name of a column of a tibble?\n\n\n::: {.cell}\n\n```{.r .cell-code}\npenguins_tb$y \n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> NULL\n```\n\n\n:::\n:::\n\n\n![tibbles are surly!](images/vectors/surly_tibbles.png)\n\n* What if we only invoke a partial name of a column of a data frame?\n\n\n::: {.cell}\n\n```{.r .cell-code}\nhead(penguins_df$y) #instead of `year`\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 2007 2007 2007 2007 2007 2007\n```\n\n\n:::\n:::\n\n\n* Is this evaluation in alphabetical order or column order?\n\n\n::: {.cell}\n\n```{.r .cell-code}\npenguins_df_se_sp <- penguins_df |> select(sex, species)\npenguins_df_sp_se <- penguins_df |> select(species, sex)\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nhead(penguins_df_se_sp$s)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> NULL\n```\n\n\n:::\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nhead(penguins_df_sp_se$s)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> NULL\n```\n\n\n:::\n:::\n\n\n\n## Chapter Quiz 1/5\n\n1. What are the four common types of atomic vectors? What are the two rare types?\n\n<details><summary>Answer(s)</summary>\nThe four common types of atomic vector are logical, integer, double and character. The two rarer types are complex and raw.\n</details>\n\n## Chapter Quiz 2/5\n\n2. What are attributes? How do you get them and set them?\n\n<details><summary>Answer(s)</summary>\nAttributes allow you to associate arbitrary additional metadata to any object. You can get and set individual attributes with `attr(x, \"y\")` and `attr(x, \"y\") <- value`; or you can get and set all attributes at once with `attributes()`.\n</details>\n\n## Chapter Quiz 3/5\n\n3. How is a list different from an atomic vector? How is a matrix different from a data frame?\n\n<details><summary>Answer(s)</summary>\nThe elements of a list can be any type (even a list); the elements of an atomic vector are all of the same type. Similarly, every element of a matrix must be the same type; in a data frame, different columns can have different types.\n</details>\n\n## Chapter Quiz 4/5\n\n4. Can you have a list that is a matrix? Can a data frame have a column that is a matrix?\n\n<details><summary>Answer(s)</summary>\nYou can make a list-array by assigning dimensions to a list. You can make a matrix a column of a data frame with `df$x <- matrix()`, or by using `I()` when creating a new data frame `data.frame(x = I(matrix()))`.\n</details>\n\n## Chapter Quiz 5/5\n\n5. How do tibbles behave differently from data frames?\n\n<details><summary>Answer(s)</summary>\nTibbles have an enhanced print method, never coerce strings to factors, and provide stricter subsetting methods.\n</details>\n",
      6     "supporting": [],
      7     "filters": [
      8       "rmarkdown/pagebreak.lua"
      9     ],
     10     "includes": {
     11       "include-after-body": [
     12         "\n<script>\n  // htmlwidgets need to know to resize themselves when slides are shown/hidden.\n  // Fire the \"slideenter\" event (handled by htmlwidgets.js) when the current\n  // slide changes (different for each slide format).\n  (function () {\n    // dispatch for htmlwidgets\n    function fireSlideEnter() {\n      const event = window.document.createEvent(\"Event\");\n      event.initEvent(\"slideenter\", true, true);\n      window.document.dispatchEvent(event);\n    }\n\n    function fireSlideChanged(previousSlide, currentSlide) {\n      fireSlideEnter();\n\n      // dispatch for shiny\n      if (window.jQuery) {\n        if (previousSlide) {\n          window.jQuery(previousSlide).trigger(\"hidden\");\n        }\n        if (currentSlide) {\n          window.jQuery(currentSlide).trigger(\"shown\");\n        }\n      }\n    }\n\n    // hookup for slidy\n    if (window.w3c_slidy) {\n      window.w3c_slidy.add_observer(function (slide_num) {\n        // slide_num starts at position 1\n        fireSlideChanged(null, w3c_slidy.slides[slide_num - 1]);\n      });\n    }\n\n  })();\n</script>\n\n"
     13       ]
     14     },
     15     "engineDependencies": {},
     16     "preserve": {},
     17     "postProcess": true
     18   }
     19 }