02.qmd (4523B)
1 --- 2 engine: knitr 3 title: Names and values 4 --- 5 6 ## Learning objectives 7 8 - Distinguish between an *object* and its *name*. 9 - Identify when data are *copied* versus *modified*. 10 - Trace and identify the memory used by R. 11 12 The `{lobstr}` package will help us throughout the chapter 13 14 ```{r} 15 library(lobstr) 16 ``` 17 18 19 ## Syntactic names are easier to create and work with than non-syntactic names 20 21 22 - Syntactic names: `my_variable`, `x`, `cpp11`, `.by`. 23 - Can't use names in `?Reserved` 24 25 - Non-syntactic names need to be surrounded in backticks. 26 27 ## Names are *bound to* values with `<-` 28 29 ```{r} 30 a <- c(1, 2, 3) 31 a 32 obj_addr(a) 33 ``` 34 35 ## Many names can be bound to the same values 36 37 ```{r} 38 b <- a 39 obj_addr(a) 40 obj_addr(b) 41 ``` 42 43 ## If shared values are modified, the object is copied to a new address 44 45 ```{r} 46 b[[1]] <- 5 47 obj_addr(a) 48 obj_addr(b) 49 ``` 50 51 ## Memory addresses can differ even if objects seem the same 52 53 ```{r} 54 a <- 1:10 55 b <- a 56 c <- 1:10 57 58 obj_addr(a) 59 obj_addr(b) 60 obj_addr(c) 61 ``` 62 63 ## Functions have a single address regardless of how they're referenced 64 65 ```{r} 66 obj_addr(mean) 67 obj_addr(base::mean) 68 obj_addr(get("mean")) 69 ``` 70 71 ## Unlike most objects, environments keep the same memory address on modify 72 73 ```{r} 74 d <- new.env() 75 obj_addr(d) 76 e <- d 77 e[['a']] <- 1 78 obj_addr(e) 79 obj_addr(d) 80 d[['a']] 81 ``` 82 83 ## Use `tracemem` to validate if values are copied or modified 84 85 ```{r} 86 #| eval: false 87 x <- runif(10) 88 tracemem(x) 89 #> [1] "<000001F4185B4B08>" 90 y <- x 91 x[[1]] <- 10 92 #> tracemem[0x000001f4185b4b08 -> 0x000001f4185b4218]: 93 untracemem(x) 94 ``` 95 96 ## `tracemem` shows internal C code minimizes copying 97 98 ```{r} 99 #| eval: false 100 y <- as.list(x) 101 tracemem(y) 102 #> [1] "<000001AD67FDCD38>" 103 medians <- vapply(x, median, numeric(1)) 104 for (i in 1:5) { 105 y[[i]] <- y[[i]] - medians[[i]] 106 } 107 #> tracemem[0x000001ad67fdcd38 -> 0x000001ad61982638]: 108 untracemem(y) 109 ``` 110 111 ## A function's environment follows copy-on-modify rules 112 113 :::: columns 114 115 ::: column 116 ```{r} 117 f <- function(a) { 118 a 119 } 120 121 x <- c(1, 2, 3) 122 z <- f(x) # No change in value 123 124 obj_addr(x) 125 obj_addr(z) # No address change 126 ``` 127 ::: 128 129 ::: column 130  131 ::: 132 133 :::: 134 135 ::: notes 136 - Diagrams will be explained more in chapter 7. 137 - `a` points to same address as `x`. 138 - If `a` modified inside function, `z` would have new address. 139 ::: 140 141 142 ## `ref()` shows the memory address of a list and its *elements* 143 144 :::: columns 145 146 ::: column 147 ```{r} 148 l1 <- list(1, 2, 3) 149 obj_addr(l1) 150 l2 <- l1 151 l2[[3]] <- 4 152 ref(l1, l2) 153 ``` 154 ::: 155 156 ::: column 157 {width=50%} 158 ::: 159 160 :::: 161 162 ## Since dataframes are lists of (column) vectors, mutating a column modifies only that column 163 164 ```{r} 165 d1 <- data.frame(x = c(1, 5, 6), y = c(2, 4, 3)) 166 d2 <- d1 167 d2[, 2] <- d2[, 2] * 2 168 ref(d1, d2) 169 ``` 170 171 ## Since dataframes are lists of (column) vectors, mutating a row modifies the value 172 173 ```{r} 174 d1 <- data.frame(x = c(1, 5, 6), y = c(2, 4, 3)) 175 d2 <- d1 176 d2[1, ] <- d2[1, ] * 2 177 ref(d1, d2) 178 ``` 179 180 ::: notes 181 - Here "mutate" means "change", not `dplyr::mutate()` 182 ::: 183 184 ## Characters are unique due to the global string pool 185 186 :::: columns 187 188 ::: column 189 ```{r} 190 x <- 1:4 191 ref(x) 192 y <- 1:4 193 ref(y) 194 x <- c("a", "a", "b") 195 ref(x, character = TRUE) 196 y <- c("a") 197 ref(y, character = TRUE) 198 ``` 199 ::: 200 201 ::: column 202  203 ::: 204 205 :::: 206 207 ::: notes 208 - "a" is always at the same address. 209 - Each member of character vector has its own address (kind of list-like). 210 ::: 211 212 ## Memory amount can also be measured, using `lobstr::obj_size` 213 214 ```{r} 215 banana <- "bananas bananas bananas" 216 obj_addr(banana) 217 obj_size(banana) 218 ``` 219 220 ## Alternative Representation or ALTREPs represent vector values efficiently 221 222 ```{r} 223 x <- 1:10 224 obj_size(x) 225 y <- 1:10000 226 obj_size(y) 227 ``` 228 229 ## We can measure memory & speed using `bench::mark()` 230 231 ```{r} 232 med <- function(d, medians) { 233 for (i in seq_along(medians)) { 234 d[[i]] <- d[[i]] - medians[[i]] 235 } 236 } 237 x <- data.frame(matrix(runif(5 * 1e4), ncol = 5)) 238 medians <- vapply(x, median, numeric(1)) 239 y <- as.list(x) 240 241 bench::mark( 242 "data.frame" = med(x, medians), 243 "list" = med(y, medians) 244 )[, c("min", "median", "mem_alloc")] 245 ``` 246 247 ::: notes 248 - The thing to see: list version uses less RAM and is faster 249 ::: 250 251 ## The garbage collector `gc()` explicitly clears out unbound objects 252 253 ```{r} 254 x <- 1:3 255 x <- 2:4 # "1:3" is orphaned 256 rm(x) # "2:4" is orphaned 257 gc() 258 lobstr::mem_used() # Wrapper around gc() 259 ``` 260 261 ::: aside 262 `gc()` runs automatically, never *need* to call 263 ::: 264 265 ::: notes 266 - `mem_used()` multiplies Ncells "used" by either 28 (32-bit architecture) or 56 (64-bit architecture)., and Vcells "used" by 8, adds them, and converts to Mb. 267 :::