commit 1294434bc010f048d1ede6b11cee225bbdaef060
parent ba2aa53434f47791c352a422080d0642c0856e29
Author: DrEntropy <DrEntropy@users.noreply.github.com>
Date:   Tue, 13 Jun 2023 04:11:53 -0700
notes for the last chapter ! (#54)
* Made changes for this cohort
* Add silly meme
Diffstat:
3 files changed, 249 insertions(+), 108 deletions(-)
diff --git a/25_Rewriting_R_code_in_C++.Rmd b/25_Rewriting_R_code_in_C++.Rmd
@@ -2,17 +2,19 @@
 
 **Learning objectives:**
 
--   how to improve performance by rewriting key functions in C++
--   how to use [{Rcpp} package](https://www.jstatsoft.org/index.php/jss/article/view/v040i08/475) (with key contributions by Doug Bates, John Chambers, and JJ Allaire)
--   how to check who's faster
+-   Learn to improve performance by rewriting bottlenecks in C++
+
+-   Introduction to the [{Rcpp} package](https://www.rcpp.org/)
 
 ## Introduction
 
-In this chapter we'll learn how to rewrite **R** code in **C++** for making it faster! We'll use the **Rcpp package** which provides **API for comparison**.
+In this chapter we'll learn how to rewrite **R** code in **C++** to make it faster using the **Rcpp package**. The **Rcpp**  package makes it simple to connect C++ to R! With C++ you can fix:
 
-A closer look at C++ will provide an overview of the language and its key conventions to focus on the differences with R. We will also have look at the standard template library STL, the C++ library to use, which provides a set of extremely useful data structures and algorithms.
+-   Loops that can't be easily vectorised because subsequent iterations depend on previous ones.
 
-A very interesting part will involve comparing the two codes benchmarking the two implementations yields with `bench::mark()`.
+-   Recursive functions, or problems which involve calling functions millions of times. The overhead of calling a function in C++ is much lower than in R.
+
+-   Problems that require advanced data structures and algorithms that R doesn't provide. Through the **standard template library (STL)**, C++ has efficient implementations of many important data structures, from ordered maps to double-ended queue
 
 <center>Like how?</center>
 
@@ -20,15 +22,7 @@ A very interesting part will involve comparing the two codes benchmarking the tw
 
 <center></center>
 
-## What C++ can handle
-
-The C language was originally implemented by Dennis Ritchie for Linux at the end of '70s. C++ is the object oriented version of C.
-
--   Loops that can't be easily vectorised because subsequent iterations depend on previous ones.
-
--   Recursive functions, or problems which involve calling functions millions of times. The overhead of calling a function in C++ is much lower than in R.
-
--   Problems that require advanced data structures and algorithms that R doesn't provide. Through the **standard template library (STL)**, C++ has efficient implementations of many important data structures, from ordered maps to double-ended queues.
+ 
 
 ## Getting started with C++
 
@@ -42,33 +36,8 @@ Install a C++ compiler:
 -   Xcode, on Mac
 -   Sudo apt-get install r-base-dev or similar, on Linux.
 
-### C++ conventions
-
-<center> </center>
-
-<center> </center>
-
-<center></center>
-
-
-
--   Use = for assignment, not \<-.
--   Scalars and vectors are different:
-    -   scalar equivalents of numeric, integer, character,
-    -   logical vectors are: double, int, String, and bool.
--   explicitly use a `return` statement to return a value from a function.
--   Every statement is terminated by a ;
--   The for statement has a different syntax: for(init; check; increment)
--   vector indices start at 0
--   methods are called with .
--   total += x[i] is equivalent to total = total + x[i].
--   in-place operators are -=, \*=, and /=
--   uses pow(), not \^, for exponentiation
--   comment block: /\*\*\* R \# This is R code \*/
 
-## Examples with the cppFunction function
-
-### Make the sum function
+### First example {-}
 
 Rcpp compiling the C++ code:
 
@@ -83,30 +52,23 @@ add
 add(1, 2, 3)
 ```
 
-### Build a simple numerical function without arguments
+Some things to note:
 
-In R:
 
-```{r}
-one <- function() 1L
-one()+100
-```
-
-In C++:
+-   The syntax to create a function is different.
+-   Types of inputs and outputs must be explicitly declared
+-   Use = for assignment, not `<-`.
+-   Every statement is terminated by a ;
+-   C++ has it's own name for the types we are used to:
+    -   scalar types are `int`, `double`, `bool` and `String`
+    -   vector types (for Rcpp) are `IntegerVector`, `NumericVector`, `LogicalVector` and `CharacterVector`
+    -   Other R types are available in C++: `List`, `Function`, `DataFrame`, and more.
+    
+-   Explicitly use a `return` statement to return a value from a function.
 
-    int one() {
-         return 1;
-              }
+ 
 
-Translation:
-
-```{r}
-cppFunction('int one() {
-  return 1;
-}')
-```
-
-### The sign function
+## Example with scalar input and output {-}
 
 ```{r}
 signR <- function(x) {
@@ -139,7 +101,9 @@ cppFunction('int signC(int x) {
 }')
 ```
 
-### Sum of a sequence: sumR vs sumC
+* Note that the `if` syntax is identical! Not everything is different!
+
+## Vector Input, Scalar output:{-}
 
 ```{r}
 sumR <- function(x) {
@@ -167,6 +131,15 @@ cppFunction('double sumC(NumericVector x) {
 }')
 ```
 
+Some observations:
+
+-   vector indices *start at 0*
+-   The for statement has a different syntax: for(init; check; increment)
+-   Methods are called with `.`
+-   `total += x[i]` is equivalent to `total = total + x[i]`.
+-   other in-place operators are `-=`, `*=`, `and /=`
+
+
 To check for the fastest way we can use:
 
 ```{r eval=FALSE}
@@ -182,7 +155,7 @@ bench::mark(
 )
 ```
 
-### Euclidean distance: pdistR versus pdistC
+## Vector input and output {-}
 
 ```{r}
 pdistR <- function(x, ys) {
@@ -202,6 +175,8 @@ cppFunction('NumericVector pdistC(double x, NumericVector ys) {
 }')
 ```
 
+Note:   uses `pow()`, not `^`, for exponentiation
+
 ```{r}
 y <- runif(1e6)
 bench::mark(
@@ -210,37 +185,37 @@ bench::mark(
 )[1:6]
 ```
 
-## Source your C++ code
+## Source your C++ code {-}
 
 Source stand-alone C++ files into R using `sourceCpp()`
 
-<center> </center>
-
-<center> </center>
-
-<center></center>
-
 
 C++ files have extension `.cpp`
 
-```{r eval=FALSE}
+```
 #include <Rcpp.h>
 using namespace Rcpp;
 ```
 
 And for each function that you want available within R, you need to prefix it with:
 
-```{r eval=FALSE}
+```
 // [[Rcpp::export]]
 ```
 
-To call the files: 
+Inside a cpp file you can include `R` code using special comments
+
+```
+/*** R
+rcode here
+*/
+```
+
 
-- in R use `source(echo = TRUE)` 
-- in C++ use `sourceCpp("path/to/file.cpp")`
 
-### Example
+### Example {-}
 
+This block in Rmarkdown uses `{Rcpp}` as a short hand for  engine = "Rcpp". 
 
 ```{Rcpp}
 #include <Rcpp.h>
@@ -266,9 +241,23 @@ bench::mark(
 */
 ```
 
+NOTE: For some reason although the r code above runs, `knit` doesn't include the output. Why?
+
+```{r}
+x <- runif(1e5)
+bench::mark(
+  mean(x),
+  meanC(x)
+)
+```
+
+
+
 ## Data frames, functions, and attributes
 
-Example of Data frames
+### Lists and Dataframes {-}
+
+Contrived example to illustrate how to access a dataframe from c++:
 
 ```{Rcpp}
 #include <Rcpp.h>
@@ -295,42 +284,193 @@ mod <- lm(mpg ~ wt, data = mtcars)
 mpe(mod)
 ```
 
+- Note that you must *cast* the values to the required type. C++ needs to know the types in advance.
+
+### Functions {-}
+
+```{Rcpp}
+#include <Rcpp.h>
+using namespace Rcpp;
+
+// [[Rcpp::export]]
+RObject callWithOne(Function f) {
+  return f(1);
+}
+```
+
+
+```{r}
+callWithOne(function(x) x + 1)
+```
+
+
+* Other values can be accessed from c++ including
+
+   * attributes (use: `.attr()`. Also `.names()` is alias for name attribute.
+   * `Environment`, `DottedPair`, `Language`, `Symbol` , etc. 
+
 ## Missing values
 
-Dealing with missing values can differs if:
+### Missing values behave differently for C++ scalers{-}
+
+* Scalar NA's in Cpp : `NA_LOGICAL`, `NA_INTEGER`, `NA_REAL`, `NA_STRING`.
+
+* Integers (`int`) stores R NA's as the smallest integer. Better to use length 1 `IntegerVector`
+* Doubles use IEEE 754 NaN , which behaves a bit differently for logical expressions (but ok for math expressions). 
+
+```{r}
+evalCpp("NA_REAL || FALSE")
+```
+
+* Strings are a class from Rcpp, so they handle missing values fine.
 
--   scalars:
-    -   integers
-    -   doubles
--   strings
--   Boolean
--   vectors
+* `bool` can only hold two values, so be careful. Consider using vectors of length 1 or coercing to `int`
+
+
+### Vectors
+
+* Vectors are all type introduced by RCpp and know how to handle missing values if you use the specific type for that vector.
+
+```{Rcpp}
+#include <Rcpp.h>
+using namespace Rcpp;
+
+// [[Rcpp::export]]
+List missing_sampler() {
+  return List::create(
+    NumericVector::create(NA_REAL),
+    IntegerVector::create(NA_INTEGER),
+    LogicalVector::create(NA_LOGICAL),
+    CharacterVector::create(NA_STRING)
+  );
+}
+```
+
+```{r}
+str(missing_sampler())
+```
 
 ## Standard Template Library
 
-STL is the fundamental library in C++, it provides a set of useful data structures and algorithms.
+STL provides powerful data structures and algorithms for C++.  
+
+### Iterators {-}
+
+Iterators are used extensively in the STL to abstract away details of underlying data structures.
+
+If you an iterator `it`, you can:
+
+- Get the value by 'dereferencing' with `*it`
+- Advance to the next value with `++it`
+- Compare iterators (locations) with `==`
+
+
+### Algorithms {-}
 
+* The real power of iterators comes from using them with STL algorithms. 
+ 
+* A good reference is [https://en.cppreference.com/w/cpp/algorithm]
 
-- Using iterators, the next step up from basic loops:
-    - NumericVector::iterator
-    - LogicalVector::iterator
-    - CharacterVector::iterator
-- Algorithms
-- Data structures:
-    - vector
-    - unordered_set
-    - unordered_map
-- Vectors
-- Map
+* Book provides examples using `accumulate` and `upper_buond`
 
-A good resource is **Effective STL by Scott Meyers**.
-And one more about the STL data structures is [the container](https://en.cppreference.com/w/cpp/container)
+* Another Example:
+
+```{Rcpp}
+
+#include <algorithm>
+#include <Rcpp.h>
+
+using namespace Rcpp;
+ 
+ 
+// Explicit iterator version
+ 
+// [[Rcpp::export]]
+NumericVector square_C_it(NumericVector x){
+  NumericVector out(x.size());
+  // Each container has its own iterator type
+  NumericVector::iterator in_it;
+  NumericVector::iterator out_it;
+  
+  for(in_it = x.begin(), out_it = out.begin(); in_it != x.end();  ++in_it, ++out_it) {
+    *out_it = pow(*in_it,2);
+  }
+  
+  return out;
+  
+}
+ 
+ 
+// Use algorithm 'transform'
+  
+// [[Rcpp::export]]
+NumericVector square_C(NumericVector x) {
+ 
+  NumericVector out(x.size());
+ 
+ 
+  std::transform(x.begin(),x.end(), out.begin(),
+            [](double v) -> double { return v*v; });
+  return out;
+}
+```
+
+```{r}
+square_C(c(1.0,2.0,3.0))
+```
+```{r}
+square_C_it(c(1.0,2.0,3.0))
+```
+
+## Data Structures {-}
+
+STL provides a large set of data structures. Some of the most important:
+
+* `std::vector` - like an `R` vector, except knows how to grow efficiently
+
+* `std::unordered_set` - unique set of values. Ordered version `std::set`. Unordered is more efficient.
+
+* `std::map` - Moslty similar to `R` lists, provide an association between a key and a value. There is also an unordered version. 
+
+A quick example illustrating the `map`:
+
+```{Rcpp}
+#include <Rcpp.h>
+using namespace Rcpp;
+
+// [[Rcpp::export]]
+std::map<double, int> tableC(NumericVector x) {
+  // Note the types are <key, value>
+  std::map<double, int> counts;
+
+  int n = x.size();
+  for (int i = 0; i < n; i++) {
+    counts[x[i]]++;
+  }
+
+  return counts;
+}
+```
+
+
+```{r}
+res = tableC(c(1,1,2,1,4,5))
+res
+```
+
+* Note that the map is converted to a named vector in this case on return
+
+ 
+To learn more about the STL data structures see [containers](https://en.cppreference.com/w/cpp/container) at `cppreference`
 
 ## Case Studies
 
+
+
 Real life uses of C++ to replace slow R code.
 
-### Case study 1: Gibbs sampler
+
+## Case study 1: Gibbs sampler {-}
 
 The [Gibbs sampler](https://en.wikipedia.org/wiki/Gibbs_sampling) is a method for estimating parameters expectations. It is a **MCMC algorithm** that has been adapted to sample from multidimensional target distributions. Gibbs sampling generates a **Markov chain** of samples, each of which is correlated with nearby samples. 
 
@@ -398,7 +538,7 @@ bench::mark(
 )
 ```
 
-### Case study 2: predict a model response from three inputs
+## Case study 2: predict a model response from three inputs {-}
 
 [Rcpp is smoking fast for agent based models in data frames](https://gweissman.github.io/post/rcpp-is-smoking-fast-for-agent-based-models-in-data-frames/) by Gary Weissman, MD, MSHP.
 
@@ -427,7 +567,7 @@ vacc1 <- function(age, female, ily) {
 }
 ```
 
-R code without a for loop:
+Vectorized R code:
 
 ```{r}
 vacc2 <- function(age, female, ily) {
@@ -494,14 +634,15 @@ bench::mark(
 
 ## Resources
 
--   [Rcpp: Seamless R and C++ Integration](https://www.jstatsoft.org/index.php/jss/article/view/v040i08/475)
--   [cpp-tutorial](https://www.learncpp.com/cpp-tutorial/introduction-to-function-parameters-and-arguments/)
+-   [Rcpp: Seamless R and C++ Integration](https:\\Rcpp.org)
+-   [cpp-tutorial](https://www.learncpp.com) is often recommended. Lots of ads though!
 -   [cpp-reference](https://en.cppreference.com/w/cpp)
-- A good resource is **Effective STL by Scott Meyers**
-- the STL data structures found in [the container](https://en.cppreference.com/w/cpp/container)
-- [Exposing C++ functions and classes
-with Rcpp modules](https://cran.rstudio.com/web/packages/Rcpp/vignettes/Rcpp-modules.pdf)
-- All gifs are from: https://giphy.com/
+-   [C++20 for Programmers](https://www.pearson.com/en-us/subject-catalog/p/c20-for-programmers-an-objects-natural-approach/P200000000211/9780137570461) is a newer book that covers modern c++ for people who know programming in another language.
+ 
+## Op Success!
+
+
+
 
 ## Meeting Videos
 
diff --git a/images/case_study.jpg b/images/case_study.jpg
Binary files differ.
diff --git a/images/we-did-it-celebration-meme.jpg b/images/we-did-it-celebration-meme.jpg
Binary files differ.