blogposts

Code used to generate some blog posts.
git clone https://git.eamoncaddigan.net/blogposts.git
Log | Files | Refs | README | LICENSE

2015-12-19-riddler-2.Rmd (3915B)


      1 ---
      2 title: "Riddler #2"
      3 author: "Eamon Caddigan"
      4 date: "2015-12-19"
      5 output: html_document
      6 layout: post
      7 summary: Solving and simulating a probability puzzle.
      8 categories: R stats
      9 ---
     10 
     11 FiveThirtyEight has a new puzzle feature called The Riddler. This week they posted [their second puzzle](http://fivethirtyeight.com/features/which-geyser-gushes-first/), which involves probability:
     12 
     13 > You arrive at the beautiful Three Geysers National Park. You read a placard
     14 > explaining that the three eponymous geysers — creatively named A, B and C —
     15 > erupt at intervals of precisely two hours, four hours and six hours,
     16 > respectively. However, you just got there, so you have no idea how the three
     17 > eruptions are staggered. Assuming they each started erupting at some
     18 > independently random point in history, what are the probabilities that A, B 
     19 > and C, respectively, will be the first to erupt after your arrival?
     20 
     21 ### Analytic solution
     22 
     23 Responses were due last night at midnight, so I hope I'm not spoiling anything by sharing mine.
     24 
     25 When you arrive at the park, the first eruption of geyser A is likely to occur at any time within the next two hours with a uniform probability. Denoting the number of hours until the first eruption of the geyser as “A”, A ~ Uniform(0, 2). Similarly, B ~ Uniform(0, 4) and C ~ Uniform(0, 6).
     26  
     27 To compute the probability of geyser A erupting first, separately consider the cases where geysers B and C do and do not erupt within the first two hours of waiting.
     28 
     29 Using Bayes Theorem:
     30 
     31 $$Pr(A first \cap B<2, C<2) = Pr(A first | B < 2, C < 2) Pr(B < 2, C < 2)$$
     32 
     33 Assuming independence of the geysers:
     34 
     35 $$= Pr(A first | B < 2, C < 2) Pr(B < 2) Pr(C < 2)$$
     36 
     37 Considering the case here, in which all three geysers erupt in the first two hours, the probability of any geyser erupting first is 1/3. The probability of geysers B and C erupting in the first two hours is easy to calculate.
     38 
     39 $$= (1/3) \times (1/2) \times (1/3) = 1/18$$
     40 
     41 Using similar logic:
     42 
     43 $$Pr(A first \cap B > 2, C < 2) = (1/2) \times (1/2) \times (1/3) = 1/12\\
     44 Pr(A first \cap B < 2, C > 2) = (1/2) \times (1/2) \times (2/3) = 1/6\\
     45 Pr(A first \cap B > 2, C > 2) = 1 \times (1/2) \times (2/3) = 1/3$$
     46  
     47 These disjoint events partition the sample space, so the law of total probability dictates:
     48 
     49 $$Pr(A first) = 1/18 + 1/12 + 1/6 + 1/3 = 23/36 \approx 0.6389$$
     50  
     51 Do the same thing to calculate the probability of geyser B erupting before the others:
     52 
     53 $$Pr(B first \cap B < 2, C < 2) = (1/3) \times (1/2) \times (1/3) = 1/18\\
     54 Pr(B first \cap B < 2, C > 2) = (1/2) \times (1/2) \times (2/3) = 1/6$$
     55 
     56 Note that when geyser B does not erupt within the first two hours, another geyser is guaranteed to erupt before it:
     57 
     58 $$Pr(B first \cap B > 2, C < 2) = 0\\
     59 Pr(B first \cap B > 2, C > 2) = 0$$
     60 
     61 Therefore:
     62 
     63 $$Pr(B first) = 1/18 + 1/6 = 4/18 \approx 0.2222$$
     64  
     65 And geyser C is similar:
     66 
     67 $$Pr(C first \cap B < 2, C < 2) = (1/3) \times (1/2) \times (1/3) = 1/18\\
     68 Pr(C first \cap B > 2, C < 2) = (1/2) \times (1/2) \times (1/3) = 1/12\\
     69 Pr(C first) = 1/18 + 1/12 = 5/36 \approx 0.1389$$
     70 
     71 ### Quick simulation
     72 
     73 It never hurts to check results with a simulation. After all, math is hard and programming is easy.
     74 
     75 ```{r, warning=FALSE, message=FALSE}
     76 library(dplyr)
     77 library(ggplot2)
     78 
     79 numSamples <- 1e6
     80 
     81 # Generate random wait times for each geyser's next eruption
     82 geysers <- data_frame(A = runif(numSamples, 0, 2),
     83                       B = runif(numSamples, 0, 4),
     84                       C = runif(numSamples, 0, 6))
     85 geyserNames <- colnames(geysers)
     86 
     87 # Identify the geyser with the smallest time-to eruption
     88 geysers <- geysers %>%
     89   mutate(first_geyser = geyserNames[max.col(-geysers[, geyserNames])])
     90 
     91 # Plot the results
     92 geysers %>% 
     93   ggplot(aes(x = first_geyser)) + geom_bar()
     94 
     95 # Count observations and estimate probabilities
     96 geysers %>% 
     97   count(first_geyser) %>% 
     98   mutate(prob_first = n / numSamples)
     99 ```
    100 
    101 Close enough!