www.eamoncaddigan.net

Content and configuration for https://www.eamoncaddigan.net
git clone https://git.eamoncaddigan.net/www.eamoncaddigan.net.git
Log | Files | Refs | Submodules | README

index.md (4126B)


      1 ---
      2 title: "Riddler #2"
      3 date: "2015-12-19"
      4 description: Solving and simulating a probability puzzle.
      5 categories:
      6 - Data Science
      7 tags:
      8 - Statistics
      9 katex: true
     10 ---
     11 
     12 FiveThirtyEight has a new puzzle feature called The Riddler. This week they
     13 posted [their second
     14 puzzle](http://fivethirtyeight.com/features/which-geyser-gushes-first/), which
     15 involves probability:
     16 
     17 > You arrive at the beautiful Three Geysers National Park. You read a placard
     18 > explaining that the three eponymous geysers — creatively named A, B and C —
     19 > erupt at intervals of precisely two hours, four hours and six hours,
     20 > respectively. However, you just got there, so you have no idea how the three
     21 > eruptions are staggered. Assuming they each started erupting at some
     22 > independently random point in history, what are the probabilities that A, B 
     23 > and C, respectively, will be the first to erupt after your arrival?
     24 
     25 ### Analytic solution
     26 
     27 Responses were due last night at midnight, so I hope I'm not spoiling anything
     28 by sharing mine.
     29 
     30 When you arrive at the park, the first eruption of geyser A is likely to occur
     31 at any time within the next two hours with a uniform probability. Denoting the
     32 number of hours until the first eruption of the geyser as “A”, A ~ Uniform(0,
     33 2). Similarly, B ~ Uniform(0, 4) and C ~ Uniform(0, 6).
     34  
     35 To compute the probability of geyser A erupting first, separately consider the
     36 cases where geysers B and C do and do not erupt within the first two hours of
     37 waiting.
     38 
     39 Using Bayes Theorem:
     40 
     41 $$Pr(A first \cap B<2, C<2) = Pr(A first | B < 2, C < 2) Pr(B < 2, C < 2)$$
     42 
     43 Assuming independence of the geysers:
     44 
     45 $$= Pr(A first | B < 2, C < 2) Pr(B < 2) Pr(C < 2)$$
     46 
     47 Considering the case here, in which all three geysers erupt in the first two
     48 hours, the probability of any geyser erupting first is 1/3. The probability of
     49 geysers B and C erupting in the first two hours is easy to calculate.
     50 
     51 $$= (1/3) \times (1/2) \times (1/3) = 1/18$$
     52 
     53 Using similar logic:
     54 
     55 $$Pr(A first \cap B > 2, C < 2) = (1/2) \times (1/2) \times (1/3) = 1/12$$
     56 $$Pr(A first \cap B < 2, C > 2) = (1/2) \times (1/2) \times (2/3) = 1/6$$
     57 $$Pr(A first \cap B > 2, C > 2) = 1 \times (1/2) \times (2/3) = 1/3$$
     58  
     59 These disjoint events partition the sample space, so the law of total
     60 probability dictates:
     61 
     62 $$Pr(A first) = 1/18 + 1/12 + 1/6 + 1/3 = 23/36 \approx 0.6389$$
     63  
     64 Do the same thing to calculate the probability of geyser B erupting before the
     65 others:
     66 
     67 $$Pr(B first \cap B < 2, C < 2) = (1/3) \times (1/2) \times (1/3) = 1/18$$
     68 $$Pr(B first \cap B < 2, C > 2) = (1/2) \times (1/2) \times (2/3) = 1/6$$
     69 
     70 Note that when geyser B does not erupt within the first two hours, another
     71 geyser is guaranteed to erupt before it:
     72 
     73 $$Pr(B first \cap B > 2, C < 2) = 0$$
     74 $$Pr(B first \cap B > 2, C > 2) = 0$$
     75 
     76 Therefore:
     77 
     78 $$Pr(B first) = 1/18 + 1/6 = 4/18 \approx 0.2222$$
     79  
     80 And geyser C is similar:
     81 
     82 $$Pr(C first \cap B < 2, C < 2) = (1/3) \times (1/2) \times (1/3) = 1/18$$
     83 $$Pr(C first \cap B > 2, C < 2) = (1/2) \times (1/2) \times (1/3) = 1/12$$
     84 $$Pr(C first) = 1/18 + 1/12 = 5/36 \approx 0.1389$$
     85 
     86 ### Quick simulation
     87 
     88 It never hurts to check results with a simulation. After all, math is hard and
     89 programming is easy.
     90 
     91 ```r
     92 library(dplyr)
     93 library(ggplot2)
     94 
     95 numSamples <- 1e6
     96 
     97 # Generate random wait times for each geyser's next eruption
     98 geysers <- data_frame(A = runif(numSamples, 0, 2),
     99                       B = runif(numSamples, 0, 4),
    100                       C = runif(numSamples, 0, 6))
    101 geyserNames <- colnames(geysers)
    102 
    103 # Identify the geyser with the smallest time-to eruption
    104 geysers <- geysers %>%
    105   mutate(first_geyser = geyserNames[max.col(-geysers[, geyserNames])])
    106 
    107 # Plot the results
    108 geysers %>% 
    109   ggplot(aes(x = first_geyser)) + geom_bar()
    110 ```
    111 
    112 ![simulation results](unnamed-chunk-1-1.png) 
    113 
    114 ```r
    115 # Count observations and estimate probabilities
    116 geysers %>% 
    117   count(first_geyser) %>% 
    118   mutate(prob_first = n / numSamples)
    119 ```
    120 ```
    121 ## Source: local data frame [3 x 3]
    122 ## 
    123 ##   first_geyser      n prob_first
    124 ## 1            A 640093   0.640093
    125 ## 2            B 221439   0.221439
    126 ## 3            C 138468   0.138468
    127 ```
    128 
    129 Close enough! 
    130