index.md (4126B)
1 --- 2 title: "Riddler #2" 3 date: "2015-12-19" 4 description: Solving and simulating a probability puzzle. 5 categories: 6 - Data Science 7 tags: 8 - Statistics 9 katex: true 10 --- 11 12 FiveThirtyEight has a new puzzle feature called The Riddler. This week they 13 posted [their second 14 puzzle](http://fivethirtyeight.com/features/which-geyser-gushes-first/), which 15 involves probability: 16 17 > You arrive at the beautiful Three Geysers National Park. You read a placard 18 > explaining that the three eponymous geysers — creatively named A, B and C — 19 > erupt at intervals of precisely two hours, four hours and six hours, 20 > respectively. However, you just got there, so you have no idea how the three 21 > eruptions are staggered. Assuming they each started erupting at some 22 > independently random point in history, what are the probabilities that A, B 23 > and C, respectively, will be the first to erupt after your arrival? 24 25 ### Analytic solution 26 27 Responses were due last night at midnight, so I hope I'm not spoiling anything 28 by sharing mine. 29 30 When you arrive at the park, the first eruption of geyser A is likely to occur 31 at any time within the next two hours with a uniform probability. Denoting the 32 number of hours until the first eruption of the geyser as “A”, A ~ Uniform(0, 33 2). Similarly, B ~ Uniform(0, 4) and C ~ Uniform(0, 6). 34 35 To compute the probability of geyser A erupting first, separately consider the 36 cases where geysers B and C do and do not erupt within the first two hours of 37 waiting. 38 39 Using Bayes Theorem: 40 41 $$Pr(A first \cap B<2, C<2) = Pr(A first | B < 2, C < 2) Pr(B < 2, C < 2)$$ 42 43 Assuming independence of the geysers: 44 45 $$= Pr(A first | B < 2, C < 2) Pr(B < 2) Pr(C < 2)$$ 46 47 Considering the case here, in which all three geysers erupt in the first two 48 hours, the probability of any geyser erupting first is 1/3. The probability of 49 geysers B and C erupting in the first two hours is easy to calculate. 50 51 $$= (1/3) \times (1/2) \times (1/3) = 1/18$$ 52 53 Using similar logic: 54 55 $$Pr(A first \cap B > 2, C < 2) = (1/2) \times (1/2) \times (1/3) = 1/12$$ 56 $$Pr(A first \cap B < 2, C > 2) = (1/2) \times (1/2) \times (2/3) = 1/6$$ 57 $$Pr(A first \cap B > 2, C > 2) = 1 \times (1/2) \times (2/3) = 1/3$$ 58 59 These disjoint events partition the sample space, so the law of total 60 probability dictates: 61 62 $$Pr(A first) = 1/18 + 1/12 + 1/6 + 1/3 = 23/36 \approx 0.6389$$ 63 64 Do the same thing to calculate the probability of geyser B erupting before the 65 others: 66 67 $$Pr(B first \cap B < 2, C < 2) = (1/3) \times (1/2) \times (1/3) = 1/18$$ 68 $$Pr(B first \cap B < 2, C > 2) = (1/2) \times (1/2) \times (2/3) = 1/6$$ 69 70 Note that when geyser B does not erupt within the first two hours, another 71 geyser is guaranteed to erupt before it: 72 73 $$Pr(B first \cap B > 2, C < 2) = 0$$ 74 $$Pr(B first \cap B > 2, C > 2) = 0$$ 75 76 Therefore: 77 78 $$Pr(B first) = 1/18 + 1/6 = 4/18 \approx 0.2222$$ 79 80 And geyser C is similar: 81 82 $$Pr(C first \cap B < 2, C < 2) = (1/3) \times (1/2) \times (1/3) = 1/18$$ 83 $$Pr(C first \cap B > 2, C < 2) = (1/2) \times (1/2) \times (1/3) = 1/12$$ 84 $$Pr(C first) = 1/18 + 1/12 = 5/36 \approx 0.1389$$ 85 86 ### Quick simulation 87 88 It never hurts to check results with a simulation. After all, math is hard and 89 programming is easy. 90 91 ```r 92 library(dplyr) 93 library(ggplot2) 94 95 numSamples <- 1e6 96 97 # Generate random wait times for each geyser's next eruption 98 geysers <- data_frame(A = runif(numSamples, 0, 2), 99 B = runif(numSamples, 0, 4), 100 C = runif(numSamples, 0, 6)) 101 geyserNames <- colnames(geysers) 102 103 # Identify the geyser with the smallest time-to eruption 104 geysers <- geysers %>% 105 mutate(first_geyser = geyserNames[max.col(-geysers[, geyserNames])]) 106 107 # Plot the results 108 geysers %>% 109 ggplot(aes(x = first_geyser)) + geom_bar() 110 ``` 111 112 ![simulation results](unnamed-chunk-1-1.png) 113 114 ```r 115 # Count observations and estimate probabilities 116 geysers %>% 117 count(first_geyser) %>% 118 mutate(prob_first = n / numSamples) 119 ``` 120 ``` 121 ## Source: local data frame [3 x 3] 122 ## 123 ## first_geyser n prob_first 124 ## 1 A 640093 0.640093 125 ## 2 B 221439 0.221439 126 ## 3 C 138468 0.138468 127 ``` 128 129 Close enough! 130