index.md (9468B)
1 --- 2 title: "Bootstrap analysis of anti-vaccination belief changes" 3 description: Another way of looking at the antivaccination data of Horne, et al. 4 date: 2015-09-17 5 categories: 6 - Data Science 7 - Science 8 tags: 9 - Psychology 10 - Statistics 11 - R 12 --- 13 14 ## Introduction 15 16 In a [previous post]({{< ref "/posts/antivax-attitudes/index.md" 17 >}}), I presented an analysis of data by [Horne, Powell, Hummel & Holyoak, 18 (2015)](http://www.pnas.org/content/112/33/10321.abstract) that investigated 19 changes in attitudes toward childhood vaccinations. The previous analysis 20 used Bayesian estimation to show a credible increase in pro-vaccination 21 attitudes following a "disease risk" intervention, but not an "autism 22 correction" intervention. 23 24 Some of my friends offered insightful comments, and one [pointed 25 out](https://twitter.com/johnclevenger/status/639795727439429632) what appeared 26 to be a failure of random assignment. Participants in the "disease risk" group 27 happened to have lower scores on the pre-intervention survey and therefore had 28 more room for improvement. This is a fair criticism, but a subsequent analysis 29 showed that post-intervention scores were also higher for the "disease risk" 30 group, which addresses this issue. 31 32 ![Posterior of final score differences](bayesian_ending_scores.png) 33 34 ### Bootstrapping 35 36 Interpreting differences in parameter values isn't always straightforward, so I 37 thought it'd be worthwhile to try a different approach. Instead of fitting a 38 generative model to the sample, we can use bootstrapping to estimate the 39 unobserved population parameters. 40 [Bootstrapping](https://en.wikipedia.org/wiki/Bootstrapping_(statistics)) is 41 conceptually simple; I feel it would have much wider adoption today had 42 computers been around in the early days of statistics. 43 44 Here is code that uses bootstrapping to estimate the probability of each 45 response on the pre-intervention survey (irrespective of survey question or 46 intervention group assignment). The sample mean is already an unbiased 47 estimator of the population mean, so bootstrapping isn't necessary in this 48 first example. However, this provides a simple illustration of how the 49 technique works: sample observations *with replacement* from the data, 50 calculate a statistic on this new data, and repeat. The mean of the observed 51 statistic values provides an estimate of the population statistic, and the 52 distribution of statistic values provides a measure of certainty. 53 54 55 ```r 56 numBootstraps <- 1e5 # Should be a big number 57 numObservations <- nrow(questionnaireData) 58 uniqueResponses <- sort(unique(questionnaireData$pretest_response)) 59 interventionLevels <- levels(questionnaireData$intervention) 60 61 # The observed proportion of responses at each level 62 pretestResponseProbs <- as.numeric(table(questionnaireData$pretest_response)) / 63 numObservations 64 ``` 65 ```r 66 # Bootstrap to find the probability that each response will be given to pre-test 67 # questions. 68 pretestData <- array(data = 0, 69 dim = c(numBootstraps, 70 length(uniqueResponses)), 71 dimnames = list(NULL, 72 paste(uniqueResponses))) 73 74 # Run the bootstrap 75 for (ii in seq_len(numBootstraps)) { 76 bootSamples <- sample(questionnaireData$pretest_response, 77 numObservations, 78 replace = TRUE) 79 bootSamplesTabulated <- table(bootSamples) 80 pretestData[ii, names(bootSamplesTabulated)] <- bootSamplesTabulated 81 } 82 83 # Convert the counts to probabilities 84 pretestData <- pretestData / numObservations 85 ``` 86 87 ![Bootstrap estimaes of responses](pretest_plot-1.png) 88 89 As expected, the bootstrap estimates for the proportion of responses at each 90 level almost exactly match the observed data. There are supposed to be 91 error-bars around the points, which show the bootstrap estimates, but they're 92 obscured by the points themselves. 93 94 ## Changes in vaccination attitudes 95 96 Due to chance alone, the three groups of participants (the control group, the 97 "autism correction" group, and the "disease risk" group) showed different 98 patterns of responses to the pre-intervention survey. To mitigate this issue, 99 the code below estimates the transition probabilities from each response on the 100 pre-intervention survey to each response on the post-intervention survey, and 101 does so separately for the groups. These are conditional probabilities, e.g., 102 `P(post-intervention rating = 4 | pre-intervention rating = 3)`. 103 104 The conditional probabilities are then combined with the observed 105 pre-intervention response probabilities to calculate the joint probability of 106 each response transition (e.g., `P(post-intervention rating = 4 ∩ 107 pre-intervention rating = 3)`). Importantly, since the prior is agnostic to 108 subjects' group assignment, these joint probability estimates are less-affected 109 by biases that would follow from a failure of random assignment. 110 111 ```r 112 # preintervention responses x intervention groups x bootstraps x postintervention responses 113 posttestData <- array(data = 0, 114 dim = c(length(uniqueResponses), 115 length(interventionLevels), 116 numBootstraps, 117 length(uniqueResponses)), 118 dimnames = list(paste(uniqueResponses), 119 interventionLevels, 120 NULL, 121 paste(uniqueResponses))) 122 123 for (pretestResponse in seq_along(uniqueResponses)) { 124 for (interventionLevel in seq_along(interventionLevels)) { 125 # Get the subset of data for each combination of intervention and 126 # pre-intervention response level. 127 questionnaireDataSubset <- filter(questionnaireData, 128 intervention == interventionLevels[interventionLevel], 129 pretest_response == pretestResponse) 130 numObservationsSubset <- nrow(questionnaireDataSubset) 131 132 # Run the bootstrap 133 for (ii in seq_len(numBootstraps)) { 134 bootSamples <- sample(questionnaireDataSubset$posttest_response, 135 numObservationsSubset, 136 replace = TRUE) 137 bootSamplesTabulated <- table(bootSamples) 138 posttestData[pretestResponse, 139 interventionLevel, 140 ii, 141 names(bootSamplesTabulated)] <- bootSamplesTabulated 142 } 143 144 # Convert the counts to probabilities 145 posttestData[pretestResponse, interventionLevel, , ] <- 146 posttestData[pretestResponse, interventionLevel, , ] / numObservationsSubset 147 } 148 149 # Convert the conditional probabilities to joint probabilities using the 150 # observed priors on each pretest response. 151 posttestData[pretestResponse, , , ] <- posttestData[pretestResponse, , , ] * 152 pretestResponseProbs[pretestResponse] 153 } 154 ``` 155 156 With the transition probabilities sampled, it's possible to test the 157 hypothesis: **"participants are more likely to shift toward a more pro-vaccine 158 attitude following a 'disease risk' intervention than participants in control 159 and 'autism correction' groups."** We'll use the previously-run bootstrap 160 samples to compute the each group's probability of increasing scores. 161 162 ```r 163 posttestIncrease <- array(data = 0, 164 dim = c(numBootstraps, 165 length(interventionLevels)), 166 dimnames = list(NULL, 167 interventionLevels)) 168 169 for (interventionLevel in seq_along(interventionLevels)) { 170 for (pretestResponse in seq_along(uniqueResponses)) { 171 for (posttestResponse in seq_along(uniqueResponses)) { 172 if (posttestResponse > pretestResponse) { 173 posttestIncrease[, interventionLevel] <- posttestIncrease[, interventionLevel] + 174 posttestData[pretestResponse, interventionLevel, , posttestResponse] 175 } 176 } 177 } 178 } 179 ``` 180 181 This estimates the probability that post-intervention responses from the 182 "disease risk" group have a greater probability of being higher than their 183 pre-intervention counterparts than responses from the "autism correction" 184 group. 185 186 ```r 187 sum(posttestIncrease[, which(interventionLevels == "Disease Risk")] > 188 posttestIncrease[, which(interventionLevels == "Autism Correction")]) / 189 nrow(posttestIncrease) 190 ``` 191 192 ``` 193 ## [1] 0.99992 194 ``` 195 196 Below is a visualization of the bootstrap distributions. This illustrates the 197 certainty of the estimates of the probability that participants would express 198 stronger pro-vaccination attitudes after the interventions. 199 200 ![Bootstrap estimates of probability of ratings increase](posttest_plot-1.png) 201 202 ## Conclusion 203 204 Bootstrapping shows that a "disease risk" intervention has a stronger effect 205 than others in shifting participants' pro-vaccination attitudes. This analysis 206 collapses across the five survey questions used by Horne and colleagues, but it 207 would be straightforward to extend this code to estimate attitude change 208 probabilities separately for each question. 209 210 Although there are benefits to analyzing data with nonparametric methods, the 211 biggest shortcoming of the approach I've used here is that it can not estimate 212 the size of the attitude changes. Instead, it estimates that probability of 213 pro-vaccination attitude changes occurring, and the difference in these 214 probabilities between the groups. This is a great example of why it's important 215 to keep your question in mind while analyzing data. 216