www.eamoncaddigan.net

Content and configuration for https://www.eamoncaddigan.net
git clone https://git.eamoncaddigan.net/www.eamoncaddigan.net.git
Log | Files | Refs | Submodules | README

index.md (9468B)


      1 ---
      2 title: "Bootstrap analysis of anti-vaccination belief changes"
      3 description: Another way of looking at the antivaccination data of Horne, et al.
      4 date: 2015-09-17
      5 categories:
      6 - Data Science
      7 - Science
      8 tags:
      9 - Psychology
     10 - Statistics
     11 - R
     12 ---
     13 
     14 ## Introduction
     15 
     16 In a [previous post]({{< ref "/posts/antivax-attitudes/index.md"
     17 >}}), I presented an analysis of data by [Horne, Powell, Hummel & Holyoak,
     18 (2015)](http://www.pnas.org/content/112/33/10321.abstract) that investigated
     19 changes in attitudes toward childhood vaccinations. The previous analysis
     20 used Bayesian estimation to show a credible increase in pro-vaccination
     21 attitudes following a "disease risk" intervention, but not an "autism
     22 correction" intervention.
     23 
     24 Some of my friends offered insightful comments, and one [pointed
     25 out](https://twitter.com/johnclevenger/status/639795727439429632) what appeared
     26 to be a failure of random assignment. Participants in the "disease risk" group
     27 happened to have lower scores on the pre-intervention survey and therefore had
     28 more room for improvement. This is a fair criticism, but a subsequent analysis
     29 showed that post-intervention scores were also higher for the "disease risk"
     30 group, which addresses this issue.
     31 
     32 ![Posterior of final score differences](bayesian_ending_scores.png)
     33 
     34 ### Bootstrapping 
     35 
     36 Interpreting differences in parameter values isn't always straightforward, so I
     37 thought it'd be worthwhile to try a different approach. Instead of fitting a
     38 generative model to the sample, we can use bootstrapping to estimate the
     39 unobserved population parameters.
     40 [Bootstrapping](https://en.wikipedia.org/wiki/Bootstrapping_(statistics)) is
     41 conceptually simple; I feel it would have much wider adoption today had
     42 computers been around in the early days of statistics.
     43 
     44 Here is code that uses bootstrapping to estimate the probability of each
     45 response on the pre-intervention survey (irrespective of survey question or
     46 intervention group assignment). The sample mean is already an unbiased
     47 estimator of the population mean, so bootstrapping isn't necessary in this
     48 first example. However, this provides a simple illustration of how the
     49 technique works: sample observations *with replacement* from the data,
     50 calculate a statistic on this new data, and repeat. The mean of the observed
     51 statistic values provides an estimate of the population statistic, and the
     52 distribution of statistic values provides a measure of certainty. 
     53 
     54 
     55 ```r
     56 numBootstraps <- 1e5  # Should be a big number
     57 numObservations <- nrow(questionnaireData)
     58 uniqueResponses <- sort(unique(questionnaireData$pretest_response))
     59 interventionLevels <- levels(questionnaireData$intervention)
     60 
     61 # The observed proportion of responses at each level
     62 pretestResponseProbs <- as.numeric(table(questionnaireData$pretest_response)) / 
     63   numObservations
     64 ```
     65 ```r
     66 # Bootstrap to find the probability that each response will be given to pre-test
     67 # questions.
     68 pretestData <- array(data = 0,
     69                      dim = c(numBootstraps,
     70                              length(uniqueResponses)),
     71                      dimnames = list(NULL,
     72                                      paste(uniqueResponses)))
     73 
     74 # Run the bootstrap
     75 for (ii in seq_len(numBootstraps)) {
     76   bootSamples <- sample(questionnaireData$pretest_response, 
     77                         numObservations, 
     78                         replace = TRUE)
     79   bootSamplesTabulated <- table(bootSamples)
     80   pretestData[ii, names(bootSamplesTabulated)] <- bootSamplesTabulated
     81 }
     82 
     83 # Convert the counts to probabilities
     84 pretestData <- pretestData / numObservations
     85 ```
     86 
     87 ![Bootstrap estimaes of responses](pretest_plot-1.png)
     88 
     89 As expected, the bootstrap estimates for the proportion of responses at each
     90 level almost exactly match the observed data. There are supposed to be
     91 error-bars around the points, which show the bootstrap estimates, but they're
     92 obscured by the points themselves.
     93 
     94 ## Changes in vaccination attitudes
     95 
     96 Due to chance alone, the three groups of participants (the control group, the
     97 "autism correction" group, and the "disease risk" group) showed different
     98 patterns of responses to the pre-intervention survey. To mitigate this issue,
     99 the code below estimates the transition probabilities from each response on the
    100 pre-intervention survey to each response on the post-intervention survey, and
    101 does so separately for the groups. These are conditional probabilities, e.g.,
    102 `P(post-intervention rating = 4 | pre-intervention rating = 3)`. 
    103 
    104 The conditional probabilities are then combined with the observed
    105 pre-intervention response probabilities to calculate the joint probability of
    106 each response transition (e.g., `P(post-intervention rating = 4 ∩
    107 pre-intervention rating = 3)`). Importantly, since the prior is agnostic to
    108 subjects' group assignment, these joint probability estimates are less-affected
    109 by biases that would follow from a failure of random assignment.
    110 
    111 ```r
    112 # preintervention responses x intervention groups x bootstraps x postintervention responses
    113 posttestData <- array(data = 0,
    114                       dim = c(length(uniqueResponses),
    115                               length(interventionLevels),
    116                               numBootstraps, 
    117                               length(uniqueResponses)),
    118                       dimnames = list(paste(uniqueResponses),
    119                                       interventionLevels,
    120                                       NULL,
    121                                       paste(uniqueResponses)))
    122 
    123 for (pretestResponse in seq_along(uniqueResponses)) {
    124   for (interventionLevel in seq_along(interventionLevels)) {
    125     # Get the subset of data for each combination of intervention and
    126     # pre-intervention response level.
    127     questionnaireDataSubset <- filter(questionnaireData,
    128                                       intervention == interventionLevels[interventionLevel],
    129                                       pretest_response == pretestResponse)
    130     numObservationsSubset <- nrow(questionnaireDataSubset)
    131     
    132     # Run the bootstrap
    133     for (ii in seq_len(numBootstraps)) {
    134       bootSamples <- sample(questionnaireDataSubset$posttest_response, 
    135                             numObservationsSubset, 
    136                             replace = TRUE)
    137       bootSamplesTabulated <- table(bootSamples)
    138       posttestData[pretestResponse, 
    139                    interventionLevel, 
    140                    ii, 
    141                    names(bootSamplesTabulated)] <- bootSamplesTabulated
    142     }
    143     
    144     # Convert the counts to probabilities
    145     posttestData[pretestResponse, interventionLevel, , ] <- 
    146       posttestData[pretestResponse, interventionLevel, , ] / numObservationsSubset
    147   }
    148   
    149   # Convert the conditional probabilities to joint probabilities using the 
    150   # observed priors on each pretest response.
    151   posttestData[pretestResponse, , , ] <- posttestData[pretestResponse, , , ] *
    152     pretestResponseProbs[pretestResponse]
    153 }
    154 ```
    155 
    156 With the transition probabilities sampled, it's possible to test the
    157 hypothesis: **"participants are more likely to shift toward a more pro-vaccine
    158 attitude following a 'disease risk' intervention than participants in control
    159 and 'autism correction' groups."** We'll use the previously-run bootstrap
    160 samples to compute the each group's probability of increasing scores.
    161 
    162 ```r
    163 posttestIncrease <- array(data = 0, 
    164                           dim = c(numBootstraps,
    165                                   length(interventionLevels)),
    166                           dimnames = list(NULL,
    167                                           interventionLevels))
    168 
    169 for (interventionLevel in seq_along(interventionLevels)) {
    170   for (pretestResponse in seq_along(uniqueResponses)) {
    171     for (posttestResponse in seq_along(uniqueResponses)) {
    172       if (posttestResponse > pretestResponse) {
    173         posttestIncrease[, interventionLevel] <- posttestIncrease[, interventionLevel] +
    174           posttestData[pretestResponse, interventionLevel, , posttestResponse]
    175       }
    176     }
    177   }
    178 }
    179 ```
    180 
    181 This estimates the probability that post-intervention responses from the
    182 "disease risk" group have a greater probability of being higher than their
    183 pre-intervention counterparts than responses from the "autism correction"
    184 group.
    185 
    186 ```r
    187 sum(posttestIncrease[, which(interventionLevels == "Disease Risk")] > 
    188       posttestIncrease[, which(interventionLevels == "Autism Correction")]) / 
    189   nrow(posttestIncrease)
    190 ```
    191 
    192 ```
    193 ## [1] 0.99992
    194 ```
    195 
    196 Below is a visualization of the bootstrap distributions. This illustrates the
    197 certainty of the estimates of the probability  that participants would express
    198 stronger pro-vaccination attitudes after the interventions. 
    199 
    200 ![Bootstrap estimates of probability of ratings increase](posttest_plot-1.png)
    201 
    202 ## Conclusion
    203 
    204 Bootstrapping shows that a "disease risk" intervention has a stronger effect
    205 than others in shifting participants' pro-vaccination attitudes. This analysis
    206 collapses across the five survey questions used by Horne and colleagues, but it
    207 would be straightforward to extend this code to estimate attitude change
    208 probabilities separately for each question. 
    209 
    210 Although there are benefits to analyzing data with nonparametric methods, the
    211 biggest shortcoming of the approach I've used here is that it can not estimate
    212 the size of the attitude changes. Instead, it estimates that probability of
    213 pro-vaccination attitude changes occurring, and the difference in these
    214 probabilities between the groups. This is a great example of why it's important
    215 to keep your question in mind while analyzing data. 
    216