small_things

A repo to stash small (single file) coding solutions.
git clone https://git.eamoncaddigan.net/small_things.git
Log | Files | Refs | README | LICENSE

PSweight_issues.Rmd (3937B)


      1 ---
      2 title: "PSweight Issues"
      3 author: "Eamon Caddigan, Ph.D."
      4 date: "5/16/2022"
      5 output: html_document
      6 ---
      7 
      8 ```{r setup, include=FALSE}
      9 knitr::opts_chunk$set(echo = TRUE)
     10 ```
     11 
     12 PSweight 1.1.6 has issues that prevent it from completing analyses in some environments. The newest version of the package, 1.1.7, cannot currently be installed on some machines that I use because of separate dependency issues.
     13 
     14 This document examines how to replicate some of the functionality of PSweight so that data can be analyzed (specifically, propensity score weighting with overlap weights) without depending on it.
     15 
     16 I'm going to follow the example provided in the PSweight vignette.
     17 
     18 ```{r libraries}
     19 library(PSweight)
     20 library(dplyr)
     21 library(ggplot2)
     22 ```
     23 
     24 # Propensity Scores
     25 
     26 By default, `SumStat` and `PSweight` just use a GLM with the logistic link function.
     27 
     28 ```{r propensity}
     29 ps.any <- Dany ~ white + maemp + as.factor(scht) + as.factor(qmab) +
     30   as.factor(qmab2) + as.factor(qvab) + as.factor(qvab2) + paed_u + maed_u +
     31   agepa + agema + sib_u + paed_u * agepa + maed_u * agema
     32 
     33 bal.any <- SumStat(ps.formula = ps.any, data = NCDS,
     34                    weight = "overlap")
     35 
     36 prop_model <- glm(ps.any, family = binomial(link = "logit"), data = NCDS)
     37 propensities <- predict(prop_model, type = "response")
     38 overlap_weights = ifelse(NCDS$Dany == 1,
     39                          1 - propensities,
     40                          propensities)
     41 
     42 all.equal(unname(bal.any$propensity[, "1"]),
     43           unname(propensities))
     44 ```
     45 
     46 # Density and Love Plots
     47 
     48 Density plots are trivially easy.
     49 
     50 ```{r density_plot}
     51 NCDS %>%
     52   mutate(propensity = propensities,
     53          Dany = as.factor(Dany)) %>%
     54   ggplot(aes(propensity)) +
     55   geom_density(aes(fill = Dany),
     56                alpha = 0.2) +
     57   theme_minimal()
     58 ```
     59 
     60 Love plots require more work. The default balance plot uses the average standardized difference (ASD) with weighted variance, and we'll use that too.
     61 
     62 ```{r love_plot}
     63 # Return the weighted estimate of x at z level z_i using weights w
     64 weighted_est <- function(z, w, x, z_i) {
     65   stopifnot(z %in% c(0, 1),
     66             z_i %in% z)
     67   sum(w[z == z_i] * x[z == z_i]) / sum(w[z == z_i])
     68 }
     69 
     70 # Return the weighted difference in x using the weights w and groupings z
     71 weighted_diffrence <- function(z, w, x) {
     72   weighted_est(z, w, x, 1) - weighted_est(z, w, x, 0)
     73 }
     74 
     75 asd <- function(z, w, x) {
     76   # TODO: weighted variance is tricky and I don't want it to be a blocker
     77   abs(weighted_difference(z, w, x)) / sqrt(NA)
     78 }
     79 ```
     80 
     81 I'll come back to this (maybe)
     82 
     83 # Weighted differences
     84 
     85 The whole point of this is to look at weighted differences (and doubly robust estimators). Point estimates are easy!
     86 
     87 ```{r weighted_difference_psweight}
     88 ato.any <- PSweight(ps.formula = ps.any, yname = "wage", data = NCDS, bootstrap = TRUE)
     89 
     90 ato.any
     91 ```
     92 
     93 ```{r diff_point_est}
     94 c(weighted_est(NCDS$Dany, overlap_weights, NCDS$wage, 0),
     95   weighted_est(NCDS$Dany, overlap_weights, NCDS$wage, 1))
     96 ```
     97 
     98 Variance estimates are trickier. The sandwich estimator is computationally simpler but more difficult to implement, and bootstrapping is the opposite. I'm going with the latter.
     99 
    100 ```{r diff_summary}
    101 summary(ato.any)
    102 ```
    103 
    104 ```{r}
    105 diff_reps <- replicate(
    106   100,
    107   {
    108     n <- sample.int(nrow(NCDS), replace = TRUE)
    109     prop_model <- glm(ps.any, family = binomial(link = "logit"), data = NCDS[n, ])
    110     propensities <- predict(prop_model, type = "response")
    111     overlap_weights = ifelse(NCDS$Dany[n] == 1,
    112                              1 - propensities,
    113                              propensities)
    114     weighted_diffrence(NCDS$Dany[n], overlap_weights, NCDS$wage[n])
    115   }
    116 )
    117 
    118 diff_results <- c(Estimate = weighted_diffrence(NCDS$Dany, overlap_weights, NCDS$wage),
    119                   Std.Err = sd(diff_reps),
    120                   lwr = quantile(diff_reps, 0.025),
    121                   upr = quantile(diff_reps, 0.975),
    122                   p.val = mean(diff_reps < 0))
    123 
    124 diff_results
    125 ```
    126 
    127 Pretty close, we'll use this.