PSweight_issues.Rmd (3937B)
1 --- 2 title: "PSweight Issues" 3 author: "Eamon Caddigan, Ph.D." 4 date: "5/16/2022" 5 output: html_document 6 --- 7 8 ```{r setup, include=FALSE} 9 knitr::opts_chunk$set(echo = TRUE) 10 ``` 11 12 PSweight 1.1.6 has issues that prevent it from completing analyses in some environments. The newest version of the package, 1.1.7, cannot currently be installed on some machines that I use because of separate dependency issues. 13 14 This document examines how to replicate some of the functionality of PSweight so that data can be analyzed (specifically, propensity score weighting with overlap weights) without depending on it. 15 16 I'm going to follow the example provided in the PSweight vignette. 17 18 ```{r libraries} 19 library(PSweight) 20 library(dplyr) 21 library(ggplot2) 22 ``` 23 24 # Propensity Scores 25 26 By default, `SumStat` and `PSweight` just use a GLM with the logistic link function. 27 28 ```{r propensity} 29 ps.any <- Dany ~ white + maemp + as.factor(scht) + as.factor(qmab) + 30 as.factor(qmab2) + as.factor(qvab) + as.factor(qvab2) + paed_u + maed_u + 31 agepa + agema + sib_u + paed_u * agepa + maed_u * agema 32 33 bal.any <- SumStat(ps.formula = ps.any, data = NCDS, 34 weight = "overlap") 35 36 prop_model <- glm(ps.any, family = binomial(link = "logit"), data = NCDS) 37 propensities <- predict(prop_model, type = "response") 38 overlap_weights = ifelse(NCDS$Dany == 1, 39 1 - propensities, 40 propensities) 41 42 all.equal(unname(bal.any$propensity[, "1"]), 43 unname(propensities)) 44 ``` 45 46 # Density and Love Plots 47 48 Density plots are trivially easy. 49 50 ```{r density_plot} 51 NCDS %>% 52 mutate(propensity = propensities, 53 Dany = as.factor(Dany)) %>% 54 ggplot(aes(propensity)) + 55 geom_density(aes(fill = Dany), 56 alpha = 0.2) + 57 theme_minimal() 58 ``` 59 60 Love plots require more work. The default balance plot uses the average standardized difference (ASD) with weighted variance, and we'll use that too. 61 62 ```{r love_plot} 63 # Return the weighted estimate of x at z level z_i using weights w 64 weighted_est <- function(z, w, x, z_i) { 65 stopifnot(z %in% c(0, 1), 66 z_i %in% z) 67 sum(w[z == z_i] * x[z == z_i]) / sum(w[z == z_i]) 68 } 69 70 # Return the weighted difference in x using the weights w and groupings z 71 weighted_diffrence <- function(z, w, x) { 72 weighted_est(z, w, x, 1) - weighted_est(z, w, x, 0) 73 } 74 75 asd <- function(z, w, x) { 76 # TODO: weighted variance is tricky and I don't want it to be a blocker 77 abs(weighted_difference(z, w, x)) / sqrt(NA) 78 } 79 ``` 80 81 I'll come back to this (maybe) 82 83 # Weighted differences 84 85 The whole point of this is to look at weighted differences (and doubly robust estimators). Point estimates are easy! 86 87 ```{r weighted_difference_psweight} 88 ato.any <- PSweight(ps.formula = ps.any, yname = "wage", data = NCDS, bootstrap = TRUE) 89 90 ato.any 91 ``` 92 93 ```{r diff_point_est} 94 c(weighted_est(NCDS$Dany, overlap_weights, NCDS$wage, 0), 95 weighted_est(NCDS$Dany, overlap_weights, NCDS$wage, 1)) 96 ``` 97 98 Variance estimates are trickier. The sandwich estimator is computationally simpler but more difficult to implement, and bootstrapping is the opposite. I'm going with the latter. 99 100 ```{r diff_summary} 101 summary(ato.any) 102 ``` 103 104 ```{r} 105 diff_reps <- replicate( 106 100, 107 { 108 n <- sample.int(nrow(NCDS), replace = TRUE) 109 prop_model <- glm(ps.any, family = binomial(link = "logit"), data = NCDS[n, ]) 110 propensities <- predict(prop_model, type = "response") 111 overlap_weights = ifelse(NCDS$Dany[n] == 1, 112 1 - propensities, 113 propensities) 114 weighted_diffrence(NCDS$Dany[n], overlap_weights, NCDS$wage[n]) 115 } 116 ) 117 118 diff_results <- c(Estimate = weighted_diffrence(NCDS$Dany, overlap_weights, NCDS$wage), 119 Std.Err = sd(diff_reps), 120 lwr = quantile(diff_reps, 0.025), 121 upr = quantile(diff_reps, 0.975), 122 p.val = mean(diff_reps < 0)) 123 124 diff_results 125 ``` 126 127 Pretty close, we'll use this.