Stratification, Regression, and all that

When is a regression a good plan for causal inference? Not always, it turns out.

This week we cast our critical eye on our old friend, the regression model. It is helpful to think about regression models as a more sophisticated form of stratification, but they do more than that. And also less. Knowing what they actually do from a causal inference perspective makes us a little more cautious, but also more realistic about what regression can offer us.

In particular, we may want to use them indirectly, for example to generate weights such as propensity scores, or to combine different regression models in the same analysis, for example to study questions of mediation.

For any regression model we’ll also ask want to ask two big questions: First, since not everything that should be conditioned on is in this regression model, how sensitive are causal inferences to what is not observed. Second, should everything that is in the model be there. For example, conditioning on collider variables will wreck the most otherwise careful analysis and, as often, there will be no statistical warning that anything has gone wrong. We’ll consider a general typology of controls, good and bad, and the unwiseness of interpreting the ‘effects’ of control variables.

We’ll also ask about the representativeness of regression estimators of causal effects when they are heterogeneous and note that here, perhaps surprisingly, some cases are more influential than others.

Readings

Lecture