Machine Learning and Big Data Changes Everything

Or maybe it doesn’t. Let’s see

This week we see how our causal inferences could be helped or hindered by taking advantage of more powerful tools for regression from machine learning, and making use of increasingly massive data sets.

More powerful models usually means more ‘flexible’ models so it will be important to see what the costs and benefits of non-linear regression are for causal inference. More flexible models allows us to go wrong more drastically, so questions of model fitting and overfitting will be important here.

Bigger data will certainly make our inferences more precise, but it is an open question whether it can make us righter. Identifying a causal effect is the situation where, if we had all the data we could possibly want then some quantity estimated from a model would be the causal effect we are interested in. Conversely, if the effect is not identified then no amount more data will make it so. So we might be skeptical that more precision in our estimation due to more data will help. However, in policy domains we are often interested in a causal effect in some groups more than in others, so more data may help us identify these group effects better. Unless we overfit the model, in which case our precision is an illusion.


Three excellent machine learning textbooks

These two are freely downloadable

And this one not

