Using prediction to assess a model

Comparing data to expectation instead of nothingness…

Updated: 2022-01-17

Prediction is a great topic to bring up if you want to get scientists, clinicians, and engineers to turn on each other immediately.

I’ve gotten that consistently for the last six years of my work. It was frustrating at first but, in hindsight, it galvanized my interest in statistics. Specifically, my interest in reframing statistical thinking.

Prediction

Prediction is fairly straight forward: I said X would happen and X happened. If you’re predicting the flip of a coin, then you have a 1/2 chance of being right about the coin flip even if you have no real knowledge.

But imagine you’re flipping 100 coins and I tell you “Fifty percent of them will be heads.” What’s the probability that I would be right given I don’t actually know anything about the future? It’s around $8%$.

Now imagine I tell you “The first fifty coins will be heads.” This is a very, very different statement than the one I made above. So you flip the coin and, lo and behold, the first fifty are heads and the last fifty are tails. The probability that I would get that right given I don’t have any extra knowledge is astronomically small, like less than one part in a trillion billion.

Scientific Understanding Through Prediction

Just because a model achieved very high prediction doesn’t mean the structure of the model reflects the actual reality of what’s happening.

But flip that.

If you actually had a model that really was reflective of what was happening in reality then it has to have predictive value. In other words, you have an understanding of a system that is better than yesterday’s understanding, then you should be able to predict the behavior of that system better than yesterday’s prediction. If you come up with a mechanistic model of the system you’re studying and it performs worse in predicting the behavior of the system than yesterday’s mechanistic model, then I would argue you don’t actually have a mechanistic model. Barring experimental confounds and issues, I can’t think of a situation where an improved understanding of a system will yield worse predictive performance.

So, if you’re studying something, and especially if you’re a physician-scientist, pursuing prediction first and then studing the structure of your model enables you to be practically useful while also building scientific understanding.