Discussion about this post

User's avatar
Will Braid's avatar

Great investigation, thanks for sharing!

> Notably, if you test twenty different aggregation techniques, you would expect one of them to come in under the 0.05 p-value that usually designates statistically significant differences between groups due to random noise alone — even if there was no actual difference.

Nit: p-values quantify *sampling* error. If you take the best of 20 techniques on a single sample, it's hard to measure whether the result is over-fit without additional data. You need a better understanding of how independent the techniques are, or -- more likely -- a separate set of evaluation data to evaluate whether your model is over-tuned. [And in the linked paper (https://goodjudgment.io/docs/Goldstein-et-al-2015.pdf) the MPDDA result for "GJP Best Method" has p<0.001]

Expand full comment
Vaclav's avatar

Interesting article, but I'm stuck on the 'Big if true' paragraph. I see someone else already pointed out the maths error, but I think the main point of the paragraph is also flawed.

Yes, if the event happens in the final month then your superforecaster will be penalised relative to the non-superforecaster. But how is this different from any other case where a low-probability event happens?

If we've reached the final month and the event hasn't yet happened, then 95% of the time it will continue to not happen, and the superforecaster will score much better than the non-superforecaster. Five percent of the time it will happen, and the superforecaster's score will suffer -- but that's the nature of prediction under uncertainty. Even the best possible predictor will sometimes be 'wrong', and their skill will only be clearly evident when the results of many predictions are aggregated.

I think you're suggesting this example is different, but I don't see how. The superforecaster has a higher expected score, whether we're looking at the year as a whole or any slice of it that begins with the result still undecided.

Expand full comment
7 more comments...

No posts