Message Font: Serif | Sans-Serif

No. of Recommendations: 1
Twenty years ago everything I learned about statistics involved normal distributions: means, standard deviations, F-statistics and their associated probabilities. And I've always regarded parametric statistics as amazingly robust to minor, or even major violations of assumptions about normality, equality of variances, etc.

And 20 years of practical experience analyzing data has convinced me that parametric statistics are indeed amazingly robust--provided you're making inferences about central tendancy and the data are unimodal (i.e., they don't have to follow a perfect bell curve, but they do have to have a single hump somewhere in the middle). But making inferences out in the tails? Egads, don't go there. We reject null hypotheses out in the tails, but P = 0.0002 doesn't really mean 2 in 10,000; it just means far enough below 5% that we can turf it.

And now the new craze is information theory, using Akaike's Information Criterion (AIC) instead of arbitrary P values. And when you do multiple resampling from your data to verify your assumptions about data dispersion (i.e. bootstrapping), lo and behold you find that your data are inevitably over-dispersed, often by 80 to 300%. At least that's been my experience with every real-world biological data set I've worked with. But I'd be willing to bet a 6-pack of good beer that's also the case in financial data.

And the information theoretic approach is to calculate your variance inflation factor, divide into residual deviance, and sweep all the annoying variance under the rug. And then procede estimating means and main effects, full speed ahead.

I guess what I'm trying to say is that humans, scientists even, seem to be psychologically pre-disposed to sweep annoyingly infrequent events under the rug. And ill-equipped to think meaningfully about risk. It doesn't bode well for progress.

Todd