Presumed Averageness: The Mis-Application of Classical Hypothesis Testing in Education

December 4, 2013

CEPR Faculty Director Thomas Kane discusses the importance of statistical significance in the following Brookings Institution paper. 

Imagine yourself having had a heart attack.  An ambulance arrives to transport you to a hospital emergency room.  Your ambulance driver asks you to choose between two hospitals, Hospital A or Hospital B.  At Hospital A, the mortality rate for heart attack patients is 75 percent.  At Hospital B, the mortality rate is just 20 percent.  But mortality rates are imperfect measures, based on a finite number of admissions.  If neither rate were “statistically significantly” different from average, would you be indifferent about which hospital you were delivered to?

Don’t ask your social scientist friends to help you with your dilemma.  When asked for expert advice, they apply the rules of classical hypothesis testing, which require that a difference be large enough to have no more than a 5% chance of being a fluke to be accepted as statistically significant.  (For examples, see Schochet and Chiang (2010), Hill (2009), Baker et. al. (2010).)  In many areas of science, it makes sense to assume that a medical procedure does not work, or that a vaccine is ineffective, or that the existing theory is correct, until the evidence is very strong that the original presumption (the null hypothesis) is wrong. That is why the classical hypothesis test places the burden of proof so heavily on the alternative hypothesis, and preserves the null hypothesis until the evidence is overwhelmingly to the contrary.  But that’s not the right standard to use in choosing between two hospitals.


Continue reading at