Paradoxes, Fallacies and Other Surprises
An attempt at a taxonomy of statistical paradoxes, fallacies and other surprises.
- Florida capital sentencing of convicted murderers: Suppression effect of a confounder: No marginal relationship between race of accused and rate of capital sentencing but the relationship becomes very strong when controlling for a confounding factor: the race of victim.
- Berkeley graduate admissions: Overall lower rate of acceptance for female candidates but no or little gender effect within departments. Women tend to apply to departments that are harder to get into (have low admission rates) for both men and women. By controlling for departments the appearance of gender discrimination disappears. But is this the right analysis? That depends on the mechanism through which discrimination occurs. If university budgeting decision systematically favour departments that teach topics that are more appealing to men, then departments is a mediator and controlling for departments masks the effect of discrimination. We need different models to identify 'micro-discrimination' at the departmental level and 'macro-discrimination' at the university level. This is analogous to asking whether department should be treated as a confounder (and included in the model) or as a mediator (and excluded) in order to identify a causal effect of gender. Conditioning is not automatically the right thing to do. So the right model to estimate discrimination depends on the mechanism of discrimination. For mechanisms in which department is a mediator or a collider, it must be omitted. For mechanisms in which it is a confounder, it must be included.
Example of conditioning, or selection, on a collider variable.
Discrepancy between global and local relationships. Some classical examples:
- Globally the distribution of heights can remain the same from generation although tall parents get the impression that their children are shorter than themselves and short parents get the impression that their children are taller than themselves on average. Also, tall children get the impression that their parents are shorter than themselves and short children get the impression that their parents are taller than themselves on average. Thus parents get the impression, perfectly legitimately from their point of view, that the distribution of heights is being compressed towards the mean and children get the impression, also perfectly legitimately, that their parents' heights were more compressed towards the mean.
- Kahneman's pilot instructors got the impression that criticism improved performance of student pilots while praise made it worse. Kahneman thought the causal effect should be in the opposite direction. Regression to the mean allows one to see how Kahneman's belief and the pilot instructors' impression are not inconsistent. The resolution of the paradox lies partly in realizing that the the instructors are noticing an 'observational' relationship that is in the opposite direction to the possible causal relationship. Thus there's a connection with Simpson's Paradox.
When comparing two groups using a pretest and a posttest, should we compare gain scores between groups or should we regress the posttest on groups using the pretest as a covariate?
Lord (1967) originally graphed a hypothetical scenario in which males and females were weighed at the beginning and end of an academic year, with gender tested as a predictor of weight gain in response to the cafeteria diet provided at the school. At the start of the school year, females had a lower average weight than males, but the group averages were about the same at the end of the school year. A difference score model, which uses post-pre as the outcome, concluded that there was no difference in weight gain between males and females, and therefore no systematic influence of gender on changes in weight. An ANCOVA—that regresses posttest weight on pretest scores as well as the gender predictor—concluded that wherever males and females start the school year with the same initial weight, males are predicted to gain significantly more weight by end of year, leading to the conclusion that gender has a substantial influence on weight gain.
A more recent example from MLB 2016 data: Wright (2017) compared the change in batting averages from the first half of Major League Baseball’s 2016 season to the second half, comparing pitchers and position players. ANCOVA that covaries the initial average out concluded that wherever a pitcher and a position player start with the same first half batting average, the position players are predicted to have a higher second half average. However, the data itself indicates that pitchers actually improve slightly, from .143 in the first half to .166 in the second half, while the position players get slightly worse, from .267 in the first half to .262 in the second half. The gain score approach concludes no difference in the change in batting averages between position players and pitchers.
Base rate paradoxes
Prosecutor's Fallacy: The p-value to test the hypothesis that Sally Clark was innocent of the murder of her two children could legitimately be interpreted as being in the vicinity of 1/100,000. However there's also a legitimate argument that the probability of her innocence is very close to 1. These two results seemingly contradictory results are not inconsistent with each other.
Representativeness heuristic: This is a concept formulated by Tversky and Kahneman. One could view the heuristic as amounting to forming judgements based on relative likelihood, thus ignoring the base rate or 'prior', the foundational fallacy of frequentism.
Also known as the Monty Hall problem or the Principle of Restricted Choice in bridge. This is a very revealing paradox that illustrates the importance of taking into account the probabilistic mechanism that generated information, in addition to the information itself, if the information does not induce a partition of the space of possibilities.
It illustrates the crucial role of statistical modelling, but perhaps not in a way that is supportive of frequentist inference.
Students at a university report an average class size of 100. Professors report an average class size of 50. Are students likely to be exaggerating and professors underestimating the size of their classes?
Paradoxes of measures of central tendency
A random sample of 100 taxpayers reveals an average income of $30,000 although the government knows that the average income is $60,000. Is the sample likely to be biased and/or respondents understating their income? Or is there another plausible explanation?
It is possible to build a model which the parameter space and the support are both equal to the natural numbers and in which a conference procedure that has at least 2/3 probability of coverage for all θ (i.e. 2/3 confidence) has only 1/3 posterior probability for all y under a uniform prior for θ. Thus confidence and credibility can be strongly inconsistent in contrast with the intuition based on compact models in which mean credibility must equal mean confidence.