Simpson's Paradox
Reversed Trends, Lurking Variables, and the Berkeley Admissions Case — A TLDR Primer
You're staring at a data set that seems to say one thing — then someone splits it by group and it says the exact opposite. Which answer do you trust? That unsettling situation has a name: **Simpson's Paradox**, and it has derailed medical studies, court cases, and careers in data science. If you've hit it in an AP Statistics course, a college intro class, or a data-analysis project and found the textbook explanation more confusing than the paradox itself, this guide is for you.
This TLDR primer walks you through Simpson's Paradox from the ground up — no filler, no multi-chapter detour into unrelated probability theory. You'll see a clean numerical example that makes the reversal undeniable, then get the weighted-average arithmetic that explains exactly *why* it happens. From there the guide introduces the causal language you need — lurking variables, confounding, aggregation bias — so you can reason about the paradox, not just recognize it.
The centerpiece is the famous 1973 UC Berkeley admissions case: a real-world instance where pooled data suggested gender discrimination, but disaggregated data told a more complicated story. Understanding what researchers actually found — and what it means for causal inference — is the kind of statistical reasoning that separates strong data thinkers from people who just run numbers.
The guide closes with practical decision rules for when to pool versus split data, and a quick tour of where Simpson's Paradox appears in medicine, sports analytics, public policy, and machine learning.
Short by design, built for the student who needs to understand this concept — not skim it. Grab your copy and close the gap.
- Recognize Simpson's Paradox in two-way tables and rate comparisons
- Compute and compare conditional vs. marginal rates correctly
- Identify lurking (confounding) variables that drive the reversal
- Decide when to pool data and when to keep it disaggregated
- Apply the paradox to real cases like UC Berkeley admissions and medical treatments
- 1. The Paradox in One PictureIntroduces Simpson's Paradox with a small, concrete numerical example showing a trend that reverses when data is split into groups.
- 2. Rates, Weights, and Why the Reversal HappensUnpacks the arithmetic of weighted averages to show mechanically how unequal group sizes can flip an overall comparison.
- 3. Lurking Variables and ConfoundingIntroduces the causal language of lurking and confounding variables that explain why subgroup splits matter.
- 4. Case Study: UC Berkeley Admissions (1973)Walks through the famous Berkeley gender-bias case to show Simpson's Paradox at full scale and what the reversal actually meant.
- 5. Should You Pool or Split? Making the Right CallGives practical rules for deciding whether the aggregated or disaggregated view answers the question you actually care about.
- 6. Where the Paradox Shows Up and Why It MattersQuick tour of Simpson's Paradox in medicine, sports stats, public policy, and machine learning, and what to watch for.