How Statistics Can Lie: The Most Common Ways Data Misleads
Statistics can mislead through sampling bias, confounded correlations, misleading charts, and p-hacking. Learn to recognize the most common statistical errors and manipulations.
Numbers Don't Lie — But People Do With Numbers
During World War II, statistician Abraham Wald was asked to help the US military decide where to add armor to bombers returning from missions. The military proposed reinforcing the areas most riddled with bullet holes — the wings and tail sections. Wald pointed out the error: the planes they were examining had survived. The bullet holes showed where a bomber could take damage and still fly home. The areas with no bullet holes on surviving planes — the engines and cockpit — were exactly where the lost planes had been hit. Reinforcing the observed damage would have been precisely wrong. Wald's correction saved lives. The fallacy he identified — survivorship bias — continues to distort decisions in business, medicine, and public policy.
Statistics is the science of learning from data under uncertainty. Done carefully, it is among humanity's most powerful tools for understanding the world. Done carelessly or dishonestly, it is among the most effective tools for misleading people with the authority of numbers. Darrell Huff's 1954 book How to Lie with Statistics remains in print partly because every technique it describes is still widely deployed.
Sampling Bias: Whose Data Is This?
Every statistical study draws conclusions from a sample and applies them to a population. If the sample is not representative of the population, the conclusions are wrong — regardless of how impressive the sample size looks.
The most famous sampling failure in history: the 1936 Literary Digest poll of 2.4 million Americans predicted Alf Landon would defeat Franklin Roosevelt in a landslide. Roosevelt won 61% of the vote. The Digest's sample came from telephone directories and automobile registration lists — systematically overrepresenting wealthy voters who disproportionately favored Landon. Meanwhile, George Gallup correctly predicted Roosevelt's victory using a sample of just 50,000, drawn to represent the actual voting population. Sample quality matters more than sample size.
Common Sampling Problems
- Voluntary response bias: surveys that let respondents self-select attract people with strong opinions, overrepresenting extreme views. Online polls are particularly vulnerable.
- Survivorship bias: analyzing only survivors (successful companies, completed drug trials, veteran pilots) ignores the failures that might reverse the conclusion.
- Selection bias: hospital studies of disease severity are biased toward severe cases because mild cases don't seek hospital care.
- Non-response bias: people who don't respond to surveys often differ systematically from those who do — higher workload, different opinions, different demographics.
Correlation and Causation: A Classic Confusion
Statistics measures relationships between variables. It cannot, by itself, establish causation. Two variables correlate when they move together; one causes the other only when changing one produces a change in the other independent of all other factors. These are different claims requiring different evidence.
Nicolas Cage film releases correlate with swimming pool drownings (r = 0.666 across 1999–2009 data). Margarine consumption correlates with divorce rates in Maine. Ice cream sales correlate with drowning deaths. These correlations are real and can be computed from genuine data. None of them reflect causal relationships — they reflect shared seasonality (more ice cream and more swimming in summer), small samples, or pure coincidence across multiple tested variable pairs.
The confounding variable problem is more serious in applied settings. Early studies showed wine drinkers had lower heart disease rates. Conclusion in popular media: wine protects the heart. Confound: wine drinkers in 1990s studies tended to have higher incomes, better diet, and more regular medical care — factors that independently reduce heart disease risk. Later controlled studies accounting for these confounders found the protective effect much smaller and contested.
P-Hacking and the Replication Crisis
In null hypothesis significance testing, a p-value below 0.05 is conventionally called "statistically significant" — meaning the result would occur by chance less than 5% of the time if the null hypothesis were true. The problem: if you test enough hypotheses, some will reach p < 0.05 by pure chance alone.
Run 20 independent tests of completely random data. On average, one will show p < 0.05. If a researcher tests 20 variations of a hypothesis, reports only the one that worked, and presents it as a single pre-planned test, the false positive rate inflates dramatically. This practice — p-hacking or data dredging — is so widespread that a 2015 review in Science attempted to replicate 100 published psychology studies and reproduced statistically significant results in only 36–39 of them. The replication crisis affects psychology, nutrition science, medicine, and economics.
| Problem | Description | Example | Partial Remedy |
|---|---|---|---|
| P-hacking | Testing many hypotheses; reporting only significant ones | Testing 20 diet variables; reporting the one that hits p < 0.05 | Pre-registration; Bonferroni correction |
| HARKing | Hypothesizing After Results are Known; presenting post-hoc findings as pre-planned | Noticing an unexpected pattern and claiming it was the original hypothesis | Pre-registration; open data |
| Publication bias | Journals favor significant results; null results go unpublished | 10 studies find no effect; 1 finds an effect; only the 1 gets published | Registered reports; pre-results peer review |
| Small sample sizes | Studies with N < 30 have low statistical power; significant results may be noise | Brain imaging studies with 10–15 participants | Power calculations; larger studies |
Misleading Averages: Which Average?
The word "average" conceals three different statistics with very different properties. Each tells a different story about the same data.
- Mean: sum divided by count. Sensitive to extreme values. When Jeff Bezos enters a room of 100 people, the average wealth in the room increases by hundreds of millions of dollars — but no one in the room became richer. Mean income in highly unequal countries vastly overstates typical living standards.
- Median: the middle value when data is sorted. Not affected by extremes. US median household income (~$74,000 in 2023) is more representative of a typical American family than the mean (~$105,000), which is pulled upward by very high earners.
- Mode: the most common value. Useful for categorical data. The modal US household income might cluster around $40,000–60,000 even if mean and median differ.
A company reporting "our average employee earns $95,000" might be hiding a bimodal distribution: 90% of employees earn $45,000 and 10% earn $540,000. The mean is technically accurate and deeply misleading.
Misleading Charts and Visual Deception
Visual representation of data is among the most powerful communication tools humans have developed — and among the most frequently manipulated.
| Visual Technique | How It Misleads | Detection |
|---|---|---|
| Truncated Y-axis | Starts axis at a non-zero value to exaggerate differences | Check if Y-axis starts at 0; if not, evaluate the relative scale |
| Dual axes | Two independent Y-axes on one chart can make unrelated variables appear correlated | Verify both scales independently |
| Cherry-picked time range | Selecting start/end dates to show desired trend | Request full historical data; check what preceded the shown range |
| 3D pie charts | Perspective distorts slice areas; front slices look larger | Use 2D equivalents; check actual percentages |
| Area vs. length confusion | When icons or bubbles represent data, area often misrepresents the number | Check what the visual dimension actually encodes |
The principle underlying all these failures is the same. Statistical results are representations — they summarize vast complexity into numbers, and every summarization involves choices about what to include, exclude, emphasize, and ignore. Those choices are never fully value-neutral. The appropriate response to statistics isn't credulity or cynicism but a set of concrete questions: Who collected this data and how? What was the sample? What confounders were tested? Was this hypothesis pre-registered? Is the axis truncated? What would the opposite conclusion look like, and why was it not found? Numbers require context. Context requires asking questions. Asking questions is the only defense statistics has against its own misuse.
Related Articles
applied mathematics
Bayes' Theorem: How to Update Beliefs With New Evidence
Bayes' theorem describes how to rationally update probability estimates when new evidence arrives. Learn the formula, its intuition, and its applications in medicine and AI.
9 min read
applied mathematics
Game Theory Explained: Nash Equilibria, Prisoner's Dilemma, and Strategic Decision-Making
A comprehensive introduction to game theory — the mathematics of strategic decision-making — covering the Prisoner's Dilemma, Nash equilibria, dominant strategies, cooperative vs. non-cooperative games, auctions, evolutionary game theory, and real-world applications from economics to nuclear deterrence.
9 min read
applied mathematics
How Bayesian Statistics Updates Beliefs With New Evidence
Bayesian statistics provides a mathematical framework for updating beliefs as evidence arrives. From spam filters to medical screening, Bayes' theorem shapes modern inference.
9 min read
applied mathematics
How Compound Interest Works: The Math Behind Exponential Growth
Compound interest grows exponentially because interest earns interest over time. Learn the formula, the Rule of 72, and why starting early makes such an enormous financial difference.
8 min read