How Statistics Can Lie: The Most Common Ways Data Misleads

Numbers Don't Lie — But People Do With Numbers

During World War II, statistician Abraham Wald was asked to help the US military decide where to add armor to bombers returning from missions. The military proposed reinforcing the areas most riddled with bullet holes — the wings and tail sections. Wald pointed out the error: the planes they were examining had survived. The bullet holes showed where a bomber could take damage and still fly home. The areas with no bullet holes on surviving planes — the engines and cockpit — were exactly where the lost planes had been hit. Reinforcing the observed damage would have been precisely wrong. Wald's correction saved lives. The fallacy he identified — survivorship bias — continues to distort decisions in business, medicine, and public policy.

Statistics is the science of learning from data under uncertainty. Done carefully, it is among humanity's most powerful tools for understanding the world. Done carelessly or dishonestly, it is among the most effective tools for misleading people with the authority of numbers. Darrell Huff's 1954 book How to Lie with Statistics remains in print partly because every technique it describes is still widely deployed.

Sampling Bias: Whose Data Is This?

Every statistical study draws conclusions from a sample and applies them to a population. If the sample is not representative of the population, the conclusions are wrong — regardless of how impressive the sample size looks.

The most famous sampling failure in history: the 1936 Literary Digest poll of 2.4 million Americans predicted Alf Landon would defeat Franklin Roosevelt in a landslide. Roosevelt won 61% of the vote. The Digest's sample came from telephone directories and automobile registration lists — systematically overrepresenting wealthy voters who disproportionately favored Landon. Meanwhile, George Gallup correctly predicted Roosevelt's victory using a sample of just 50,000, drawn to represent the actual voting population. Sample quality matters more than sample size.

Common Sampling Problems

Voluntary response bias: surveys that let respondents self-select attract people with strong opinions, overrepresenting extreme views. Online polls are particularly vulnerable.
Survivorship bias: analyzing only survivors (successful companies, completed drug trials, veteran pilots) ignores the failures that might reverse the conclusion.
Selection bias: hospital studies of disease severity are biased toward severe cases because mild cases don't seek hospital care.
Non-response bias: people who don't respond to surveys often differ systematically from those who do — higher workload, different opinions, different demographics.

Correlation and Causation: A Classic Confusion

Statistics measures relationships between variables. It cannot, by itself, establish causation. Two variables correlate when they move together; one causes the other only when changing one produces a change in the other independent of all other factors. These are different claims requiring different evidence.

Nicolas Cage film releases correlate with swimming pool drownings (r = 0.666 across 1999–2009 data). Margarine consumption correlates with divorce rates in Maine. Ice cream sales correlate with drowning deaths. These correlations are real and can be computed from genuine data. None of them reflect causal relationships — they reflect shared seasonality (more ice cream and more swimming in summer), small samples, or pure coincidence across multiple tested variable pairs.

The confounding variable problem is more serious in applied settings. Early studies showed wine drinkers had lower heart disease rates. Conclusion in popular media: wine protects the heart. Confound: wine drinkers in 1990s studies tended to have higher incomes, better diet, and more regular medical care — factors that independently reduce heart disease risk. Later controlled studies accounting for these confounders found the protective effect much smaller and contested.

P-Hacking and the Replication Crisis

In null hypothesis significance testing, a p-value below 0.05 is conventionally called "statistically significant" — meaning the result would occur by chance less than 5% of the time if the null hypothesis were true. The problem: if you test enough hypotheses, some will reach p < 0.05 by pure chance alone.

Run 20 independent tests of completely random data. On average, one will show p < 0.05. If a researcher tests 20 variations of a hypothesis, reports only the one that worked, and presents it as a single pre-planned test, the false positive rate inflates dramatically. This practice — p-hacking or data dredging — is so widespread that a 2015 review in Science attempted to replicate 100 published psychology studies and reproduced statistically significant results in only 36–39 of them. The replication crisis affects psychology, nutrition science, medicine, and economics.

Problem	Description	Example	Partial Remedy
P-hacking	Testing many hypotheses; reporting only significant ones	Testing 20 diet variables; reporting the one that hits p < 0.05	Pre-registration; Bonferroni correction
HARKing	Hypothesizing After Results are Known; presenting post-hoc findings as pre-planned	Noticing an unexpected pattern and claiming it was the original hypothesis	Pre-registration; open data
Publication bias	Journals favor significant results; null results go unpublished	10 studies find no effect; 1 finds an effect; only the 1 gets published	Registered reports; pre-results peer review
Small sample sizes	Studies with N < 30 have low statistical power; significant results may be noise	Brain imaging studies with 10–15 participants	Power calculations; larger studies

Misleading Averages: Which Average?

The word "average" conceals three different statistics with very different properties. Each tells a different story about the same data.

Mean: sum divided by count. Sensitive to extreme values. When Jeff Bezos enters a room of 100 people, the average wealth in the room increases by hundreds of millions of dollars — but no one in the room became richer. Mean income in highly unequal countries vastly overstates typical living standards.
Median: the middle value when data is sorted. Not affected by extremes. US median household income (~$74,000 in 2023) is more representative of a typical American family than the mean (~$105,000), which is pulled upward by very high earners.
Mode: the most common value. Useful for categorical data. The modal US household income might cluster around $40,000–60,000 even if mean and median differ.

A company reporting "our average employee earns $95,000" might be hiding a bimodal distribution: 90% of employees earn $45,000 and 10% earn $540,000. The mean is technically accurate and deeply misleading.

Misleading Charts and Visual Deception

Visual representation of data is among the most powerful communication tools humans have developed — and among the most frequently manipulated.

Visual Technique	How It Misleads	Detection
Truncated Y-axis	Starts axis at a non-zero value to exaggerate differences	Check if Y-axis starts at 0; if not, evaluate the relative scale
Dual axes	Two independent Y-axes on one chart can make unrelated variables appear correlated	Verify both scales independently
Cherry-picked time range	Selecting start/end dates to show desired trend	Request full historical data; check what preceded the shown range
3D pie charts	Perspective distorts slice areas; front slices look larger	Use 2D equivalents; check actual percentages
Area vs. length confusion	When icons or bubbles represent data, area often misrepresents the number	Check what the visual dimension actually encodes

The principle underlying all these failures is the same. Statistical results are representations — they summarize vast complexity into numbers, and every summarization involves choices about what to include, exclude, emphasize, and ignore. Those choices are never fully value-neutral. The appropriate response to statistics isn't credulity or cynicism but a set of concrete questions: Who collected this data and how? What was the sample? What confounders were tested? Was this hypothesis pre-registered? Is the axis truncated? What would the opposite conclusion look like, and why was it not found? Numbers require context. Context requires asking questions. Asking questions is the only defense statistics has against its own misuse.

How Statistics Can Lie: The Most Common Ways Data Misleads

Numbers Don't Lie — But People Do With Numbers

Sampling Bias: Whose Data Is This?

Common Sampling Problems

Correlation and Causation: A Classic Confusion

P-Hacking and the Replication Crisis

Misleading Averages: Which Average?

Misleading Charts and Visual Deception

Related Articles

Bayes' Theorem: How to Update Beliefs With New Evidence

Game Theory Explained: Nash Equilibria, Prisoner's Dilemma, and Strategic Decision-Making

How Bayesian Statistics Updates Beliefs With New Evidence

How Compound Interest Works: The Math Behind Exponential Growth