How Bayesian Statistics Updates Beliefs With New Evidence
Bayesian statistics provides a mathematical framework for updating beliefs as evidence arrives. From spam filters to medical screening, Bayes' theorem shapes modern inference.
A Presbyterian Minister's Formula Changed Science
Thomas Bayes was an 18th-century English Presbyterian minister with a talent for mathematics. After his death in 1761, his friend Richard Price found an unpublished manuscript among Bayes's papers and presented it to the Royal Society in 1763. The paper described a method for calculating the probability of a cause given an observed effect—inverting the usual direction of probabilistic reasoning. Pierre-Simon Laplace independently developed the same ideas into a comprehensive framework by 1812. Two centuries later, Bayes' theorem underpins spam filters processing billions of emails daily, medical diagnostic algorithms, machine learning systems, and the statistical backbone of modern science.
The Formula Itself
Bayes' theorem is concise. For a hypothesis H given observed evidence E:
P(H|E) = P(E|H) × P(H) / P(E)
Each term has a specific meaning:
- P(H|E) — Posterior probability: The updated probability of the hypothesis after observing the evidence. This is what you want to calculate.
- P(H) — Prior probability: Your belief in the hypothesis before seeing the evidence
- P(E|H) — Likelihood: The probability of observing the evidence if the hypothesis is true
- P(E) — Marginal likelihood: The total probability of observing the evidence under all possible hypotheses
The logic flows naturally. Start with what you believe (prior). Observe evidence. Update your belief proportionally to how well the evidence fits your hypothesis versus all alternatives. The result is a new belief (posterior) that incorporates both prior knowledge and new data.
The Medical Screening Problem That Fools Doctors
A disease affects 1 in 1,000 people. A screening test detects the disease 99% of the time (sensitivity) and correctly identifies healthy people 95% of the time (specificity). You test positive. What's the probability you actually have the disease?
Most people—including many physicians in documented studies—guess around 95%. The actual answer is approximately 2%.
| Group | Population (per 100,000) | Test Result | Count |
|---|---|---|---|
| Truly sick | 100 | Positive (true positive) | 99 |
| Truly sick | 100 | Negative (false negative) | 1 |
| Truly healthy | 99,900 | Positive (false positive) | 4,995 |
| Truly healthy | 99,900 | Negative (true negative) | 94,905 |
Of 5,094 total positive results, only 99 are true positives. That's 99/5,094 = 1.94%. The low base rate (prior) overwhelms the test's accuracy. This result surprises people because they ignore the prior probability and focus only on the test's sensitivity. Bayes' theorem forces you to account for both.
Spam Filters: Bayesian Reasoning at Scale
Paul Graham's 2002 essay "A Plan for Spam" demonstrated that a simple Bayesian classifier could filter spam email with over 99.5% accuracy. The method treats each word in an email as evidence and calculates the posterior probability that the email is spam.
The filter learns from examples:
- Words like "viagra," "winner," and "Nigerian" strongly increase spam probability
- Words like "meeting," "Tuesday," and the recipient's company name decrease it
- The prior is updated continuously as the user marks emails as spam or not-spam
- New word frequencies shift the posterior for future classifications
Every major email provider now uses Bayesian methods as one component of spam detection. Gmail processes over 300 billion emails daily, blocking an estimated 100 million spam messages per day with Bayesian-informed classifiers combined with other machine learning techniques.
Bayesian vs. Frequentist: The Philosophical Divide
Statistics has two major philosophical camps, and the disagreement runs deeper than most scientific disputes.
| Feature | Frequentist Approach | Bayesian Approach |
|---|---|---|
| Probability means | Long-run frequency of events | Degree of belief or uncertainty |
| Parameters are | Fixed but unknown constants | Random variables with probability distributions |
| Prior information | Not formally incorporated | Explicitly included via prior distribution |
| Result format | p-value, confidence interval | Posterior distribution, credible interval |
| Sample size dependence | Requires large samples for reliable inference | Works naturally with small samples (prior carries more weight) |
| Multiple testing | Requires correction (Bonferroni, etc.) | Naturally handled through prior updating |
Frequentists argue that priors introduce subjectivity. Bayesians counter that ignoring prior information is itself a subjective choice—and often a bad one. A scientist studying a new drug knows something about pharmacology before running the trial. Pretending otherwise throws away useful information.
The debate has cooled in practice. Most modern statisticians use whichever framework better suits the problem.
MCMC: Making Bayesian Methods Practical
For most real-world problems, the posterior distribution cannot be calculated analytically. The integral in the denominator of Bayes' theorem—P(E), the marginal likelihood—is often impossibly complex for high-dimensional models. Markov Chain Monte Carlo (MCMC) methods solve this by sampling.
MCMC algorithms generate sequences of random samples from the posterior distribution without computing it directly. The Metropolis-Hastings algorithm (1953/1970) and the Gibbs sampler (1984) made Bayesian analysis practical for complex models. Modern implementations run on standard laptops in minutes for models that would have required supercomputers in the 1990s.
Key MCMC applications include:
- Phylogenetic tree estimation in evolutionary biology (MrBayes software)
- Cosmological parameter estimation from cosmic microwave background radiation data
- Financial risk modeling with correlated asset returns
- Climate model calibration using observational data
- Archaeological dating when combining radiocarbon measurements with stratigraphic information
Machine Learning and the Bayesian Revival
Bayesian methods have become foundational in modern machine learning. Bayesian neural networks assign probability distributions to weights rather than point estimates, providing built-in uncertainty quantification. Gaussian processes—entirely Bayesian models—are standard for regression with small datasets and for optimizing expensive functions (Bayesian optimization).
The practical implications are significant. A self-driving car that uses Bayesian methods can distinguish between "I'm 99% confident that's a pedestrian" and "I'm 60% confident that's a pedestrian but it might be a mailbox." The distinction between a confident prediction and an uncertain one matters when lives depend on the decision.
Bayes' theorem is 260 years old. It spent most of that time in relative obscurity, overshadowed by frequentist methods that dominated 20th-century statistics. The computational revolution rescued it. With enough processing power to run MCMC algorithms, the minister's formula finally had the machinery to match its ambition—and that ambition turns out to be exactly what the age of data demands.
Related Articles
applied mathematics
Bayes' Theorem: How to Update Beliefs With New Evidence
Bayes' theorem describes how to rationally update probability estimates when new evidence arrives. Learn the formula, its intuition, and its applications in medicine and AI.
9 min read
applied mathematics
Game Theory Explained: Nash Equilibria, Prisoner's Dilemma, and Strategic Decision-Making
A comprehensive introduction to game theory — the mathematics of strategic decision-making — covering the Prisoner's Dilemma, Nash equilibria, dominant strategies, cooperative vs. non-cooperative games, auctions, evolutionary game theory, and real-world applications from economics to nuclear deterrence.
9 min read
applied mathematics
Chaos Theory: How Small Changes Create Unpredictable Outcomes
Chaos theory studies how deterministic systems can produce unpredictable behavior. Learn about the butterfly effect, strange attractors, and where chaos appears in nature.
9 min read
applied mathematics
How Cryptography Math Works: Primes, Modular Arithmetic, and RSA
Modern encryption relies on number theory and mathematical problems that are easy to compute in one direction but practically impossible to reverse. Learn how primes, modular arithmetic, and RSA work together to secure digital communication.
11 min read