How Bayesian Statistics Updates Beliefs With New Evidence

A Presbyterian Minister's Formula Changed Science

Thomas Bayes was an 18th-century English Presbyterian minister with a talent for mathematics. After his death in 1761, his friend Richard Price found an unpublished manuscript among Bayes's papers and presented it to the Royal Society in 1763. The paper described a method for calculating the probability of a cause given an observed effect—inverting the usual direction of probabilistic reasoning. Pierre-Simon Laplace independently developed the same ideas into a comprehensive framework by 1812. Two centuries later, Bayes' theorem underpins spam filters processing billions of emails daily, medical diagnostic algorithms, machine learning systems, and the statistical backbone of modern science.

The Formula Itself

Bayes' theorem is concise. For a hypothesis H given observed evidence E:

P(H|E) = P(E|H) × P(H) / P(E)

Each term has a specific meaning:

P(H|E) — Posterior probability: The updated probability of the hypothesis after observing the evidence. This is what you want to calculate.
P(H) — Prior probability: Your belief in the hypothesis before seeing the evidence
P(E|H) — Likelihood: The probability of observing the evidence if the hypothesis is true
P(E) — Marginal likelihood: The total probability of observing the evidence under all possible hypotheses

The logic flows naturally. Start with what you believe (prior). Observe evidence. Update your belief proportionally to how well the evidence fits your hypothesis versus all alternatives. The result is a new belief (posterior) that incorporates both prior knowledge and new data.

The Medical Screening Problem That Fools Doctors

A disease affects 1 in 1,000 people. A screening test detects the disease 99% of the time (sensitivity) and correctly identifies healthy people 95% of the time (specificity). You test positive. What's the probability you actually have the disease?

Most people—including many physicians in documented studies—guess around 95%. The actual answer is approximately 2%.

Group	Population (per 100,000)	Test Result	Count
Truly sick	100	Positive (true positive)	99
Truly sick	100	Negative (false negative)	1
Truly healthy	99,900	Positive (false positive)	4,995
Truly healthy	99,900	Negative (true negative)	94,905

Of 5,094 total positive results, only 99 are true positives. That's 99/5,094 = 1.94%. The low base rate (prior) overwhelms the test's accuracy. This result surprises people because they ignore the prior probability and focus only on the test's sensitivity. Bayes' theorem forces you to account for both.

Spam Filters: Bayesian Reasoning at Scale

Paul Graham's 2002 essay "A Plan for Spam" demonstrated that a simple Bayesian classifier could filter spam email with over 99.5% accuracy. The method treats each word in an email as evidence and calculates the posterior probability that the email is spam.

The filter learns from examples:

Words like "viagra," "winner," and "Nigerian" strongly increase spam probability
Words like "meeting," "Tuesday," and the recipient's company name decrease it
The prior is updated continuously as the user marks emails as spam or not-spam
New word frequencies shift the posterior for future classifications

Every major email provider now uses Bayesian methods as one component of spam detection. Gmail processes over 300 billion emails daily, blocking an estimated 100 million spam messages per day with Bayesian-informed classifiers combined with other machine learning techniques.

Bayesian vs. Frequentist: The Philosophical Divide

Statistics has two major philosophical camps, and the disagreement runs deeper than most scientific disputes.

Feature	Frequentist Approach	Bayesian Approach
Probability means	Long-run frequency of events	Degree of belief or uncertainty
Parameters are	Fixed but unknown constants	Random variables with probability distributions
Prior information	Not formally incorporated	Explicitly included via prior distribution
Result format	p-value, confidence interval	Posterior distribution, credible interval
Sample size dependence	Requires large samples for reliable inference	Works naturally with small samples (prior carries more weight)
Multiple testing	Requires correction (Bonferroni, etc.)	Naturally handled through prior updating

Frequentists argue that priors introduce subjectivity. Bayesians counter that ignoring prior information is itself a subjective choice—and often a bad one. A scientist studying a new drug knows something about pharmacology before running the trial. Pretending otherwise throws away useful information.

The debate has cooled in practice. Most modern statisticians use whichever framework better suits the problem.

MCMC: Making Bayesian Methods Practical

For most real-world problems, the posterior distribution cannot be calculated analytically. The integral in the denominator of Bayes' theorem—P(E), the marginal likelihood—is often impossibly complex for high-dimensional models. Markov Chain Monte Carlo (MCMC) methods solve this by sampling.

MCMC algorithms generate sequences of random samples from the posterior distribution without computing it directly. The Metropolis-Hastings algorithm (1953/1970) and the Gibbs sampler (1984) made Bayesian analysis practical for complex models. Modern implementations run on standard laptops in minutes for models that would have required supercomputers in the 1990s.

Key MCMC applications include:

Phylogenetic tree estimation in evolutionary biology (MrBayes software)
Cosmological parameter estimation from cosmic microwave background radiation data
Financial risk modeling with correlated asset returns
Climate model calibration using observational data
Archaeological dating when combining radiocarbon measurements with stratigraphic information

Machine Learning and the Bayesian Revival

Bayesian methods have become foundational in modern machine learning. Bayesian neural networks assign probability distributions to weights rather than point estimates, providing built-in uncertainty quantification. Gaussian processes—entirely Bayesian models—are standard for regression with small datasets and for optimizing expensive functions (Bayesian optimization).

The practical implications are significant. A self-driving car that uses Bayesian methods can distinguish between "I'm 99% confident that's a pedestrian" and "I'm 60% confident that's a pedestrian but it might be a mailbox." The distinction between a confident prediction and an uncertain one matters when lives depend on the decision.

Bayes' theorem is 260 years old. It spent most of that time in relative obscurity, overshadowed by frequentist methods that dominated 20th-century statistics. The computational revolution rescued it. With enough processing power to run MCMC algorithms, the minister's formula finally had the machinery to match its ambition—and that ambition turns out to be exactly what the age of data demands.

How Bayesian Statistics Updates Beliefs With New Evidence

A Presbyterian Minister's Formula Changed Science

The Formula Itself

The Medical Screening Problem That Fools Doctors

Spam Filters: Bayesian Reasoning at Scale

Bayesian vs. Frequentist: The Philosophical Divide

MCMC: Making Bayesian Methods Practical

Machine Learning and the Bayesian Revival

Related Articles

Bayes' Theorem: How to Update Beliefs With New Evidence

Game Theory Explained: Nash Equilibria, Prisoner's Dilemma, and Strategic Decision-Making

Chaos Theory: How Small Changes Create Unpredictable Outcomes

How Cryptography Math Works: Primes, Modular Arithmetic, and RSA