Bayesian Probability: How to Update Beliefs With New Evidence

A Reverend's Posthumous Revolution

Thomas Bayes never published his most famous work. The Presbyterian minister died in 1761, and his friend Richard Price found the manuscript among his papers. Price edited and submitted it to the Royal Society in 1763. That paper introduced a method for calculating the probability of causes given observed effects — inverting the standard direction of probabilistic reasoning. Two centuries later, Bayesian methods power spam filters, medical diagnostics, self-driving cars, and search engines. The approach Bayes sketched in the 1700s became the mathematical backbone of modern artificial intelligence.

The Formula and Its Components

Bayes' theorem expresses the probability of a hypothesis given observed evidence. The formula is: P(H|E) = P(E|H) × P(H) / P(E). Each component carries specific meaning.

Term	Name	Meaning
P(H\|E)	Posterior probability	Probability of the hypothesis after observing evidence
P(E\|H)	Likelihood	Probability of the evidence if the hypothesis is true
P(H)	Prior probability	Probability of the hypothesis before observing evidence
P(E)	Marginal likelihood	Total probability of the evidence under all hypotheses

The theorem is mathematically uncontroversial. It follows directly from the definition of conditional probability. The controversy lies in interpretation — specifically, in what counts as a valid prior probability.

Frequentist vs. Bayesian: Two Philosophies

Statistics has two dominant schools. They disagree about fundamental questions.

Frequentists define probability as the long-run frequency of events. A coin has a 50 percent probability of heads because, over thousands of flips, roughly half will be heads. Frequentists reject assigning probabilities to hypotheses — a parameter either has a value or it does not. There is no meaningful sense in which a physical constant has a 95 percent probability of falling in some range.

Bayesians define probability as a degree of belief. Probability quantifies uncertainty about any proposition, including one-time events and unknown parameters. It is perfectly meaningful to say there is a 70 percent probability that it will rain tomorrow or a 90 percent probability that a defendant is guilty. Evidence updates these beliefs systematically through Bayes' theorem.

Aspect	Frequentist	Bayesian
Probability definition	Long-run frequency	Degree of belief
Parameters	Fixed but unknown	Random variables with distributions
Prior information	Not formally incorporated	Encoded as prior distribution
Confidence intervals	95% of intervals contain the true value	95% probability the value is in the interval
Sample size requirements	Often large	Works with small samples

The Medical Testing Problem

Bayesian reasoning reveals counterintuitive truths about diagnostic testing. Suppose a disease affects 1 in 1,000 people. A test for this disease has 99 percent sensitivity (correctly identifies 99 percent of sick people) and 95 percent specificity (correctly identifies 95 percent of healthy people). A patient tests positive. Intuition suggests they almost certainly have the disease. Bayes' theorem says otherwise.

Prior probability of disease — 0.001 (1 in 1,000)
Probability of positive test given disease — 0.99
Probability of positive test given no disease — 0.05 (false positive rate)
Total probability of positive test — (0.001 × 0.99) + (0.999 × 0.05) = 0.05094
Posterior probability of disease given positive test — 0.00099 / 0.05094 ≈ 0.019 or 1.9%

A positive result means the patient has less than a 2 percent chance of actually having the disease. The low base rate overwhelms the test accuracy. False positives vastly outnumber true positives. This result has profound implications for mass screening programs, criminal forensics, and any domain where rare events are tested for in large populations.

Real-World Applications

Machine Learning and AI

Naive Bayes classifiers use Bayes' theorem to categorize text, detect spam, and analyze sentiment. Despite their simplicity, they perform surprisingly well. Bayesian neural networks assign probability distributions to network weights rather than fixed values, providing uncertainty estimates alongside predictions. This matters in safety-critical applications like autonomous driving and medical diagnosis.

Legal Reasoning

Courts implicitly use Bayesian reasoning when evaluating evidence. DNA evidence is expressed as likelihood ratios — the probability of the DNA match if the defendant is the source versus if a random person is. Forensic statisticians advocate for explicit Bayesian frameworks to prevent common errors like the prosecutor's fallacy, where the rarity of a DNA profile is confused with the probability of innocence.

Search and Rescue

The U.S. Coast Guard uses Bayesian search theory to locate missing vessels. Prior probabilities are assigned to grid squares based on last known position, drift patterns, and weather data. Each unsuccessful search updates the probability map, concentrating subsequent efforts on the most likely remaining areas. This method located the wreckage of Air France Flight 447 in 2011 after two years of searching.

Choosing Priors: The Contentious Step

The prior probability is both Bayesian analysis's greatest strength and its most criticized element. Critics argue that priors inject subjectivity into scientific analysis. Supporters counter that all statistical methods embed assumptions — Bayesian methods simply make them explicit.

Informative priors — Based on previous research or expert knowledge. A prior for average human body temperature centers on 37°C because centuries of measurement support that value.
Weakly informative priors — Broad distributions that constrain parameters to physically plausible ranges without strongly favoring specific values.
Non-informative (flat) priors — Assign equal probability to all parameter values, letting the data dominate. These can be mathematically problematic in some contexts.
Conjugate priors — Mathematical convenience choices that produce posterior distributions in the same family as the prior, simplifying computation.

From Controversy to Consensus

For most of the twentieth century, Bayesian methods were marginalized in academic statistics. Computational limitations made Bayesian calculations intractable for complex problems. The development of Markov Chain Monte Carlo (MCMC) algorithms in the 1990s changed everything. MCMC methods allow computers to approximate posterior distributions for models with thousands of parameters. Software packages like Stan, PyMC, and BUGS made Bayesian analysis accessible to researchers across disciplines.

Today the frequentist-Bayesian divide has softened considerably. Many statisticians use both approaches depending on the problem. Bayesian methods dominate in machine learning, signal processing, and any field where incorporating prior knowledge improves predictions. The framework Thomas Bayes outlined in the eighteenth century has become the standard language for reasoning under uncertainty in the twenty-first.