Bayes' Theorem: How to Update Beliefs With New Evidence

Bayes' theorem describes how to rationally update probability estimates when new evidence arrives. Learn the formula, its intuition, and its applications in medicine and AI.

The InfoNexus Editorial TeamMay 17, 20269 min read

The Disease That Probably Isn't There

A medical test for a rare disease is 99% accurate — it correctly identifies 99% of infected people and correctly clears 99% of healthy people. You test positive. What's the probability you actually have the disease? Most people answer: 99%. The correct answer, if the disease affects 1 in 10,000 people, is approximately 1%. Bayes' theorem explains why — and why the gap between the intuitive answer and the correct one causes genuine harm in medical diagnosis, criminal justice, and everyday reasoning.

Thomas Bayes was an 18th-century English statistician and minister who never published his most important work. His paper "An Essay Towards Solving a Problem in the Doctrine of Chances" was submitted to the Royal Society posthumously by his friend Richard Price in 1763, two years after Bayes's death. Pierre-Simon Laplace independently derived and extensively developed the same result in 1774. The theorem bearing Bayes's name became one of the most contested ideas in mathematics — a century-long debate between frequentists and Bayesians about the fundamental nature of probability — and one of the most practically consequential tools in modern data science, machine learning, and medical statistics.

The Formula

Bayes' theorem provides a rule for calculating conditional probability — the probability of event A given that event B has occurred, written P(A|B):

P(A|B) = P(B|A) × P(A) / P(B)

In inference problems, the formula is usually written in terms of hypothesis H and evidence E:

P(H|E) = P(E|H) × P(H) / P(E)

Where:

  • P(H) is the prior probability — your probability that the hypothesis H is true before seeing evidence E
  • P(E|H) is the likelihood — the probability of observing evidence E if H were true
  • P(E) is the marginal likelihood — the total probability of observing E under all possible hypotheses
  • P(H|E) is the posterior probability — your updated probability that H is true after observing E

The theorem formalizes a simple idea: your belief after seeing evidence should be proportional to your prior belief times how well the hypothesis explains the evidence.

The Medical Test Example: Working Through It

Return to the disease test. Let D = has the disease, D̄ = doesn't, T = tests positive. Given:

  • P(D) = 0.0001 (1 in 10,000 prevalence)
  • P(T|D) = 0.99 (test sensitivity: correctly detects 99% of cases)
  • P(T|D̄) = 0.01 (false positive rate: incorrectly flags 1% of healthy people)

Compute P(D|T) — probability of disease given positive test — using Bayes:

P(T) = P(T|D) × P(D) + P(T|D̄) × P(D̄) = (0.99 × 0.0001) + (0.01 × 0.9999) = 0.000099 + 0.009999 = 0.010098

P(D|T) = P(T|D) × P(D) / P(T) = (0.99 × 0.0001) / 0.010098 = 0.000099 / 0.010098 ≈ 0.0098 ≈ 1%

Despite the test's 99% accuracy, a positive result means roughly a 1% chance of disease. Why? Because the disease is so rare that even a 1% false positive rate generates 100 false alarms for every true positive. The prior (base rate) dominates when rare conditions are tested. This is not a flaw in the mathematics — it's the mathematics correctly accounting for rarity.

Bayesian Updating: Sequential Inference

Bayes' theorem's deepest power is its recursive nature. Each posterior becomes the new prior for the next piece of evidence. This sequential updating formalizes learning as an accumulation of evidence.

Suppose you've taken the test and received a positive result (P(D) is now ~1%). You take a second independent test, also positive. Apply Bayes again with P(D) = 0.01 as the new prior:

P(D|T₂) = (0.99 × 0.01) / [(0.99 × 0.01) + (0.01 × 0.99)] = 0.0099 / 0.0198 = 0.5

Two positive tests on a 1-in-10,000 disease put you at 50-50. A third positive test would push the probability to ~99%. Each piece of evidence updates the probability rationally, with earlier uncertainty gradually resolving as evidence accumulates.

Applications Across Disciplines

FieldApplicationPriorLikelihoodPosterior
MedicineDiagnostic testingDisease prevalenceTest sensitivity/specificityProbability of disease given test result
Spam filteringEmail classificationProportion of spam in emailWord frequencies in spam vs. legitimate emailProbability this email is spam
FinanceRisk assessmentHistorical default ratesCompany-specific signalsUpdated default probability
Criminal justiceForensic evidenceBase rate of guilt given prior evidenceProbability of DNA match if guilty vs. innocentProbability of guilt given DNA evidence
CosmologyParameter estimationTheoretical prior on physical constantsProbability of observed CMB data given parametersBest-fit cosmological parameters

The Prosecutor's Fallacy: Bayes in Court

Misapplication of conditional probability in legal reasoning is common enough to have a name: the prosecutor's fallacy. It conflates P(evidence|innocent) with P(innocent|evidence). A forensic expert testifies that a DNA match has a 1-in-1 million probability of occurring by chance. This is P(match|innocent). The jury hears: "there's only a 1-in-1 million chance the defendant is innocent." This would be P(innocent|match) — an entirely different quantity that depends on the prior probability of guilt, which in turn depends on all other evidence in the case.

The distinction matters enormously. In a city of 1 million people, a 1-in-1-million false match rate means roughly one other person in the city could have matched by chance — giving prior probability of guilt around 50% from the DNA evidence alone, far from certain. The UK case of Sally Clark, convicted of murdering her two children in 1999 based partly on a pediatrician's testimony that the probability of two natural infant deaths in the same family was 1 in 73 million (itself a statistical error), illustrates the lethal consequences of ignoring base rates and Bayesian reasoning. Clark's conviction was overturned in 2003.

Bayesian vs. Frequentist Statistics

The Bayesian approach treats probability as a degree of belief, updated by evidence. The frequentist approach treats probability as the long-run frequency of events in repeated experiments and explicitly prohibits assigning probabilities to hypotheses. This philosophical difference produces real methodological differences:

  • Frequentist p-values ask: "If the null hypothesis were true, how often would data this extreme occur?" They cannot say how likely the hypothesis is.
  • Bayesian posteriors ask: "Given the data, how probable is each hypothesis?" They directly answer the question scientists usually want answered.
  • Bayesian inference requires specifying a prior — a choice frequentists criticize as subjective. Bayesians respond that all statistical analysis embeds assumptions; at least Bayesian priors are explicit.

In practice, both frameworks are used depending on context. Clinical trials typically use frequentist hypothesis tests because of regulatory conventions. Machine learning, scientific parameter estimation, and spam filtering commonly use Bayesian approaches because they naturally incorporate prior information and sequentially update. The theorem Bayes never published has become the mathematical lingua franca of rational belief revision — a framework for thinking clearly about uncertainty in any domain where evidence arrives incrementally and decisions must be made before certainty is possible.

mathematicsprobabilitystatisticsreasoning

Related Articles