How Bayes' Theorem Works and Why It Changes How You Think About Evidence

Bayes' theorem describes how to update beliefs when new evidence arrives. Learn the math behind it, why human intuition gets it wrong, and why it underlies modern AI and medicine.

The InfoNexus Editorial TeamMay 10, 20269 min read

Updating What You Believe

Most of us were taught to think about probability as a fixed property of the world — the probability that a fair coin lands heads is exactly 50%, full stop. Bayesian probability offers a different interpretation: probability represents a degree of belief, and beliefs should be updated rationally as new evidence arrives. The mathematical engine for this updating is Bayes' theorem, named after the Reverend Thomas Bayes, an 18th-century English statistician and minister whose work was published posthumously in 1763.

At its core, Bayes' theorem tells you how to revise a probability estimate when you learn something new. It connects three quantities: the probability you assigned to a hypothesis before seeing the evidence (the prior), the probability that you would see that evidence if the hypothesis were true (the likelihood), and the probability you should assign to the hypothesis after seeing the evidence (the posterior). The theorem is a precise recipe for rational belief revision.

The Formula and What It Means

The formal statement of Bayes' theorem is: P(H|E) = P(E|H) x P(H) / P(E). Translated: the probability of a hypothesis H given evidence E equals the probability of E given H, multiplied by the prior probability of H, divided by the overall probability of E. Each term has a name and a meaning:

  • P(H|E) — the posterior: what you should believe about the hypothesis after seeing the evidence
  • P(H) — the prior: what you believed about the hypothesis before seeing the evidence
  • P(E|H) — the likelihood: how probable the evidence would be if the hypothesis were true
  • P(E) — the marginal probability of the evidence: how probable the evidence is overall, across all possible hypotheses

The denominator P(E) acts as a normalizing constant, ensuring the posterior probabilities sum to one. In practice, it is often computed by summing P(E|H) x P(H) across all competing hypotheses.

A Medical Example: Why Intuition Fails

The power of Bayes' theorem becomes clear — and the failure of intuition most obvious — in medical testing scenarios. Suppose a disease affects 1 in 1,000 people in the general population. A test for this disease is 99% accurate: it correctly identifies 99% of people who have the disease, and correctly reports negative for 99% of healthy people (1% false positive rate). You test positive. What is the probability you actually have the disease?

Most people guess something close to 99% — after all, the test is 99% accurate. The Bayesian calculation tells a very different story. Imagine testing 100,000 people. About 100 have the disease; the test correctly identifies 99 of them. Of the 99,900 healthy people, 1% (about 999) test positive falsely. So among all positive results (99 + 999 = 1,098), only 99 are true positives. The probability of actually having the disease given a positive test is roughly 99/1,098 — about 9%. This is called base rate neglect: humans systematically ignore the prior probability (1 in 1,000) when processing evidence. Bayes' theorem forces you to account for it.

The Role of Priors

The prior probability — what you believed before seeing the evidence — is where Bayesian reasoning gets philosophically interesting and sometimes controversial. In the medical example, the prior is the disease's prevalence in the population. But what is the appropriate prior when reasoning about a novel scientific hypothesis, a criminal suspect's guilt, or a new business idea?

Bayesians argue that any rational reasoner should be able to specify a prior — it just represents their initial state of uncertainty, which may be informed by background knowledge, past experience, or theoretical considerations. Skeptics worry that priors can be arbitrary or manipulated, that different priors can lead to dramatically different posteriors even from the same evidence, and that the choice of prior imports subjective judgments into what should be objective inference. This debate — between Bayesian (subjective) and frequentist (objective) statistics — has shaped the philosophy of science and the practice of statistics for decades.

Sequential Updating

One of Bayes' theorem's most elegant features is that it applies sequentially. Today's posterior becomes tomorrow's prior. Each new piece of evidence updates your beliefs incrementally, and the order in which you receive evidence does not matter — you end up in the same place whether you learn A then B or B then A. This makes Bayesian reasoning ideal for settings where evidence accumulates gradually: scientific research, medical diagnosis over multiple tests, intelligence analysis, and machine learning.

This sequential property also captures something important about how rational belief revision should work. A Bayesian reasoner who starts with a very different prior than another will gradually converge on the same posterior as they accumulate sufficient evidence — as long as both are updating rationally. Evidence, applied consistently through Bayes' theorem, should eventually overcome the differences in starting assumptions. This is why scientific communities with different theoretical priors can reach consensus through accumulated experimental evidence.

Bayesian Methods in Science and AI

Bayes' theorem is not just a philosophical tool — it has become a core method in statistics, machine learning, and scientific inference. In clinical trials, Bayesian adaptive designs allow researchers to update their estimates of a treatment's effectiveness as data accumulates and to stop trials earlier if evidence becomes decisive, potentially saving both time and patient welfare. Bayesian networks — graphical models that represent probabilistic relationships between variables — are used in medical diagnosis, fault detection, natural language processing, and countless other applications.

Modern machine learning is deeply Bayesian in spirit, even when not explicitly so. The process of training a neural network updates internal parameters based on evidence (training data) in a way that parallels Bayesian updating. Probabilistic programming languages like Stan, PyMC3, and Pyro make it possible to build explicit Bayesian models of complex data-generating processes, with priors specified for every uncertain quantity and posteriors computed by sophisticated numerical algorithms.

Bayesian Thinking in Everyday Life

You do not need to do any explicit math to benefit from Bayesian thinking. The key insight is that evidence should update beliefs in proportion to how much it differentiates hypotheses. If something was already very likely, weak evidence for it should not dramatically increase your confidence. If something was very unlikely, even strong evidence may not make it probable. When evaluating a claim, ask: what would the world look like if this were true, and does what I see match that? If an alternative explanation predicts the same evidence equally well, seeing that evidence should not favor one explanation over another.

This kind of reasoning helps guard against several common cognitive errors: confirmation bias (seeking only evidence that supports your prior beliefs), base rate neglect (ignoring how unlikely something was before you saw the evidence), and the prosecutor's fallacy (confusing P(evidence | not guilty) with P(not guilty | evidence)). Bayes' theorem does not make these errors impossible — humans are not calculating machines — but understanding it gives you a framework for recognizing and correcting them.

MathematicsStatisticsProbability

Related Articles