Bayesian Statistics: Updating Beliefs with Evidence

Probability as a State of Knowledge

A drug company tests a new compound. The trial shows a positive result. How confident should regulators be that the drug actually works? Classical frequentist statistics gives one answer. Bayesian statistics gives a different, arguably more intuitive one — by explicitly incorporating prior knowledge about how often drug candidates actually work, and updating that knowledge with the trial evidence.

Bayesian statistics treats probability not as a long-run frequency of events but as a quantified degree of belief — a measure of how confident a reasoning agent is that something is true, given available information. This philosophical difference has profound practical consequences. It allows statisticians to make direct probability statements about hypotheses ("there is a 94% probability the drug is effective"), include prior information systematically, and update beliefs incrementally as new data arrives — in a framework grounded in mathematics rather than informal judgment.

The Bayesian Framework

The entire framework rests on one equation — Bayes' theorem — applied to the relationship between data and hypotheses:

P(θ | data) = P(data | θ) × P(θ) / P(data)

Where:

P(θ): Prior probability — beliefs about parameter θ before observing data; encodes existing knowledge or uncertainty
P(data | θ): Likelihood — the probability of observing the data given parameter θ; comes from the statistical model
P(data): Marginal likelihood (evidence) — a normalizing constant ensuring the posterior integrates to 1
P(θ | data): Posterior probability — updated beliefs about θ after incorporating the observed data; the goal of Bayesian analysis

The posterior combines prior and likelihood multiplicatively. A strong likelihood can overwhelm a diffuse prior. A strong prior can moderate an extreme likelihood. This balance corresponds naturally to how scientific knowledge accumulates: as data accumulate, posteriors become increasingly concentrated around the true parameter value, regardless of the initial prior (under standard regularity conditions).

Choosing Priors

The prior distribution is simultaneously the most powerful and most controversial aspect of Bayesian analysis. Critics argue it introduces subjectivity; proponents argue it provides a principled mechanism for incorporating genuine prior knowledge and that all statistical analyses make assumptions, whether explicit or not.

Prior Type	Description	When to Use
Informative prior	Reflects genuine prior knowledge (e.g., from previous studies)	When reliable prior information exists
Weakly informative prior	Provides some regularization without strongly constraining posterior	Default choice in many practical analyses
Non-informative (flat) prior	Assigns roughly equal probability to all parameter values	When prior knowledge is genuinely absent
Conjugate prior	From a family where prior × likelihood yields posterior in same family	Analytical convenience; enables closed-form solutions
Jeffreys prior	Invariant to reparameterization; based on Fisher information	Objective Bayesian analysis

Conjugate priors simplify computation dramatically. For a binomial likelihood (counts of successes), the Beta distribution is the conjugate prior — yielding a Beta posterior. For a normal likelihood with known variance, the normal prior is conjugate. These analytically tractable cases allow exact Bayesian inference without numerical methods, though they apply only to a limited set of models.

Bayesian vs. Frequentist Inference

The two schools of statistical thought produce different answers to similar questions, and the differences matter in practice:

Concept	Frequentist	Bayesian
Probability	Long-run frequency of repeatable events	Degree of belief; applies to unique events
Parameter status	Fixed unknown constant	Random variable with a distribution
Uncertainty interval	Confidence interval: 95% of such intervals contain the true parameter	Credible interval: 95% posterior probability that parameter lies in interval
Hypothesis testing	p-value: prob. of data this extreme under H₀	Bayes factor: ratio of evidence for H₁ vs H₀
Prior information	Not formally incorporated	Explicitly incorporated via prior distribution

The frequentist confidence interval is routinely misinterpreted as a Bayesian credible interval. A 95% confidence interval does not mean there is a 95% probability the true parameter lies in that specific interval — it means the procedure, if repeated many times, would produce intervals containing the true parameter in 95% of cases. The Bayesian credible interval is the direct probability statement most users want: given the data and prior, there is a 95% posterior probability the parameter lies in this range.

Markov Chain Monte Carlo: Making Bayesian Analysis Practical

For all but the simplest models, computing the posterior analytically is impossible. The marginal likelihood P(data) — the normalizing constant — requires integrating the likelihood over all possible parameter values, often a high-dimensional integral with no closed form.

Markov Chain Monte Carlo (MCMC) methods solve this computationally. Rather than computing the posterior distribution analytically, MCMC generates samples from it by constructing a Markov chain that has the posterior as its stationary distribution. As the chain runs, samples accumulate from the posterior, enabling estimation of any posterior quantity (means, credible intervals, predictive distributions) from the empirical distribution of samples.

Metropolis-Hastings algorithm (1953, 1970): The foundational MCMC method; proposes moves through parameter space and accepts or rejects each move probabilistically to ensure samples come from the correct target distribution
Gibbs sampling: A special case applicable when full conditional distributions (the distribution of each parameter given all others and the data) are available in closed form; samples each parameter in turn
Hamiltonian Monte Carlo (HMC): Uses gradient information (as in physical Hamiltonian mechanics) to propose distant moves efficiently, dramatically reducing autocorrelation in chains; implemented in Stan and PyMC
No-U-Turn Sampler (NUTS): An adaptive extension of HMC that automatically tunes simulation length; the default sampler in Stan

Applications Across Disciplines

Bayesian methods now permeate scientific practice across fields:

Medical diagnosis: Bayesian reasoning is essential for interpreting diagnostic tests; a test's sensitivity and specificity combine with disease prevalence (the prior) via Bayes' theorem to give the posterior probability of disease given a positive test result
Clinical trials: Bayesian adaptive trial designs adjust sample allocation based on accumulating evidence, potentially reducing trial size and exposing fewer patients to inferior treatments
Gravitational wave detection: LIGO data analysis uses Bayesian inference to estimate parameters of merging black holes or neutron stars from noisy signals
Machine learning: Bayesian neural networks quantify prediction uncertainty; Gaussian processes provide a fully Bayesian approach to regression; variational inference enables scalable approximate Bayesian methods in large models
Spam filtering: Naive Bayes classifiers update word-probability estimates as new spam examples are observed, adapting to changing spam patterns

Bayesian Model Comparison

Bayesian statistics provides a natural solution to model selection: the Bayes factor. For two competing models M₁ and M₂, the Bayes factor BF₁₂ = P(data | M₁) / P(data | M₂) quantifies how much more probable the observed data is under M₁ than under M₂. Unlike frequentist model comparison metrics, the Bayes factor automatically penalizes model complexity — a more complex model must fit the data substantially better to overcome the prior probability spread across a larger parameter space. This built-in Occam's Razor makes Bayesian model comparison a principled approach to choosing between competing scientific theories.

Bayesian Statistics: Updating Beliefs with Evidence

Probability as a State of Knowledge

The Bayesian Framework

Choosing Priors

Bayesian vs. Frequentist Inference

Markov Chain Monte Carlo: Making Bayesian Analysis Practical

Applications Across Disciplines

Bayesian Model Comparison

Related Articles

Bayes' Theorem: How to Update Beliefs With New Evidence

Game Theory Explained: Nash Equilibria, Prisoner's Dilemma, and Strategic Decision-Making

How Bayesian Statistics Updates Beliefs With New Evidence

How Compound Interest Works: The Math Behind Exponential Growth