The Birthday Paradox: Why 23 People Make a 50% Match Likely
Why only 23 people are needed for a 50% chance of a shared birthday, the probability calculation, real-world applications in cryptography and hashing, and common misconceptions.
23 People, 365 Days — 50.7% Chance Two Share a Birthday
The Birthday Paradox is the surprising fact that in a group of just 23 people, the probability that two of them share the same birthday exceeds 50%. With 57 people, the probability reaches 99%. With 366 people, by the pigeonhole principle, it is guaranteed. These numbers feel wrong to most people because intuition frames the question differently: "What are the chances someone shares MY birthday?" That question would indeed require around 253 people for a 50% chance. The paradox lies in comparing any pair, not a specific pair. The number of possible pairs grows quadratically — 23 people create 253 distinct pairings, and any one of those 253 pairs might match.
The Calculation
The easiest approach calculates the complementary probability — the chance that no two people share a birthday — then subtracts from 1.
- Person 1's birthday: can be anything — 365/365 probability = 1.0000
- Person 2's birthday: must differ from person 1 — 364/365 = 0.9973
- Person 3's birthday: must differ from persons 1 and 2 — 363/365 = 0.9945
- Person 23's birthday: must differ from all previous 22 — 343/365 = 0.9397
The probability that all 23 birthdays are unique: (365/365) × (364/365) × (363/365) × ... × (343/365) = approximately 0.4927.
Therefore, the probability that at least two people share a birthday = 1 − 0.4927 = 0.5073, or 50.7%.
| Group Size | Probability of At Least One Shared Birthday |
|---|---|
| 10 people | 11.7% |
| 20 people | 41.1% |
| 23 people | 50.7% |
| 30 people | 70.6% |
| 40 people | 89.1% |
| 57 people | 99.0% |
| 70 people | 99.9% |
Why Intuition Fails: Pairs vs. Targets
The mismatch between intuition and reality stems from confusing two different questions. People naturally frame the birthday problem as a targeting question: what is the chance that someone in the room matches me? With 22 other people, each with a 1/365 chance of matching your birthday, the probability of at least one match is 1 − (364/365)²² ≈ 5.9% — tiny and unsurprising.
The birthday paradox asks a different question: what is the chance that any two people in the room match each other? With 23 people, there are 23×22/2 = 253 possible pairs. Each pair has a 1/365 chance of sharing a birthday. With 253 independent shots, the combined probability approaches 50%. The quadratic growth in the number of pairs — not the linear growth in group size — drives the rapid probability increase.
The Birthday Attack in Cryptography
The Birthday Paradox is not merely a probability curiosity. It directly threatens cryptographic hash functions. A hash function takes arbitrary input and produces a fixed-length output called a digest. A "collision" occurs when two different inputs produce the same digest. For a hash function with an n-bit output, the naive assumption might be that finding a collision requires testing 2ⁿ inputs. The Birthday Paradox shows that collisions become likely after testing only approximately 2^(n/2) inputs.
| Hash Function | Output Size | Theoretical Collision Resistance | Birthday Attack Complexity |
|---|---|---|---|
| MD5 (deprecated) | 128 bits | 2¹²⁸ operations | ~2⁶⁴ operations (broken in practice) |
| SHA-1 (deprecated) | 160 bits | 2¹⁶⁰ operations | ~2⁸⁰ operations (broken by Google/CWI 2017) |
| SHA-256 | 256 bits | 2²⁵⁶ operations | ~2¹²⁸ operations (computationally infeasible) |
| SHA-3 (Keccak) | 256–512 bits | 2²⁵⁶–2⁵¹² operations | ~2¹²⁸–2²⁵⁶ operations (secure) |
Google and CWI Amsterdam demonstrated the first practical SHA-1 collision in 2017 (the "SHAttered" attack), producing two different PDF files with identical SHA-1 hashes after approximately 9.2 × 10¹⁸ SHA-1 computations. The attack cost an estimated $110,000 in cloud computing time — feasible for well-funded adversaries. Birthday-bound complexity is why modern security standards require 256-bit or larger hash outputs.
Assumptions and Real-World Deviations
The 50.7% figure for 23 people assumes birthdays are uniformly distributed across all 365 days (ignoring leap years). Real birthdays are not uniform.
- Births peak in late summer in the United States — August and September consistently show higher birth rates
- The least common birthday in the US is December 25, followed by January 1 — likely due to scheduled C-section and induction avoidance around holidays
- Non-uniform distribution generally increases the probability of a match — when some days are more common than others, the pigeonhole effect strengthens
- Leap year birthdays (February 29) affect about 1 in 1,461 people; ignoring them introduces a small underestimate in collision probability
Extensions: Near-Miss and Half-Birthday Variants
The birthday problem extends naturally to "near misses" — matching within a window of k days rather than exactly. With a 7-day window, the probability of a near-miss birthday match in a group of just 14 people exceeds 50%. This variant is relevant in scheduling problems, where approximate rather than exact matches create conflicts. The same mathematical framework also applies to IP address collisions in network routing, coupon collector problems in combinatorics, and random number generator cycle detection — anywhere distinct items are drawn from a finite pool and collisions have practical consequences.
Related Articles
mathematics
Bayesian Inference: Priors, Posteriors, and Updating Beliefs with Data
How Bayes' theorem works, what prior and posterior distributions mean, the base rate neglect problem in medical testing, the Bayesian vs. frequentist debate, and MCMC computation.
9 min read
mathematics
Fermat's Last Theorem: 358 Years From Margin Note to Proof
How Fermat's 1637 claim sat unproven for 358 years until Andrew Wiles secretly spent 7 years connecting elliptic curves, modular forms, and the Shimura-Taniyama-Weil conjecture.
9 min read
mathematics
Gödel's Incompleteness Theorems: The Limits of Mathematical Truth
Gödel's two incompleteness theorems explained, the self-referential proof method, what they mean for formal systems, and their influence on mathematics and computing.
9 min read
mathematics
P vs NP: The Million-Dollar Problem at the Heart of Computer Science
What P vs NP means, examples of P and NP problems, why the question matters, Cook's theorem, and the implications if P equals NP or does not.
9 min read