BGP: The Fragile Protocol That Holds the Internet Together
Understand how the Border Gateway Protocol routes traffic between autonomous systems, why BGP hijacking threatens internet stability, and how RPKI aims to fix it.
The Internet's Postal System Runs on Trust
On October 4, 2021, Facebook disappeared from the internet for nearly six hours. The cause was a routine BGP configuration change that went wrong, withdrawing the routes that told the rest of the internet how to reach Facebook's servers. Approximately 3.5 billion users lost access. The outage cost the company an estimated $100 million in revenue. This single incident revealed how the Border Gateway Protocol — a system designed in 1989 on three napkins at a lunch meeting — remains the critical backbone of global internet routing.
BGP connects roughly 75,000 autonomous systems worldwide. Every email, video stream, and web page depends on it. Yet the protocol has no built-in security verification.
Autonomous Systems and Path Selection
The internet is not a single network. It is a collection of independently operated networks called autonomous systems (ASes). Each AS is identified by a unique number (ASN) and controlled by a single organization — an internet service provider, a corporation, a university, or a content delivery network.
BGP enables these autonomous systems to exchange routing information. Each BGP router maintains a routing table that maps IP address prefixes to AS paths. When a router learns multiple routes to the same destination, it selects the best path based on a series of attributes.
BGP Path Selection Criteria
- Highest local preference — Network operators assign preference values to influence which paths their routers favor.
- Shortest AS path — Fewer hops generally mean faster delivery. A route traversing three ASes is preferred over one traversing five.
- Lowest origin type — Routes learned from internal sources are preferred over those learned externally.
- Lowest Multi-Exit Discriminator (MED) — When multiple links connect two ASes, MED values guide traffic to the preferred entry point.
- eBGP over iBGP — Externally learned routes take precedence over internally learned ones.
- Nearest next hop (IGP metric) — Hot-potato routing pushes traffic to the closest exit point.
How BGP Messages Flow
BGP operates over TCP port 179. Two routers establish a BGP session through a three-step process: TCP handshake, OPEN message exchange, and KEEPALIVE confirmation. Once established, the session persists indefinitely unless interrupted.
| Message Type | Purpose | When Sent |
|---|---|---|
| OPEN | Establishes session parameters (ASN, hold time, router ID) | Session initiation |
| UPDATE | Announces new routes or withdraws previously announced routes | Route changes |
| KEEPALIVE | Confirms the session is still active | Every 60 seconds (default) |
| NOTIFICATION | Reports errors and terminates the session | Error conditions |
Route convergence — the time it takes for all routers to agree on the network topology after a change — can take minutes. During convergence, packets may be dropped, routed in loops, or delivered along suboptimal paths.
BGP Hijacking: When Trust Breaks Down
BGP was designed when the internet consisted of a small number of trusted academic and government networks. It has no native mechanism to verify that a network actually owns the IP addresses it claims to route. This trust-based model enables BGP hijacking.
| Incident | Year | Description |
|---|---|---|
| Pakistan YouTube block | 2008 | Pakistan Telecom announced a more specific route for YouTube's IP space, accidentally black-holing YouTube traffic globally for two hours |
| China Telecom rerouting | 2010 | China Telecom advertised routes for approximately 37,000 prefixes belonging to other networks for 18 minutes |
| Russian BGP hijack of Amazon, Google | 2018 | Traffic to Amazon Route 53 DNS servers was redirected through Russia, enabling cryptocurrency theft |
| Rostelecom hijack | 2020 | Traffic to over 200 CDN and cloud providers was rerouted through Russian infrastructure for over an hour |
Hijacks can be accidental or deliberate. Both are dangerous. An accidental misconfiguration can take down major services. A deliberate hijack can intercept sensitive data or redirect users to malicious servers.
Security Solutions and Their Adoption
The internet community has developed several mechanisms to address BGP's trust deficit. Progress has been slow.
Resource Public Key Infrastructure (RPKI)
RPKI allows IP address holders to cryptographically sign Route Origin Authorizations (ROAs), declaring which ASes are authorized to announce their prefixes. Receiving networks can then validate incoming BGP announcements against these ROAs and reject unauthorized ones.
Adoption is growing but incomplete. By 2024, approximately 50 percent of global routes had valid ROAs, up from 10 percent in 2019. Major providers including AT&T, NTT, and Cloudflare perform RPKI validation. Full protection requires both origin networks to create ROAs and transit networks to enforce validation.
Additional Security Measures
- BGPsec — Extends RPKI to validate the entire AS path, not just the origin. Computationally expensive and rarely deployed.
- IRR filtering — Internet Routing Registries maintain databases of expected routing policies. Operators can filter routes against these databases.
- Prefix length filtering — Rejecting announcements for overly specific prefixes (longer than /24 for IPv4) limits the impact of hijacks.
- MANRS (Mutually Agreed Norms for Routing Security) — A voluntary initiative where network operators commit to implementing routing security best practices.
BGP and Internet Outages
BGP misconfigurations cause significant outages with alarming regularity. In June 2019, a small Pennsylvania ISP leaked routes from a major transit provider, causing a chain reaction that disrupted Cloudflare, Amazon, and other services. The entire event traced back to a single router with improper route leak prevention.
Content providers like Google, Amazon, and Microsoft mitigate BGP risks through massive global networks with redundant paths, private peering arrangements, and real-time route monitoring systems. Smaller networks lack these resources, making them more vulnerable to both accidental and malicious route manipulation.
The Protocol That Refuses Replacement
Replacing BGP is effectively impossible. The protocol's simplicity and flexibility made it universal, and that universality makes migration to any alternative prohibitively complex. Instead, the internet community layers security mechanisms on top of BGP's original trust-based design. RPKI adoption accelerates each year. Monitoring platforms like BGPStream and RIPE RIS detect anomalies in near real-time. But the fundamental architecture remains unchanged from those napkin sketches in 1989. The internet's most critical routing decisions still depend on a protocol that takes its neighbors at their word.
Related Articles
artificial intelligence
AI Ethics: Bias, Fairness, Accountability, and the Governance Challenge
AI systems can embed and amplify human biases, produce discriminatory outcomes, and evade accountability. Explore the core ethical challenges in AI development, from algorithmic fairness to governance frameworks shaping the future of the technology.
11 min read
artificial intelligence
The History of AI: From Turing's Test to ChatGPT (Part 2)
Artificial intelligence has a richer and more turbulent history than most people realize, stretching back more than seventy years. This article traces the key breakthroughs, painful setbacks, and unexpected leaps that brought us from Alan Turing's 1950 thought experiment to the ChatGPT era.
8 min read
artificial intelligence
Neural Networks for Beginners: How AI Mimics the Brain (Part 5)
Neural networks are the engine behind most modern AI, from image recognition to language generation. This beginner-friendly guide explains neurons, layers, weights, activation functions, and the training process in plain language — no math required.
8 min read
artificial intelligence
Generative AI Explained: How ChatGPT and Image Generators Work (Part 8)
Generative AI can write essays, compose code, paint images, and hold conversations — but how does it actually work? This article demystifies large language models, diffusion-based image generators, and the art and science of prompting.
8 min read