BGP: The Fragile Protocol That Holds the Internet Together

The Internet's Postal System Runs on Trust

On October 4, 2021, Facebook disappeared from the internet for nearly six hours. The cause was a routine BGP configuration change that went wrong, withdrawing the routes that told the rest of the internet how to reach Facebook's servers. Approximately 3.5 billion users lost access. The outage cost the company an estimated $100 million in revenue. This single incident revealed how the Border Gateway Protocol — a system designed in 1989 on three napkins at a lunch meeting — remains the critical backbone of global internet routing.

BGP connects roughly 75,000 autonomous systems worldwide. Every email, video stream, and web page depends on it. Yet the protocol has no built-in security verification.

Autonomous Systems and Path Selection

The internet is not a single network. It is a collection of independently operated networks called autonomous systems (ASes). Each AS is identified by a unique number (ASN) and controlled by a single organization — an internet service provider, a corporation, a university, or a content delivery network.

BGP enables these autonomous systems to exchange routing information. Each BGP router maintains a routing table that maps IP address prefixes to AS paths. When a router learns multiple routes to the same destination, it selects the best path based on a series of attributes.

BGP Path Selection Criteria

Highest local preference — Network operators assign preference values to influence which paths their routers favor.
Shortest AS path — Fewer hops generally mean faster delivery. A route traversing three ASes is preferred over one traversing five.
Lowest origin type — Routes learned from internal sources are preferred over those learned externally.
Lowest Multi-Exit Discriminator (MED) — When multiple links connect two ASes, MED values guide traffic to the preferred entry point.
eBGP over iBGP — Externally learned routes take precedence over internally learned ones.
Nearest next hop (IGP metric) — Hot-potato routing pushes traffic to the closest exit point.

How BGP Messages Flow

BGP operates over TCP port 179. Two routers establish a BGP session through a three-step process: TCP handshake, OPEN message exchange, and KEEPALIVE confirmation. Once established, the session persists indefinitely unless interrupted.

Message Type	Purpose	When Sent
OPEN	Establishes session parameters (ASN, hold time, router ID)	Session initiation
UPDATE	Announces new routes or withdraws previously announced routes	Route changes
KEEPALIVE	Confirms the session is still active	Every 60 seconds (default)
NOTIFICATION	Reports errors and terminates the session	Error conditions

Route convergence — the time it takes for all routers to agree on the network topology after a change — can take minutes. During convergence, packets may be dropped, routed in loops, or delivered along suboptimal paths.

BGP Hijacking: When Trust Breaks Down

BGP was designed when the internet consisted of a small number of trusted academic and government networks. It has no native mechanism to verify that a network actually owns the IP addresses it claims to route. This trust-based model enables BGP hijacking.

Incident	Year	Description
Pakistan YouTube block	2008	Pakistan Telecom announced a more specific route for YouTube's IP space, accidentally black-holing YouTube traffic globally for two hours
China Telecom rerouting	2010	China Telecom advertised routes for approximately 37,000 prefixes belonging to other networks for 18 minutes
Russian BGP hijack of Amazon, Google	2018	Traffic to Amazon Route 53 DNS servers was redirected through Russia, enabling cryptocurrency theft
Rostelecom hijack	2020	Traffic to over 200 CDN and cloud providers was rerouted through Russian infrastructure for over an hour

Hijacks can be accidental or deliberate. Both are dangerous. An accidental misconfiguration can take down major services. A deliberate hijack can intercept sensitive data or redirect users to malicious servers.

Security Solutions and Their Adoption

The internet community has developed several mechanisms to address BGP's trust deficit. Progress has been slow.

Resource Public Key Infrastructure (RPKI)

RPKI allows IP address holders to cryptographically sign Route Origin Authorizations (ROAs), declaring which ASes are authorized to announce their prefixes. Receiving networks can then validate incoming BGP announcements against these ROAs and reject unauthorized ones.

Adoption is growing but incomplete. By 2024, approximately 50 percent of global routes had valid ROAs, up from 10 percent in 2019. Major providers including AT&T, NTT, and Cloudflare perform RPKI validation. Full protection requires both origin networks to create ROAs and transit networks to enforce validation.

Additional Security Measures

BGPsec — Extends RPKI to validate the entire AS path, not just the origin. Computationally expensive and rarely deployed.
IRR filtering — Internet Routing Registries maintain databases of expected routing policies. Operators can filter routes against these databases.
Prefix length filtering — Rejecting announcements for overly specific prefixes (longer than /24 for IPv4) limits the impact of hijacks.
MANRS (Mutually Agreed Norms for Routing Security) — A voluntary initiative where network operators commit to implementing routing security best practices.

BGP and Internet Outages

BGP misconfigurations cause significant outages with alarming regularity. In June 2019, a small Pennsylvania ISP leaked routes from a major transit provider, causing a chain reaction that disrupted Cloudflare, Amazon, and other services. The entire event traced back to a single router with improper route leak prevention.

Content providers like Google, Amazon, and Microsoft mitigate BGP risks through massive global networks with redundant paths, private peering arrangements, and real-time route monitoring systems. Smaller networks lack these resources, making them more vulnerable to both accidental and malicious route manipulation.

The Protocol That Refuses Replacement

Replacing BGP is effectively impossible. The protocol's simplicity and flexibility made it universal, and that universality makes migration to any alternative prohibitively complex. Instead, the internet community layers security mechanisms on top of BGP's original trust-based design. RPKI adoption accelerates each year. Monitoring platforms like BGPStream and RIPE RIS detect anomalies in near real-time. But the fundamental architecture remains unchanged from those napkin sketches in 1989. The internet's most critical routing decisions still depend on a protocol that takes its neighbors at their word.