How Databases Store and Retrieve Millions of Records Instantly

Finding One Record in a Billion

Amazon's DynamoDB handles over 89 million requests per second at peak, returning results in single-digit milliseconds. PostgreSQL running on commodity hardware can perform a lookup across 100 million rows in under a millisecond. These aren't acts of raw computational power — they're the result of decades of research into how data is physically organized on disk and in memory, and how query engines navigate that organization. The mechanisms behind these numbers reshape what software can do at scale.

How Data Lives on Disk

Storage in relational databases is organized around pages — fixed-size blocks of data, typically 8KB (PostgreSQL) or 16KB (MySQL/InnoDB). All I/O between disk and memory happens at the page granularity, never byte-by-byte. A table's rows are packed into heap pages; as rows are inserted they fill pages sequentially. A page holds a page header, line pointer array (offsets to each row's start), and the actual row data growing from the other end.

This paging architecture mirrors how operating systems manage memory and disk I/O. Fetching one row requires loading its entire page into memory — which is why locating 100 rows scattered across 100 different pages (a full-table scan over sparse results) is far more expensive than fetching 100 contiguous rows from a few pages.

The B-Tree Index: The Core Data Structure

Without indexes, every query requires scanning every row in a table. For a million-row table, that means reading every page on every query — unacceptable for any responsive application. Indexes solve this by creating auxiliary data structures that map column values to row locations.

The dominant index structure in relational databases is the B-tree (Balanced Tree), specifically the B+ tree variant. A B+ tree has these properties:

All data (row pointers) resides in leaf nodes; internal nodes contain only keys for navigation.
Leaf nodes are linked, allowing efficient range scans by following the linked list rather than traversing the tree repeatedly.
The tree is always balanced — all leaf nodes are at the same depth, guaranteeing O(log n) lookup time regardless of data distribution.
For a table of 1 billion rows with a B-tree of order 500, the tree has roughly 3 levels (log₅₀₀(1,000,000,000) ≈ 3). Three page reads locate any row.

Creating an index on a column inserts every value into this structure along with a pointer (either physical page address or primary key, depending on the engine) to the corresponding row. Each INSERT, UPDATE, or DELETE to an indexed column must also update the B-tree — the performance cost of indexes during writes.

Buffer Pool: Keeping Hot Data in Memory

Disk access is measured in milliseconds; RAM access in nanoseconds — roughly 100,000× faster. Database engines maintain a buffer pool (InnoDB) or shared buffer cache (PostgreSQL) — a region of RAM that caches recently or frequently accessed pages. When a query needs a page, the engine checks the buffer pool first; a cache hit avoids disk I/O entirely.

Buffer pools use eviction policies (typically LRU variants, sometimes with adaptive adjustments) to decide which pages to retain when memory fills. PostgreSQL's default shared_buffers is 128MB, though production configurations typically allocate 25% of total RAM. A well-configured database serving a typical web application achieves buffer pool hit rates of 95–99%, meaning most queries never touch disk.

Query Planning and Optimization

When you submit a SQL query, the database engine doesn't execute it directly. It parses the query into an abstract syntax tree, analyzes it, and runs it through the query planner/optimizer — a component that evaluates multiple possible execution plans and picks the cheapest one based on cost estimates.

Execution Strategy	When Used	Cost Profile
Sequential scan	Retrieving large fraction of table	High I/O, predictable
Index scan	Fetching few rows by indexed column	Low I/O for sparse results
Index-only scan	All needed columns in index	No heap access required
Bitmap heap scan	Moderate row count from index	Batch heap access, efficient
Hash join	Equi-joins on larger tables	Memory-intensive, fast
Merge join	Sorted input, equi or range joins	CPU-efficient, requires sorted data

The planner uses table statistics — row counts, value distribution histograms, null fractions — stored in system catalogs. ANALYZE (PostgreSQL) or ANALYZE TABLE (MySQL) refreshes these statistics. Outdated statistics cause the planner to choose suboptimal plans, which is a common source of query performance regressions after bulk data loads.

Transactions and ACID Properties

Databases guarantee correctness through ACID properties — Atomicity, Consistency, Isolation, Durability. The mechanisms enforcing these guarantees have direct performance implications:

Write-Ahead Log (WAL): Every change is first written to a sequential log file before modifying data pages. WAL enables durability (log survives crash) and atomicity (uncommitted changes can be rolled back). Sequential log writes are far faster than random page writes.
MVCC (Multi-Version Concurrency Control): Rather than locking rows for reads, PostgreSQL and InnoDB maintain multiple versions of rows. Readers see a consistent snapshot without blocking writers. Each row stores transaction IDs indicating visibility; old versions are periodically cleaned up by VACUUM or purge threads.

NoSQL Approaches to the Same Problem

NoSQL databases trade SQL's flexibility for targeted performance gains.

NoSQL Type	Example	Storage Structure	Strength
Key-Value	Redis, DynamoDB	Hash table / LSM tree	Sub-millisecond single-key lookup
Document	MongoDB	B-tree with BSON documents	Flexible schema, nested data
Column-family	Cassandra, HBase	LSM tree, SSTable files	Write-heavy workloads, wide rows
Graph	Neo4j	Adjacency list with native pointers	Traversing relationships

LSM trees (Log-Structured Merge Trees), used by Cassandra, RocksDB, and LevelDB, replace B-trees for write-heavy workloads. Writes go first to an in-memory structure (memtable), then flush to immutable sorted files (SSTables). Background compaction merges SSTables and removes deleted records. This converts random writes to sequential writes — dramatically faster on both HDDs and SSDs — at the cost of slower reads that may need to consult multiple SSTables.

The choice between storage engines, index types, and database paradigms ultimately depends on workload shape: read-heavy vs. write-heavy, point lookups vs. range scans, strict consistency vs. eventual consistency. No single architecture excels at everything — which is why most large-scale applications use multiple database systems simultaneously.

How Databases Store and Retrieve Millions of Records Instantly

Finding One Record in a Billion

How Data Lives on Disk

The B-Tree Index: The Core Data Structure

Buffer Pool: Keeping Hot Data in Memory

Query Planning and Optimization

Transactions and ACID Properties

NoSQL Approaches to the Same Problem

Related Articles

APIs Explained: How Software Systems Talk to Each Other

How Chess Engines Outthink Human Grandmasters at Every Level

How Fiber Optic Cables Transmit Data at Light Speed

How Memory Chips Store and Retrieve Information