How Databases Store and Retrieve Millions of Records Instantly

Databases use B-tree indexes, buffer pools, and query planners to retrieve records in microseconds. Learn how relational and NoSQL engines actually store and find data.

The InfoNexus Editorial TeamMay 17, 20269 min read

Finding One Record in a Billion

Amazon's DynamoDB handles over 89 million requests per second at peak, returning results in single-digit milliseconds. PostgreSQL running on commodity hardware can perform a lookup across 100 million rows in under a millisecond. These aren't acts of raw computational power — they're the result of decades of research into how data is physically organized on disk and in memory, and how query engines navigate that organization. The mechanisms behind these numbers reshape what software can do at scale.

How Data Lives on Disk

Storage in relational databases is organized around pages — fixed-size blocks of data, typically 8KB (PostgreSQL) or 16KB (MySQL/InnoDB). All I/O between disk and memory happens at the page granularity, never byte-by-byte. A table's rows are packed into heap pages; as rows are inserted they fill pages sequentially. A page holds a page header, line pointer array (offsets to each row's start), and the actual row data growing from the other end.

This paging architecture mirrors how operating systems manage memory and disk I/O. Fetching one row requires loading its entire page into memory — which is why locating 100 rows scattered across 100 different pages (a full-table scan over sparse results) is far more expensive than fetching 100 contiguous rows from a few pages.

The B-Tree Index: The Core Data Structure

Without indexes, every query requires scanning every row in a table. For a million-row table, that means reading every page on every query — unacceptable for any responsive application. Indexes solve this by creating auxiliary data structures that map column values to row locations.

The dominant index structure in relational databases is the B-tree (Balanced Tree), specifically the B+ tree variant. A B+ tree has these properties:

  • All data (row pointers) resides in leaf nodes; internal nodes contain only keys for navigation.
  • Leaf nodes are linked, allowing efficient range scans by following the linked list rather than traversing the tree repeatedly.
  • The tree is always balanced — all leaf nodes are at the same depth, guaranteeing O(log n) lookup time regardless of data distribution.
  • For a table of 1 billion rows with a B-tree of order 500, the tree has roughly 3 levels (log₅₀₀(1,000,000,000) ≈ 3). Three page reads locate any row.

Creating an index on a column inserts every value into this structure along with a pointer (either physical page address or primary key, depending on the engine) to the corresponding row. Each INSERT, UPDATE, or DELETE to an indexed column must also update the B-tree — the performance cost of indexes during writes.

Buffer Pool: Keeping Hot Data in Memory

Disk access is measured in milliseconds; RAM access in nanoseconds — roughly 100,000× faster. Database engines maintain a buffer pool (InnoDB) or shared buffer cache (PostgreSQL) — a region of RAM that caches recently or frequently accessed pages. When a query needs a page, the engine checks the buffer pool first; a cache hit avoids disk I/O entirely.

Buffer pools use eviction policies (typically LRU variants, sometimes with adaptive adjustments) to decide which pages to retain when memory fills. PostgreSQL's default shared_buffers is 128MB, though production configurations typically allocate 25% of total RAM. A well-configured database serving a typical web application achieves buffer pool hit rates of 95–99%, meaning most queries never touch disk.

Query Planning and Optimization

When you submit a SQL query, the database engine doesn't execute it directly. It parses the query into an abstract syntax tree, analyzes it, and runs it through the query planner/optimizer — a component that evaluates multiple possible execution plans and picks the cheapest one based on cost estimates.

Execution StrategyWhen UsedCost Profile
Sequential scanRetrieving large fraction of tableHigh I/O, predictable
Index scanFetching few rows by indexed columnLow I/O for sparse results
Index-only scanAll needed columns in indexNo heap access required
Bitmap heap scanModerate row count from indexBatch heap access, efficient
Hash joinEqui-joins on larger tablesMemory-intensive, fast
Merge joinSorted input, equi or range joinsCPU-efficient, requires sorted data

The planner uses table statistics — row counts, value distribution histograms, null fractions — stored in system catalogs. ANALYZE (PostgreSQL) or ANALYZE TABLE (MySQL) refreshes these statistics. Outdated statistics cause the planner to choose suboptimal plans, which is a common source of query performance regressions after bulk data loads.

Transactions and ACID Properties

Databases guarantee correctness through ACID properties — Atomicity, Consistency, Isolation, Durability. The mechanisms enforcing these guarantees have direct performance implications:

  • Write-Ahead Log (WAL): Every change is first written to a sequential log file before modifying data pages. WAL enables durability (log survives crash) and atomicity (uncommitted changes can be rolled back). Sequential log writes are far faster than random page writes.
  • MVCC (Multi-Version Concurrency Control): Rather than locking rows for reads, PostgreSQL and InnoDB maintain multiple versions of rows. Readers see a consistent snapshot without blocking writers. Each row stores transaction IDs indicating visibility; old versions are periodically cleaned up by VACUUM or purge threads.

NoSQL Approaches to the Same Problem

NoSQL databases trade SQL's flexibility for targeted performance gains.

NoSQL TypeExampleStorage StructureStrength
Key-ValueRedis, DynamoDBHash table / LSM treeSub-millisecond single-key lookup
DocumentMongoDBB-tree with BSON documentsFlexible schema, nested data
Column-familyCassandra, HBaseLSM tree, SSTable filesWrite-heavy workloads, wide rows
GraphNeo4jAdjacency list with native pointersTraversing relationships

LSM trees (Log-Structured Merge Trees), used by Cassandra, RocksDB, and LevelDB, replace B-trees for write-heavy workloads. Writes go first to an in-memory structure (memtable), then flush to immutable sorted files (SSTables). Background compaction merges SSTables and removes deleted records. This converts random writes to sequential writes — dramatically faster on both HDDs and SSDs — at the cost of slower reads that may need to consult multiple SSTables.

The choice between storage engines, index types, and database paradigms ultimately depends on workload shape: read-heavy vs. write-heavy, point lookups vs. range scans, strict consistency vs. eventual consistency. No single architecture excels at everything — which is why most large-scale applications use multiple database systems simultaneously.

databasesSQLdata engineeringsoftware

Related Articles