How Version Control Systems Track and Manage Code Changes
Version control systems record every code change with author, timestamp, and context. Learn how Git's DAG model, branching, merging, and distributed architecture work.
94 Million Developers. One Tool. Built in Two Weeks.
Linus Torvalds created Git in April 2005 in approximately 10 days, motivated by the collapse of his free license for BitKeeper — the proprietary tool used to manage the Linux kernel source. His requirements were specific: support distributed development across thousands of contributors, handle the kernel's 6.7 million lines of code efficiently, and be faster than any existing system. Git met all three requirements and became the dominant version control system globally. As of 2024, GitHub alone hosts over 420 million repositories and serves 100 million developers.
Version control systems (VCS) record changes to files over time, enabling teams to review history, revert mistakes, work on parallel features simultaneously, and merge independent work streams back together. Without version control, collaborative software development at any meaningful scale is effectively impossible.
The Evolution of Version Control Models
| Generation | Examples | Model | Limitation |
|---|---|---|---|
| Local VCS | RCS | Tracks changes in a database on local disk | No collaboration; single machine |
| Centralized VCS | CVS, Subversion (SVN) | Single central server holds all history; clients check out files | Central server is single point of failure; no offline commits |
| Distributed VCS | Git, Mercurial | Every client holds complete repository history; no central server required | Higher disk usage; steeper learning curve |
Subversion (SVN), the dominant centralized VCS before Git, stored changes as differences (deltas) from a central baseline. Every commit required network access to the server. If the server was down, development stopped. Git's distributed model means every developer has a full local copy of all history — commits, branches, and tags — enabling fast local operations and offline work.
Git's Data Model: A DAG of Immutable Objects
Git's storage model is the foundation of everything else it does. Rather than tracking file differences, Git stores snapshots of the entire repository at each commit. Four types of objects are stored in Git's object store, identified by the SHA-1 hash of their contents.
- Blob objects: Store the raw content of individual files — no filename, no metadata, just content; identical content across different files produces a single shared blob
- Tree objects: Represent directory structures, mapping filenames to blob hashes for files and nested tree hashes for subdirectories
- Commit objects: Point to a root tree snapshot, reference parent commit(s), and record author, committer, timestamp, and the commit message
- Tag objects: Named, permanent pointers to specific commits (annotated tags also store a message and signature)
Commits form a Directed Acyclic Graph (DAG). Each commit points to its parent(s) — one parent for normal commits, two parents for merge commits. This graph structure is the complete, immutable history of the repository. Because objects are identified by cryptographic hashes of their contents, any modification to any historical commit changes its hash, which changes its child commit's parent reference, which changes the child's hash — making unauthorized history modification detectable.
Branching and Merging
A Git branch is simply a lightweight pointer — a file containing a 40-character SHA-1 hash — to a specific commit. Creating a new branch is instantaneous and requires no copying. This stands in stark contrast to SVN branching, which physically copied the entire directory tree on the server.
- HEAD: A special pointer that indicates the currently checked-out commit or branch; when a new commit is made, HEAD advances to point to the new commit
- Fast-forward merge: When the target branch is a direct ancestor of the source branch, Git simply moves the pointer forward — no merge commit needed
- Three-way merge: When branches have diverged, Git finds the common ancestor commit and merges changes from both branches relative to it; conflicts occur when both branches modified the same lines differently
- Rebase: Replays commits from one branch on top of another, creating a linear history without merge commits — rewrites commit hashes and should never be done on shared/published branches
Distributed Workflows and Remote Repositories
Git's distributed nature supports multiple collaboration workflows. In the centralized workflow used by most teams, a "bare" remote repository (typically on GitHub, GitLab, or Bitbucket) serves as the shared coordination point. Developers clone, pull, and push — but every operation against the remote is explicit and the full history exists locally.
| Workflow | Description | Common Use |
|---|---|---|
| Centralized workflow | All developers commit to a single main branch | Small teams, simple projects |
| Feature branch workflow | Each feature developed in a dedicated branch, merged via pull request | Most professional teams |
| Gitflow | Defined branches for features, releases, hotfixes, and main | Projects with scheduled release cycles |
| Forking workflow | Developers fork the project and submit pull requests from their fork | Open source projects (Linux, CPython) |
| Trunk-based development | All developers commit to main frequently using feature flags | High-velocity teams with CI/CD pipelines |
Git Internals: What Makes It Fast
Git's performance advantage over predecessors comes from several design decisions. Objects are stored in a content-addressable file system indexed by hash — lookups are O(1). The packfile format compresses similar objects together using delta compression, allowing hundreds of thousands of file versions to be stored compactly. The reflog — a local record of every reference update — provides a safety net for recovering from mistakes like accidental branch deletion or forced push overwrites.
The Staging Area (Index) is one of Git's most distinctive features and a frequent source of confusion for new users. The Index is an intermediate layer between the working directory and the repository — changes must be explicitly staged before they are included in a commit. This allows granular commits that include only logically related changes, even when the working directory contains multiple unrelated modifications.
GitHub's pull request workflow, which wraps Git's branching and merging capabilities in a code review interface, has become the standard unit of software contribution — both commercially and in open source. The combination of git blame (identifying who changed each line), git bisect (binary search through history to find the commit that introduced a bug), and rich diff views makes version control's historical record a living diagnostic tool, not merely an archive.
Related Articles
software
APIs Explained: How Software Systems Talk to Each Other
Learn what APIs are, how REST, GraphQL, and gRPC work, key concepts like authentication, rate limiting, and versioning, and why APIs are the internet's building blocks.
9 min read
software
How Chess Engines Outthink Human Grandmasters at Every Level
Stockfish evaluates millions of positions per second using minimax and alpha-beta pruning. AlphaZero learned from scratch with neural networks. Here's how engines surpass human play.
9 min read
software
How Electric Vehicles Differ From Combustion Engines in Efficiency, Cost, and Impact
EVs convert 85–90% of battery energy to motion vs. 20–40% for combustion engines. Battery chemistry, regenerative braking, charging networks, and lifecycle emissions comparisons reveal the full picture.
9 min read
software
How Lithium-Ion Batteries Store and Release Energy
Lithium-ion batteries power everything from phones to electric vehicles through lithium intercalation chemistry. Explore NMC vs LFP tradeoffs, degradation, thermal runaway, and recycling challenges.
9 min read