How Git Version Control Tracks Every Change in a Codebase
Git stores code history as a directed acyclic graph of snapshots, not diffs. Learn how commits, branches, merges, and distributed workflows actually function under the hood.
Built to Survive a Linux Kernel Crisis
Git was created by Linus Torvalds in April 2005 in just ten days. The impetus was urgent: BitKeeper, the proprietary version control system used to manage the Linux kernel's 6 million lines of code, had revoked the kernel team's free license. Torvalds, dissatisfied with every existing alternative, wrote his own. His design requirements were uncompromising: it had to be fast, support distributed workflows with no central server, and guarantee data integrity through cryptographic hashing. Git met all three goals, and those goals explain why it now hosts over 300 million repositories on GitHub alone.
Objects, Not Diffs
Most people think of version control as storing the differences between file versions — a sequence of edits. Git works differently. It stores snapshots of the complete file tree at each point in time. When you make a commit, Git takes a picture of every file in your repository and stores that state as a tree of objects in the hidden .git/ directory.
Git uses four object types, each identified by its SHA-1 (and increasingly SHA-256) hash:
- Blob: Stores the raw content of a single file. No filename, no metadata — just bytes.
- Tree: Stores a directory listing: filenames, associated blob hashes, and file permissions. Trees can reference other trees (subdirectories).
- Commit: Points to a tree (the root of the snapshot), one or more parent commit hashes, and metadata (author, timestamp, message).
- Tag: A named pointer to a commit, optionally with a message and signature.
If a file has not changed between two commits, both commits' trees point to the same blob object. Git does not duplicate unchanged content. This makes Git's storage surprisingly efficient despite being snapshot-based rather than diff-based. Git also uses delta compression in packfiles — grouped storage of objects — when packing repositories, bringing storage down further for large histories.
The Directed Acyclic Graph
A Git repository's history is a directed acyclic graph (DAG). Each commit node has one or more parent pointers (directed edges pointing backward in time). The graph is acyclic — you cannot create a commit that is its own ancestor. Branches are simply named pointers (lightweight text files containing a commit SHA) that move forward as commits are added. HEAD is a special pointer indicating the current working position, usually pointing to a branch.
This design makes branching almost free. Creating a branch in Git creates a 41-byte file. Switching branches modifies the working directory to match the target commit's tree. There is no copying of files, no slow network operations.
Branching and Merging
Branches enable parallel work without interference. A developer creates a feature branch, makes commits, and later merges back into the main branch. Git offers several merge strategies:
- Fast-forward merge: When the target branch has no divergent commits, Git simply advances its pointer to the tip of the incoming branch. No merge commit is created.
- Three-way merge: When both branches have diverged, Git finds the most recent common ancestor commit, then computes changes on both sides and combines them. If the same lines changed on both sides, a conflict occurs and the developer resolves it manually.
- Rebase: Instead of merging, rebasing replays commits from one branch onto another, rewriting their parent pointers. This produces a linear history without merge commits, at the cost of rewriting commit hashes — which means rebased commits are different objects than their originals.
Distributed Architecture
Git has no central server in its model — every clone is a full copy of the repository, including complete history. Remote servers like GitHub, GitLab, or Bitbucket are simply repositories configured as remote references. Collaboration works through push and pull operations that synchronize object databases between repositories.
| Operation | Direction | What Happens |
|---|---|---|
| git clone | Remote to local | Copies all objects and refs to new local repo |
| git fetch | Remote to local | Downloads new objects/refs without modifying working tree |
| git pull | Remote to local | fetch + merge (or rebase if configured) |
| git push | Local to remote | Uploads local commits to remote; rejected if non-fast-forward |
The Staging Area
Git's three-stage workflow — working directory, staging area (index), committed history — is often misunderstood. The index is a binary file (.git/index) that holds a snapshot of what the next commit will contain. git add stages changes by updating the index. git commit turns the index into a commit object. This design allows developers to commit only a subset of their working directory changes — staging specific files or even specific lines — giving fine-grained control over commit granularity.
Data Integrity Through Hashing
Every object in Git is identified by the SHA-1 hash of its contents. A commit hash is deterministic — given identical content, metadata, and parent hashes, two independently created commits on different machines will produce the same SHA-1. This makes data corruption detectable: if a single bit in a stored object flips, its hash no longer matches, and Git will report the repository as corrupt. It also makes tampering with history visible: changing any commit changes its hash, which changes every descendant commit's hash, producing a completely different chain that diverges from the original.
| Feature | Git | SVN | Mercurial |
|---|---|---|---|
| History model | Snapshots (DAG) | Diffs | Snapshots (DAG) |
| Distribution | Fully distributed | Centralized | Fully distributed |
| Branching cost | ~41 bytes | Full copy of working dir | ~100 bytes |
| Merge tracking | Native (graph) | Manual / property-based | Native |
| Market dominance (2025) | >95% of open source | Declining | Niche |
Git's combination of snapshot storage, cryptographic integrity, and cheap branching produced something the software industry had lacked: a version control system fast enough and flexible enough that developers actually use it for everything — not just final releases, but experimental features, documentation drafts, configuration files, and infrastructure code. Torvalds estimated it took ten days to make Git self-hosting. It took about three years to take over the industry.
Related Articles
software
APIs Explained: How Software Systems Talk to Each Other
Learn what APIs are, how REST, GraphQL, and gRPC work, key concepts like authentication, rate limiting, and versioning, and why APIs are the internet's building blocks.
9 min read
software
How Chess Engines Outthink Human Grandmasters at Every Level
Stockfish evaluates millions of positions per second using minimax and alpha-beta pruning. AlphaZero learned from scratch with neural networks. Here's how engines surpass human play.
9 min read
software
How Electric Vehicles Differ From Combustion Engines in Efficiency, Cost, and Impact
EVs convert 85–90% of battery energy to motion vs. 20–40% for combustion engines. Battery chemistry, regenerative braking, charging networks, and lifecycle emissions comparisons reveal the full picture.
9 min read
software
How Lithium-Ion Batteries Store and Release Energy
Lithium-ion batteries power everything from phones to electric vehicles through lithium intercalation chemistry. Explore NMC vs LFP tradeoffs, degradation, thermal runaway, and recycling challenges.
9 min read