How Git Version Control Tracks Every Change in a Codebase

Built to Survive a Linux Kernel Crisis

Git was created by Linus Torvalds in April 2005 in just ten days. The impetus was urgent: BitKeeper, the proprietary version control system used to manage the Linux kernel's 6 million lines of code, had revoked the kernel team's free license. Torvalds, dissatisfied with every existing alternative, wrote his own. His design requirements were uncompromising: it had to be fast, support distributed workflows with no central server, and guarantee data integrity through cryptographic hashing. Git met all three goals, and those goals explain why it now hosts over 300 million repositories on GitHub alone.

Objects, Not Diffs

Most people think of version control as storing the differences between file versions — a sequence of edits. Git works differently. It stores snapshots of the complete file tree at each point in time. When you make a commit, Git takes a picture of every file in your repository and stores that state as a tree of objects in the hidden .git/ directory.

Git uses four object types, each identified by its SHA-1 (and increasingly SHA-256) hash:

Blob: Stores the raw content of a single file. No filename, no metadata — just bytes.
Tree: Stores a directory listing: filenames, associated blob hashes, and file permissions. Trees can reference other trees (subdirectories).
Commit: Points to a tree (the root of the snapshot), one or more parent commit hashes, and metadata (author, timestamp, message).
Tag: A named pointer to a commit, optionally with a message and signature.

If a file has not changed between two commits, both commits' trees point to the same blob object. Git does not duplicate unchanged content. This makes Git's storage surprisingly efficient despite being snapshot-based rather than diff-based. Git also uses delta compression in packfiles — grouped storage of objects — when packing repositories, bringing storage down further for large histories.

The Directed Acyclic Graph

A Git repository's history is a directed acyclic graph (DAG). Each commit node has one or more parent pointers (directed edges pointing backward in time). The graph is acyclic — you cannot create a commit that is its own ancestor. Branches are simply named pointers (lightweight text files containing a commit SHA) that move forward as commits are added. HEAD is a special pointer indicating the current working position, usually pointing to a branch.

This design makes branching almost free. Creating a branch in Git creates a 41-byte file. Switching branches modifies the working directory to match the target commit's tree. There is no copying of files, no slow network operations.

Branching and Merging

Branches enable parallel work without interference. A developer creates a feature branch, makes commits, and later merges back into the main branch. Git offers several merge strategies:

Fast-forward merge: When the target branch has no divergent commits, Git simply advances its pointer to the tip of the incoming branch. No merge commit is created.
Three-way merge: When both branches have diverged, Git finds the most recent common ancestor commit, then computes changes on both sides and combines them. If the same lines changed on both sides, a conflict occurs and the developer resolves it manually.
Rebase: Instead of merging, rebasing replays commits from one branch onto another, rewriting their parent pointers. This produces a linear history without merge commits, at the cost of rewriting commit hashes — which means rebased commits are different objects than their originals.

Distributed Architecture

Git has no central server in its model — every clone is a full copy of the repository, including complete history. Remote servers like GitHub, GitLab, or Bitbucket are simply repositories configured as remote references. Collaboration works through push and pull operations that synchronize object databases between repositories.

Operation	Direction	What Happens
git clone	Remote to local	Copies all objects and refs to new local repo
git fetch	Remote to local	Downloads new objects/refs without modifying working tree
git pull	Remote to local	fetch + merge (or rebase if configured)
git push	Local to remote	Uploads local commits to remote; rejected if non-fast-forward

The Staging Area

Git's three-stage workflow — working directory, staging area (index), committed history — is often misunderstood. The index is a binary file (.git/index) that holds a snapshot of what the next commit will contain. git add stages changes by updating the index. git commit turns the index into a commit object. This design allows developers to commit only a subset of their working directory changes — staging specific files or even specific lines — giving fine-grained control over commit granularity.

Data Integrity Through Hashing

Every object in Git is identified by the SHA-1 hash of its contents. A commit hash is deterministic — given identical content, metadata, and parent hashes, two independently created commits on different machines will produce the same SHA-1. This makes data corruption detectable: if a single bit in a stored object flips, its hash no longer matches, and Git will report the repository as corrupt. It also makes tampering with history visible: changing any commit changes its hash, which changes every descendant commit's hash, producing a completely different chain that diverges from the original.

Feature	Git	SVN	Mercurial
History model	Snapshots (DAG)	Diffs	Snapshots (DAG)
Distribution	Fully distributed	Centralized	Fully distributed
Branching cost	~41 bytes	Full copy of working dir	~100 bytes
Merge tracking	Native (graph)	Manual / property-based	Native
Market dominance (2025)	>95% of open source	Declining	Niche

Git's combination of snapshot storage, cryptographic integrity, and cheap branching produced something the software industry had lacked: a version control system fast enough and flexible enough that developers actually use it for everything — not just final releases, but experimental features, documentation drafts, configuration files, and infrastructure code. Torvalds estimated it took ten days to make Git self-hosting. It took about three years to take over the industry.

How Git Version Control Tracks Every Change in a Codebase

Built to Survive a Linux Kernel Crisis

Objects, Not Diffs

The Directed Acyclic Graph

Branching and Merging

Distributed Architecture

The Staging Area

Data Integrity Through Hashing

Related Articles

APIs Explained: How Software Systems Talk to Each Other

How Chess Engines Outthink Human Grandmasters at Every Level

How Electric Vehicles Differ From Combustion Engines in Efficiency, Cost, and Impact

How Lithium-Ion Batteries Store and Release Energy