How Git Version Control Tracks Every Change in a Codebase

Git stores code history as a directed acyclic graph of snapshots, not diffs. Learn how commits, branches, merges, and distributed workflows actually function under the hood.

The InfoNexus Editorial TeamMay 17, 20269 min read

Built to Survive a Linux Kernel Crisis

Git was created by Linus Torvalds in April 2005 in just ten days. The impetus was urgent: BitKeeper, the proprietary version control system used to manage the Linux kernel's 6 million lines of code, had revoked the kernel team's free license. Torvalds, dissatisfied with every existing alternative, wrote his own. His design requirements were uncompromising: it had to be fast, support distributed workflows with no central server, and guarantee data integrity through cryptographic hashing. Git met all three goals, and those goals explain why it now hosts over 300 million repositories on GitHub alone.

Objects, Not Diffs

Most people think of version control as storing the differences between file versions — a sequence of edits. Git works differently. It stores snapshots of the complete file tree at each point in time. When you make a commit, Git takes a picture of every file in your repository and stores that state as a tree of objects in the hidden .git/ directory.

Git uses four object types, each identified by its SHA-1 (and increasingly SHA-256) hash:

  • Blob: Stores the raw content of a single file. No filename, no metadata — just bytes.
  • Tree: Stores a directory listing: filenames, associated blob hashes, and file permissions. Trees can reference other trees (subdirectories).
  • Commit: Points to a tree (the root of the snapshot), one or more parent commit hashes, and metadata (author, timestamp, message).
  • Tag: A named pointer to a commit, optionally with a message and signature.

If a file has not changed between two commits, both commits' trees point to the same blob object. Git does not duplicate unchanged content. This makes Git's storage surprisingly efficient despite being snapshot-based rather than diff-based. Git also uses delta compression in packfiles — grouped storage of objects — when packing repositories, bringing storage down further for large histories.

The Directed Acyclic Graph

A Git repository's history is a directed acyclic graph (DAG). Each commit node has one or more parent pointers (directed edges pointing backward in time). The graph is acyclic — you cannot create a commit that is its own ancestor. Branches are simply named pointers (lightweight text files containing a commit SHA) that move forward as commits are added. HEAD is a special pointer indicating the current working position, usually pointing to a branch.

This design makes branching almost free. Creating a branch in Git creates a 41-byte file. Switching branches modifies the working directory to match the target commit's tree. There is no copying of files, no slow network operations.

Branching and Merging

Branches enable parallel work without interference. A developer creates a feature branch, makes commits, and later merges back into the main branch. Git offers several merge strategies:

  • Fast-forward merge: When the target branch has no divergent commits, Git simply advances its pointer to the tip of the incoming branch. No merge commit is created.
  • Three-way merge: When both branches have diverged, Git finds the most recent common ancestor commit, then computes changes on both sides and combines them. If the same lines changed on both sides, a conflict occurs and the developer resolves it manually.
  • Rebase: Instead of merging, rebasing replays commits from one branch onto another, rewriting their parent pointers. This produces a linear history without merge commits, at the cost of rewriting commit hashes — which means rebased commits are different objects than their originals.

Distributed Architecture

Git has no central server in its model — every clone is a full copy of the repository, including complete history. Remote servers like GitHub, GitLab, or Bitbucket are simply repositories configured as remote references. Collaboration works through push and pull operations that synchronize object databases between repositories.

OperationDirectionWhat Happens
git cloneRemote to localCopies all objects and refs to new local repo
git fetchRemote to localDownloads new objects/refs without modifying working tree
git pullRemote to localfetch + merge (or rebase if configured)
git pushLocal to remoteUploads local commits to remote; rejected if non-fast-forward

The Staging Area

Git's three-stage workflow — working directory, staging area (index), committed history — is often misunderstood. The index is a binary file (.git/index) that holds a snapshot of what the next commit will contain. git add stages changes by updating the index. git commit turns the index into a commit object. This design allows developers to commit only a subset of their working directory changes — staging specific files or even specific lines — giving fine-grained control over commit granularity.

Data Integrity Through Hashing

Every object in Git is identified by the SHA-1 hash of its contents. A commit hash is deterministic — given identical content, metadata, and parent hashes, two independently created commits on different machines will produce the same SHA-1. This makes data corruption detectable: if a single bit in a stored object flips, its hash no longer matches, and Git will report the repository as corrupt. It also makes tampering with history visible: changing any commit changes its hash, which changes every descendant commit's hash, producing a completely different chain that diverges from the original.

FeatureGitSVNMercurial
History modelSnapshots (DAG)DiffsSnapshots (DAG)
DistributionFully distributedCentralizedFully distributed
Branching cost~41 bytesFull copy of working dir~100 bytes
Merge trackingNative (graph)Manual / property-basedNative
Market dominance (2025)>95% of open sourceDecliningNiche

Git's combination of snapshot storage, cryptographic integrity, and cheap branching produced something the software industry had lacked: a version control system fast enough and flexible enough that developers actually use it for everything — not just final releases, but experimental features, documentation drafts, configuration files, and infrastructure code. Torvalds estimated it took ten days to make Git self-hosting. It took about three years to take over the industry.

Gitversion controlsoftware developmentDevOps

Related Articles