How Facial Recognition Technology Identifies People

A Face as a Mathematical Object

In 2014, Facebook's DeepFace system matched human faces with 97.35% accuracy — closing the gap with human performance at 97.5% almost overnight. That benchmark moment signaled that facial recognition had crossed from research curiosity into practical deployment. Today the technology processes hundreds of millions of face matches daily, from unlocking smartphones to scanning crowds at airports. Understanding how it works requires looking at faces the way computers do: not as human portraits, but as geometric maps made of numbers.

The Four-Stage Pipeline

Every facial recognition system runs through four core stages, regardless of the vendor or application.

Stage 1 — Face Detection

Before any recognition can happen, the system must find faces in an image. Early detectors, like the Viola-Jones algorithm from 2001, used Haar-like features and a cascade of classifiers to scan images at multiple scales. Modern systems use convolutional neural networks (CNNs) that simultaneously detect dozens of faces in a single video frame with sub-millisecond latency on dedicated hardware.

Stage 2 — Facial Alignment

Raw images vary in pose, lighting, and angle. Alignment normalizes these variables by locating facial landmarks — typically 68 to 468 anchor points depending on the model — and rotating or scaling the face into a canonical position. Key landmarks include the corners of the eyes, the tip of the nose, and the edges of the lips. This step dramatically improves downstream accuracy because the feature extractor always sees a consistently oriented face.

Stage 3 — Feature Extraction and Embedding

This is the core of modern facial recognition. A deep neural network — often based on architectures like ResNet-50, MobileNetV3, or ArcFace — processes the aligned face image and compresses it into a compact numerical vector called a face embedding. Typical embedding sizes range from 128 to 512 floating-point numbers. These numbers encode the spatial relationships between facial features in a high-dimensional space where similar faces cluster together and dissimilar faces sit far apart. The embedding does not store the image itself — it stores its geometric essence.

Stage 4 — Matching

The generated embedding is compared against a database of stored embeddings using distance metrics, most commonly cosine similarity or Euclidean distance. A similarity score above a configured threshold triggers a match. In 1:1 verification (is this the person they claim to be?), one comparison happens. In 1:N identification (who is this person among millions?), approximate nearest-neighbor search algorithms — such as FAISS from Meta — enable fast lookup across databases with hundreds of millions of entries.

Training: How Models Learn Faces

The neural networks powering facial recognition are trained on massive labeled datasets. MS-Celeb-1M contains roughly 10 million images of 100,000 celebrities. VGGFace2 holds 3.3 million images across 9,131 identities. During training, the model learns to minimize intra-class variation (same person in different conditions) while maximizing inter-class separation (different people).

Loss functions drive this learning. Softmax loss was the early standard. ArcFace loss, introduced in 2018, adds an angular margin that pushes embeddings apart more aggressively, producing significantly better generalization. Models trained with ArcFace on large datasets routinely exceed 99% accuracy on standard benchmarks like LFW (Labeled Faces in the Wild).

Accuracy by Demographic Group

High benchmark accuracy masks a critical problem: performance varies substantially across demographic groups. A 2019 NIST study evaluated 189 commercial algorithms and found large disparities.

Demographic Group	False Non-Match Rate (vs. baseline)
White males (baseline)	~0.1%
Black females	5–10× higher
East Asian faces	2–5× higher
Elderly (70+)	3–7× higher
Children under 12	Up to 10× higher

These disparities trace to training data imbalance. Datasets historically over-represented lighter-skinned male faces, so models learned those features most reliably. The consequences are not abstract — wrongful arrests in Detroit, New Orleans, and Georgia between 2020 and 2023 all involved facial recognition misidentifications of Black men.

Real-World Deployments

Border control: U.S. Customs and Border Protection uses facial recognition at over 200 airports, matching travelers against passport photos.
Law enforcement: Clearview AI scraped over 30 billion images from the public internet to build a police-accessible identification database.
Retail: Some chains use live recognition to flag individuals flagged for prior theft.
Financial services: Banks use it for remote account opening, comparing a live selfie video to government ID.

Performance Benchmarks

Benchmark	Top System Accuracy	Human Performance
LFW (standard)	99.86%	~99.2%
IJB-C (harder, cross-age)	96–98%	~85%
MegaFace (1M distractors)	~98.7%	~91.4%
Real-world surveillance	60–85%	Varies widely

Adversarial Attacks and Evasion

Facial recognition is not foolproof. Researchers have demonstrated multiple evasion techniques:

Adversarial patches: Small printed patterns placed near the face fool detectors into missing the face entirely.
Infrared makeup: Certain infrared-reflective patterns disrupt NIR-based recognition used in low-light cameras.
Deepfake injection: Synthetic face videos injected into video streams can spoof remote verification systems.
Physical disguises: Hairstyle changes, glasses, and masks reduce match rates substantially at lower thresholds.

Regulation and the Road Ahead

Facial recognition sits at the intersection of capability and consent. The EU AI Act, finalized in 2024, categorizes real-time public facial recognition as high-risk and bans most law enforcement uses in public spaces. Several U.S. cities — including San Francisco, Boston, and Portland — have passed municipal bans on government use. Illinois' Biometric Information Privacy Act (BIPA) allows individuals to sue companies that collect biometric data without consent, resulting in settlements exceeding $650 million against Facebook, Google, and others.

The technology keeps improving. Next-generation systems work from partial occlusions, low-resolution images, and unusual angles that would have defeated 2018-era models. The capability gap between what facial recognition can do and what societies have decided it should do remains one of the defining technology policy tensions of the 2020s.