Deepfakes: The Technology Behind Synthetic Media and How to Detect It

Understand how deepfake technology uses neural networks to generate synthetic media, the threats it poses, and the detection methods being developed to counter it.

The InfoNexus Editorial TeamMay 20, 20269 min read

When Seeing Stopped Being Believing

In December 2017, a Reddit user operating under the name "deepfakes" posted manipulated videos that swapped celebrities' faces onto other bodies using open-source machine learning tools. The posts were removed, but the technology had been released into the wild. Within months, freely available software enabled anyone with a consumer-grade graphics card to produce convincing face-swapped video. By 2023, the number of deepfake videos online had surpassed 500,000 — a 550% increase from 2019, according to cybersecurity firm Sensity AI.

The term "deepfake" combines "deep learning" and "fake." It broadly refers to any synthetic media generated or manipulated by artificial intelligence, including face swaps, voice clones, lip-synced video, and fully generated images of people who do not exist.

The Neural Network Architecture

Most deepfake systems rely on one of two architectures: autoencoders or generative adversarial networks (GANs). Each approaches the problem differently, but both learn to map one person's facial features onto another's.

ArchitectureMechanismStrengthsWeaknesses
AutoencoderTwo networks share an encoder but use separate decoders; the encoder learns a compressed facial representationRequires less training data; faster to trainLower quality at high resolutions; visible artifacts at face boundaries
GAN (Generative Adversarial Network)A generator creates fake images while a discriminator tries to detect them; they improve through competitionHigher visual quality; better at fine detailsTraining instability; requires more computational resources
Diffusion ModelProgressively denoises random noise into a target image guided by conditioning inputsHighest quality output; flexible conditioningSlow generation; very high compute cost

The autoencoder approach dominated early deepfake tools like FaceSwap and DeepFaceLab. The shared encoder learns a generalized representation of human faces, while each decoder specializes in reconstructing one specific person's face. Feeding person A's face through person B's decoder produces the swap.

GANs, introduced by Ian Goodfellow in 2014, work through adversarial training. The generator learns to produce increasingly realistic forgeries while the discriminator learns to detect flaws. This arms race within the network produces outputs that can fool human observers.

Beyond Face Swaps: The Full Spectrum

Deepfake technology extends well beyond putting one face on another body. The toolkit has expanded rapidly:

  • Voice cloning: Systems like those developed by Resemble AI or ElevenLabs can replicate a person's voice from as little as three seconds of audio
  • Lip sync manipulation: Software can alter mouth movements in existing video to match a new audio track
  • Full body puppetry: Motion transfer systems map one person's body movements onto video of another
  • Text-to-video generation: Models like Sora generate entire video sequences from text prompts, producing people and scenes that never existed

Audio deepfakes may pose the greatest near-term threat. Voice authentication systems used by banks and customer service centers have been fooled by cloned voices in documented tests. A 2023 study by University College London found that human listeners correctly identified AI-generated speech only 73% of the time.

Real-World Damage Already Inflicted

Deepfakes have moved from curiosity to weapon. Non-consensual intimate imagery constitutes an estimated 96% of all deepfake videos online, according to a 2019 Sensity report. Targets are overwhelmingly women. Political manipulation is the second major vector.

Notable incidents include:

  • In 2022, a deepfake video of Ukrainian President Volodymyr Zelensky ordering soldiers to surrender circulated on social media during the Russian invasion
  • In 2024, AI-generated robocalls impersonating U.S. President Joe Biden urged New Hampshire voters not to participate in the primary election
  • Corporate fraud using voice deepfakes resulted in a $243,000 theft from a British energy company in 2019 when a CEO's voice was cloned to authorize a wire transfer
  • In 2024, a Hong Kong finance worker was tricked into transferring $25 million after a video call in which deepfaked colleagues appeared to authorize the transaction

Detection Methods and Their Limits

Detecting deepfakes is fundamentally an arms race. As generation improves, detection must evolve in parallel. Current detection approaches fall into several categories:

Detection MethodHow It WorksAccuracy Range
Biological signal analysisDetects absence of natural patterns like blinking, pulse-related skin color changes60–85% (easily defeated by newer models)
Frequency domain analysisExamines spectral artifacts introduced by neural network upsampling75–92%
Facial landmark inconsistencyIdentifies geometric distortions in facial proportions during movement70–88%
Provenance-based (C2PA/Content Credentials)Embeds cryptographic metadata at capture time to verify authenticityHigh for participating platforms; zero for content without metadata

Machine learning detectors trained on current deepfake methods often fail when confronted with outputs from newer generation tools. A detector trained on GAN outputs may miss diffusion-model forgeries entirely. Generalization remains the central unsolved problem.

The Provenance Approach

Rather than asking "is this fake?" after the fact, the Content Authenticity Initiative (founded by Adobe, the BBC, and others) asks "can we prove this is real?" The C2PA standard embeds cryptographic signatures at the point of capture, creating an unbroken chain of provenance. If a photo was taken on a C2PA-enabled camera, any subsequent manipulation breaks the signature chain. This approach sidesteps the detection arms race entirely — but only works when the original content is captured with compliant hardware.

Legal and Regulatory Responses

Legislation has struggled to keep pace. The European Union's AI Act, enacted in 2024, requires that AI-generated content be labeled as such. Several U.S. states have passed laws specifically targeting deepfake pornography and election interference. China's Deep Synthesis Regulations, effective January 2023, require watermarking and user consent for synthetic media.

Enforcement is the bottleneck. Deepfakes can be generated anywhere, distributed through encrypted channels, and consumed globally before any authority becomes aware of their existence. Legal frameworks address the problem after harm has occurred. Prevention requires technological and institutional solutions that do not yet exist at scale.

The technology itself is neutral. The same systems that create malicious deepfakes also enable legitimate applications in film production, accessibility (voice synthesis for people who have lost speech), language dubbing, and historical reconstruction. Drawing the line between beneficial and harmful use remains one of the defining regulatory challenges of the current decade.

Artificial IntelligenceCybersecurityMedia

Related Articles