What Is DNA and How It Stores the Instructions for Life
DNA is the molecule of heredity, encoding the instructions for every protein in every living organism. Understanding its structure reveals how biological information is stored, copied, and read.
The Molecule of Heredity
DNA (deoxyribonucleic acid) is the molecule that carries the genetic instructions for the development, functioning, growth, and reproduction of all known living organisms and many viruses. It is one of the most remarkable molecules in nature — encoding billions of years of evolutionary history, carrying information from parent to offspring with extraordinary fidelity, and directing the construction of every protein in every cell.
The discovery of DNA's double helix structure by James Watson, Francis Crick, Rosalind Franklin, and Maurice Wilkins in 1953 is among the most consequential scientific achievements of the twentieth century. It immediately suggested how genetic information could be stored, copied, and transmitted — launching the modern era of molecular biology and genetics.
The Structure of DNA
DNA is a polymer — a long chain of repeating units called nucleotides. Each nucleotide consists of three components: a phosphate group, a deoxyribose sugar, and one of four nitrogen-containing bases — adenine (A), thymine (T), guanine (G), and cytosine (C).
The DNA double helix consists of two antiparallel strands wound around each other. The strands are held together by hydrogen bonds between complementary base pairs: A always pairs with T (forming two hydrogen bonds), and G always pairs with C (forming three hydrogen bonds). This strict complementarity — called Chargaff's rules — is the key to understanding how DNA is copied and how information is encoded.
The double helix is approximately 2 nanometers wide, but in human cells it is packed remarkably tightly. If all the DNA in a single human cell were stretched out, it would be roughly 2 meters long — yet it fits inside a cell nucleus about 6 micrometers in diameter through progressive levels of coiling and compaction around protein spools called histones.
How DNA Stores Information
Genetic information is encoded in the sequence of bases along a DNA strand. The four bases act like an alphabet with four letters; genes are words written in this alphabet. A typical human gene is hundreds to thousands of base pairs long. The human genome contains approximately 3.2 billion base pairs arranged on 23 pairs of chromosomes, encoding roughly 20,000 to 25,000 protein-coding genes — though only about 1.5 percent of the genome directly encodes proteins. Much of the remainder regulates when and where genes are expressed, with a smaller portion of unknown function.
The genetic code specifies how the sequence of bases translates into the sequence of amino acids in a protein. The code is read in triplets called codons — each set of three consecutive bases specifies one of 20 amino acids (or a start/stop signal). With four bases and three-letter codons, there are 64 possible codons, more than enough to encode 20 amino acids — making the code redundant (multiple codons specify the same amino acid).
DNA Replication: Copying the Blueprint
Before a cell divides, it must duplicate its entire genome so both daughter cells receive a complete copy. DNA replication exploits the complementary structure of the double helix: each strand serves as a template for a new complementary strand.
The process begins at specific sequences called origins of replication. The enzyme helicase unwinds and separates the two strands, creating a replication fork. DNA polymerase then synthesizes new complementary strands by adding nucleotides one at a time, always working in the 5' to 3' direction and reading the template in the opposite direction. The leading strand is synthesized continuously; the lagging strand is synthesized in short fragments (Okazaki fragments) that are later joined by DNA ligase.
DNA replication achieves extraordinary accuracy — roughly one error per billion base pairs copied — thanks to proofreading by DNA polymerase and additional mismatch repair systems that scan newly replicated DNA and correct errors.
Transcription and Translation: Reading the Instructions
The information stored in DNA is accessed through a two-step process: transcription (DNA to RNA) followed by translation (RNA to protein) — the central dogma of molecular biology.
In transcription, the enzyme RNA polymerase binds to a promoter sequence upstream of a gene and synthesizes a complementary messenger RNA (mRNA) strand. In eukaryotes, the raw mRNA transcript is processed — non-coding introns are spliced out, a protective cap is added to the 5' end, and a poly-A tail is added to the 3' end — before the mature mRNA is exported from the nucleus.
In translation, ribosomes bind to the mRNA and read the codons three bases at a time. Transfer RNA (tRNA) molecules, each carrying a specific amino acid and possessing an anticodon complementary to a specific codon, deliver amino acids to the ribosome in the correct order. The ribosome catalyzes peptide bond formation between adjacent amino acids, building the protein chain codon by codon until a stop codon signals termination.
Mutations and Genetic Variation
Despite high-fidelity replication, DNA sequences do change — through mutations caused by copying errors, chemical damage, or radiation. Point mutations change a single base pair; insertions and deletions add or remove bases; larger-scale chromosomal rearrangements can duplicate, delete, or invert entire chromosomal segments.
The consequences of mutation range from negligible (synonymous mutations that change a codon but not the encoded amino acid) to devastating (frameshift mutations that disrupt the entire reading frame downstream). Mutations in somatic cells can contribute to cancer; mutations in germline cells (eggs and sperm) are inherited by offspring and are the raw material of evolution. Human genetic variation — including the single-nucleotide polymorphisms (SNPs) used in genome-wide association studies to identify disease risk factors — reflects the accumulated mutations in the human lineage over thousands of generations.
Related Articles
biology
Apex Predators: Mesopredator Release and Ecosystem Control
Mesopredator release theory from Soule 1988, the sea otter–kelp–urchin trophic cascade, functional extinction vs. extirpation, and rewilding case studies from Europe and North America.
9 min read
biology
Axolotl Regeneration: How This Salamander Regrows Limbs and Hearts
The axolotl (Ambystoma mexicanum) can regenerate entire limbs, spinal cord segments, heart tissue, and portions of its brain. The biology of blastema formation and what it means for medicine.
9 min read
biology
Circadian Rhythms: The 24-Hour Biological Clock That Runs Every Cell
Circadian rhythms govern sleep, hormones, metabolism, and immunity on a 24-hour cycle. Learn how the suprachiasmatic nucleus works, jet lag biology, and the 2017 Nobel Prize findings.
9 min read
biology
CRISPR Gene Editing: Rewriting the Code of Life
CRISPR-Cas9 allows precise editing of DNA in living organisms. Learn how it works, its applications in medicine and agriculture, and the ethical debates it has sparked.
9 min read