What Is DNA: Structure, Function, and How It Encodes Life

DNA is the molecule of heredity and instruction that directs the development, function, and reproduction of all known life. Learn how the double helix is structured, how it encodes genetic information, and how that information is read and expressed.

The InfoNexus Editorial TeamMay 15, 202611 min read

The Molecule of Life

Every living organism, from the simplest bacterium to the most complex mammal, carries within its cells a molecule that encodes everything needed to build and operate a living system. This molecule — deoxyribonucleic acid, or DNA — is one of the most remarkable substances in the known universe. It is simultaneously a physical structure, an information storage medium, and a blueprint for molecular machinery. It has been copied, generation after generation, for roughly four billion years, accumulating the changes that have produced every living species on Earth.

Understanding DNA is foundational to biology, medicine, evolutionary science, and biotechnology. The discovery of its structure in 1953 by James Watson and Francis Crick — building on X-ray crystallography data from Rosalind Franklin and Maurice Wilkins — is one of the pivotal moments in scientific history. The double helix model immediately suggested how genetic information might be stored and copied, and it launched the molecular biology revolution that transformed our understanding of life at the most fundamental level.

The Structure of DNA: The Double Helix

DNA is a polymer — a long chain molecule built from repeating subunits called nucleotides. Each nucleotide consists of three components: a five-carbon sugar molecule called deoxyribose, a phosphate group, and one of four nitrogen-containing bases — adenine (A), thymine (T), guanine (G), or cytosine (C). The nucleotides link together through covalent bonds between the sugar of one nucleotide and the phosphate of the next, forming a long backbone with the bases extending outward like teeth from a comb.

In the double helix, two nucleotide chains run antiparallel to each other — one runs 5' to 3', the other 3' to 5', referring to the carbon numbering on the deoxyribose sugar. The two strands are held together by hydrogen bonds between complementary base pairs: adenine always pairs with thymine (connected by two hydrogen bonds), and guanine always pairs with cytosine (three hydrogen bonds). This base complementarity — A with T, G with C — is the chemical basis of genetic information copying and transmission. The paired strands twist around a central axis, forming the iconic right-handed double helix with roughly ten base pairs per turn.

The Genetic Code: From DNA Sequence to Protein

The information in DNA is stored in the linear sequence of its four bases along the strand. Just as a finite alphabet can encode unlimited text, four DNA bases can, in various combinations, encode the information needed to build the thousands of different proteins that carry out cellular functions. The path from DNA sequence to functional protein involves two major steps: transcription and translation.

In transcription, an enzyme called RNA polymerase reads a DNA template strand and synthesizes a complementary single-stranded RNA molecule called messenger RNA (mRNA). The mRNA carries the same sequence information as the DNA coding strand (with uracil replacing thymine). In translation, the mRNA is read by the ribosome — a large molecular machine assembled from proteins and ribosomal RNA — which decodes the mRNA sequence in triplets called codons. Each of the 64 possible three-base codons specifies one of twenty amino acids (or a stop signal), according to the genetic code. Transfer RNA (tRNA) molecules bring the appropriate amino acid to the ribosome as each codon is read, and the amino acids are linked in sequence to form a polypeptide chain, which folds into a functional protein.

Genes, the Genome, and Non-Coding DNA

A gene is traditionally defined as a segment of DNA that encodes a functional product — typically a protein, though many genes encode functional RNA molecules instead. The human genome contains approximately 20,000 to 25,000 protein-coding genes, distributed across 23 pairs of chromosomes. These genes collectively encode the proteins that carry out virtually every biological function: enzymes that catalyze chemical reactions, structural proteins that build cells and tissues, signaling proteins that coordinate cellular communication, and regulatory proteins that control gene expression.

Remarkably, protein-coding sequences account for only about 1.5% of the human genome. The remaining 98.5% was once dismissed as "junk DNA," but ongoing research has revealed that much of it plays important regulatory and structural roles. Large-scale genomic projects like ENCODE have shown that a substantial fraction of non-coding DNA is transcribed and contains binding sites for transcription factors, regulatory elements like enhancers and silencers, and sequences encoding functional non-coding RNAs. The genome is far more functionally complex than a simple protein-coding sequence catalog suggests.

DNA Replication: Copying the Blueprint

Before a cell divides, its entire DNA content must be accurately copied so that each daughter cell receives a complete genome. DNA replication is an extraordinarily precise molecular process that begins at specific sequences called origins of replication and proceeds bidirectionally along each chromosome. The double helix is unwound and separated by enzymes called helicases, and each parental strand serves as a template for synthesis of a new complementary strand by the enzyme DNA polymerase.

DNA polymerase adds nucleotides to the growing strand one at a time, matching each template base with its complement: A opposite T, G opposite C. The error rate of replication is extraordinarily low — approximately one mistake per billion base pairs — achieved through proofreading mechanisms that detect and correct misincorporated bases. The human genome contains roughly 6 billion base pairs (since each cell has two copies of the genome), meaning that each cell division copies this entire sequence with extraordinary fidelity. Nevertheless, errors occasionally escape correction, producing mutations — permanent changes in DNA sequence — that may be neutral, beneficial, or harmful, and that are the raw material of evolution.

DNA Damage and Repair

DNA is continually subject to damage from both internal and external sources. Ultraviolet radiation from sunlight creates thymine dimers — covalent bonds between adjacent thymine bases that distort the helix and block replication. Ionizing radiation from X-rays and cosmic rays causes double-strand breaks that can rearrange or delete entire chromosomal segments. Reactive oxygen species produced by normal metabolism oxidize DNA bases. Chemical mutagens in food, tobacco smoke, and the environment alkylate bases or intercalate between them. Even spontaneous chemical reactions — hydrolysis and deamination — occur tens of thousands of times per cell per day.

Cells have evolved a comprehensive array of DNA repair pathways to address these threats. Base excision repair removes and replaces individual damaged bases. Nucleotide excision repair cuts out longer damaged segments, including thymine dimers. Homologous recombination and non-homologous end joining repair double-strand breaks. When damage is too severe to repair, cells undergo programmed cell death (apoptosis) to prevent propagation of a corrupted genome. Failures in these repair pathways contribute to the accumulation of mutations that drive cancer — inherited defects in repair genes like BRCA1 and BRCA2 dramatically increase lifetime cancer risk.

DNA in Medicine and Biotechnology

Mastering the reading, copying, and editing of DNA has transformed medicine and biotechnology. DNA sequencing technologies, culminating in next-generation sequencing that can read entire genomes cheaply and rapidly, have enabled personalized medicine, cancer genomics, pathogen identification, and population-scale genetic epidemiology. Polymerase chain reaction (PCR) amplifies specific DNA sequences from tiny samples, enabling forensic identification, prenatal testing, and rapid diagnostic testing for infectious diseases — as the COVID-19 pandemic demonstrated at global scale.

Recombinant DNA technology allowed scientists to insert genes from one organism into another, enabling the production of human insulin, growth hormone, and other therapeutic proteins in bacteria. CRISPR-Cas9 gene editing, drawing on a bacterial immune system mechanism, now allows researchers to precisely edit DNA sequences in living cells with unprecedented accuracy and ease, opening the door to treatments for genetic diseases and new approaches to agriculture and materials science. DNA's role in biotechnology is only deepening: researchers are exploring DNA as an information storage medium, with theoretical densities that dwarf conventional digital storage, and as a programmable nanomaterial for building molecular machines. The molecule that encodes life may also transform the way we build, store, and process information in the twenty-first century.

ScienceBiologyGenetics

Related Articles