Syntax: How Human Languages Build Sentences
How syntax works: constituency tests, phrase structure rules, X-bar theory, dependency grammar, head-directionality, SOV vs SVO word order, movement transformations, and the recursion debate.
The Sentence as Architecture
The sentence "Colorless green ideas sleep furiously" is grammatical but semantically absurd. Noam Chomsky introduced it in 1957 precisely to demonstrate that grammaticality is independent of meaning: a native speaker of English can identify the sentence as syntactically well-formed without knowing what it means or whether it could be true. That intuition—rapid, automatic, and shared across speakers—is the explanandum of syntax: how do humans compute, almost instantly, whether an arbitrary sequence of words constitutes a well-formed sentence in their language?
Syntax is the branch of linguistics concerned with the rules and principles governing sentence structure. It investigates how words combine into phrases, how phrases combine into clauses, and how clauses nest within one another. Different theoretical frameworks offer competing answers to these questions, but all agree on the explanandum: natural language sentences exhibit hierarchical structure, and that structure is mentally real.
Constituency and Phrase Structure
The most fundamental syntactic unit above the word is the constituent—a group of words that functions as a single unit in the sentence. Constituencies are not arbitrary; they are revealed by diagnostic tests:
- Substitution: A constituent can be replaced by a single pronoun or pro-form. "The large dog" can be replaced by "it," showing it is a constituent.
- Movement: A constituent can be moved together to another position. "The new book, I gave to her" (topicalization of "the new book") shows it is a constituent.
- Question test: Constituents can answer "what?" questions. "What did she eat? [A sandwich]."
- Coordination: Constituents can be conjoined with "and." "[The cat] and [the dog] slept" shows both noun phrases are parallel constituents.
Phrase structure rules formalize constituent structure. In their classic form (rewrite rules), they specify how a category can be expanded: S → NP VP; NP → Det N; VP → V NP. These rules generate a set of well-formed phrase structures—syntactic trees—and exclude ill-formed sequences. The system is powerful but cannot capture all observed constraints, motivating more sophisticated frameworks.
X-Bar Theory: A Universal Template
X-bar theory, developed in the 1970s and formalized in Chomsky's Government and Binding framework, proposes that all phrases in all languages share a universal three-level template. Every phrase has a head (the category-defining element, written X), an intermediate projection (X-bar, written X'), and a maximal projection (XP). Specifiers appear at the left edge of XP; complements appear immediately adjacent to the head within X'.
The theory makes a bold cross-linguistic prediction: whether a language is English (head-initial) or Japanese (head-final), the same hierarchical levels are present. The lexical category filling the head position determines the category of the entire phrase: a noun head (N) projects NP; a verb head (V) projects VP; a preposition head (P) projects PP; a complementizer head (C) projects CP (the clausal level). The symmetry across phrase types—the parallelism between NP structure and VP structure—is a major empirical achievement of X-bar theory.
Word Order and Head-Directionality
Human languages exhibit a striking statistical preference for two word orders: Subject-Object-Verb (SOV) and Subject-Verb-Object (SVO). Together they account for roughly 85% of the world's languages, according to the World Atlas of Language Structures (WALS).
| Word Order | Approximate % of Languages | Example Languages |
|---|---|---|
| SOV | ~45% | Japanese, Korean, Turkish, Hindi, Tibetan |
| SVO | ~42% | English, Mandarin, French, Swahili, Russian |
| VSO | ~9% | Welsh, Classical Arabic, Tagalog |
| VOS, OVS, OSV | ~4% | Malagasy (VOS), Hixkaryana (OVS) |
The head-directionality parameter captures a related generalization. In head-initial languages (English, French), the head precedes its complement: [V NP] (verb before object), [P NP] (preposition before object). In head-final languages (Japanese, Korean), the head follows: [NP V] (object before verb), [NP P] (postposition after object). The parameter has broad consequences: SOV languages tend to be head-final throughout their phrase structure, while SVO languages tend to be head-initial in their VP but show more variation in other phrase types.
Movement Transformations
One of the most influential claims of generative syntax is that surface sentences are not their underlying structure. Movement transformations—rules that displace constituents from their base position to a surface position—explain a wide range of phenomena.
English wh-movement illustrates the claim. In "What did she buy?" the wh-word what appears at the beginning of the sentence, but semantically it is the object of buy. Generative syntax posits that what originated in the object position and moved to the specifier of CP—the clause-initial position—leaving a covert trace (or in Minimalist terms, a copy). Evidence for movement comes from island constraints: movement is blocked from certain positions (relative clauses, embedded questions), creating ungrammaticality that cannot be explained by surface phrase structure alone.
- Yes/no questions in English involve subject-auxiliary inversion: "She can swim" → "Can she swim?" The auxiliary has moved from its base position in VP to a higher clausal position.
- Passive constructions involve movement of the object to subject position: "The cake was eaten" places cake in subject position despite being the semantic object of eat
- Cross-linguistic variation in movement is constrained by principles like the ECP (Empty Category Principle) and Subjacency/Phase Theory in different versions of generative theory
Dependency Grammar: An Alternative Framework
Dependency grammar, associated with Lucien Tesnière's 1959 Éléments de syntaxe structurale, represents sentence structure as a set of binary relations between words rather than as phrase constituency. Each word (dependent) connects to exactly one governing word (head) by a directed dependency arc. The resulting structure is a tree with the verb at the root.
Dependency grammar has gained practical importance in computational linguistics because dependency parses are simpler to annotate and compute for many purposes than constituency trees. Universal Dependencies (UD), a cross-lingual annotation scheme used in NLP research, applies dependency relations to over 100 languages using a shared set of relation labels. The framework also handles free word-order languages (Latin, Russian, Turkish) more naturally than phrase-structure approaches.
Universal Grammar and Recursion
The most contentious claim in modern syntax is Chomsky's Universal Grammar hypothesis: that humans are born with a language-specific faculty containing principles and parameters that constrain the space of possible human languages. The hypothesis predicts that all languages share deep structural properties invisible at the surface.
In 2002, Hauser, Chomsky, and Fitch proposed that the narrow language faculty—what distinguishes human language from animal communication—is recursion: the capacity to embed phrases within phrases without theoretical limit. "The cat that the dog that the man that... chased bit scratched ran away" is structurally valid, however difficult to process. Recursion enables the infinite productivity of language from finite means.
Linguist Daniel Everett challenged this claim in 2005 by arguing that Pirahã, an Amazonian language, lacks recursion: it has no embedding, no relative clauses, and no subordinate clauses. If true, this would refute the universality of recursion. The empirical claims remain disputed, with Everett's characterization of Pirahã grammar contested by other researchers who have worked with the language. The debate reflects deeper theoretical commitments about whether syntax is shaped primarily by a domain-specific language faculty or by general cognitive and processing principles.
Related Articles
linguistics
American Sign Language: History, Structure, and Linguistic Status
ASL's history from Gallaudet and Clerc in 1817, Martha's Vineyard Sign Language, Stokoe's 1960 recognition, ASL grammar and spatial syntax, classifier predicates, and Deaf cultural identity.
9 min read
linguistics
Constructed Languages: From Tolkien's Elvish to Klingon
Tolkien spent 60 years on Quenya and Sindarin. Klingon has ~250 fluent speakers. Learn about the art of language creation, Esperanto's 2 million speakers, and what conlangs reveal about human language.
9 min read
linguistics
Endangered Languages: The Race to Document the World's Disappearing Tongues
How languages die and how linguists are racing to document them: UNESCO's 6 endangerment levels, Ainu in Japan, Cornish revival, ELDP projects, language nest programs, and digital preservation tools.
9 min read
linguistics
Language Endangerment: Why 40% of Languages Are Dying
40% of the world's 7,000 languages face extinction. Learn about UNESCO's endangerment criteria, successful revivals like Welsh and Māori, and language nesting strategies.
9 min read