The History of AI: From Turing's Test to ChatGPT (Part 2)

AI Fundamentals Series · Part 2 of 10 — Previous: Part 1: What Is AI? — Next: Part 3: How Computers Learn

Why History Matters for Understanding AI Today

In Part 1, we defined artificial intelligence and surveyed its role in everyday life. Before we dive into how AI works, it is worth stepping back and asking: how did we get here? The history of AI is not a straight line of progress. It is a story of soaring ambitions, humbling failures, patient rebuilding, and then — in the last decade — a sudden, almost shocking acceleration that has left researchers, policymakers, and the public scrambling to keep up.

Understanding this arc will help you appreciate why modern AI is built the way it is, why certain ideas that were dismissed for decades suddenly became mainstream, and why the field has produced both genuine breakthroughs and cautionary tales about hype. History is the best antidote to both excessive fear and excessive enthusiasm about AI.

The Founding Era: 1940s – 1956

Alan Turing and the Question That Started It All

In 1950, the British mathematician and wartime codebreaker Alan Turing published a paper in the journal Mind titled Computing Machinery and Intelligence. It opened with a deceptively simple question: “Can machines think?” Turing did not try to answer it directly — he recognized that “thinking” was too vague and philosophically contested a concept to be useful. Instead, he proposed a practical, operational test that he called the Imitation Game, now known as the Turing Test.

In the Turing Test, a human evaluator holds a text conversation with both a human and a machine, without knowing which is which. If the evaluator cannot reliably distinguish the machine from the human, the machine can be said to have demonstrated intelligent behavior. Turing predicted that by the year 2000, machines would be able to fool an average evaluator about 30% of the time in a five-minute conversation.

Turing's framing was revolutionary because it shifted the question from philosophy to engineering. Instead of debating whether a machine could truly think, researchers could focus on building machines that behaved intelligently. This pragmatic orientation — define intelligence by behavior, not by essence — still shapes AI research today, though the debate about what it all means has never gone away.

The Birth of the Field: Dartmouth 1956

The field of artificial intelligence was formally inaugurated at a summer workshop at Dartmouth College in 1956, organized by John McCarthy, Marvin Minsky, Nathaniel Rochester, and Claude Shannon. McCarthy coined the term “artificial intelligence” specifically for this event, reportedly because he wanted a name that was distinct from earlier work on “cybernetics” and “automata theory.”

The workshop proposal declared: “We propose that a 2-month, 10-man study of artificial intelligence be carried out during the summer of 1956... The study is to proceed on the basis of the conjecture that every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it.”

The attendees were optimistic to the point of boldness — some believed a handful of researchers, working intensively over a single summer, might solve the core problems of AI. That optimism would take decades to partially justify and is still not fully vindicated. But the workshop launched the field, established its vocabulary, and identified its core research questions.

The Age of Symbolic AI: 1956 – 1974

Early AI research was dominated by what is now called symbolic AI or Good Old-Fashioned AI (GOFAI). The core idea was to represent knowledge as explicit symbols and rules, then manipulate those symbols with logical operations — much as humans reason consciously when they follow step-by-step instructions.

Programs from this era achieved impressive results on narrow, well-defined problems:

Logic Theorist (1956) — created by Allen Newell and Herbert Simon, it proved 38 of the 52 theorems in Whitehead and Russell's Principia Mathematica
General Problem Solver (1957) — also by Newell and Simon, designed as a universal problem-solving program modeled on human reasoning strategies
ELIZA (1966) — created by MIT's Joseph Weizenbaum, a chatbot that simulated a psychotherapist by pattern-matching user input to scripted responses. Weizenbaum was disturbed to find that many users became emotionally attached to the program, believing it genuinely understood them — a phenomenon now called the ELIZA effect.
SHRDLU (1970) — a program that could have natural conversations about a simulated world of colored blocks, following complex multi-step instructions

The mood was euphoric. In 1965, Herbert Simon predicted that “machines will be capable, within twenty years, of doing any work a man can do.” In 1967, Marvin Minsky wrote that “within a generation... the problem of creating artificial intelligence will be substantially solved.” These predictions would prove spectacularly wrong — but they had consequences, because governments funded research based on them.

The First AI Winter: 1974 – 1980

By the mid-1970s, the gap between promise and performance had become impossible to ignore. Symbolic AI struggled badly with anything that required common sense, handling ambiguity, or operating in the complex, noisy real world. Translating between languages turned out to be far harder than anticipated because language meaning is deeply contextual. Early machine translation systems produced output that was often laughably wrong.

The fundamental problem was that human knowledge is not organized as neat, explicit rules. Most of what we know is tacit — embedded in experience, embodied in physical intuition, expressed in the ability to recognize patterns that we cannot articulate. Trying to encode all of this in if-then rules proved both practically infeasible and theoretically misguided.

Funding agencies commissioned critical assessments. The U.S. DARPA cut AI funding significantly. The U.K.'s Lighthill Report (1973), commissioned by the British Science Research Council, was particularly damning — it concluded that AI had failed to live up to its ambitious promises and recommended cutting most funding to the field. This period is known as the First AI Winter.

Expert Systems and the Second Wave: 1980 – 1987

AI revived in the 1980s with a more modest and practical approach: expert systems. Rather than trying to build general intelligence, researchers focused on encoding the specialized knowledge of domain experts into rule-based systems. The insight was: if a doctor can articulate the diagnostic rules they use, why not encode those rules directly in software?

The results were commercially significant and genuinely useful:

MYCIN (Stanford) diagnosed bacterial infections and recommended antibiotic treatments with accuracy comparable to human specialists, though it was never deployed in clinical practice due to legal and practical concerns
XCON/R1 (Digital Equipment Corporation) configured minicomputer systems, correctly processing 95-98% of orders and saving the company an estimated $40 million per year
PROSPECTOR assisted geologists in mineral exploration, correctly identifying a deposit of molybdenum that human geologists had missed

A dedicated AI hardware industry emerged, producing “Lisp machines” — specialized workstations optimized for symbolic computation in the Lisp programming language. Investment poured in from venture capital and corporate R&D budgets. By the late 1980s, the expert systems industry had grown to over a billion dollars per year in the United States alone, with Japan's government launching an ambitious ten-year “Fifth Generation Computer” program to build the next generation of AI hardware.

The Second AI Winter: 1987 – 1993

The expert systems boom ended abruptly. Several problems converged to deflate it:

The knowledge acquisition bottleneck: encoding expert knowledge was slow, expensive, and brittle. Experts often had difficulty articulating everything they knew, and rules that worked in one context frequently failed in adjacent ones. Building a new expert system for each domain required enormous manual effort.
Maintenance nightmares: as rule sets grew into tens of thousands of conditions, small changes could break distant parts of the system in unpredictable ways. The software became increasingly difficult to maintain and extend.
Hardware economics: general-purpose desktop computers (IBM PCs, Apple Macintoshes) became powerful enough and cheap enough to run most practical business software, undermining the commercial rationale for expensive specialized Lisp machines.
Failed promises: many expert systems that were supposed to scale up to handle complex real-world situations revealed unexpected limitations when pushed beyond their narrow training domains.

The Japanese Fifth Generation project quietly failed to meet its ambitious goals. DARPA cut AI funding again. A Second AI Winter settled over the field, lasting into the early 1990s.

The Quiet Revolution: Machine Learning Rises (1990s – 2011)

Even during the winters, a smaller community of researchers had been pursuing an alternative approach: instead of encoding rules manually, let machines learn rules from data. This is the core idea of machine learning, which we explore thoroughly in Part 3. During the 1990s and 2000s, this approach began delivering practical results:

1997 — IBM's Deep Blue defeated world chess champion Garry Kasparov in a six-game match, the first time a computer beat a world champion under standard tournament conditions. Deep Blue used a combination of custom hardware, minimax search, and extensive human-crafted evaluation functions — not machine learning, but a milestone nonetheless.
1998 — Yann LeCun at Bell Labs demonstrated LeNet, a neural network that could reliably read handwritten zip codes on envelopes with commercial accuracy. The U.S. Postal Service eventually used a version of this system to automatically sort about 10% of U.S. mail.
2002 — iRobot's Roomba brought a simple but real autonomous robot into millions of homes, proving that AI-powered products could reach mass consumers.
2006 — Geoffrey Hinton and colleagues published work showing that deep neural networks (those with many layers) could be trained effectively if weights were initialized properly and pre-training was used — reigniting serious academic interest in neural network research that had been largely dormant since the late 1980s.
2011 — IBM's Watson defeated Jeopardy! champions Ken Jennings and Brad Rutter in a televised contest, demonstrating sophisticated natural language understanding and information retrieval far beyond what previous systems had achieved.

The Deep Learning Revolution: 2012 – 2017

The watershed moment came in September 2012. A deep neural network called AlexNet, built by Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton at the University of Toronto, entered the ImageNet Large Scale Visual Recognition Challenge. This annual competition required systems to classify one million images from the internet into one thousand categories.

AlexNet did not merely win. It shattered the previous record by an enormous margin, achieving a top-5 error rate of 15.3% compared to the next best system's 26.2%. The AI community recognized collectively that something fundamental had shifted. Three ingredients had converged: a large dataset (ImageNet, with 14 million labeled images), powerful graphics processors (NVIDIA GPUs that could train the network in days rather than months), and architectural insights (particularly the use of the ReLU activation function and dropout regularization).

The years that followed brought an avalanche of results that would have seemed impossible to researchers from the expert systems era:

2014 — Ian Goodfellow introduced Generative Adversarial Networks (GANs), enabling AI to generate convincing synthetic images for the first time
2015 — Google's deep neural network achieved superhuman performance on the ImageNet classification task, surpassing human error rates
2016 — Google DeepMind's AlphaGo defeated world Go champion Lee Sedol 4-1, a task many experts had believed would require decades more progress
2017 — Google researchers introduced the Transformer architecture in the landmark paper “Attention Is All You Need,” which would become the foundation of modern language AI

The Large Language Model Era: 2018 – Present

The Transformer architecture enabled a new class of models trained on vast quantities of text with a simple self-supervised objective: predict the next word. OpenAI's GPT-1 (2018) proved the concept. GPT-2 (2019) was so capable that OpenAI initially refused to release the full model, fearing it could be used to generate disinformation at scale. GPT-3 (2020), with 175 billion parameters, demonstrated that scaling up this approach produced qualitatively new capabilities. Google released BERT (2018), which dramatically improved search quality for billions of users overnight.

The public inflection point arrived on November 30, 2022, when OpenAI launched ChatGPT, built on the GPT-3.5 model and fine-tuned with human feedback to be a conversational assistant. It reached one million users in five days and one hundred million users in two months — the fastest consumer technology adoption in history, surpassing the iPhone and TikTok. For the first time, most people could interact directly and fluently with an AI system that felt qualitatively different from everything before it.

The years since have seen an explosion of competitive products: Google Gemini, Anthropic Claude, Meta Llama (open-source), Mistral, and dozens more. Multimodal models that understand both text and images became standard. AI-generated images, audio, and video reached commercial maturity. The question shifted from “can AI do this?” to “what should AI be allowed to do, and who should decide?”

Key Takeaways

AI was formally born at Dartmouth in 1956, but its intellectual roots trace to Alan Turing's 1950 paper.
Early symbolic AI achieved narrow successes but could not scale to the ambiguity and complexity of the real world.
Two “AI winters” (mid-1970s and late 1980s) followed periods of over-hyped expectations and under-delivered results.
Machine learning grew quietly through the 1990s and 2000s, producing practical applications without generating much public notice.
AlexNet's 2012 breakthrough launched the deep learning era and initiated the current period of rapid capability growth.
The 2017 Transformer architecture enabled the large language models that power modern AI assistants.
ChatGPT's November 2022 launch marked AI's transition from specialist tool to mass consumer product.

Now that you know when and why each era of AI arose, Part 3 will explain how the fundamental shift from explicit rules to learning from data actually works — and why that shift changed everything about what AI can and cannot do.