The History of AI
From Turing's dream to agentic intelligence — eight decades of machines learning to think
Artificial intelligence is not a recent invention. It is the product of eight decades of ambition, breakthrough, disappointment, and reinvention. Understanding where AI came from is essential to understanding where it is going.
This post traces the full arc — from the earliest theoretical foundations to the agentic systems reshaping how we work today.
AI at a Glance
The Foundations (1940s–1950s)
The story of AI begins not with a computer, but with a question: can machines think?
In 1943, Warren McCulloch and Walter Pitts published a mathematical model of artificial neurons — the first formal description of how simple computational units could, in theory, perform logical reasoning. It was abstract, but it planted a seed.
Then came Alan Turing. In his landmark 1950 paper Computing Machinery and Intelligence, Turing proposed the Imitation Game — now known as the Turing Test — as a practical way to evaluate whether a machine could exhibit intelligent behaviour indistinguishable from a human. He did not ask whether machines could think. He asked whether the distinction mattered.
In 1956, John McCarthy, Marvin Minsky, Nathaniel Rochester, and Claude Shannon organised the Dartmouth Conference — the event that officially named the field "artificial intelligence" and set its agenda. The proposal was breathtaking in its optimism: they believed that a summer's worth of work by a small team could make significant progress on making machines intelligent.
They were wrong about the timeline. They were right about the importance.
The Founding Era
McCulloch-Pitts Neuron
First mathematical model of an artificial neuron
Turing's Paper
"Computing Machinery and Intelligence" proposes the Imitation Game
Dartmouth Conference
The field of AI is officially named and launched
Perceptron
Frank Rosenblatt builds the first neural network hardware
LISP Created
John McCarthy invents LISP, the language of early AI
The Golden Age and the First Winter (1960s–1970s)
The decade following Dartmouth was electric with optimism. Early AI programs could solve algebra problems, prove geometric theorems, and play checkers. Herbert Simon predicted that within ten years, a computer would be chess champion and would discover a significant mathematical theorem.
Key achievements of this era:
- ELIZA (1966) — Joseph Weizenbaum's chatbot simulated a psychotherapist using simple pattern matching. Users formed emotional attachments to it, revealing how readily humans anthropomorphise machines.
- SHRDLU (1970) — Terry Winograd's program could understand and act on natural language commands within a simple block world.
- Expert Systems — Programs like DENDRAL (chemistry) and MYCIN (medical diagnosis) encoded human expert knowledge into rule-based systems.
But the optimism outpaced the reality. The problems that seemed close to solved turned out to be far harder than anyone expected.
The winter was not a failure of intelligence. It was a failure of expectations. The foundational work of this era — search algorithms, knowledge representation, natural language processing — would prove essential decades later.
Expert Systems and the Second Winter (1980s–early 1990s)
The 1980s brought a commercial AI boom driven by expert systems — programs that encoded domain knowledge as if-then rules and made decisions like human specialists.
Companies spent billions on expert system technology. Japan launched the ambitious Fifth Generation Computer Project, aiming to build machines capable of reasoning and natural language understanding. The field was hot again.
Landmark Expert Systems
MYCIN
Diagnosed bacterial infections and recommended antibiotics with accuracy rivalling human doctors
DENDRAL
Identified molecular structures from mass spectrometry data — the first expert system
XCON/R1
Configured VAX computer orders for DEC, saving the company $40M annually
But expert systems had a fatal flaw: they were brittle. They could only handle situations explicitly covered by their rules. They could not learn, generalise, or adapt. Maintaining the rule bases became prohibitively expensive as they grew.
By the late 1980s, the hype collapsed again. Japan's Fifth Generation project was quietly shelved. Corporate AI labs closed. The second AI winter set in, lasting through the early 1990s.
The Quiet Revolution: Machine Learning (1990s–2000s)
While the public face of AI went dark, something profound was happening beneath the surface. Researchers shifted from trying to program intelligence to trying to learn it from data.
Key developments of this era:
- Support Vector Machines (1995) — powerful classifiers that found optimal decision boundaries in high-dimensional data
- Random Forests (2001) — ensemble methods that combined many weak learners into strong ones
- IBM Deep Blue (1997) — defeated world chess champion Garry Kasparov, proving that brute-force search combined with expert evaluation could conquer a domain once considered a hallmark of human intelligence
- Statistical NLP — language processing moved from hand-crafted grammars to probabilistic models trained on text corpora
The internet was also beginning to generate vast amounts of data — exactly what machine learning algorithms needed to thrive. The fuel was accumulating. The engines were being built. The explosion was coming.
Deep Learning Changes Everything (2010s)
The 2010s were the decade AI went from an academic curiosity to a force that reshaped industries. The catalyst was deep learning — neural networks with many layers, trained on massive datasets using powerful GPUs.
The Deep Learning Decade
AlexNet
Crushed the ImageNet competition, proving deep learning for computer vision
GANs
Ian Goodfellow invents generative adversarial networks, enabling AI-generated images
AlphaGo
DeepMind's system defeats world Go champion Lee Sedol — a game thought decades away from AI mastery
Transformer
Google publishes "Attention Is All You Need" — the architecture that will power the LLM revolution
BERT & GPT
Pre-trained language models show that scale and self-supervised learning unlock language understanding
The Transformer architecture (2017) deserves special attention. By replacing sequential processing with self-attention mechanisms, Transformers could process entire sequences in parallel and capture long-range dependencies in text. This was the architectural breakthrough that made modern large language models possible.
The Large Language Model Era (2020–2024)
If deep learning was the earthquake, large language models were the tsunami.
GPT-3 (2020) shocked the world by demonstrating that scaling up Transformers to 175 billion parameters produced a system that could write essays, translate languages, answer questions, and generate code — all from a single model trained on text from the internet.
What followed was an arms race unlike anything the field had seen:
The LLM Arms Race
GPT-3
175B parameters — few-shot learning stuns researchers and the public
ChatGPT
OpenAI's conversational interface reaches 100M users in two months
GPT-4
Multimodal model passes professional exams and codes complex systems
Claude 2
Anthropic releases Claude 2 with 100K context and Constitutional AI
Claude 3.5 Sonnet
Sets new benchmarks for coding and reasoning at lower cost
Open Source Surge
Llama 3, Mistral, and others democratise access to frontier-class models
The Scale of the LLM Revolution
The LLM era also brought a critical philosophical shift. Previous AI systems were narrow — they did one thing well. LLMs were general-purpose. A single model could write poetry, debug code, summarise legal documents, and tutor students in calculus. This generality was new, and it changed the conversation about AI from "Can it do X?" to "What can it not do?"
The Agentic Era (2025–Present)
The most recent chapter in AI history is unfolding right now. It began when engineers asked a deceptively simple question: what if we gave language models tools?
An LLM on its own can reason and generate text. But wrap it in a harness — an orchestration layer that connects it to file systems, web browsers, APIs, databases, and code execution environments — and it becomes an agent: a system that can act on the world, not just talk about it.
What Makes the Agentic Era Different
Autonomous Action
Agents don't just respond — they plan, execute multi-step tasks, and adapt based on results
Tool Use
Agents read files, write code, search the web, call APIs, and manage infrastructure
Persistent Memory
Modern harnesses give agents memory that persists across sessions, enabling long-term collaboration
Key milestones of the agentic era:
- Claude Code (2025) — Anthropic's CLI harness that lets Claude autonomously read, edit, and manage entire codebases from the terminal
- Model Context Protocol (2025) — an open standard for connecting models to tools, data sources, and services
- Claude Opus 4.6 (2026) — 1 million token context window with advanced reasoning and tool use capabilities
- Multi-agent systems — specialised agents collaborating on complex tasks, debating outcomes, and checking each other's work
Comparing the Eras
AI Through the Ages
| Feature | Era | Core Approach | Strength | Limitation |
|---|---|---|---|---|
| 1950s–1970s | Symbolic AI & Search | Logical reasoning, theorem proving | Could not handle ambiguity or learn from data | |
| 1980s | Expert Systems | Domain-specific decision making | Brittle, expensive to maintain, could not generalise | |
| 1990s–2000s | Machine Learning | Pattern recognition from data | Required hand-crafted features, narrow applications | |
| 2010s | Deep Learning | Learned features automatically at scale | Required massive labelled datasets, single-task | |
| 2020–2024 | Large Language Models | General-purpose reasoning and generation | Stateless, no tool use, hallucination-prone | |
| 2025–Present | Agentic AI | Autonomous action with tools and memory | Safety, alignment, and governance still maturing |
The Lessons of History
Seventy-six years of AI history teach us several enduring lessons:
1. Hype cycles are inevitable. Every era of AI has been marked by wild optimism followed by painful correction. The researchers who survived the winters were the ones who kept working when the funding dried up.
2. Scale matters, but so does architecture. Neural networks existed for decades before they worked. What changed was not just more data and compute, but better architectures (CNNs, LSTMs, Transformers) that could exploit that scale.
3. The hard problems are always harder than expected. Common sense reasoning, robust language understanding, and general intelligence have been "ten years away" since 1956. Humility about timelines is a hard-won lesson.
4. Practical value drives sustained progress. Expert systems boomed because businesses could use them. Deep learning took off because it solved real problems in vision and speech. The agentic era is accelerating because agents deliver measurable productivity gains.
5. Safety and ethics are not optional. Every era that ignored the societal implications of its technology eventually faced a reckoning. The current focus on alignment, safety, and responsible deployment is not a distraction — it is a lesson learned from history.
What Comes Next
The Horizon
Global AI Infrastructure
Federated agent networks connected through open protocols like MCP, enabling planetary-scale collaboration
Solved Alignment
AI systems that reliably do what humans intend, even in novel situations, with transparent reasoning
Artificial General Intelligence
Systems that match or exceed human cognitive abilities across all domains — the original dream of Dartmouth
The history of AI is a story of people refusing to give up on an idea that was always ahead of its time — until it was not. Each generation of researchers inherited the failures of the last and turned them into foundations.
We stand at the most capable point in that history. The models are powerful. The tools are connected. The agents are acting. What we build from here will determine whether AI's next chapter is its most consequential — or just another cycle.
76 Years in 60 Seconds
The Neuron
McCulloch & Pitts model the first artificial neuron
The Question
Turing asks "Can machines think?"
The Name
Dartmouth Conference coins "artificial intelligence"
The Chatbot
ELIZA simulates conversation
The First Winter
Funding collapses, pessimism reigns
Expert Systems Boom
Commercial AI takes off with rule-based systems
The Second Winter
Expert systems collapse under their own weight
Deep Blue
IBM defeats Kasparov at chess
AlexNet
Deep learning conquers computer vision
Transformers
"Attention Is All You Need" changes everything
GPT-3
Large language models arrive
ChatGPT
AI goes mainstream — 100M users in 2 months
Agentic AI
Models gain tools, memory, and autonomy
The Present
1M-token agents writing code, managing systems, and debating outcomes
The history of AI is not a straight line. It is a spiral — each revolution wider, higher, and more consequential than the last. We are living in the widest turn yet.