The History of AI

May 14, 2026

The History of AI

From Turing's dream to agentic intelligence — eight decades of machines learning to think

Artificial intelligence is not a recent invention. It is the product of eight decades of ambition, breakthrough, disappointment, and reinvention. Understanding where AI came from is essential to understanding where it is going.

This post traces the full arc — from the earliest theoretical foundations to the agentic systems reshaping how we work today.

AI at a Glance

1950

Year the field was born

Years of research

$300B+

Annual AI investment (2026)

3.5B

People using AI tools daily

The Foundations (1940s–1950s)

Alan Turing era computing

The story of AI begins not with a computer, but with a question: can machines think?

In 1943, Warren McCulloch and Walter Pitts published a mathematical model of artificial neurons — the first formal description of how simple computational units could, in theory, perform logical reasoning. It was abstract, but it planted a seed.

Then came Alan Turing. In his landmark 1950 paper Computing Machinery and Intelligence, Turing proposed the Imitation Game — now known as the Turing Test — as a practical way to evaluate whether a machine could exhibit intelligent behaviour indistinguishable from a human. He did not ask whether machines could think. He asked whether the distinction mattered.

The Turing Test

Turing proposed that if a human evaluator could not reliably distinguish between a machine and a human in a text-based conversation, the machine should be considered intelligent. This simple framing shaped AI research for decades.

In 1956, John McCarthy, Marvin Minsky, Nathaniel Rochester, and Claude Shannon organised the Dartmouth Conference — the event that officially named the field "artificial intelligence" and set its agenda. The proposal was breathtaking in its optimism: they believed that a summer's worth of work by a small team could make significant progress on making machines intelligent.

They were wrong about the timeline. They were right about the importance.

The Founding Era

1943

McCulloch-Pitts Neuron

First mathematical model of an artificial neuron

1950

Turing's Paper

"Computing Machinery and Intelligence" proposes the Imitation Game

1956

Dartmouth Conference

The field of AI is officially named and launched

1957

Perceptron

Frank Rosenblatt builds the first neural network hardware

1958

LISP Created

John McCarthy invents LISP, the language of early AI

The Golden Age and the First Winter (1960s–1970s)

The decade following Dartmouth was electric with optimism. Early AI programs could solve algebra problems, prove geometric theorems, and play checkers. Herbert Simon predicted that within ten years, a computer would be chess champion and would discover a significant mathematical theorem.

Key achievements of this era:

ELIZA (1966) — Joseph Weizenbaum's chatbot simulated a psychotherapist using simple pattern matching. Users formed emotional attachments to it, revealing how readily humans anthropomorphise machines.
SHRDLU (1970) — Terry Winograd's program could understand and act on natural language commands within a simple block world.
Expert Systems — Programs like DENDRAL (chemistry) and MYCIN (medical diagnosis) encoded human expert knowledge into rule-based systems.

But the optimism outpaced the reality. The problems that seemed close to solved turned out to be far harder than anyone expected.

The First AI Winter (1974–1980)

Governments and institutions that had funded AI research grew disillusioned with the slow progress. The Lighthill Report in the UK was devastating, concluding that AI had failed to deliver on its grand promises. Funding dried up. Researchers left the field. The first AI winter had arrived.

The winter was not a failure of intelligence. It was a failure of expectations. The foundational work of this era — search algorithms, knowledge representation, natural language processing — would prove essential decades later.

Expert Systems and the Second Winter (1980s–early 1990s)

Early computing era

The 1980s brought a commercial AI boom driven by expert systems — programs that encoded domain knowledge as if-then rules and made decisions like human specialists.

Companies spent billions on expert system technology. Japan launched the ambitious Fifth Generation Computer Project, aiming to build machines capable of reasoning and natural language understanding. The field was hot again.

Landmark Expert Systems

MYCIN

Diagnosed bacterial infections and recommended antibiotics with accuracy rivalling human doctors

DENDRAL

Identified molecular structures from mass spectrometry data — the first expert system

XCON/R1

Configured VAX computer orders for DEC, saving the company $40M annually

But expert systems had a fatal flaw: they were brittle. They could only handle situations explicitly covered by their rules. They could not learn, generalise, or adapt. Maintaining the rule bases became prohibitively expensive as they grew.

By the late 1980s, the hype collapsed again. Japan's Fifth Generation project was quietly shelved. Corporate AI labs closed. The second AI winter set in, lasting through the early 1990s.

The Quiet Revolution: Machine Learning (1990s–2000s)

While the public face of AI went dark, something profound was happening beneath the surface. Researchers shifted from trying to program intelligence to trying to learn it from data.

The Paradigm Shift

The fundamental question changed from "How do we encode human knowledge into rules?" to "How do we build systems that discover patterns on their own?" This shift from knowledge engineering to machine learning would eventually transform the entire field.

Key developments of this era:

Support Vector Machines (1995) — powerful classifiers that found optimal decision boundaries in high-dimensional data
Random Forests (2001) — ensemble methods that combined many weak learners into strong ones
IBM Deep Blue (1997) — defeated world chess champion Garry Kasparov, proving that brute-force search combined with expert evaluation could conquer a domain once considered a hallmark of human intelligence
Statistical NLP — language processing moved from hand-crafted grammars to probabilistic models trained on text corpora

The internet was also beginning to generate vast amounts of data — exactly what machine learning algorithms needed to thrive. The fuel was accumulating. The engines were being built. The explosion was coming.

Deep Learning Changes Everything (2010s)

Neural network visualisation

The 2010s were the decade AI went from an academic curiosity to a force that reshaped industries. The catalyst was deep learning — neural networks with many layers, trained on massive datasets using powerful GPUs.

The Deep Learning Decade

2012

AlexNet

Crushed the ImageNet competition, proving deep learning for computer vision

2014

GANs

Ian Goodfellow invents generative adversarial networks, enabling AI-generated images

2016

AlphaGo

DeepMind's system defeats world Go champion Lee Sedol — a game thought decades away from AI mastery

2017

Transformer

Google publishes "Attention Is All You Need" — the architecture that will power the LLM revolution

2018

BERT & GPT

Pre-trained language models show that scale and self-supervised learning unlock language understanding

The Transformer architecture (2017) deserves special attention. By replacing sequential processing with self-attention mechanisms, Transformers could process entire sequences in parallel and capture long-range dependencies in text. This was the architectural breakthrough that made modern large language models possible.

Why Deep Learning Worked This Time

Three things converged: massive datasets (the internet), massive compute (GPUs and cloud), and algorithmic improvements (dropout, batch normalisation, residual connections). The same neural network ideas that failed in the 1960s succeeded in the 2010s because the infrastructure finally caught up with the theory.

The Large Language Model Era (2020–2024)

If deep learning was the earthquake, large language models were the tsunami.

GPT-3 (2020) shocked the world by demonstrating that scaling up Transformers to 175 billion parameters produced a system that could write essays, translate languages, answer questions, and generate code — all from a single model trained on text from the internet.

What followed was an arms race unlike anything the field had seen:

The LLM Arms Race

2020

GPT-3

175B parameters — few-shot learning stuns researchers and the public

2022

ChatGPT

OpenAI's conversational interface reaches 100M users in two months

2023

GPT-4

Multimodal model passes professional exams and codes complex systems

2023

Claude 2

Anthropic releases Claude 2 with 100K context and Constitutional AI

2024

Claude 3.5 Sonnet

Sets new benchmarks for coding and reasoning at lower cost

2024

Open Source Surge

Llama 3, Mistral, and others democratise access to frontier-class models

The Scale of the LLM Revolution

100M

ChatGPT users in 2 months

1T+

Parameters in frontier models

200K

Claude 3.5 context window (tokens)

$100B+

Annual investment in LLM infrastructure

The LLM era also brought a critical philosophical shift. Previous AI systems were narrow — they did one thing well. LLMs were general-purpose. A single model could write poetry, debug code, summarise legal documents, and tutor students in calculus. This generality was new, and it changed the conversation about AI from "Can it do X?" to "What can it not do?"

The Agentic Era (2025–Present)

Agentic AI systems

The most recent chapter in AI history is unfolding right now. It began when engineers asked a deceptively simple question: what if we gave language models tools?

An LLM on its own can reason and generate text. But wrap it in a harness — an orchestration layer that connects it to file systems, web browsers, APIs, databases, and code execution environments — and it becomes an agent: a system that can act on the world, not just talk about it.

What Makes the Agentic Era Different

Autonomous Action

Agents don't just respond — they plan, execute multi-step tasks, and adapt based on results

Tool Use

Agents read files, write code, search the web, call APIs, and manage infrastructure

Persistent Memory

Modern harnesses give agents memory that persists across sessions, enabling long-term collaboration

Key milestones of the agentic era:

Claude Code (2025) — Anthropic's CLI harness that lets Claude autonomously read, edit, and manage entire codebases from the terminal
Model Context Protocol (2025) — an open standard for connecting models to tools, data sources, and services
Claude Opus 4.6 (2026) — 1 million token context window with advanced reasoning and tool use capabilities
Multi-agent systems — specialised agents collaborating on complex tasks, debating outcomes, and checking each other's work

From Intelligence to Agency

The agentic era represents the most significant shift since the invention of the Transformer. For the first time, AI systems are not just answering questions — they are completing tasks, making decisions, and operating with increasing autonomy. The model provides the intelligence. The harness provides the agency.

Comparing the Eras

AI Through the Ages

Feature	Era	Core Approach	Strength
1950s–1970s	Symbolic AI & Search	Logical reasoning, theorem proving	Could not handle ambiguity or learn from data
1980s	Expert Systems	Domain-specific decision making	Brittle, expensive to maintain, could not generalise
1990s–2000s	Machine Learning	Pattern recognition from data	Required hand-crafted features, narrow applications
2010s	Deep Learning	Learned features automatically at scale	Required massive labelled datasets, single-task
2020–2024	Large Language Models	General-purpose reasoning and generation	Stateless, no tool use, hallucination-prone
2025–Present	Agentic AI	Autonomous action with tools and memory	Safety, alignment, and governance still maturing

The Lessons of History

Reflection and learning

Seventy-six years of AI history teach us several enduring lessons:

1. Hype cycles are inevitable. Every era of AI has been marked by wild optimism followed by painful correction. The researchers who survived the winters were the ones who kept working when the funding dried up.

2. Scale matters, but so does architecture. Neural networks existed for decades before they worked. What changed was not just more data and compute, but better architectures (CNNs, LSTMs, Transformers) that could exploit that scale.

3. The hard problems are always harder than expected. Common sense reasoning, robust language understanding, and general intelligence have been "ten years away" since 1956. Humility about timelines is a hard-won lesson.

4. Practical value drives sustained progress. Expert systems boomed because businesses could use them. Deep learning took off because it solved real problems in vision and speech. The agentic era is accelerating because agents deliver measurable productivity gains.

5. Safety and ethics are not optional. Every era that ignored the societal implications of its technology eventually faced a reckoning. The current focus on alignment, safety, and responsible deployment is not a distraction — it is a lesson learned from history.

The Pattern to Watch

Every previous AI paradigm eventually hit a wall — a class of problems it fundamentally could not solve. Symbolic AI could not learn. Expert systems could not generalise. Narrow deep learning could not reason. The question for the agentic era is: what is its wall, and are we building toward it or around it?

What Comes Next

The Horizon

Global AI Infrastructure

Federated agent networks connected through open protocols like MCP, enabling planetary-scale collaboration

Solved Alignment

AI systems that reliably do what humans intend, even in novel situations, with transparent reasoning

Artificial General Intelligence

Systems that match or exceed human cognitive abilities across all domains — the original dream of Dartmouth

The history of AI is a story of people refusing to give up on an idea that was always ahead of its time — until it was not. Each generation of researchers inherited the failures of the last and turned them into foundations.

We stand at the most capable point in that history. The models are powerful. The tools are connected. The agents are acting. What we build from here will determine whether AI's next chapter is its most consequential — or just another cycle.

76 Years in 60 Seconds

1943

The Neuron

McCulloch & Pitts model the first artificial neuron

1950

The Question

Turing asks "Can machines think?"

1956

The Name

Dartmouth Conference coins "artificial intelligence"

1966

The Chatbot

ELIZA simulates conversation

1974

The First Winter

Funding collapses, pessimism reigns

1980

Expert Systems Boom

Commercial AI takes off with rule-based systems

1987

The Second Winter

Expert systems collapse under their own weight

1997

Deep Blue

IBM defeats Kasparov at chess

2012

AlexNet

Deep learning conquers computer vision

2017

Transformers

"Attention Is All You Need" changes everything

2020

GPT-3

Large language models arrive

2022

ChatGPT

AI goes mainstream — 100M users in 2 months

2025

Agentic AI

Models gain tools, memory, and autonomy

2026

The Present

1M-token agents writing code, managing systems, and debating outcomes

The history of AI is not a straight line. It is a spiral — each revolution wider, higher, and more consequential than the last. We are living in the widest turn yet.