The Rise of Agent Harnesses
How a thin orchestration layer became the most consequential piece of AI infrastructure
For years, all the attention went to the models. Bigger parameters. Better benchmarks. Faster inference. But quietly, a different kind of innovation was compounding — not inside the model, but around it.
The agent harness — the orchestration layer that connects a language model to tools, memory, safety controls, and the real world — has gone from an afterthought to the defining technology of the agentic era. This is the story of how that happened.
The Harness Revolution in Numbers
Phase 1: The Prompt Wrapper Era (2022–2023)
The first "harnesses" were barely harnesses at all. They were prompt wrappers — thin scripts that formatted a user's input, sent it to an API, and printed the response.
The pattern was simple:
- Prepend a system prompt with instructions
- Append the user's message
- Call the API
- Display the result
Tools like LangChain and early AutoGPT experiments showed that you could chain multiple API calls together, giving the model a rudimentary loop: think, act, observe, repeat. But these systems were fragile. They hallucinated tool calls, got stuck in infinite loops, and had no meaningful safety controls.
Despite the rough edges, the prompt wrapper era proved something crucial: models wanted to use tools. Given the right framing, language models would naturally attempt to call functions, read files, and interact with external systems. The instinct for agency was latent in the models. It just needed the right infrastructure to express it.
Phase 2: The Framework Explosion (2023–2024)
The realisation that models could use tools triggered an explosion of harness frameworks, each taking a different approach to orchestration.
The Framework Wave
LangChain
Pioneered chain-based orchestration with composable components
AutoGPT / BabyAGI
Demonstrated autonomous agents with goal-driven loops
CrewAI
Introduced role-based multi-agent collaboration
OpenAI Assistants API
First-party tool use and file handling from a model provider
Claude Tool Use
Anthropic ships native tool calling with structured outputs
Cursor / Windsurf
IDE-integrated harnesses bring agentic AI into the editor
This era was characterised by experimentation and fragmentation. Every framework had its own way of defining tools, managing context, and handling errors. There was no standard. If you built tools for one framework, they did not work with another.
What the Frameworks Got Right
Tool Abstraction
Defined clean interfaces between models and external capabilities
Retry Logic
Handled API failures, rate limits, and malformed model outputs gracefully
Structured Output
Forced models to return typed, parseable responses instead of free-form text
What They Got Wrong
Vendor Lock-In
Most frameworks were tightly coupled to a single model provider
Over-Abstraction
Some frameworks had so many layers that debugging became nearly impossible
Safety Afterthought
Permissions and sandboxing were bolted on late, if at all
The framework explosion was messy but necessary. It mapped the problem space, identified the patterns that worked, and made clear what a mature harness would eventually need to include.
Phase 3: The Integrated Harness (2024–2025)
By late 2024, the field began consolidating around a more mature vision of what a harness should be. Instead of loose chains of API calls, the new generation of harnesses were integrated platforms with first-class support for every aspect of agent operation.
The defining characteristics of this phase:
Models Started Shipping Their Own Harnesses
Anthropic launched Claude Code — a CLI-based harness that let Claude directly read, edit, search, and manage codebases from the terminal. This was significant because the harness was designed by the same team that built the model, enabling deep integration between reasoning and tool use.
OpenAI followed with the Codex CLI. Google integrated agent capabilities into Gemini. The model providers recognised that the harness was not someone else's problem — it was core to the product.
Safety Became Architectural
The early harnesses treated safety as an afterthought. The integrated harnesses made it foundational:
- Permission tiers that control which tools run automatically and which require human approval
- Sandboxed execution environments that contain the blast radius of mistakes
- Audit trails that log every tool call, every model decision, and every file modification
Memory Went Persistent
The integrated harnesses introduced persistent memory systems that survived across sessions. Agents could remember user preferences, project context, past feedback, and accumulated knowledge. This transformed agents from amnesiac tools into long-term collaborators.
Phase 4: The Protocol Era (2025–Present)
The most recent phase addresses the fragmentation problem that plagued earlier eras. The question shifted from "How do we build a harness?" to "How do we make all harnesses interoperable?"
The Protocol Revolution
MCP Announced
Anthropic introduces the Model Context Protocol as an open standard
MCP Adoption
Major tool providers begin publishing MCP-compatible interfaces
MCP Ecosystem
Thousands of MCP servers available — databases, APIs, dev tools, cloud services
Universal Tool Layer
Any MCP-compatible agent can discover and use any MCP-compatible tool
The Model Context Protocol (MCP) is to agent harnesses what HTTP was to web browsers: a universal standard that decouples the client from the server. Before MCP, every harness had to build custom integrations for every tool. After MCP, a tool built once works with every harness.
Before and After MCP
| Feature | Aspect | Pre-MCP World | Post-MCP World |
|---|---|---|---|
| Tool Integration | Custom code per harness per tool | Build once, works everywhere | |
| Discovery | Manual configuration and documentation | Automatic capability discovery | |
| Provider Lock-In | Tools tied to specific frameworks | Provider-agnostic tools | |
| Ecosystem Growth | Linear — each integration is bespoke | Exponential — tools compose freely | |
| Upgrade Path | Rewrite integrations when tools change | Protocol handles versioning |
MCP did not just make harnesses more convenient. It changed the economics of the ecosystem. Tool builders now had a single target to build for. Harness developers could focus on orchestration rather than integration. Users got access to an ever-growing catalogue of capabilities without waiting for their specific harness to add support.
The Anatomy of a Modern Harness (2026)
Today's state-of-the-art harnesses have converged on a layered architecture with distinct responsibilities at each level.
The Five Layers
Model Layer
Manages model selection, prompt formatting, and response parsing across multiple providers
Tool Layer
MCP-based tool discovery, invocation, and result handling with structured I/O
Memory Layer
Persistent, categorised, queryable memory that accumulates knowledge across sessions
Safety Layer
Permission tiers, sandboxing, audit trails, and policy-as-code enforcement
Orchestration Layer
The conversation loop — prompt, respond, execute, observe, repeat until done
What Modern Harnesses Handle Daily
Why the Harness Matters More Than the Model
This is the counterintuitive insight that the rise of agent harnesses has revealed: the harness is more important than the model it wraps.
Consider the evidence:
- The same model performs dramatically differently in different harnesses. Claude in a well-designed coding harness is vastly more effective than Claude in a bare API call.
- Users choose tools based on the harness experience (Cursor, Claude Code, Windsurf), not on which model happens to power them.
- The harness determines what the agent can actually do. A brilliant model with no tools is just a chatbot. A good model with great tools is a collaborator.
This does not diminish the importance of model quality. But it reframes the competitive landscape. The race is no longer just about who has the best model. It is about who has the best harness.
The Multi-Agent Frontier
The latest evolution in harness design is the shift from single-agent to multi-agent orchestration. Modern harnesses can spawn, coordinate, and manage multiple specialised agents working on different aspects of a task.
Multi-Agent Patterns
Fork and Join
Parent agent spawns specialists for subtasks, then synthesises their results
Debate
Multiple agents argue different positions, with a judge agent synthesising the verdict
Review Chain
One agent does the work, another reviews it, a third validates the review
Multi-agent harnesses introduce new challenges that single-agent systems never faced:
- Context sharing — how much context does each sub-agent need? Too little and it lacks understanding. Too much and it wastes tokens.
- Conflict resolution — when agents disagree, who wins? The harness needs arbitration logic.
- Resource management — multiple agents running simultaneously multiply compute costs and API calls.
- Observability — tracing a single agent's reasoning is hard enough. Tracing a network of interacting agents requires sophisticated tooling.
What Comes Next
The rise of agent harnesses is far from over. Several trends are shaping the next phase:
The Road Ahead
Federated Harnesses
Agents from different organisations discover and collaborate through shared protocols
Self-Improving Harnesses
Harnesses that optimise their own prompts, tool selection, and orchestration strategies based on outcomes
Ambient Agents
Harnesses embedded in operating systems, always running, always available — not apps you open but infrastructure you rely on
The Super Harness
A planetary-scale orchestration layer connecting every agent, tool, and data source into unified infrastructure
Three Bets for the Future
Harnesses Become Platforms
The most successful harnesses will evolve into platforms with ecosystems of plugins, extensions, and community-built tools
Safety Becomes Competitive Advantage
As agents gain autonomy, the harnesses with the best safety records will win user trust — and market share
Human-Agent Teams Become Default
The future of work is not AI replacing humans or humans directing AI — it is integrated teams where the harness mediates the collaboration
The Lesson of the Rise
The rise of agent harnesses teaches a lesson that recurs throughout the history of technology: the infrastructure layer eventually matters more than the innovation it supports.
The internet mattered more than any individual website. Cloud computing mattered more than any individual application. And the agent harness — the layer that turns raw intelligence into practical agency — may ultimately matter more than any individual model.
The models will keep improving. That is almost guaranteed. But the harness is what determines whether that intelligence is channelled into useful work or wasted in idle conversation. It is what makes the difference between an AI that can think and an AI that can do.
They built the models. Then they built the harnesses. And that is when everything changed.