In the evolving landscape of artificial intelligence, the transition from single-agent models to multi-agent systems marks a significant leap. Anthropic’s latest Research feature, powered by a multi-agent architecture, showcases how orchestrated AI agents can collaborate to tackle complex, dynamic tasks like research—something that’s notoriously hard to hardcode or linearize.
In this blog post, we’ll unpack how Anthropic designed, tested, and scaled this system, and what the rest of us can learn from their journey.
Traditional AI models often struggle with open-ended tasks that require adaptive, iterative exploration—like research. Anthropic tackled this by enabling Claude to manage a group of autonomous agents that:
- Break a complex query into smaller parts.
- Explore multiple search paths in parallel.
- Combine findings into coherent, cited responses.
This is inspired by how humans work in teams: distributed intelligence, specialization, and dynamic strategy adjustment.
The architecture follows a lead agent (the LeadResearcher) that:
- Plans a strategy based on the user’s query.
- Spawns subagents, each with a targeted task.
- Aggregates and synthesizes results for the final output.
- Uses a citation agent to ensure traceability and source attribution.
This mirrors a research lab model: a principal investigator delegates to team members, then compiles a report.
Prompt Engineering is Everything
- Agents were initially overzealous or duplicated tasks.
- Engineers created prompt-based heuristics to guide delegation, effort scaling, and tool choice.
- Claude models even helped debug and optimize their own prompts.
Parallelization = Power
- Subagents run in parallel, each using multiple tools simultaneously.
- This massively reduced research time (up to 90%) and expanded the scope of each session.
Token Budget Matters
- Multi-agent systems can use 15x more tokens than typical chats.
- So they’re best used for high-value queries where depth and breadth justify the cost.
Unlike single-agent systems, there’s no one "correct" path in multi-agent workflows. Anthropic used:
- LLM-as-judge evaluations for outputs.
- Rubrics for factual accuracy, citations, and source quality.
- Small-sample evals early on to rapidly iterate.
- Human testers to catch hallucinations, biases, and weak sources.
Going from prototype to product wasn’t just about better prompts. Anthropic had to:
- Persist agent state across tool failures.
- Debug dynamic behaviors using full traceability.
- Deploy updates safely using rainbow deployments (gradual rollouts to prevent disruption).
Synchronous subagent execution remains a bottleneck, but asynchronous architectures are in development for greater parallelism.
Claude’s multi-agent system is now helping users:
- Discover business opportunities.
- Navigate complex decisions (like healthcare or legal).
- Accelerate technical and academic research.
According to usage data, top use cases include:
- Building software systems (10%)
- Professional content development (8%)
- Business strategy (8%)
- Academic support (7%)
Multi-agent systems are more than just multiple copies of an LLM—they’re carefully coordinated ecosystems that require:
- Clear roles.
- Smart prompts.
- Thoughtful tool integration.
- Continuous evaluation.
Anthropic’s work sets a precedent for how to responsibly scale LLM capabilities by emulating collaborative human processes.