Most generalist multi-agent stacks still look like a relay race: a central planner prompts specialist workers, who pass back long context blobs for the planner to stitch together. It works—until you downsize the planner or hit token limits. Anemoi proposes a different wiring: keep a light planner, but let agents communicate directly over an Agent-to-Agent (A2A) MCP server so everyone can see progress, flag bottlenecks, and propose fixes in real time.
What’s actually new
Anemoi replaces unidirectional prompt passing with a threaded A2A server (built on the Model Context Protocol) that exposes primitives like list_agents
, create_thread
, send_message
, and wait_for_mentions
. Any agent can join a thread, address peers, and update plans mid-flight—reducing redundant context stuffing and information loss.
The cast of agents (and why it matters)
-
Planner: drafts the initial plan and spins up a thread.
-
Critique: continuously audits intermediate results.
-
Answer-Finder: compiles the final submission.
-
Workers: Web, Document Processing, and Reasoning & Coding—mirroring OWL’s tool set for a fair head-to-head. All are MCP-enabled so they can monitor progress and coordinate directly.
This design reduces reliance on one overpowered planner, supports adaptive plan updates, and cuts token overhead from repeated context injection.
Numbers that move the needle (GAIA validation)
Framework | Planner / Workers | Avg. Acc. |
---|---|---|
OWL-rep (pass@3) | GPT-4.1-mini / GPT-4o | 43.64% |
OWL (paper, pass@3) | GPT-4o-mini / GPT-4o | 47.27% |
Anemoi (pass@3) | GPT-4.1-mini / GPT-4o | 52.73% |
With a small planner (GPT-4.1-mini), Anemoi tops a strong open-source baseline by +9.09 points under identical tools and models—and is competitive with several proprietary systems that rely on larger planners.
How the A2A workflow runs
-
Discover agents → 2) Create thread with participants → 3) Workers execute subtasks; Critique labels outputs accept/uncertain while any agent can contribute revisions → 4) Consensus vote before finalization → 5) Answer-Finder submits. All via MCP messaging in a single conversation context.
Where it wins—and where it trips
-
Wins: Of the tasks Anemoi solved that OWL missed, 52% were due to collaborative refinement enabled by A2A; another 8% came from less context redundancy.
-
Failures: Remaining errors skew to LLM/tool limits (≈46%/21%), incorrect plans (≈12%), and some communication latency (≈10%)—notably when the web agent is busy and can’t respond to peers.
Why this matters
If your agent system juggles web search, file I/O, and coding, direct inter-agent communication can deliver better results without upgrading to an expensive planner. Anemoi shows a practical blueprint: keep the planner lightweight, move coordination into an A2A layer, and let specialists negotiate in-thread instead of bloating prompts.
Paper link: arXiv 2508.17068 (PDF)