Most "multi-agent" systems I've seen in production are just a chain of prompts wearing a trench coat. Real orchestration is harder, messier, and more interesting than the demos suggest.
I run a 9-agent autonomous marketing system. It researches markets, builds content strategy, writes posts, validates quality, routes to me for approval, publishes across channels, and measures what worked. I didn't build it as an experiment. I built it because a real estate analytics product needed a full content operation and I didn't have the team or the timeline to do it with humans alone.
This post is about the architecture, the patterns that survived production, and what broke along the way.
Why one agent isn't enough
A single LLM call can do a lot. It can write a blog post, generate a social caption, summarize research. The problem shows up when you need all of those things to happen in sequence, with different contexts and quality gates between each step.
Think about what a human marketing team actually does. A researcher pulls industry data. A strategist decides what to say and where. A writer turns strategy into content. An editor reviews it. A manager approves it. Someone publishes it. Someone checks whether it worked.
You can try cramming all of that into one giant prompt. I did. The output is mediocre across the board because you're asking one context window to be seven different people. The research is shallow because the model is already thinking about the writing. The strategy is vague because it's trying to be a plan and a draft at the same time.
Splitting these into separate agents with separate contexts and separate instructions made each step dramatically better.
The system: 9 agents, one pipeline
I built this on OpenClaw, a framework that gives you the scaffolding for agent coordination without reinventing message passing and task management from scratch. Here's the command center showing the live system:
The nine agents form a complete marketing team, each with a distinct role:
Personal Assistant acts as the liaison between me and the marketing team. He handles coordination that doesn't fit neatly into the content flow — relaying my priorities, answering questions from other agents, and keeping me informed without flooding me with every detail.
Marketing Director / Strategist leads the entire team, running on Opus 4.6. She owns the content calendar, analyzes what the researcher produces, and creates tasks for the blog writer, social post generator, or email sender depending on what content needs to be produced. Critically, she also measures content performance across platforms before making decisions about what to assign next. That feedback loop — publish, measure, adjust — is what makes this a strategy system rather than just a content factory.
Researcher monitors competitor activity, market trends, and keyword opportunities every 6 hours via MCP connections to GA4 and web search. Everything the researcher produces feeds into the marketing director's decision-making.
Blog Writer handles long-form content for the website — articles, guides, and SEO-driven posts.
Social Post Generator creates platform-native content for Instagram, X, LinkedIn, Facebook, and other social channels.
Email Sender writes email copy and sends campaigns to users.
Validator runs QA on everything before it reaches me. It checks for legal compliance, factual accuracy, and tone. If content fails validation, it gets sent back to the appropriate agent — not through the whole pipeline again, just back to whoever produced it. If it passes, the validator routes it for human approval.
Publisher handles the actual distribution after approval, pushing content to the appropriate channel — website, social platforms, or email.
Social Sharer cross-posts published content across all social platforms.
All agents share a task board. When the researcher finishes a briefing, it creates a task for the marketing director. The director analyzes it and assigns work to the right content agent. The task board is the coordination mechanism, not a central orchestrator dictating every handoff.
Here's the pipeline flow:
The approval flow has two layers. First, the validator checks every piece of content against rules — legal, factual, tone. If something fails, it goes back to the agent that produced it with specific feedback. If it passes validation, it comes to me on Telegram via OpenClaw's Lobster protocol. I get a formatted message with approve/revise/reject buttons. I act as a secondary validator — a human sanity check before anything goes live. If I revise, my notes go back to the content agent as a new task. Only after both the validator and I approve does the publisher take over.
The marketing director closes the loop by measuring performance after content goes live. She pulls data from GA4, Google Search Console, and social platforms to see what's working and what isn't. That analysis directly informs the next round of task assignments. Here's what that looks like in the command center:
Page views, social impressions, search rankings, click-through rates — all of it feeds back into the director's next set of decisions. Content that performs well signals the team to produce more in that vein. Content that underperforms gets analyzed for why, and the strategy adjusts.
Each agent can also message other agents directly for edge cases. If the validator finds a statistic that doesn't match the source data, it messages the researcher to verify rather than kicking the whole piece back to step one.
The patterns that matter
After months of running this, a few patterns have proven themselves.
Task boards over central orchestrators
The first version had a central orchestrator that called each agent in order. It was brittle. If QA needed a revision, the orchestrator had to handle the entire state machine of "go back to step 3, but keep the research from step 1, and don't re-run strategy."
Switching to a shared task board changed the dynamic. Each agent watches for tasks it can handle, picks them up, does its work, and creates downstream tasks. The orchestration emerges from agent behavior rather than being dictated by a controller. This maps more naturally to how human teams actually work.
Agent-to-agent messaging for edge cases
The happy path is linear. But real work has branches. QA might find a statistic that doesn't match the source data. Rather than failing the whole piece and sending it through the full pipeline again, it messages the research agent directly: "Can you verify this claim?"
The research agent responds asynchronously while QA continues reviewing other aspects. Small thing, but it's the difference between a system that handles reality and one that only works in demos.
Human-in-the-loop as secondary validation
I route approval decisions to Telegram because that's where I already am. The validator handles the first pass — legal, factual, tone. I handle the second. Between the two layers, very little slips through.
You need a real human checkpoint in most agent pipelines today. Not because AI can't produce good work, but because publishing bad content has consequences and a quick human review costs almost nothing relative to a mistake.
The approval gate also functions as a training signal. When I revise content, the revision notes accumulate and the system gets better at anticipating what I'll flag. Not through fine-tuning, but through growing context in the briefing documents.
Performance-driven feedback loops
The marketing director doesn't just assign tasks and move on. She measures outcomes. After content is published and has had time to accumulate data, she pulls performance metrics and factors them into the next round of strategy. This is the part most agent systems skip — they produce output but never learn from results.
ACP delegation for heavy lifting
Some tasks are complex enough that an agent spawns a Claude Code session via ACP (Agent Communication Protocol) to handle the substantive work. The research agent delegates deep analysis to Claude Code, which can read files, run searches, and produce structured outputs. The agent handles coordination. Claude Code handles the thinking.
This separation keeps the agent layer lightweight and fast. It manages flow, not computation. When something needs serious analysis or long-form writing, it hands off to a full coding session with the tools and context window to do it properly.
MCP as connective tissue
None of this works without MCP (Model Context Protocol). It's the plumbing that connects agents to external systems.
Research uses MCP to pull from GA4. Publishing uses MCP to push through Postiz for social distribution. Content agents use Nano Banana via MCP for image generation. The measurement agent reads analytics through the same GA4 MCP connection.
What makes MCP valuable isn't any single integration. It's the consistent interface for connecting agents to tools. Adding a new data source or distribution channel means configuring a new MCP server, not rewriting agent logic. Wiring in Napkin AI for visual summaries took about 20 minutes.
The alternative is custom API integrations for every tool. Each with its own auth, its own error handling, its own data format. MCP standardizes that. Boring infrastructure work, but boring infrastructure work is what makes systems maintainable at scale.
What breaks
I want to be honest about what doesn't work well yet.
Error propagation is the biggest headache. When research produces a bad briefing, every downstream agent produces bad output. The validator catches some of it, not all. I'm still working on better validation at each handoff point.
Cost adds up faster than you'd expect. Nine agents making multiple LLM calls, some spawning Claude Code sessions. A full content cycle isn't cheap. I've had to be deliberate about which agents get Opus and which can work with Sonnet or Haiku for simpler tasks.
Debugging is a slog. When the final output is wrong, figuring out where in the pipeline it went wrong means reading through task board history, agent-to-agent messages, and intermediate outputs. I built logging for it, but it's still more detective work than I'd like.
Timing across cron cycles trips you up. If an agent is waiting for a human approval that takes three hours, the next cycle has to pick up where it left off rather than starting fresh. Stateful orchestration across time boundaries sounds simple. It isn't.
What I'd do differently
I'd spend more time on the task board schema upfront. How tasks are described, what metadata they carry, and how agents query for work turned out to be the most consequential design decision. Getting it wrong meant agents picked up tasks they couldn't handle or missed ones they should have grabbed.
I'd invest earlier in observability. Being able to trace a piece of content from initial research through to published post and performance data, with clear links between each step, would have saved weeks of debugging.
And I'd start with three agents, not nine. The full pipeline is the right end state, but building incrementally and proving each link works before adding the next would have been faster than trying to coordinate the whole fleet from day one.
The enterprise angle
Multi-agent orchestration isn't about replacing people. This 9-agent system does the work that would take a small marketing team weeks. But I still review everything. I still make strategic decisions. I still catch things the agents miss.
What it does is let one architect operate at the scale of a team. For enterprise organizations, that means you can prototype content operations, automate repetitive workflows, and test market approaches at a speed that traditional hiring and outsourcing can't match. The patterns here apply beyond marketing. Any workflow with distinct roles, quality gates, and measurable outcomes is a candidate for this kind of architecture.
If you're thinking about building something similar, start with a real workflow you understand deeply. Don't start with agents and go looking for problems. Start with the problem and figure out which agents you need to solve it.


