More agents, less cost

When I split my single-agent setup into a team — a Lead, a Builder, and a Validator — the first reaction from people was about cost. Three context windows instead of one. That's 3x the tokens, right?

It's not. In practice, the team setup is comparable or cheaper than the single-agent approach. Here's why.

Where single-agent tokens actually go

In my single-agent CMS migration setup, one Opus model was doing everything: writing AEM components, holding project context, reasoning about architecture, and producing output I'd then review. When I found a discrepancy — and I found a lot of them — the correction cycle looked like this:

I describe the problem. Claude apologizes and reasons about what went wrong. Claude rewrites the component. I check again. Maybe it's fixed, maybe we go another round.

Each round burns Opus-priced tokens on implementation work. But the hidden cost is worse: the correction conversation stays in context. The next component Claude builds is now competing for attention with the debugging session from the last one. Context gets noisier, output quality drops, more corrections follow. It compounds.

The redo cycle isn't just expensive in tokens — it's expensive in context pollution.

The model-per-role split

The agent team uses different models for different jobs:

The Lead runs on Opus. It coordinates work, maintains project files, triages findings, makes architectural calls. This is reasoning-heavy work where model quality matters. But the Lead never writes code — so it's not burning Opus tokens on boilerplate HTL templates or Maven configuration.

The Builder runs on Sonnet. It writes all the code. Sonnet handles implementation work well and costs a fraction of Opus per token. The Builder gets atomic, well-scoped tasks from the Lead: "implement this specific component per the mapping spec." Focused input, focused output. It doesn't need Opus-level reasoning because the hard thinking already happened at the Lead level.

The Validator runs on Sonnet. It compares output against the original site. This is pattern-matching and structured comparison — checking whether what was built matches what should have been built. Again, Sonnet handles this well.

The expensive model does the thinking. The cheaper models do the work.

Early catches kill redo cycles

Single agent redo cycles vs. agent team early catches

The bigger cost saving isn't the model split — it's what the Validator does to the feedback loop.

In the single-agent setup, I was the validator. I'd review output hours or days after it was built. By then, Claude had moved on to other components. Fixing an issue meant context-switching back, re-loading the relevant decisions, and often reworking dependencies that had been built on top of the flawed component.

With the agent team, the Validator checks output immediately after the Builder finishes. Findings go straight to the Builder while the component is still fresh in its context window. The fix is a targeted adjustment, not a rework.

It's the same principle as catching bugs early in a development pipeline. A validation finding at the component level costs a few hundred Sonnet tokens to fix. A "this whole page looks wrong" from the stakeholder three components later means unwinding work across multiple files, re-reading decisions, and reconstructing context — all on Opus in the single-agent case.

The math, roughly

I haven't done rigorous cost accounting, but the directional picture is clear:

Single agent: All tokens are Opus-priced. Redo cycles add 20-40% overhead on implementation tokens. Context pollution causes quality degradation that triggers more redos. Human review is serialized, so problems accumulate before being caught.

Agent team: Lead tokens are Opus-priced but low volume — coordination is concise. Builder and Validator tokens are Sonnet-priced and high volume — but Sonnet is roughly 15x cheaper per token than Opus. Redo cycles are shorter and caught earlier. Context stays clean per agent, so quality holds.

The total token count is probably higher with three agents. The total cost is probably lower, because the expensive tokens shifted from implementation (wasteful) to coordination (high-leverage), and the cheap tokens handle the volume.

The broader pattern

This isn't specific to CMS migrations. Any AI workflow where a single agent cycles between building and self-correcting has the same dynamic. The model is burning premium tokens on work that a cheaper model could handle, and the correction cycles are degrading the very context that's supposed to keep quality high.

Splitting into roles with model-per-role assignment is essentially the same insight as microservices over monoliths, applied to AI token economics: use the right tool for each job instead of using the most expensive tool for every job.

The practical threshold is simpler than it sounds. If you're running a single agent and spending significant time reviewing and correcting its output, you're paying a premium — in tokens, in context quality, and in your own time — for work that a cheaper model in a validator role could handle.

#Where single-agent tokens actually go

#The model-per-role split

#Early catches kill redo cycles

#The math, roughly

#The broader pattern

Related Posts

Paper rules break first

When one context window isn't enough

I was the validator all along

Where single-agent tokens actually go

The model-per-role split

Early catches kill redo cycles

The math, roughly

The broader pattern