Separation of concerns in AI agent teams: lessons from multi-disciplinary delivery (and where they break)
We've been solving the 'too complex for one person' problem for decades in program delivery. Now some team members are language models — and the coordination overhead is measured in tokens and meetings.
We've been here before.
Not with AI agents specifically, but with the underlying problem: work that's too complex for a single person or discipline to carry end-to-end. When digital programs grew beyond what one PM or tech lead could reasonably own, we invented cross-functional squads, domain teams, and program structures. Roles, responsibilities, and Conway's Law quietly shaped how work flowed.
Now we're doing it again. Some "team members" are language models, some are humans, and the coordination overhead is measured in tokens and meetings.
As someone running multi-disciplinary programs for enterprise clients — and more recently experimenting hands-on with agent teams in my own side projects — I find the parallels to modern delivery org design stronger than the pure software analogy. But the differences are where it gets interesting.
Separation of concerns as role and decision design
In software, separation of concerns answers "who is responsible for what?" at the code level. In programs, it's the same question at the level of roles, decisions, and outcomes.
On any serious initiative, you don't want your UX designer carrying data migration risk, your solution architect quietly becoming the product owner, or your QA function owning scope decisions because they're the only ones who see the whole thing at the end.
We fix that with domains, clear mandates, and operating models. The same thinking applies — powerfully — to AI agent teams.
In my own experiments, this clicked once I stopped treating the agent setup as an abstract architecture problem and started using it for my personal projects: things like CMS migrations and general development workflows.
At first, I did what most people do: I built a single, over-qualified "orchestrator" agent and asked it to do everything end-to-end. It could technically handle discovery, transformation, and validation, but debugging it was painful and failures were opaque.
When I split that work into a small team of agents, things got much clearer. A Discovery agent that only scans and inventories content — it's the research/BA function: what do we actually have? How messy is it? A Transformation agent that only reshapes content into the target structure or pattern — it plays the solution/design role: how should this look on the new platform? And a Validation agent that only checks against rules, guidelines, and constraints — it's your QA/governance function: does this meet standards?
Each agent has a clear mandate (what decisions it's allowed to make), a bounded context (what it needs to know, and what it explicitly doesn't), and an interface contract (what it consumes and produces).
This matches how we design high-performing multi-disciplinary teams: small, autonomous groups with end-to-end responsibility for a slice of the program, but clear role boundaries inside that group.
The agent-team equivalent of a "two-pizza team" is a three-to-five agent squad, embedded into a human team, with clearly scoped responsibilities and a thin orchestration layer above it.
The benefits are the same as they've always been in program design. Debuggability — when something goes wrong, you know where to look. If migrated content violates brand rules, you start with the Validation agent and its human overseer, not with an opaque 40,000-token context dump from a single mega-agent. Replaceability — you can change how Transformation works without touching Discovery or Validation. The role and interfaces stay stable; the implementation evolves. Parallelism — different concerns can move in parallel. Discovery can scan ahead while Transformation and Validation work through earlier batches. Specialization — a focused Validation agent with the right guardrails and data is more reliable than a generalist "do everything" agent trying to remember brand rules, schemas, and copy tone at the same time. That's why we hire specialists in the first place.
So far, this is familiar: treat agents as team members with clear roles. Where it gets tricky is everything that normally lives in the white space between roles.
Context is not free (why "context as code" matters)
In human teams, context is cheap.
Program managers, tech leads, and designers trade context constantly: side conversations, Slack threads, tribal knowledge. People remember the last steering decision, the stakeholder who changed their mind, the hidden constraint that never made it into the Confluence page.
Agent teams don't get any of that for free.

Every piece of context a downstream agent needs has to be explicitly passed to it or made retrievable. There is no ambient knowledge, no implicit institutional memory. If your Discovery agent produces an inventory that isn't structured the way the Transformation agent needs it, you've created a silent failure that only surfaces as "odd content" or rework much later.
This is where Context as Code becomes critical. Treat organizational knowledge — business rules, editorial standards, design systems, migration logic, schema conventions — as first-class, versioned artifacts. Store them in structured, machine-readable forms (rules, schemas, config, knowledge bases), not just in PowerPoints and prompts. Make them accessible through stable interfaces that both humans and agents use.
In a human-only program, you can get away with "ask Maria, she's been here for years" as your context strategy. In a mixed human/agent team, that path doesn't scale. The program manager's traditional role as "glue" shifts from holding it in their head to making that glue explicit and operational.
Context as Code is, in practice, the institutional memory of your program. It's what lets you plug different agents — or humans — into a workflow without losing the plot every time.
Orchestration as governance, not just routing
In classic distributed systems, we built patterns — message queues, sagas, compensating transactions — because coordination is genuinely hard. In programs, we do the same thing with ceremonies and governance: SteerCos and release boards, Definition of Ready/Done, RACI matrices and approval workflows.
Agent orchestration sits at that intersection. It's not just a routing problem ("send this task to that agent"). It's a governance problem: who checks that work is good enough to move on, and based on what criteria?
The orchestration role in an agent team is essentially a Program Orchestrator. It decides which agent or human should handle which part of the work, ensures each handoff comes with the context needed to succeed, evaluates outputs against quality thresholds before moving work downstream, and escalates ambiguous or risky cases to humans.
That last point matters more than most diagrams suggest. In a typical system, a broken component fails loudly. In an agent system, failure often looks like confident, plausible output that's subtly wrong — at scale.
So orchestration has to embed quality checks and evaluation into the flow, not bolt them on at the end. Use "critic" or "reviewer" patterns where necessary — sometimes agent-driven, sometimes human-driven. Provide traceability: if a content piece is wrong, you can see which agent made which decision with which context.
From a program management perspective, this is just an extension of the usual job: design the operating model so work moves smoothly, quality issues surface early, and responsibility is clear. The difference is that some of the "people" in your RACI are now agents.
Autonomy, alignment, and the "brilliant generalist" trap
In human teams, autonomy and alignment are a constant balancing act. You give a team enough autonomy to move fast, but enough alignment to stay on strategy. You expect people to exercise judgment in edge cases, guided by shared values and context. A lot of program leadership is about tuning that balance over time.
Agents are more literal.

Give an agent too narrow a scope and it will optimize its local goal in ways that violate the spirit of the broader program. Give it too broad a scope and you're back to a single, opaque "everything agent" — the same anti-pattern as dumping all decisions on one overqualified human.
Modern models are "brilliant generalists." They can, in principle, do a bit of everything: discovery, drafting, transforming, validating. That creates a constant pressure to consolidate. "Why not just have one strong agent handle the whole pipeline?" In org design, we've seen this movie before. Teams that rely on one heroic lead PM or architect to "just take care of it all" move quickly at first and become a bottleneck later. The same happens with single-agent systems: they're easy to start and hard to evolve.
The core design question becomes: for each agent, what decisions do we want it to own, what decisions can it recommend, and what decisions must be escalated to humans? That's not a prompt-engineering problem. It's a product/program governance problem.
You might be drifting into an "agent monolith" if most work routes through one central agent "because it knows the most," you can't describe what each agent or human role owns in one clear sentence, every quality issue leads to "make the model smarter" instead of "fix the role or context design," or your main lever is increasing the context window, not improving the operating model.
The fix looks familiar: clarify roles, narrow scopes, push shared knowledge into Context as Code, and deliberately decide where humans stay in the loop.
The structural hurdles worth taking seriously
Even with clean roles and good governance, there are properties of agent teams that don't map cleanly to traditional program management.
Non-determinism. The same input can produce different outputs on different runs. You don't get step-by-step reproducibility in the way you do with a human following a checklist. That pushes you toward statistical validation and acceptance ranges rather than simple checklists.
Hidden failure modes. Humans are usually aware when they're confused and can ask for help. Agents fail silently and confidently. You need deliberate mechanisms — spot checks, evaluation runs, human review on critical paths — to detect this.
Latency and cost as architectural constraints. Each additional agent in a pipeline adds latency and token cost. The cleanest decomposition from a separation-of-concerns perspective might be economically or operationally unviable at scale. Program managers will feel this in their budgets and SLAs.
Organizational readiness. Most governance, sign-off, and risk structures assume human accountability. A "team" that's part human, part agent raises questions: who signs off? How much autonomy do we give an agent? When something goes wrong, is it a tooling issue, a process issue, or an accountability issue?
These are not arguments against agent teams. They're constraints program and product leaders need to design with, not discover late.
What this means for program and product leaders
If you're responsible for outcomes in a world where some team members are agents, software engineering analogies are a helpful starting point — but they're not enough. The real leverage is in treating this as an operating model and governance problem.
Define a mixed role taxonomy. Start with roles, not models. For each major concern in your program — discovery, design, migration, validation, reporting — decide which parts are agent-first, which are human-first, and which are shared. Write role definitions that include both humans and agents where appropriate. Make decision rights explicit: what each role can decide alone, where it needs a counterpart, and when it escalates.
Treat Context as Code as a core asset. Stop thinking of knowledge bases and rule stores as "supporting documentation." Put business rules, content guidelines, schemas, and process constraints into structured, versioned repositories. Make sure both humans and agents rely on the same sources of truth. Plan for ownership: who maintains this over time, and how change control works.
Design orchestration as governance. Don't reduce orchestration to technical routing. Define how work flows between human and agent roles, including review and escalation paths. Embed evaluation into the flow: what gets auto-approved, what gets sampled, what always gets human review. Align ceremonies — stand-ups, demos, steering — with this reality: agents produce artifacts, humans review and steer.
Use separation of concerns to protect autonomy and accountability. Avoid both extremes: one overloaded "super agent" and a chaotic swarm of agents with overlapping scopes. Keep agent scopes tight and aligned with recognizable roles. Give each agent clear success criteria tied to program outcomes, not just local metrics. Be deliberate about where humans stay in the loop and why.
The field will keep moving fast at the tooling level. New orchestrators, new patterns, new models. The underlying design problems, however — how to decompose complex work, how to coordinate independent roles, how to build systems that remain debuggable and governable at scale — are very old.
We already know a lot about how to solve those problems in multi-disciplinary programs. The opportunity is to apply that institutional knowledge deliberately to mixed human/agent teams, before we end up with agent monoliths and governance gaps we'll spend the next decade trying to untangle.