Bigger context, worse agents
I've been experimenting with feeding agents larger volumes of context. The pattern is steady: the bigger I make it, the worse it looks. Meanwhile, the rest of the field is racing the other way.
I've been experimenting with larger volumes of context — bigger memory stores, more documents, broader graphs, more sources stitched into a single agent's working memory. The bet was the usual one: more context, better answers.
What I'm finding is the opposite. The bigger I make it, the worse it looks.
Not always, and not catastrophically. But the pattern is steady enough that I've stopped ignoring it. Agents pulling from larger context surfaces miss things that smaller, focused setups catch. They mix up which constraints belong to which step. They draw confident connections between things that happen to sit close in the graph but don't actually relate. They give more authoritative-sounding answers, and more wrong ones.
Meanwhile, the rest of the field is moving the other way. Companies are commissioning enterprise context graphs that try to encode every entity, relationship, and interaction in the business. Vendors are pitching "complete context" platforms as the foundation everyone needs. The whole industry is racing toward bigger.
Most of them are likely about to discover what I keep discovering: at scale, context becomes a liability, not an asset.

What context rot actually feels like
I've written about context rot before at the individual-session level — the way conversations degrade as the window fills, the way models forget the early constraints when the back-half of the window gets noisy. That was already a real effect at twenty-message conversations. It compounds badly when the underlying context source is also large.
The mechanics aren't mysterious. The window may be a million tokens. The model's effective attention isn't. Once you cross a certain density of partially-relevant information, every additional piece dilutes the signal of what actually matters for the current step. The graph that was supposed to give the agent richer reasoning surfaces ends up giving it more ways to be wrong.
The "lost in the middle" papers were the early evidence of this. What I'm watching now is the production version: large retrieval surfaces, large graphs, large memory stores, all producing the same failure mode at a different scale. The agent doesn't refuse to find the right thing. It confidently picks up the wrong-but-adjacent thing and proceeds.
To be fair, the strongest version of the opposing case isn't "stuff everything into the window." It's structured retrieval over a large store, returning targeted subgraphs to the agent. That's a real architecture, and a more interesting one. But it relocates the problem rather than solving it. The variable becomes retrieval quality, and most production retrievers over large heterogeneous stores are bad at exactly the kind of cross-domain queries the graph was supposed to enable in the first place. The graph doesn't fix the retrieval problem. It moves it.
Why the field keeps going wider
The instinct to go bigger is rational on paper. If your agent doesn't have the answer, give it more places to look. If it gets one query wrong, encode more relationships so it gets the next one right. Every individual addition feels like progress.
The trap is that progress on the demo isn't progress on the workflow. A wider context surface looks impressive in a single-query setup where you can pre-curate the question to fit what the graph happens to be good at. It looks much worse over a real day of work, where queries are messy and the agent has to traverse parts of the graph no one pre-curated for.
There's also a procurement gravity to it. "Complete enterprise context" is easier to put on a slide than "deliberately scoped context for specific workflows." One sounds like vision, the other like accounting. The slide wins the meeting, even though the accounting is what actually makes the system work.
Custodians, not collectors
The shift I keep coming back to — and the one I think the field is about to be forced into — is from collecting context to curating it. From volume to relevance. From substrate to scaffolding.
I framed this as custodianship of context in an earlier post, but I was thinking about it mostly as an organizational role. The more I work with larger graphs, the more I think custodianship is also the daily craft. Not a job title, but a posture: actively deciding what context an agent gets, in what order, at what scope, for what step. Pruning aggressively. Treating every additional source as a cost, not a gift.
The agents I trust most right now aren't the ones with the biggest memory. They're the ones with the most disciplined memory — small, scoped, current, owned by someone who knows what's in there and why. The ones with sprawling context surfaces are louder, more impressive in a demo, and quietly less reliable when it matters.
Where this lands
If I'm reading the pattern correctly, the competitive advantage in agentic AI doesn't go to whoever builds the biggest graph. It goes to whoever gets best at deciding what to leave out.
That's an uncomfortable answer for the part of the industry currently betting on completeness as a moat. The bet assumes that more context will eventually translate into better behavior once retrieval and reasoning catch up. My experience so far is the reverse: the gap between window capacity and useful attention isn't closing as fast as the window sizes would suggest. Models are getting better at sounding coherent across more text without necessarily integrating it.
The custodian's job isn't to feed the agent everything it might plausibly need. It's to feed it the smallest set of things that lets it do the next step well — and to keep doing that, deliberately, as the work evolves.
That's not a sexy pitch deck. It's what's been working.