My .md files vs Claude's memory tool: a practitioner comparison

When Anthropic shipped the memory tool for Claude — first as a beta in September 2025, then as auto-memory in Claude Code in February 2026, and finally free for all users in March — I had a strange moment of recognition. The system they'd built was a file-based context layer where agents write structured knowledge to a /memories directory, tag it with frontmatter, and reload selectively at session start. That's the same architecture I'd been running by hand with five markdown files and a starter prompt.

I wrote about this overlap back in February, arguing that my manual system — CLAUDE.md, PROGRESS.md, DECISIONS.md, TECHNICAL.md, HANDOVER.md — maps onto the hierarchical memory patterns described in agent research. Long-term knowledge, episodic memory, working memory. Same concepts, different packaging. What I hadn't done was sit down and actually compare the two approaches in detail.

So here's the comparison. Not "which is better" — that depends on what you're doing — but where each approach has structural advantages, and what the convergence tells us about the problem itself.

The architecture side by side

My system uses five files with distinct roles and update cadences. CLAUDE.md is the project constitution — rarely changes, loaded every session, defines identity and constraints. DECISIONS.md records architectural choices with rationale. PROGRESS.md tracks what's done and what's next. TECHNICAL.md captures implementation details. HANDOVER.md is working memory — overwritten at the end of each session, consumed at the start of the next.

Each file has explicit read/write rules. CLAUDE.md gets read first, always. HANDOVER.md gets written last, always. The others get updated when their content changes. The AI knows which file to consult for which type of question.

Claude's built-in memory uses a MEMORY.md index file that points to individual memory files, each tagged with frontmatter: name, description, type. The types — user, feedback, project, reference — map to different categories of persistent knowledge. Claude reads MEMORY.md at session start, follows links to relevant memories, and writes new ones when it learns something worth keeping.

The structural parallel is obvious. Both systems separate memory into types. Both use an entry point file that gets loaded every session. Both rely on the model to decide what's worth writing and what's worth retrieving. The difference is in who drives the process and how much structure they impose.

Where the built-in memory wins

Automation is the obvious advantage. In my system, I have to explicitly prompt the AI to update files — or build that prompting into a starter template. If I forget to ask for a handover note at the end of a session, the context is gone. If I don't re-anchor on DECISIONS.md before a big architectural choice, the AI might contradict a logged decision.

Claude's auto-memory handles this without prompting. It notices when you've shared something worth remembering — your role, a preference, a project detail — and writes it to a file. It prunes outdated memories. It decides what's relevant to load for a given conversation. The cognitive overhead drops to near zero.

The type system is also cleaner in some ways. My five files evolved organically — TECHNICAL.md and DECISIONS.md overlap more than I'd like. Claude's four types (user, feedback, project, reference) are crisper categories with clear guidance on what goes where and how each should be structured. The frontmatter schema means memories are machine-parseable, not just human-readable prose.

And for teams, the built-in system scales better. My approach requires every team member to understand and follow the same file conventions. The built-in memory just works — no onboarding, no discipline required.

Where manual files still win

The biggest advantage of my system is separation of concerns. I have five files because different types of context have fundamentally different lifespans and access patterns. CLAUDE.md is constitutional — it defines what the project is and what constraints apply. It changes maybe once a month. HANDOVER.md is ephemeral — it gets rewritten every session. Mixing those into a flat directory of memory files loses something important: the distinction between "what this project is" and "what I was doing five minutes ago."

In practice, this means my system degrades more gracefully under pressure. When a session gets long and context is tight, the AI knows to prioritize CLAUDE.md over PROGRESS.md because the file hierarchy encodes importance. A flat memory directory with equal-weight files doesn't have that built-in triage.

Visibility is the second advantage. I can open DECISIONS.md in any text editor and see exactly what the AI should know about my architectural choices. If the AI contradicts a logged decision, I know immediately whether the file was loaded or whether the model just didn't attend to it. That debuggability matters enormously when things go wrong.

The built-in memory is more opaque. You can inspect memory files, but the system decides what's relevant to load for a given session. If Claude forgets something that's in a memory file, you don't know whether the file wasn't loaded, was loaded but not attended to, or was loaded and outweighed by something else. The debugging surface is smaller.

Cross-model portability is the third. My .md files work with any model that can read files — Claude, Gemini, GPT, a local model. They're plain text with no vendor lock-in. Claude's memory tool is Claude-specific. If I switch to a different model for a task, my file-based context comes with me. The built-in memory stays behind.

And then there's the granularity of control. In my system, I decide exactly when context gets written, what format it takes, and what gets loaded when. I can have the AI generate a detailed handover for a complex session and a brief one for a simple session. I can restructure DECISIONS.md when it gets unwieldy. I can delete a section I know is outdated without worrying about cascading effects. The manual approach is more work, but the work is precisely the kind of context curation that produces better results.

Where they overlap — and what that means

The fact that both systems converged on the same basic architecture — typed files, an index, explicit read/write protocols, persistence outside the context window — says something about the problem itself. There aren't many ways to solve "the model forgets things between sessions." You externalize state. You structure it. You reload strategically. Whether you do that with hand-maintained markdown or an automated tool, the physics of the problem pushes you toward the same shape.

What's interesting is that Anthropic's engineering blog on effective context management describes a pattern called "structured note-taking" — agents maintain memory files during sessions and reload them on resumption. That's precisely what my starter prompt does. The terminology is different, the implementation is more polished, but the principle is one I was applying months before I read their documentation.

I don't say this to claim priority. The agent memory research that informed Anthropic's design predates my blog by years. The point is that practitioners doing serious work with AI tools will independently discover these patterns because the failure modes force you toward them. Context rots. Sessions end. Decisions get lost. You start writing things down. You structure what you write. You build protocols around when to read and when to write. And eventually you look up and realize you've built a memory architecture.

When to use which

For most people starting out with AI-assisted work, the built-in memory is the right choice. It works out of the box, requires no discipline, and handles the common cases well. If your needs are "remember my preferences and project context across sessions," the product feature does that.

For practitioners running complex, multi-session projects with strong architectural constraints — the kind of work where a forgotten decision in session twelve causes a cascade of errors in session fifteen — the manual approach is worth the overhead. The explicit structure, the debuggability, the separation of constitutional context from ephemeral state — these pay dividends that scale with project complexity.

The best answer, of course, is both. I use CLAUDE.md as a project constitution that I maintain by hand, and I let the auto-memory handle the accumulation of smaller insights across sessions. The manual files set the floor — the AI always knows the project's identity, constraints, and architecture. The automatic memory fills in the gaps — preferences I didn't think to document, patterns Claude noticed in my workflow, feedback I gave that should carry forward.

That layered approach — manual for the important stuff, automatic for everything else — is probably where most serious practitioners will end up. It's the same principle as the context management series I've been writing all along: the AI is a brilliant collaborator with a specific kind of amnesia, and the best results come from working with that limitation deliberately rather than hoping the tooling will handle it all.

The tooling has gotten remarkably good. But "remarkably good" and "good enough to stop thinking about it" are still different things.

#The architecture side by side

#Where the built-in memory wins

#Where manual files still win

#Where they overlap — and what that means

#When to use which

Related Posts

Paper rules break first

The context wars: how Gemini, ChatGPT, Claude and Grok manage agent memory in 2026

Your .md files are a memory architecture (you just don't call it that)

The architecture side by side

Where the built-in memory wins

Where manual files still win

Where they overlap — and what that means

When to use which