Most of the failure modes I see in AI-assisted coding are familiar by now. The agent jumps to implementation without mapping what it's about to break. It writes code without writing tests. It starts each session with no memory of the last one. And when something goes wrong in review, there's no evidence trail — you have to re-investigate everything yourself.

The instinct, once you've been burned, is to wrap process around the agent. Phases. State machines. Per-phase output documents. Generators. A dozen or thirty slash commands. I've watched this happen in my own setup more than once: a clean prompt becomes a checklist becomes a workflow becomes a small private framework. The discipline is real. The ceremony eventually eats the productivity.

This post is the smallest version of that discipline I've found that still works for solo product development. I'm calling it Cairn — a stack of stones that marks a trail without dictating your route.

What thin discipline has to solve

Strip away the bureaucracy and there are five problems worth solving on every non-trivial change:

  • Discovery before code. Map blast radius. Check existing tests. Read whatever institutional knowledge already exists about the area. The agent can't know what it doesn't know — and neither can you, until something breaks in review.
  • Memory across sessions. Every prompt starts from zero unless you persist what you've learned. Quirks of a module, APIs that time out, regressions that keep coming back — all of that lives in your head if it lives anywhere.
  • Evidence over assertion. "Verified", "works", "looks good" are claims, not proofs. The cheapest way to keep an AI honest is to require it to show the command and the output before stating a conclusion.
  • A stop condition. Without one, the agent grinds. Two attempts on the same approach is enough; a third is usually a hunch dressed as effort.
  • A trace someone can read. Per-phase documents are paperwork; nobody reads them. The git log and the diff are the only audit trail with a real audience.

These five are inexpensive to enforce. Most of them cost a sentence in your project's CLAUDE.md. The expensive frameworks I've seen built around them charge a process tax on every change, regardless of whether the change is a typo or a feature. For solo work the math doesn't justify it.

Cairn is the workflow layer, not the context layer

The first version of this framework I wrote tried to own everything — workflow protocol, knowledge store, the whole context substrate. That was a mistake. I already had a context substrate: the five-file architecture from my context-aware starter promptCLAUDE.md, PROGRESS.md, DECISIONS.md, TECHNICAL.md, HANDOVER.md. That system already solves what the agent knows. What I needed was a thin layer telling the agent how to use it on every workflow.

So Cairn became exactly that. It assumes the substrate exists (run the starter prompt to bootstrap if it doesn't) and bolts a workflow protocol on top:

File Purpose Cadence
CLAUDE.md Project constitution — stakeholder, objectives, standards Rarely (after discovery)
PROGRESS.md Living build log — done / in progress / next / blocked Every completed task
DECISIONS.md ADR — every constraining technical choice with rationale Every major decision
TECHNICAL.md Implementation details — stack, contracts, data models As architecture evolves
HANDOVER.md Session transition snapshot — overwritten each session End of every session
KNOWLEDGE.md Runtime quirks, fragile modules, observed gotchas Via /learn with evidence

The first five files are owned by the starter-prompt protocol. The sixth, KNOWLEDGE.md, is Cairn's addition — backward-looking gotchas ("this was a problem; here's what fixed it") that don't belong in DECISIONS.md (forward-looking) or TECHNICAL.md (specs). Each Cairn command knows which files to touch, when to read, when to write.

What Cairn ships

A CLAUDE.md section you merge into your project's existing one, a KNOWLEDGE.md template, and four slash commands:

cairn/
  README.md
  CLAUDE.md           ← merge into your project's CLAUDE.md
  KNOWLEDGE.md        ← drop into your repo root
  commands/
    scout.md          ← drop into <repo>/.claude/commands/
    recall.md
    learn.md
    handover.md

The commands:

  • /scout — do the work, with discovery proportional to risk. The agent classifies the task into one of three tiers and declares the tier in its first response. Reads CLAUDE.md + PROGRESS.md + HANDOVER.md at session start. After every task, appends one line to PROGRESS.md. When a constraining decision is made mid-flight, logs it to DECISIONS.md immediately.
  • /recall — surface relevant slices from across the context layer. Reads KNOWLEDGE.md, relevant DECISIONS.md entries, relevant TECHNICAL.md sections, and the active HANDOVER.md. Cite-not-restate; flag stale entries.
  • /learn — append a durable insight, routed to the right destination file. Refuses entries without evidence (commit SHA, PR, test output). Routes runtime quirks to KNOWLEDGE.md, architectural decisions to DECISIONS.md, implementation specs to TECHNICAL.md. Splits and cross-references when an insight spans more than one.
  • /handover — write HANDOVER.md at session end. No exceptions. Active task, what landed, decisions made, constraints carrying forward, concrete next steps, open questions. Overwrites — it's a snapshot, not a log.

That's the whole framework. The discipline is in the merged CLAUDE.md; the commands are the four moments where the protocol fires.

The tier rubric

The tiers are the trick that lets one command cover work that many frameworks split into bug-fix, feature-enhance, and feature-build:

  • T1 (trivial) — typo, rename, comment, format change. No discovery. Just edit.
  • T2 (contained) — fix or change inside one module, where blast radius is obvious from the file. Grep callers, check existing tests, plan in one sentence, edit, run tests.
  • T3 (substantial) — non-trivial bug, new feature, or multi-module change. Run /recall across the context layer, blast-radius map, plan as 3–5 bullets via TaskCreate, re-anchor in DECISIONS.md before the build, implement, red-green tests for new behavior, regression check, /learn if anything durable surfaced.

The agent declares the tier and you can override with --tier N. Most days you'll trust the agent's call. When you don't, the override is one flag.

The point of tiering is to avoid the most common process failure mode: enforcing the same machinery on a typo and a multi-module refactor. A small change should incur small overhead.

The session protocol

Cairn's CLAUDE.md enforces five rules that sit above the per-task work:

  1. Session start reads three files. First response in any conversation reads CLAUDE.md + PROGRESS.md + HANDOVER.md before anything else. Conversational memory does not carry project state forward.
  2. Write as you go. PROGRESS.md after every completed task. DECISIONS.md the moment a constraining decision is made — five-line entries, never deleted, only superseded.
  3. 10–15 exchange checkpoint. Silent self-check: am I still aligned with CLAUDE.md and DECISIONS.md? If unsure, re-read the relevant sections before continuing.
  4. Re-anchor before significant implementation. Re-read the DECISIONS.md entries that constrain the area you're about to touch. Never assume prior context is still active.
  5. Session end writes a handover. /handover runs unconditionally. A clean re-entry next time is worth the two minutes.

These are the directives I've already proven out across long-running projects via the starter prompt. Cairn just makes them load-bearing on every workflow rather than aspirational at the project level.

How `/learn` routes

Insight type Destination Format
Runtime quirk, fragile module, observed gotcha KNOWLEDGE.md what / why / evidence / scope / revisit
Architectural decision, framework choice DECISIONS.md decision / alternatives / rationale / date / supersedes
Implementation spec (contract, schema, model) TECHNICAL.md update the relevant section in place
Build progress PROGRESS.md (auto, by /scout) terse single-line per task

The routing matters because the four destinations have different lifecycles. KNOWLEDGE.md entries get pruned quarterly when their revisit: date hits. DECISIONS.md entries are immutable — supersede with a new entry, don't edit the old one. TECHNICAL.md mutates in place with the architecture. PROGRESS.md is append-only and gets compacted into release notes when a milestone ships. One file per cadence, not one file for everything.

Why `KNOWLEDGE.md` is the runtime-quirks layer

The single biggest design choice is that KNOWLEDGE.md does not duplicate the other context files. It's the backward-looking layer — observed gotchas, fragile modules, things that bit me. Forward-looking decisions go in DECISIONS.md. Implementation contracts go in TECHNICAL.md. Mixing them is what makes persistent AI memory poison itself: when "this approach worked" gets written next to "we will adopt approach X", the agent can't tell observation from intent.

Each KNOWLEDGE.md entry has five fields and a short curation rule set:

### 2026-04-29 — Postgres FK on orders.customer_id is deferrable

- **what:** the foreign key is DEFERRABLE INITIALLY IMMEDIATE
- **why:** the nightly import job inserts orders before customers,
  then commits, deferring the constraint
- **evidence:** PR #142, see migration 0089_orders_fk
- **scope:** db schema, import worker
- **revisit:** 2026-10-29
  • Evidence is required. An entry without a commit SHA, PR, or test output is not knowledge — delete on sight.
  • Quarterly prune. Walk the file. Validate every entry past its revisit: date.
  • Contradictions surface, never overwrite. A new entry replacing an old one is itself a learning.
  • No prophecy. Record what was a problem, not what you think might become one.

What Cairn deliberately does not do

It does not maintain a workflow state file across sessions. It does not generate per-phase output documents. It does not commit, push, or open PRs unless you ask. It does not run a generator that surveys your repo and produces tailored workflow files. It does not add a fifth, sixth, or thirtieth slash command for status, retry, unblock, guide, or classification.

If you find yourself wanting any of these, you've hit a complexity threshold where Cairn is the wrong tool. That's fine — heavier frameworks exist for a reason. They're built for regulated work where audit trails have legal weight, for multi-team standardization across many engineers, for long unattended runs where the agent drives itself for thirty minutes without supervision, and for cross-workflow state that must survive crashes. None of those describe my solo product work. They might describe yours.

Ceremony is not trust

The pitch for heavy SDLC frameworks is that they make AI-assisted development trustworthy. That framing has always slipped past me, because trust in software doesn't come from process documents. It comes from working code, passing tests, code review, and observability. Process helps when those four are weak — but it isn't a substitute, and adding process to a context that already has them just raises the cost without raising the benefit.

For solo work I have my own observability — I review my own diffs. I have my own code review when I want one — a sub-agent that reads the change. My tests are the gate. What I actually needed from a framework was discipline I could carry across sessions — the parts that fade when the context window churns. Discovery before code. Knowledge that compounds. Stop at two attempts. Show your work. Hand over cleanly.

That's Cairn. Four commands. One workflow layer over a context substrate I already trusted. Install into any Claude Code project with one line:

bash
curl -fsSL https://raw.githubusercontent.com/andreas0480/cairn/main/install.sh | bash

The installer is idempotent — drops the four commands into .claude/commands/, creates KNOWLEDGE.md if missing (never overwrites your entries), and appends or updates the Cairn section in your existing CLAUDE.md between marker comments. Re-run any time to pick up updates.

Repo: github.com/andreas0480/cairn — MIT-licensed.