Command Palette

Search for a command to run...

0

AI Agent Amnesia: Every New Session Starts From Zero

> Claude Code / Codex / Cursor starts as a blank slate every session. It's not the model's fault — you haven't given it a memory system. A three-layer architecture + lifecycle hooks to make agents remember.

AI Agent Amnesia: Every New Session Starts From Zero

Claude Code / Codex / Cursor starts as a blank slate every session. It's not that the model isn't capable enough — you just haven't given it a memory system.


I've been using AI coding assistants for almost a year. At first I was amazed. Gradually, I got frustrated.

Not because they can't write code — but because every new session, they're like a new hire on day one:

  • Don't know the project structure → grep for half an hour
  • Don't know the tech choices → generated code doesn't match existing style
  • Don't know what was changed last time → redo work, re-introduce fixed bugs
  • Don't know cross-project dependencies → change backend API, frontend doesn't sync, CI breaks

Until I realized: the problem isn't that AI isn't smart enough. It's that I hadn't given it a context engineering system.

Why Can't Agents Remember?

Two 2026 studies provided the answer.

ETH Zurich's ICSE 2026 paper found: Giving an agent a long, generic AGENTS.md actually decreased task success rates by 2-3% while increasing costs by 20%+. Too much "nice to know" info pushed out the "need to use" info — the model couldn't prioritize.

GitHub's analysis of 2,500+ repositories found: Most AGENTS.md failures aren't technical limitations — they're vagueness.

// ❌ Anti-pattern: prose paragraphs
"We value code quality and follow TDD principles.
Please ensure all changes are properly tested."

// ✅ Correct: command-first
## Commands
# Test
uv run pytest tests/ -v --cov=80%

# Format
uv run ruff format . && ruff check --fix

The first example gets ignored. "Value code quality" is a human value, not a machine instruction. The second is exactly what an agent needs — a concrete shell command.

Three-Layer Memory Architecture

Based on these findings, I built a three-layer workspace memory architecture — from passive to active, each layer solving a different aspect of agent amnesia.

Layer 1: Static Context Files
  AGENTS.md + HANDOVER.md + ADR
  → Read by agent, tells it about the project

Layer 2: Knowledge Graph Engine
  CodeGraph / GitNexus (MCP)
  → Lets the agent discover code dependencies on its own

Layer 3: Lifecycle Hooks
  SessionStart → PreToolUse → SessionEnd
  → Fires automatically, no "remembering" needed

Layer 1: Static Files

Three files per project:

FilePurposeSize Limit
AGENTS.mdTech stack + commands + constraints≤150 lines
HANDOVER.mdSession log + changelog80 lines auto-archive
docs/decisions/ADR-YYYYMMDDArchitecture decision recordsOne decision per file

Multi-project workspaces get an index layer:

workspace/
├── AGENTS.md          ← Project map: what projects exist, dependencies
└── shared/            ← Cross-project docs
    ├── api-contracts.md
    └── architecture-overview.md

service-a/
├── AGENTS.md          ← Tech guide
├── HANDOVER.md        ← Session log (cross-session memory)
└── docs/decisions/    ← ADR decisions

The golden rule for AGENTS.md: Commands first, no prose, numbered priority constraints.

// ❌ Bad (agent ignores)
"We prefer async programming patterns with proper error handling."

// ✅ Good (agent acts on)
## Constraints (by priority)
1. All API keys from .env, never hardcode
2. DB migrations must be additive only
3. Test coverage ≥ 80%

HANDOVER.md is the agent's cross-session memory:

# HANDOVER
 
## Current Goal
Implement user registration API
 
## Changelog
| Date | Type | Scope | Description |
|------|------|-------|-------------|
| 2026-06-25 | Added | auth | Email verification |
 
## Completed
- [x] 2026-06-25 User registration API
 
## In Progress
- [ ] 2026-06-26 OAuth login — token refresh pending
 
## Key Decisions
| Date | Decision | Reason |
|------|----------|--------|
| 2026-06-25 | FastAPI over Flask | Native async + auto OpenAPI |

Layer 2: Knowledge Graph

AGENTS.md covers "project background," but for "this function change affects 47 callers," static files aren't enough.

CodeGraph (47.4k ★, MIT) and GitNexus (42k ★) are two breakout open-source projects from 2026. They parse your entire codebase with tree-sitter, pre-index imports, calls, and class hierarchies into a local database, and expose it to agents via MCP.

Benchmarks:

  • CodeGraph: 58-70% fewer agent tool calls
  • GitNexus: 88% fewer tool calls in a 17-agent production audit
Agent receives task: modify handleLogin function
  → queries CodeGraph: "who calls handleLogin?"
  → finds 3 routes + 1 middleware depend on it
  → plans modification order → changes + runs tests
  → passes first time, nothing broken

Layer 3: Lifecycle Hooks (Most Critical)

The first two layers tell the agent "what to do" — and the agent might or might not comply. Hooks are deterministic: they always execute.

Claude Code supports 6 hook events (SessionStart, UserPromptSubmit, PreToolUse, PostToolUse, PreCompact, SessionEnd). We only need 3:

SessionStart hook — Runs automatically when a session boots.

  • Reads HANDOVER.md and displays where we left off
  • Checks environment: AGENTS.md exists? CLAUDE.md symlink intact?
  • Warns if HANDOVER.md hasn't been updated in 14+ days

PreToolUse hook — Fires before file writes.

  • Detects multi-file edits → reminds to query CodeGraph for blast radius
  • Detects .env file access → prevents secret leaks

SessionEnd hook — Fires when the session ends.

  • Extracts change summary from git diff
  • Appends changelog to HANDOVER.md
  • Records branch name, file list, change stats

This means: the agent doesn't need to "remember" to update the handover — the SessionEnd hook does it automatically.

# Auto-appended by SessionEnd hook
 
## Session End: 2026-06-26 15:30
- Branch: feat/user-auth
- Uncommitted files: 5
- Changes:
   src/api/auth.py     | 45 +++++++++++++++++++
   tests/test_auth.py  | 78 +++++++++++++++++++++++++++++++++++++

Anti-rot Measures

AGENTS.md files rot. The codebase evolves — directories get renamed, scripts change, dependencies get swapped — but AGENTS.md stays the same.

agents-lint is a lightweight tool that detects AGENTS.md rot:

  • Validates every referenced path still exists (directory renamed? file moved?)
  • Checks npm scripts are still valid (npm run test but package.json removed it?)
  • Detects outdated framework patterns (AGENTS.md says Angular @NgModule but project is on standalone?)
  • Cross-file consistency check (AGENTS.md says yarn, CLAUDE.md says npm — conflict)

We added a weekly CI check, because code changes but AGENTS.md doesn't — without automated detection, it silently becomes useless in two months.

The Combined Effect

Initializing a workspace takes one command:

$ pnpm create agent-workspace ./project api-gateway user-service --hooks --ci

Your daily workflow becomes:

WhenWhat happens
New session startsAuto-loads HANDOVER.md, shows where we left off
Before writing codePreToolUse checks blast radius
After writing codePostToolUse logs changes
Session endsSessionEnd auto-writes changelog to HANDOVER.md
git commitpre-commit hook validates AGENTS.md
Every MondayGitHub Actions runs agents-lint for stale detection

The agent is no longer "a blank slate every time" — it's working with yesterday's memory.

Get Started

The full reference architecture is open-source:

Includes:

  • Workspace + per-project AGENTS.md templates
  • HANDOVER.md session log (80-line auto-archive)
  • ADR template (date-based IDs, no multi-project conflicts)
  • 5 Claude Code lifecycle hook scripts
  • agents-lint integration + GitHub Actions CI
  • validate.sh + pre-commit hook
  • CodeGraph / GitNexus / Repomix integration guides

AI agent capability is no longer the bottleneck. The bottleneck is context engineering.
Instead of waiting for bigger models, manage the context you already have.