Giving Hermes Agent an 'External Enhancement System': Deep Transformation of the Memory System and Skill System

A complete practice record of AI Agent memory management + Skill self-evolution + automated maintenance. Without modifying Hermes core code, using external plugins to make the Agent smarter with use.

Giving Hermes Agent an 'External Enhancement System': Deep Transformation of the Memory System and Skill System

A complete practice record of AI Agent memory management + Skill self-evolution + automated maintenance. Without modifying Hermes core code, using external plugins to make the Agent smarter with use.

TL;DR

Dimension	Before	After
Cross-session memory	Each new session starts from scratch	5-layer memory + auto-maintenance
Knowledge accumulation	Experience dies with the session	Reflection → solutions/ → auto-promotion
Skill management	60 skills flat, no quality control	Hub layering + quality gates + anti-bloat
Token consumption	Large skills fully injected	On-demand loading, 87→56 enabled
Automation	Manual maintenance	Session-level/daily/weekly cron full coverage

Hermes Agent natively has three layers of memory: MEMORY.md (system prompt injection), memory tool (memories/MEMORY.md), and session_search (SQLite conversation history). It looks sufficient on paper, but in practice there are three fatal issues:

Problem	Manifestation
Cross-session amnesia	New session starts, Agent doesn't remember yesterday's debugged bugs or decisions made
Knowledge doesn't accumulate	Pitfalls get stepped in again; solutions disappear when the session ends
Manual maintenance	MEMORY.md fills up and needs manual cleanup; outdated info never expires automatically

1.2 Our Transformation: 5-Layer Memory Architecture

┌─────────────────────────────────────────────────────────────┐
│  Layer 1: System Prompt MEMORY/USER                          │
│  Always loaded, 2200 char limit, stores most critical facts  │
│  Files: ~/.hermes/MEMORY.md + ~/.hermes/memories/MEMORY.md   │
├─────────────────────────────────────────────────────────────┤
│  Layer 2: fact_store (Holographic Structured Memory)         │
│  Vector embeddings + entity relationships + trust scores     │
│  Supports semantic search, entity probing, compositional reasoning │
├─────────────────────────────────────────────────────────────┤
│  Layer 3: session_search (FTS5 Conversation History)         │
│  Full-text search of past conversations, supports            │
│  discovery/scroll/read/browse four modes                     │
├─────────────────────────────────────────────────────────────┤
│  Layer 4: compound-system (Reflection + Knowledge Base)      │
│  Auto-reflection after tasks, writes to solutions/,          │
│  with track/level three-tier management                      │
├─────────────────────────────────────────────────────────────┤
│  Layer 5: Skill System (Reusable Skills)                     │
│  Auto-extracted from success patterns, loaded on demand,     │
│  anti-bloat                                                  │
└─────────────────────────────────────────────────────────────┘

1.3 compound-system: Teaching the Agent to Reflect

This is the core of the entire memory system. Each time a non-trivial task is completed, the Agent automatically executes a reflection workflow:

Task Complete
    ↓
compound.sh reflect → Determine if reflection is needed
    ↓ Needs reflection
reflect.sh → Call LLM analysis, output structured JSON
    ↓
write-solution.sh → Write to solutions/ directory
    ↓
Next time a similar issue arises → compound.sh search → Hit historical solution

Key Design Decisions:

Decision	Choice	Reason
Reflection trigger	Auto-determine	Not every task deserves reflection, avoid noise
Storage format	Markdown files	Human-readable, git-friendly, easy to search
Knowledge grading	working → session → longterm	Three-tier promotion, avoid info explosion
Decay mechanism	Auto-archive after 30 days	Unused knowledge cools down automatically
Search method	FTS5 full-text search	Runs locally, no external dependencies

Actual Results:

$ bash compound.sh search "ruff format"
[INFO] Found 3 solution(s) for: ruff format

[1] "ruff format debt causing CI failure"
    File: solutions/bug/ticketpilot-ruff-format-debt-2026-06-21.md
    Track: bug, Level: session

[2] "Must check ruff after sub-agent module split"
    File: solutions/session/ruff-subagent-lesson.md
    Track: session, Level: session

1.4 reflect.sh: LLM-Driven Structured Reflection

Initially reflect.sh only returned raw JSON, and the LLM often just echoed the input without doing any analysis. After multiple iterations:

Iteration	Problem	Fix
v1	"Return JSON only" prompt too simple	Switch to structured prompt, explicitly require root_cause/solution/lessons/patterns
v2	curl bash quote nesting explosion	Switch to Python urllib
v3	log_info polluting stdout	Redirect to stderr
v4	write-solution.sh dropping fields	Extend template to support lessons/patterns
v5	Confusion between compound.sh reflect vs reflect.sh	Document the distinction: former decides if reflection is needed, latter actually calls LLM

Final pipeline:

# One-command reflection + archiving
bash ~/.hermes/skills/compound-system/scripts/reflect.sh \
    "Fixed QQ Bot heartbeat loop" "success" "low" "" \
  | bash ~/.hermes/skills/compound-system/scripts/write-solution.sh

1.5 Memory Maintenance Automation

Manual maintenance is not sustainable. We built an automated maintenance chain:

Trigger	Script	Action
Session end	session-end.sh	Compress MEMORY.md + archive old solutions + reflect
Daily at 2 AM	daily-maintenance.sh	Decay check + dedup check + sync check
Weekly at 3 AM	weekly-maintenance.sh	Elimination check + merge check + size check

Decay Mechanism Principle:

# memory-decay.py core logic
# Trust score decays exponentially, 30-day half-life
new_trust = old_trust * (0.5 ** (days_elapsed / 30))
# Below threshold → auto-archive to .archive/

Dedup Mechanism:

# memory-dedup.py core logic
# FTS5 full-text search + SequenceMatcher similarity
# Similarity > 0.85 → merge, retain the newer version

1.6 The MEMORY.md Three-File Trap

This was the trickiest discovery during debugging — there are three different MEMORY.md files in the system:

File	Purpose	Format	Character Limit
`~/.hermes/MEMORY.md`	System prompt injection	Structured markdown	No hard limit
`~/.hermes/memories/MEMORY.md`	memory tool operations	§ separator format	2,200
`~/.hermes/memory/MEMORY.md`	Legacy, deprecated	—	—

The memory tool's add/replace/remove operates on memories/MEMORY.md, not the one injected into the system prompt. When the tool reports "at X/2200 chars," you need to manually edit memories/MEMORY.md to clean up.

1.7 Production Data

Metric	Before	After
Cross-session knowledge retention	0%	~90% (compound-system)
Repeated pitfall rate	High	Low (search hits historical solutions)
MEMORY.md maintenance	Manual	Automatic (cron + decay)
New session cold start time	5-10 minutes	1-2 minutes

Part 2: Skill System

2.1 Problem: Skill Bloat

Hermes Agent's skill mechanism essentially injects SKILL.md content into the system prompt. A 300-line skill gets fully injected into every conversation, whether you use it or not.

Problem	Impact
Large number of skills	60+ skills, each with descriptions and instructions
Lark series dominates	24 Lark skills, never used in QQ chat
No quality control	Some skills exceed 500 lines, cramming in every detail
No elimination mechanism	Outdated skills permanently consume tokens

2.2 Transformation 1: Hub + Focused Skill Layering

Core principle: One large skill is worse than multiple small skills.

Dimension	One Large Skill	Multiple Small Skills
Token consumption	High (always loads everything)	Low (on-demand loading)
Attention	"Lost in the Middle" effect	Each one is concise
Maintenance	Changing one thing affects everything	Independent updates
Reusability	Hard to reuse	Composable

The transformed architecture:

Hub Skill (Index layer, <100 lines)
  context-engineering-hub
  Quick reference + on-demand loading of sub-skills
      ↓ On-demand loading
Focused Skills (Function layer, each <200 lines)
  project-context        File structure, templates
  skill-evolver          Generate skills from success patterns
  context-validation     Verify improvement effectiveness
  self-evolution-system  Complete closed-loop architecture
      ↓
Tool Skills (Support layer)
  memory-orchestrator    Unified memory orchestration
  resilient-web-search   Search API fallback

Real Case: context-engineering Split

Before Split	After Split
1 skill, 683 lines	4 skills: hub(95 lines) + project-context(148 lines) + token-compression(120 lines) + session-handoff(89 lines)
Always fully loaded	Load corresponding sub-skill on demand
Information overload	Each is concise and focused

2.3 Transformation 2: Quality Gates

Every skill creation/update must pass quality gates:

Check	Threshold	Action on Failure
Line count	≤200 lines	Compress or split
Three-part trap	At least 1	Add one
Edge-case three-part	Required	Add one
Code example	At least 1	Add one
Pointer reference	Don't embed full documentation	Change to pointer

Three-Part Trap Example:

- **psycopg_pool transaction rollback**: `with` block auto-rollbacks on exit → use `autocommit=True` for write operations
- **ruff format debt**: CI runs `ruff format --check` → run `ruff format .` before committing

2.4 Transformation 3: Anti-Bloat Mechanisms

Mechanism	Rule	Action
Decay	Not used for 30 days	Archive to `.archive/`
Merge	2+ similar skills	Merge into one
Cap	Exceeds 200 lines	Compress
Eliminate	Not referenced for 6 months	Delete

2.5 Transformation 4: Smart Skill Injection

Hermes natively loads descriptions of all enabled skills every conversation. Our optimization:

# config.yaml
skills:
  disabled:
    # Lark series (25 skills) — not needed for QQ chat
    - lark-approval
    - lark-apps
    - lark-attendance
    # ... 25 total
    
    # Other unnecessary skills
    - yuanbao
    - honcho
    - hermes-memory-setup
    # ... 6 total

Metric	Before Optimization	After Optimization
Enabled skills	87	56 (-31)
System prompt injection	~4600 lines skill descriptions	~2800 lines
Estimated tokens/request	—	-2000~3000

2.6 resilient-web-search: Search Fault Tolerance

Tavily API frequently returns 432 (quota exhausted). We built a fallback chain:

web_search (Tavily)
    ↓ On failure
web_search_plus (auto routing)
    ↓ On failure
web_search_plus (explicit provider polling)
    ↓ On failure
Return error + list of attempted providers

This skill is referenced by all search-related cron jobs, ensuring that even if one API goes down, we don't come back empty-handed.

2.7 Skill Optimization Production Data

Metric	Before Optimization	After Optimization
Total skills	60 (4629 lines)	56 (~2200 lines effective injection)
Max single skill	648 lines (lark-mail)	≤200 lines
On-demand loading	None	hub + focused
Anti-bloat	None	Decay + merge + elimination

Part 3: Automation System

3.1 Cron Job Schedule (UTC+8)

Job	Schedule	Description
GitHub Hot Daily	Daily 09:00	Search trending, organize in Chinese
compound-system-refresh	Daily 03:00	Refresh solutions index
Unified Memory Maintenance	Daily 04:00	MEMORY.md + fact_store check
AgentMemory Maintenance	Daily 19:00	MCP server health check
Hermes Config Backup	Daily 06:00	config.yaml backup
skill-index-rebuild	Sunday 09:00	Rebuild skill index

3.2 Session-Level Automation

Session Start
    ↓
fact_store(action='search') → Look up existing knowledge
    ↓
Start Task
    ↓
Task Complete
    ↓
compound.sh reflect → Reflection
    ↓
session-end.sh → Compress + archive + update stats

3.3 Decoupling Principle

None of the transformations modify Hermes core code. This is the most important design decision:

Component	Location	Relationship with Hermes
compound-system	`~/.hermes/skills/compound-system/`	Independent skill, no core modification
memory-orchestrator	`~/.hermes/skills/memory-orchestrator/`	Independent skill
context-engineering-hub	`~/.hermes/skills/context-engineering-hub/`	Independent skill
resilient-web-search	`~/.hermes/skills/resilient-web-search/`	Independent skill
Maintenance scripts	`~/.hermes/scripts/`	Independent scripts
Cron jobs	Hermes cron system	Using native scheduling capabilities

Why decouple?

No conflicts when Hermes updates
Can be tested and iterated independently
Can be shared with other users
Low maintenance cost — no need to track upstream changes

Part 4: Design Philosophy

4.1 From Research to Practice

All our decisions are backed by research:

Research	Finding	Our Application
ETH Zurich "Evaluating AGENTS.md"	Auto-generation reduces success rate by 3%	Manually write skills, don't auto-generate
GitHub 2500+ repo analysis	Three-part edge cases most effective	Skills must have always/ask/never
"Lost in the Middle"	Middle info in long contexts gets ignored	Small skills, on-demand loading
Longbench	Moderate compression improves output quality	Structured format over paragraphs
Mem0 2026 Agent Memory	Multi-layer memory architecture is best	5-layer memory system

4.2 Core Insights

Reducing tokens ≠ reducing understanding: Structured compression can achieve both simultaneously
Progressive disclosure > full loading: Only load detailed content when needed
Closed-loop validation is key: Each improvement must be validated; don't optimize blindly
Automation is a necessity: Manual maintenance is not sustainable
Decoupling is a principle: Don't depend on the internal implementation of any specific tool

4.3 Directions Still in Iteration

Direction	Status	Priority
SkillOpt integration (auto-training skills)	POC complete, 0.825→0.85 target	High
ONNX semantic embeddings (fact_store vector search)	Planned	Medium
Cross-project skill reuse	Concept proof	Medium
LLM-driven reflect.sh integration	Implemented	✅

Summary

The core idea behind adding an external enhancement system to Hermes Agent: Don't modify the core; use plugin mechanisms to extend capabilities.

Memory system prevents the Agent from forgetting — pitfalls stepped in are remembered, decisions made are retained
Skill system organizes knowledge — on-demand loading, quality control, automatic elimination
Automation keeps the system running — session-level/daily/weekly three-tier maintenance, no manual intervention needed

All code is under ~/.hermes/skills/, purely external, never touching the Hermes core.

Written on 2026-06-21, based on practice with Hermes Agent + compound-system + memory-orchestrator

Giving Hermes Agent an 'External Enhancement System': Deep Transformation of the Memory System and Skill System

Giving Hermes Agent an 'External Enhancement System': Deep Transformation of the Memory System and Skill System

TL;DR

Part 1: Memory System

1.1 Problem: The Agent's "Goldfish Memory"

1.2 Our Transformation: 5-Layer Memory Architecture

1.3 compound-system: Teaching the Agent to Reflect

1.4 reflect.sh: LLM-Driven Structured Reflection

1.5 Memory Maintenance Automation

1.6 The MEMORY.md Three-File Trap

1.7 Production Data

Part 2: Skill System

2.1 Problem: Skill Bloat

2.2 Transformation 1: Hub + Focused Skill Layering

2.3 Transformation 2: Quality Gates

2.4 Transformation 3: Anti-Bloat Mechanisms

2.5 Transformation 4: Smart Skill Injection

2.6 resilient-web-search: Search Fault Tolerance

2.7 Skill Optimization Production Data

Part 3: Automation System

3.1 Cron Job Schedule (UTC+8)

3.2 Session-Level Automation

3.3 Decoupling Principle

Part 4: Design Philosophy

4.1 From Research to Practice

4.2 Core Insights

4.3 Directions Still in Iteration

Summary