Command Palette

Search for a command to run...

0

Giving Hermes Agent an 'External Enhancement System': Deep Transformation of the Memory System and Skill System

A complete practice record of AI Agent memory management + Skill self-evolution + automated maintenance. Without modifying Hermes core code, using external plugins to make the Agent smarter with use.

Giving Hermes Agent an 'External Enhancement System': Deep Transformation of the Memory System and Skill System

A complete practice record of AI Agent memory management + Skill self-evolution + automated maintenance. Without modifying Hermes core code, using external plugins to make the Agent smarter with use.


TL;DR

DimensionBeforeAfter
Cross-session memoryEach new session starts from scratch5-layer memory + auto-maintenance
Knowledge accumulationExperience dies with the sessionReflection → solutions/ → auto-promotion
Skill management60 skills flat, no quality controlHub layering + quality gates + anti-bloat
Token consumptionLarge skills fully injectedOn-demand loading, 87→56 enabled
AutomationManual maintenanceSession-level/daily/weekly cron full coverage

Part 1: Memory System

1.1 Problem: The Agent's "Goldfish Memory"

Hermes Agent natively has three layers of memory: MEMORY.md (system prompt injection), memory tool (memories/MEMORY.md), and session_search (SQLite conversation history). It looks sufficient on paper, but in practice there are three fatal issues:

ProblemManifestation
Cross-session amnesiaNew session starts, Agent doesn't remember yesterday's debugged bugs or decisions made
Knowledge doesn't accumulatePitfalls get stepped in again; solutions disappear when the session ends
Manual maintenanceMEMORY.md fills up and needs manual cleanup; outdated info never expires automatically

1.2 Our Transformation: 5-Layer Memory Architecture

┌─────────────────────────────────────────────────────────────┐
│  Layer 1: System Prompt MEMORY/USER                          │
│  Always loaded, 2200 char limit, stores most critical facts  │
│  Files: ~/.hermes/MEMORY.md + ~/.hermes/memories/MEMORY.md   │
├─────────────────────────────────────────────────────────────┤
│  Layer 2: fact_store (Holographic Structured Memory)         │
│  Vector embeddings + entity relationships + trust scores     │
│  Supports semantic search, entity probing, compositional reasoning │
├─────────────────────────────────────────────────────────────┤
│  Layer 3: session_search (FTS5 Conversation History)         │
│  Full-text search of past conversations, supports            │
│  discovery/scroll/read/browse four modes                     │
├─────────────────────────────────────────────────────────────┤
│  Layer 4: compound-system (Reflection + Knowledge Base)      │
│  Auto-reflection after tasks, writes to solutions/,          │
│  with track/level three-tier management                      │
├─────────────────────────────────────────────────────────────┤
│  Layer 5: Skill System (Reusable Skills)                     │
│  Auto-extracted from success patterns, loaded on demand,     │
│  anti-bloat                                                  │
└─────────────────────────────────────────────────────────────┘

1.3 compound-system: Teaching the Agent to Reflect

This is the core of the entire memory system. Each time a non-trivial task is completed, the Agent automatically executes a reflection workflow:

Task Complete
    ↓
compound.sh reflect → Determine if reflection is needed
    ↓ Needs reflection
reflect.sh → Call LLM analysis, output structured JSON
    ↓
write-solution.sh → Write to solutions/ directory
    ↓
Next time a similar issue arises → compound.sh search → Hit historical solution

Key Design Decisions:

DecisionChoiceReason
Reflection triggerAuto-determineNot every task deserves reflection, avoid noise
Storage formatMarkdown filesHuman-readable, git-friendly, easy to search
Knowledge gradingworking → session → longtermThree-tier promotion, avoid info explosion
Decay mechanismAuto-archive after 30 daysUnused knowledge cools down automatically
Search methodFTS5 full-text searchRuns locally, no external dependencies

Actual Results:

$ bash compound.sh search "ruff format"
[INFO] Found 3 solution(s) for: ruff format

[1] "ruff format debt causing CI failure"
    File: solutions/bug/ticketpilot-ruff-format-debt-2026-06-21.md
    Track: bug, Level: session

[2] "Must check ruff after sub-agent module split"
    File: solutions/session/ruff-subagent-lesson.md
    Track: session, Level: session

1.4 reflect.sh: LLM-Driven Structured Reflection

Initially reflect.sh only returned raw JSON, and the LLM often just echoed the input without doing any analysis. After multiple iterations:

IterationProblemFix
v1"Return JSON only" prompt too simpleSwitch to structured prompt, explicitly require root_cause/solution/lessons/patterns
v2curl bash quote nesting explosionSwitch to Python urllib
v3log_info polluting stdoutRedirect to stderr
v4write-solution.sh dropping fieldsExtend template to support lessons/patterns
v5Confusion between compound.sh reflect vs reflect.shDocument the distinction: former decides if reflection is needed, latter actually calls LLM

Final pipeline:

# One-command reflection + archiving
bash ~/.hermes/skills/compound-system/scripts/reflect.sh \
    "Fixed QQ Bot heartbeat loop" "success" "low" "" \
  | bash ~/.hermes/skills/compound-system/scripts/write-solution.sh

1.5 Memory Maintenance Automation

Manual maintenance is not sustainable. We built an automated maintenance chain:

TriggerScriptAction
Session endsession-end.shCompress MEMORY.md + archive old solutions + reflect
Daily at 2 AMdaily-maintenance.shDecay check + dedup check + sync check
Weekly at 3 AMweekly-maintenance.shElimination check + merge check + size check

Decay Mechanism Principle:

# memory-decay.py core logic
# Trust score decays exponentially, 30-day half-life
new_trust = old_trust * (0.5 ** (days_elapsed / 30))
# Below threshold → auto-archive to .archive/

Dedup Mechanism:

# memory-dedup.py core logic
# FTS5 full-text search + SequenceMatcher similarity
# Similarity > 0.85 → merge, retain the newer version

1.6 The MEMORY.md Three-File Trap

This was the trickiest discovery during debugging — there are three different MEMORY.md files in the system:

FilePurposeFormatCharacter Limit
~/.hermes/MEMORY.mdSystem prompt injectionStructured markdownNo hard limit
~/.hermes/memories/MEMORY.mdmemory tool operations§ separator format2,200
~/.hermes/memory/MEMORY.mdLegacy, deprecated

The memory tool's add/replace/remove operates on memories/MEMORY.md, not the one injected into the system prompt. When the tool reports "at X/2200 chars," you need to manually edit memories/MEMORY.md to clean up.

1.7 Production Data

MetricBeforeAfter
Cross-session knowledge retention0%~90% (compound-system)
Repeated pitfall rateHighLow (search hits historical solutions)
MEMORY.md maintenanceManualAutomatic (cron + decay)
New session cold start time5-10 minutes1-2 minutes

Part 2: Skill System

2.1 Problem: Skill Bloat

Hermes Agent's skill mechanism essentially injects SKILL.md content into the system prompt. A 300-line skill gets fully injected into every conversation, whether you use it or not.

ProblemImpact
Large number of skills60+ skills, each with descriptions and instructions
Lark series dominates24 Lark skills, never used in QQ chat
No quality controlSome skills exceed 500 lines, cramming in every detail
No elimination mechanismOutdated skills permanently consume tokens

2.2 Transformation 1: Hub + Focused Skill Layering

Core principle: One large skill is worse than multiple small skills.

DimensionOne Large SkillMultiple Small Skills
Token consumptionHigh (always loads everything)Low (on-demand loading)
Attention"Lost in the Middle" effectEach one is concise
MaintenanceChanging one thing affects everythingIndependent updates
ReusabilityHard to reuseComposable

The transformed architecture:

Hub Skill (Index layer, <100 lines)
  context-engineering-hub
  Quick reference + on-demand loading of sub-skills
      ↓ On-demand loading
Focused Skills (Function layer, each <200 lines)
  project-context        File structure, templates
  skill-evolver          Generate skills from success patterns
  context-validation     Verify improvement effectiveness
  self-evolution-system  Complete closed-loop architecture
      ↓
Tool Skills (Support layer)
  memory-orchestrator    Unified memory orchestration
  resilient-web-search   Search API fallback

Real Case: context-engineering Split

Before SplitAfter Split
1 skill, 683 lines4 skills: hub(95 lines) + project-context(148 lines) + token-compression(120 lines) + session-handoff(89 lines)
Always fully loadedLoad corresponding sub-skill on demand
Information overloadEach is concise and focused

2.3 Transformation 2: Quality Gates

Every skill creation/update must pass quality gates:

CheckThresholdAction on Failure
Line count≤200 linesCompress or split
Three-part trapAt least 1Add one
Edge-case three-partRequiredAdd one
Code exampleAt least 1Add one
Pointer referenceDon't embed full documentationChange to pointer

Three-Part Trap Example:

- **psycopg_pool transaction rollback**: `with` block auto-rollbacks on exit → use `autocommit=True` for write operations
- **ruff format debt**: CI runs `ruff format --check` → run `ruff format .` before committing

2.4 Transformation 3: Anti-Bloat Mechanisms

MechanismRuleAction
DecayNot used for 30 daysArchive to .archive/
Merge2+ similar skillsMerge into one
CapExceeds 200 linesCompress
EliminateNot referenced for 6 monthsDelete

2.5 Transformation 4: Smart Skill Injection

Hermes natively loads descriptions of all enabled skills every conversation. Our optimization:

# config.yaml
skills:
  disabled:
    # Lark series (25 skills) — not needed for QQ chat
    - lark-approval
    - lark-apps
    - lark-attendance
    # ... 25 total
    
    # Other unnecessary skills
    - yuanbao
    - honcho
    - hermes-memory-setup
    # ... 6 total
MetricBefore OptimizationAfter Optimization
Enabled skills8756 (-31)
System prompt injection~4600 lines skill descriptions~2800 lines
Estimated tokens/request-2000~3000

2.6 resilient-web-search: Search Fault Tolerance

Tavily API frequently returns 432 (quota exhausted). We built a fallback chain:

web_search (Tavily)
    ↓ On failure
web_search_plus (auto routing)
    ↓ On failure
web_search_plus (explicit provider polling)
    ↓ On failure
Return error + list of attempted providers

This skill is referenced by all search-related cron jobs, ensuring that even if one API goes down, we don't come back empty-handed.

2.7 Skill Optimization Production Data

MetricBefore OptimizationAfter Optimization
Total skills60 (4629 lines)56 (~2200 lines effective injection)
Max single skill648 lines (lark-mail)≤200 lines
On-demand loadingNonehub + focused
Anti-bloatNoneDecay + merge + elimination

Part 3: Automation System

3.1 Cron Job Schedule (UTC+8)

JobScheduleDescription
GitHub Hot DailyDaily 09:00Search trending, organize in Chinese
compound-system-refreshDaily 03:00Refresh solutions index
Unified Memory MaintenanceDaily 04:00MEMORY.md + fact_store check
AgentMemory MaintenanceDaily 19:00MCP server health check
Hermes Config BackupDaily 06:00config.yaml backup
skill-index-rebuildSunday 09:00Rebuild skill index

3.2 Session-Level Automation

Session Start
    ↓
fact_store(action='search') → Look up existing knowledge
    ↓
Start Task
    ↓
Task Complete
    ↓
compound.sh reflect → Reflection
    ↓
session-end.sh → Compress + archive + update stats

3.3 Decoupling Principle

None of the transformations modify Hermes core code. This is the most important design decision:

ComponentLocationRelationship with Hermes
compound-system~/.hermes/skills/compound-system/Independent skill, no core modification
memory-orchestrator~/.hermes/skills/memory-orchestrator/Independent skill
context-engineering-hub~/.hermes/skills/context-engineering-hub/Independent skill
resilient-web-search~/.hermes/skills/resilient-web-search/Independent skill
Maintenance scripts~/.hermes/scripts/Independent scripts
Cron jobsHermes cron systemUsing native scheduling capabilities

Why decouple?

  1. No conflicts when Hermes updates
  2. Can be tested and iterated independently
  3. Can be shared with other users
  4. Low maintenance cost — no need to track upstream changes

Part 4: Design Philosophy

4.1 From Research to Practice

All our decisions are backed by research:

ResearchFindingOur Application
ETH Zurich "Evaluating AGENTS.md"Auto-generation reduces success rate by 3%Manually write skills, don't auto-generate
GitHub 2500+ repo analysisThree-part edge cases most effectiveSkills must have always/ask/never
"Lost in the Middle"Middle info in long contexts gets ignoredSmall skills, on-demand loading
LongbenchModerate compression improves output qualityStructured format over paragraphs
Mem0 2026 Agent MemoryMulti-layer memory architecture is best5-layer memory system

4.2 Core Insights

  1. Reducing tokens ≠ reducing understanding: Structured compression can achieve both simultaneously
  2. Progressive disclosure > full loading: Only load detailed content when needed
  3. Closed-loop validation is key: Each improvement must be validated; don't optimize blindly
  4. Automation is a necessity: Manual maintenance is not sustainable
  5. Decoupling is a principle: Don't depend on the internal implementation of any specific tool

4.3 Directions Still in Iteration

DirectionStatusPriority
SkillOpt integration (auto-training skills)POC complete, 0.825→0.85 targetHigh
ONNX semantic embeddings (fact_store vector search)PlannedMedium
Cross-project skill reuseConcept proofMedium
LLM-driven reflect.sh integrationImplemented

Summary

The core idea behind adding an external enhancement system to Hermes Agent: Don't modify the core; use plugin mechanisms to extend capabilities.

  • Memory system prevents the Agent from forgetting — pitfalls stepped in are remembered, decisions made are retained
  • Skill system organizes knowledge — on-demand loading, quality control, automatic elimination
  • Automation keeps the system running — session-level/daily/weekly three-tier maintenance, no manual intervention needed

All code is under ~/.hermes/skills/, purely external, never touching the Hermes core.


Written on 2026-06-21, based on practice with Hermes Agent + compound-system + memory-orchestrator