Table of Contents
- Context and Memory Flow Analysis
- Research Findings
- 1. Context Compaction Patterns (Claude Code, Codex CLI, OpenCode)
- 2. JetBrains Research (NeurIPS 2025)
- 3. MemGPT Architecture
- Mapping Research to Our System
- Implemented Solution: Two-Phase Compaction
- Design Decisions
- 1. Preserve Window
- 2. Compaction Prompt (Customizable)
- 3. Storage Format
- 4. Observation Masking (Optional Enhancement)
- Implementation Files
- Design Questions (Resolved)
- Architecture (Reference)
- Related Files Reference
Context and Memory Flow Analysis
Issue #10: Implement Conversation History Trimming Created: 2025-12-06 Status: ✅ Implemented
Note
: This document preserves the original research and design analysis that informed the context compaction implementation. For current implementation details, see Data-Flow-01-Context-Compaction.
Research Findings
1. Context Compaction Patterns (Claude Code, Codex CLI, OpenCode)
Source: Context Compaction Research Gist
| Tool | Manual Trigger | Auto Trigger | Method |
|---|---|---|---|
| Claude Code | /compact |
~95% capacity | LLM summarization |
| Codex CLI | /compact |
Token threshold | Dedicated prompt |
| OpenCode | /compact |
Token threshold | Prune + summarize |
Best Practices:
- Set auto-trigger at 85-90% (not 95%) to avoid edge cases
- Prune tool outputs before summarization
- Warn users about accuracy degradation after multiple compactions
- Allow disabling auto-compaction
- Support custom summarization prompts
Effective Compaction Prompt Should Include:
- Completed work
- Current state
- In-progress tasks
- Next steps
- Constraints
- Critical context
2. JetBrains Research (NeurIPS 2025)
Source: Efficient Context Management
Two Approaches:
| Approach | What It Does | Pros | Cons |
|---|---|---|---|
| LLM Summarization | Compress entire history | Infinite scaling | Expensive, loses termination signals |
| Observation Masking | Hide old observations with placeholders | Efficient, preserves reasoning | Limited compression |
Key Finding: Observation masking often outperforms summarization in efficiency and reliability.
Hybrid Approach: Combine masking with occasional summarization for best results.
3. MemGPT Architecture
Source: MemGPT Paper (arXiv:2310.08560)
┌─────────────────────────────────────────────────────────────────┐
│ MAIN CONTEXT (RAM) │
├─────────────────────────────────────────────────────────────────┤
│ System Instructions (fixed) │
│ Working Context (key facts, preferences - read/write) │
│ FIFO Message Queue (conversation history) │
└─────────────────────────────────────────────────────────────────┘
│
overflow triggers
▼
┌─────────────────────────────────────────────────────────────────┐
│ Queue Manager: Detects overflow → Eviction + Summarization │
└─────────────────────────────────────────────────────────────────┘
│
evicted messages
▼
┌─────────────────────────────────────────────────────────────────┐
│ EXTERNAL CONTEXT (Disk) │
├─────────────────────────────────────────────────────────────────┤
│ Archival storage (searchable) │
│ Evicted message summaries │
└─────────────────────────────────────────────────────────────────┘
Key Insight: MemGPT separates "working context" (explicitly managed facts) from "message queue" (FIFO history).
Mapping Research to Our System
| Research Concept | Our Equivalent | Status |
|---|---|---|
| Main context | LLM prompt (built in build_llm_messages) |
✅ Implemented |
| Working context | character.db.entity_profiles + journal |
✅ Implemented |
| Message queue | script.db.conversation_history |
✅ Managed (compaction) |
| External context | Mem0 semantic memory | ✅ Implemented |
| Queue manager | sleep/compaction.py |
✅ Implemented |
| Eviction + summarization | compact_conversation_history() |
✅ Implemented |
| Observation masking | — | ❌ Not implemented |
Implemented Solution: Two-Phase Compaction
Phase 1: Sleep-Based Compaction (Primary Path)
During run_sleep_tick(), after memory consolidation:
1. Calculate conversation_history token count
2. If tokens > SLEEP_COMPACT_THRESHOLD (e.g., 50% of max_context_tokens):
a. Identify messages outside "preserve window" (oldest messages)
b. Generate summary using LLM with compaction prompt
c. Store summary as journal entry (type="context_synthesis")
d. Replace old messages with single [CONTEXT SUMMARY] message
e. Delete original messages from history
Why sleep is ideal:
- LLM is already "offline" - no mid-conversation disruption
- Memory consolidation just ran - facts are safe in Mem0
- Natural checkpoint for state compression
Phase 2: Emergency Compaction (Fallback)
At start of each tick, before build_llm_messages():
1. Check token pressure
2. If tokens > EMERGENCY_THRESHOLD (e.g., 80% of max_context_tokens):
a. Log warning about forced compaction
b. Run compaction immediately (same as sleep path)
c. Continue with tick
Why this is a fallback:
- Context pressure should rarely reach this level if sleep compaction works
- Forces compaction even during active conversations
- May lose some facts that haven't been journaled yet
Design Decisions
1. Preserve Window
Keep the last N messages (or N tokens) intact to maintain conversation coherence.
- Suggested: Last 20 messages OR 20% of max_context_tokens, whichever is larger
2. Compaction Prompt (Customizable)
Summarize the following conversation history. Focus on:
- Key facts learned about entities, locations, and events
- Decisions made and actions taken
- Current state and ongoing tasks
- Important context for future interactions
Do NOT include:
- Routine greetings or small talk
- Step-by-step tool execution details
- Redundant information
Format as a concise narrative that preserves essential context.
3. Storage Format
Compacted context stored as:
{
"role": "system",
"content": "[CONTEXT SUMMARY]\n<summary text>",
"metadata": {
"type": "compaction",
"compacted_count": 45, # messages summarized
"compacted_at": "2025-12-06T12:00:00Z",
"original_token_count": 15000,
"summary_token_count": 800,
}
}
4. Observation Masking (Optional Enhancement)
For tool outputs specifically, consider masking instead of summarizing:
{
"role": "tool",
"content": "[TOOL OUTPUT ARCHIVED - search history for details]",
"original_tool": "search_memory",
"archived_at": "2025-12-06T12:00:00Z",
}
Implementation Files
| File | Purpose |
|---|---|
sleep/compaction.py |
compact_conversation_history() function |
sleep/__init__.py |
Orchestrates compaction in run_sleep_tick() |
assistant_script.py |
Emergency compaction check, config attributes |
llm_interaction.py |
generate_context_summary() helper |
tests/test_context_compaction.py |
Compaction unit tests |
tests/test_pre_compaction_extraction.py |
Pre-compaction journaling tests |
Design Questions (Resolved)
- Token counting cost: Count on-demand with tiered fallback (toksum → tiktoken → heuristic)
- Summary model: Uses same model as main LLM for consistency
- Compaction history: Summaries stored in journal as
synthesisentries - User visibility: Compaction metadata stored in journal entries, visible via
aihistory
Architecture (Reference)
Data Stores (Three Tiers)
┌─────────────────────────────────────────────────────────────────────────┐
│ PERSISTENT STORAGE │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────────────────┐ ┌──────────────────────────────────┐ │
│ │ script.db.conversation_ │ │ character.db.journal │ │
│ │ history │ │ │ │
│ │ │ │ entries: [{id, content, │ │
│ │ [{role, content}, ...] │ │ importance, timestamp, │ │
│ │ │ │ source_type, tags...}] │ │
│ │ ✅ Managed via │ │ │ │
│ │ compaction at 70%/80% │ │ consolidated_entry_ids: [] │ │
│ │ token thresholds │ │ │ │
│ └──────────────────────────┘ └──────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────┘
Message Flow: Write Path
User Event / Tool Result / LLM Response
│
▼
┌─────────────────────────────────┐
│ script.db.conversation_history │ ← Messages appended here
│ .append({role, content}) │
└─────────────────────────────────┘
│
│ Compaction triggers:
│ - Sleep phase: 70% threshold
│ - Emergency: 80% threshold
▼
[Controlled Growth]
Related Files Reference
| File | Purpose |
|---|---|
assistant_script.py |
History init, compaction config |
llm_interaction.py |
Token-budgeted history loading, summary generation |
tool_execution.py |
History append on tool calls |
sleep/__init__.py |
Sleep tick orchestration |
sleep/compaction.py |
compact_conversation_history() |
sleep/consolidation.py |
Journal → Mem0 consolidation |
tools/journal.py |
AddJournalEntryTool |
importance_scoring.py |
Heuristic and LLM scoring |
Last updated: 2025-12-09