4 Context and Memory Flow Analysis
Blightbow edited this page 2025-12-09 14:14:16 -05:00

Context and Memory Flow Analysis

Issue #10: Implement Conversation History Trimming Created: 2025-12-06 Status: Implemented

Note

: This document preserves the original research and design analysis that informed the context compaction implementation. For current implementation details, see Data-Flow-01-Context-Compaction.


Research Findings

1. Context Compaction Patterns (Claude Code, Codex CLI, OpenCode)

Source: Context Compaction Research Gist

Tool Manual Trigger Auto Trigger Method
Claude Code /compact ~95% capacity LLM summarization
Codex CLI /compact Token threshold Dedicated prompt
OpenCode /compact Token threshold Prune + summarize

Best Practices:

  • Set auto-trigger at 85-90% (not 95%) to avoid edge cases
  • Prune tool outputs before summarization
  • Warn users about accuracy degradation after multiple compactions
  • Allow disabling auto-compaction
  • Support custom summarization prompts

Effective Compaction Prompt Should Include:

  • Completed work
  • Current state
  • In-progress tasks
  • Next steps
  • Constraints
  • Critical context

2. JetBrains Research (NeurIPS 2025)

Source: Efficient Context Management

Two Approaches:

Approach What It Does Pros Cons
LLM Summarization Compress entire history Infinite scaling Expensive, loses termination signals
Observation Masking Hide old observations with placeholders Efficient, preserves reasoning Limited compression

Key Finding: Observation masking often outperforms summarization in efficiency and reliability.

Hybrid Approach: Combine masking with occasional summarization for best results.


3. MemGPT Architecture

Source: MemGPT Paper (arXiv:2310.08560)

┌─────────────────────────────────────────────────────────────────┐
│                      MAIN CONTEXT (RAM)                          │
├─────────────────────────────────────────────────────────────────┤
│  System Instructions (fixed)                                    │
│  Working Context (key facts, preferences - read/write)          │
│  FIFO Message Queue (conversation history)                      │
└─────────────────────────────────────────────────────────────────┘
                              │
                    overflow triggers
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│  Queue Manager: Detects overflow → Eviction + Summarization     │
└─────────────────────────────────────────────────────────────────┘
                              │
                    evicted messages
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                   EXTERNAL CONTEXT (Disk)                        │
├─────────────────────────────────────────────────────────────────┤
│  Archival storage (searchable)                                  │
│  Evicted message summaries                                      │
└─────────────────────────────────────────────────────────────────┘

Key Insight: MemGPT separates "working context" (explicitly managed facts) from "message queue" (FIFO history).


Mapping Research to Our System

Research Concept Our Equivalent Status
Main context LLM prompt (built in build_llm_messages) Implemented
Working context character.db.entity_profiles + journal Implemented
Message queue script.db.conversation_history Managed (compaction)
External context Mem0 semantic memory Implemented
Queue manager sleep/compaction.py Implemented
Eviction + summarization compact_conversation_history() Implemented
Observation masking Not implemented

Implemented Solution: Two-Phase Compaction

Phase 1: Sleep-Based Compaction (Primary Path)

During run_sleep_tick(), after memory consolidation:

1. Calculate conversation_history token count
2. If tokens > SLEEP_COMPACT_THRESHOLD (e.g., 50% of max_context_tokens):
   a. Identify messages outside "preserve window" (oldest messages)
   b. Generate summary using LLM with compaction prompt
   c. Store summary as journal entry (type="context_synthesis")
   d. Replace old messages with single [CONTEXT SUMMARY] message
   e. Delete original messages from history

Why sleep is ideal:

  • LLM is already "offline" - no mid-conversation disruption
  • Memory consolidation just ran - facts are safe in Mem0
  • Natural checkpoint for state compression

Phase 2: Emergency Compaction (Fallback)

At start of each tick, before build_llm_messages():

1. Check token pressure
2. If tokens > EMERGENCY_THRESHOLD (e.g., 80% of max_context_tokens):
   a. Log warning about forced compaction
   b. Run compaction immediately (same as sleep path)
   c. Continue with tick

Why this is a fallback:

  • Context pressure should rarely reach this level if sleep compaction works
  • Forces compaction even during active conversations
  • May lose some facts that haven't been journaled yet

Design Decisions

1. Preserve Window

Keep the last N messages (or N tokens) intact to maintain conversation coherence.

  • Suggested: Last 20 messages OR 20% of max_context_tokens, whichever is larger

2. Compaction Prompt (Customizable)

Summarize the following conversation history. Focus on:
- Key facts learned about entities, locations, and events
- Decisions made and actions taken
- Current state and ongoing tasks
- Important context for future interactions

Do NOT include:
- Routine greetings or small talk
- Step-by-step tool execution details
- Redundant information

Format as a concise narrative that preserves essential context.

3. Storage Format

Compacted context stored as:

{
    "role": "system",
    "content": "[CONTEXT SUMMARY]\n<summary text>",
    "metadata": {
        "type": "compaction",
        "compacted_count": 45,  # messages summarized
        "compacted_at": "2025-12-06T12:00:00Z",
        "original_token_count": 15000,
        "summary_token_count": 800,
    }
}

4. Observation Masking (Optional Enhancement)

For tool outputs specifically, consider masking instead of summarizing:

{
    "role": "tool",
    "content": "[TOOL OUTPUT ARCHIVED - search history for details]",
    "original_tool": "search_memory",
    "archived_at": "2025-12-06T12:00:00Z",
}

Implementation Files

File Purpose
sleep/compaction.py compact_conversation_history() function
sleep/__init__.py Orchestrates compaction in run_sleep_tick()
assistant_script.py Emergency compaction check, config attributes
llm_interaction.py generate_context_summary() helper
tests/test_context_compaction.py Compaction unit tests
tests/test_pre_compaction_extraction.py Pre-compaction journaling tests

Design Questions (Resolved)

  1. Token counting cost: Count on-demand with tiered fallback (toksum → tiktoken → heuristic)
  2. Summary model: Uses same model as main LLM for consistency
  3. Compaction history: Summaries stored in journal as synthesis entries
  4. User visibility: Compaction metadata stored in journal entries, visible via aihistory

Architecture (Reference)

Data Stores (Three Tiers)

┌─────────────────────────────────────────────────────────────────────────┐
│                         PERSISTENT STORAGE                               │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  ┌──────────────────────────┐    ┌──────────────────────────────────┐   │
│  │ script.db.conversation_  │    │ character.db.journal             │   │
│  │ history                  │    │                                  │   │
│  │                          │    │  entries: [{id, content,         │   │
│  │  [{role, content}, ...]  │    │    importance, timestamp,        │   │
│  │                          │    │    source_type, tags...}]        │   │
│  │  ✅ Managed via          │    │                                  │   │
│  │  compaction at 70%/80%   │    │  consolidated_entry_ids: []      │   │
│  │  token thresholds        │    │                                  │   │
│  └──────────────────────────┘    └──────────────────────────────────┘   │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

Message Flow: Write Path

User Event / Tool Result / LLM Response
              │
              ▼
┌─────────────────────────────────┐
│ script.db.conversation_history  │  ← Messages appended here
│ .append({role, content})        │
└─────────────────────────────────┘
              │
              │ Compaction triggers:
              │ - Sleep phase: 70% threshold
              │ - Emergency: 80% threshold
              ▼
         [Controlled Growth]

File Purpose
assistant_script.py History init, compaction config
llm_interaction.py Token-budgeted history loading, summary generation
tool_execution.py History append on tool calls
sleep/__init__.py Sleep tick orchestration
sleep/compaction.py compact_conversation_history()
sleep/consolidation.py Journal → Mem0 consolidation
tools/journal.py AddJournalEntryTool
importance_scoring.py Heuristic and LLM scoring

Last updated: 2025-12-09