blightbow/evennia_ai

Fork 0

Table of Contents

Context and Memory Flow Analysis

Research Findings

1. Context Compaction Patterns (Claude Code, Codex CLI, OpenCode)
2. JetBrains Research (NeurIPS 2025)
3. MemGPT Architecture

Mapping Research to Our System
Implemented Solution: Two-Phase Compaction

Phase 1: Sleep-Based Compaction (Primary Path)
Phase 2: Emergency Compaction (Fallback)

Design Decisions

1. Preserve Window
2. Compaction Prompt (Customizable)
3. Storage Format
4. Observation Masking (Optional Enhancement)

Implementation Files
Design Questions (Resolved)
Architecture (Reference)

Data Stores (Three Tiers)
Message Flow: Write Path

Related Files Reference

Context and Memory Flow Analysis

Issue #10: Implement Conversation History Trimming Created: 2025-12-06 Status: ✅ Implemented

Note

: This document preserves the original research and design analysis that informed the context compaction implementation. For current implementation details, see Data-Flow-01-Context-Compaction.

Research Findings

1. Context Compaction Patterns (Claude Code, Codex CLI, OpenCode)

Source: Context Compaction Research Gist

Tool	Manual Trigger	Auto Trigger	Method
Claude Code	`/compact`	~95% capacity	LLM summarization
Codex CLI	`/compact`	Token threshold	Dedicated prompt
OpenCode	`/compact`	Token threshold	Prune + summarize

Best Practices:

Set auto-trigger at 85-90% (not 95%) to avoid edge cases
Prune tool outputs before summarization
Warn users about accuracy degradation after multiple compactions
Allow disabling auto-compaction
Support custom summarization prompts

Effective Compaction Prompt Should Include:

Completed work
Current state
In-progress tasks
Next steps
Constraints
Critical context

2. JetBrains Research (NeurIPS 2025)

Source: Efficient Context Management

Two Approaches:

Approach	What It Does	Pros	Cons
LLM Summarization	Compress entire history	Infinite scaling	Expensive, loses termination signals
Observation Masking	Hide old observations with placeholders	Efficient, preserves reasoning	Limited compression

Key Finding: Observation masking often outperforms summarization in efficiency and reliability.

Hybrid Approach: Combine masking with occasional summarization for best results.

3. MemGPT Architecture

Source: MemGPT Paper (arXiv:2310.08560)

┌─────────────────────────────────────────────────────────────────┐
│                      MAIN CONTEXT (RAM)                          │
├─────────────────────────────────────────────────────────────────┤
│  System Instructions (fixed)                                    │
│  Working Context (key facts, preferences - read/write)          │
│  FIFO Message Queue (conversation history)                      │
└─────────────────────────────────────────────────────────────────┘
                              │
                    overflow triggers
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│  Queue Manager: Detects overflow → Eviction + Summarization     │
└─────────────────────────────────────────────────────────────────┘
                              │
                    evicted messages
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                   EXTERNAL CONTEXT (Disk)                        │
├─────────────────────────────────────────────────────────────────┤
│  Archival storage (searchable)                                  │
│  Evicted message summaries                                      │
└─────────────────────────────────────────────────────────────────┘

Key Insight: MemGPT separates "working context" (explicitly managed facts) from "message queue" (FIFO history).

Mapping Research to Our System

Research Concept	Our Equivalent	Status
Main context	LLM prompt (built in `build_llm_messages`)	✅ Implemented
Working context	`character.db.entity_profiles` + `journal`	✅ Implemented
Message queue	`script.db.conversation_history`	✅ Managed (compaction)
External context	Mem0 semantic memory	✅ Implemented
Queue manager	`sleep/compaction.py`	✅ Implemented
Eviction + summarization	`compact_conversation_history()`	✅ Implemented
Observation masking	—	❌ Not implemented

Implemented Solution: Two-Phase Compaction

Phase 1: Sleep-Based Compaction (Primary Path)

During run_sleep_tick(), after memory consolidation:

1. Calculate conversation_history token count
2. If tokens > SLEEP_COMPACT_THRESHOLD (e.g., 50% of max_context_tokens):
   a. Identify messages outside "preserve window" (oldest messages)
   b. Generate summary using LLM with compaction prompt
   c. Store summary as journal entry (type="context_synthesis")
   d. Replace old messages with single [CONTEXT SUMMARY] message
   e. Delete original messages from history

Why sleep is ideal:

LLM is already "offline" - no mid-conversation disruption
Memory consolidation just ran - facts are safe in Mem0
Natural checkpoint for state compression

Phase 2: Emergency Compaction (Fallback)

At start of each tick, before build_llm_messages():

1. Check token pressure
2. If tokens > EMERGENCY_THRESHOLD (e.g., 80% of max_context_tokens):
   a. Log warning about forced compaction
   b. Run compaction immediately (same as sleep path)
   c. Continue with tick

Why this is a fallback:

Context pressure should rarely reach this level if sleep compaction works
Forces compaction even during active conversations
May lose some facts that haven't been journaled yet

Design Decisions

1. Preserve Window

Keep the last N messages (or N tokens) intact to maintain conversation coherence.

Suggested: Last 20 messages OR 20% of max_context_tokens, whichever is larger

2. Compaction Prompt (Customizable)

Summarize the following conversation history. Focus on:
- Key facts learned about entities, locations, and events
- Decisions made and actions taken
- Current state and ongoing tasks
- Important context for future interactions

Do NOT include:
- Routine greetings or small talk
- Step-by-step tool execution details
- Redundant information

Format as a concise narrative that preserves essential context.

3. Storage Format

Compacted context stored as:

{
    "role": "system",
    "content": "[CONTEXT SUMMARY]\n<summary text>",
    "metadata": {
        "type": "compaction",
        "compacted_count": 45,  # messages summarized
        "compacted_at": "2025-12-06T12:00:00Z",
        "original_token_count": 15000,
        "summary_token_count": 800,
    }
}

4. Observation Masking (Optional Enhancement)

For tool outputs specifically, consider masking instead of summarizing:

{
    "role": "tool",
    "content": "[TOOL OUTPUT ARCHIVED - search history for details]",
    "original_tool": "search_memory",
    "archived_at": "2025-12-06T12:00:00Z",
}

Implementation Files

File	Purpose
`sleep/compaction.py`	`compact_conversation_history()` function
`sleep/__init__.py`	Orchestrates compaction in `run_sleep_tick()`
`assistant_script.py`	Emergency compaction check, config attributes
`llm_interaction.py`	`generate_context_summary()` helper
`tests/test_context_compaction.py`	Compaction unit tests
`tests/test_pre_compaction_extraction.py`	Pre-compaction journaling tests

Design Questions (Resolved)

Token counting cost: Count on-demand with tiered fallback (toksum → tiktoken → heuristic)
Summary model: Uses same model as main LLM for consistency
Compaction history: Summaries stored in journal as synthesis entries
User visibility: Compaction metadata stored in journal entries, visible via aihistory

Architecture (Reference)

Data Stores (Three Tiers)

┌─────────────────────────────────────────────────────────────────────────┐
│                         PERSISTENT STORAGE                               │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  ┌──────────────────────────┐    ┌──────────────────────────────────┐   │
│  │ script.db.conversation_  │    │ character.db.journal             │   │
│  │ history                  │    │                                  │   │
│  │                          │    │  entries: [{id, content,         │   │
│  │  [{role, content}, ...]  │    │    importance, timestamp,        │   │
│  │                          │    │    source_type, tags...}]        │   │
│  │  ✅ Managed via          │    │                                  │   │
│  │  compaction at 70%/80%   │    │  consolidated_entry_ids: []      │   │
│  │  token thresholds        │    │                                  │   │
│  └──────────────────────────┘    └──────────────────────────────────┘   │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

Message Flow: Write Path

User Event / Tool Result / LLM Response
              │
              ▼
┌─────────────────────────────────┐
│ script.db.conversation_history  │  ← Messages appended here
│ .append({role, content})        │
└─────────────────────────────────┘
              │
              │ Compaction triggers:
              │ - Sleep phase: 70% threshold
              │ - Emergency: 80% threshold
              ▼
         [Controlled Growth]

File	Purpose
`assistant_script.py`	History init, compaction config
`llm_interaction.py`	Token-budgeted history loading, summary generation
`tool_execution.py`	History append on tool calls
`sleep/__init__.py`	Sleep tick orchestration
`sleep/compaction.py`	`compact_conversation_history()`
`sleep/consolidation.py`	Journal → Mem0 consolidation
`tools/journal.py`	`AddJournalEntryTool`
`importance_scoring.py`	Heuristic and LLM scoring

Last updated: 2025-12-09