Context Compaction System

Implementation Plan: Conversation History Compaction

Issue #10: Implement Conversation History Trimming
Wiki: https://forge.wordpainter.net/blightbow/evennia_ai/wiki/Context-and-Memory-Flow-Analysis


Overview

Implement a two-phase context compaction system that:

  1. Compresses old conversation history during sleep (ideal path)
  2. Provides emergency compaction fallback when approaching context limits
  3. Preserves facts via LLM-generated summaries before trimming

Based on research from Claude Code, MemGPT, and JetBrains NeurIPS 2025.


Configuration Attributes

Add to assistant_script.py in at_script_creation():

# Context compaction settings
self.db.compact_sleep_threshold = 0.7    # Trigger at 70% during sleep
self.db.compact_emergency_threshold = 0.8  # Emergency trigger at 80%
self.db.compact_preserve_window = 20       # Keep last N messages intact
self.db.compact_model = None               # None = use main LLM, or override
self.db.compact_enabled = True             # Allow disabling
self.db.last_compaction = None             # Timestamp of last compaction

Implementation Steps

Step 1: Add token counting helper

File: evennia/contrib/base_systems/ai/helpers.py

def count_conversation_tokens(history: list, model: str = "gpt-4") -> int:
    """Count total tokens in conversation history."""
    from .utils.token_counter import count_tokens
    total = 0
    for msg in history:
        content = msg.get("content", "")
        if content:
            total += count_tokens(content, model=model)
    return total

Step 2: Add compaction summary generator

File: evennia/contrib/base_systems/ai/llm_interaction.py

@inlineCallbacks
def generate_context_summary(script, messages_to_compact: list):
    """
    Generate a summary of conversation messages for compaction.

    Uses compaction_model if configured, else main LLM.
    Returns summary text.
    """
    prompt = script.db.compact_prompt or DEFAULT_COMPACTION_PROMPT

    # Build compaction request
    compaction_messages = [
        {"role": "system", "content": prompt},
        {"role": "user", "content": format_messages_for_compaction(messages_to_compact)}
    ]

    # Use compact_model if set, otherwise main LLM
    model = script.db.compact_model or script.db.llm_model

    # Call LLM for summary
    response = yield call_llm_simple(script, compaction_messages, model=model)
    defer.returnValue(response.content if response else None)

Default prompt:

DEFAULT_COMPACTION_PROMPT = """Summarize this conversation history. Focus on:
- Key facts learned about entities, locations, and events
- Decisions made and actions taken
- Current state and ongoing tasks
- Important context for future interactions

Exclude routine greetings, tool execution details, and redundant info.
Format as a concise narrative (max 500 words)."""

Step 3: Add main compaction function

File: evennia/contrib/base_systems/ai/rag_memory.py

@inlineCallbacks
def compact_conversation_history(script, character, force=False):
    """
    Compact conversation history by summarizing old messages.

    Args:
        script: AssistantScript instance
        character: AssistantCharacter instance
        force: If True, compact even below threshold (for emergency)

    Returns:
        dict with compaction results
    """
    if not script.db.compact_enabled and not force:
        defer.returnValue({"skipped": True, "reason": "compaction disabled"})

    history = script.db.conversation_history or []
    if not history:
        defer.returnValue({"skipped": True, "reason": "no history"})

    # Calculate current token usage
    model = script.db.llm_model or "gpt-4"
    total_tokens = count_conversation_tokens(history, model)
    max_tokens = script.db.max_context_tokens or 100000
    usage_pct = total_tokens / max_tokens

    # Determine threshold based on context (sleep vs emergency)
    threshold = script.db.compact_emergency_threshold if force else script.db.compact_sleep_threshold

    if usage_pct < threshold and not force:
        defer.returnValue({
            "skipped": True,
            "reason": f"below threshold ({usage_pct:.1%} < {threshold:.0%})",
            "token_usage": usage_pct
        })

    # Calculate preserve window
    preserve_count = script.db.compact_preserve_window or 20
    if len(history) <= preserve_count:
        defer.returnValue({"skipped": True, "reason": "history within preserve window"})

    # Split history: compact vs preserve
    messages_to_compact = history[:-preserve_count]
    messages_to_preserve = history[-preserve_count:]

    # Generate summary
    summary = yield generate_context_summary(script, messages_to_compact)
    if not summary:
        defer.returnValue({"success": False, "error": "summary generation failed"})

    # Create compaction marker message
    compaction_msg = {
        "role": "system",
        "content": f"[CONTEXT SUMMARY]\n{summary}",
        "metadata": {
            "type": "compaction",
            "compacted_count": len(messages_to_compact),
            "compacted_at": timezone.now().isoformat(),
            "original_tokens": total_tokens,
        }
    }

    # Store summary in journal for archival
    journal = character.db.journal or {"entries": []}
    journal["entries"].append({
        "id": f"compact_{timezone.now().strftime('%Y%m%d_%H%M%S')}",
        "timestamp": timezone.now().isoformat(),
        "content": f"[CONTEXT SYNTHESIS]\n{summary}",
        "source_type": "compaction",
        "importance": 7,  # High importance - this is compressed context
        "tags": ["compaction", "synthesis"],
    })
    character.db.journal = journal

    # Replace history with compaction marker + preserved messages
    script.db.conversation_history = [compaction_msg] + messages_to_preserve
    script.db.last_compaction = timezone.now().isoformat()

    new_tokens = count_conversation_tokens(script.db.conversation_history, model)

    logger.log_info(
        f"Context compaction: {len(messages_to_compact)} messages → summary, "
        f"{total_tokens}{new_tokens} tokens ({(1 - new_tokens/total_tokens)*100:.0f}% reduction)"
    )

    defer.returnValue({
        "success": True,
        "compacted_messages": len(messages_to_compact),
        "preserved_messages": len(messages_to_preserve),
        "original_tokens": total_tokens,
        "new_tokens": new_tokens,
        "reduction_pct": (1 - new_tokens/total_tokens) * 100,
    })

Step 4: Integrate into sleep phase

File: evennia/contrib/base_systems/ai/rag_memory.py

In run_sleep_tick(), after memory consolidation and before returning:

# Context compaction (after consolidation, facts are safe in Mem0)
compaction_result = yield compact_conversation_history(script, character)
results["context_compaction"] = compaction_result

Step 5: Add emergency compaction check

File: evennia/contrib/base_systems/ai/assistant_script.py

In at_tick(), near the start (after early exits, before main logic):

# Emergency context compaction check
if self.db.compact_enabled:
    history = self.db.conversation_history or []
    if history:
        total_tokens = count_conversation_tokens(history, self.db.llm_model or "gpt-4")
        max_tokens = self.db.max_context_tokens or 100000
        if total_tokens / max_tokens >= self.db.compact_emergency_threshold:
            logger.log_warn(
                f"Emergency compaction triggered: {total_tokens/max_tokens:.1%} context usage"
            )
            yield compact_conversation_history(self, character, force=True)

Step 6: Add tests

File: evennia/contrib/base_systems/ai/tests/test_context_compaction.py

Test classes:

  1. TestCompactConversationHistorySkips - threshold checks, disabled, empty
  2. TestCompactConversationHistoryExecution - summary generation, message replacement
  3. TestCompactConversationHistoryJournal - synthesis entry creation
  4. TestEmergencyCompaction - force=True behavior
  5. TestSleepPhaseIntegration - called during run_sleep_tick

Files Modified

File Changes
assistant_script.py Add config attrs, emergency check in at_tick
rag_memory.py Add compact_conversation_history(), call in run_sleep_tick()
llm_interaction.py Add generate_context_summary(), DEFAULT_COMPACTION_PROMPT
helpers.py Add count_conversation_tokens()
tests/test_context_compaction.py New test file

Verification Steps

  1. Run existing tests to ensure no regressions
  2. Run new compaction tests
  3. Manual test: Set low threshold, verify compaction triggers during sleep
  4. Manual test: Set low emergency threshold, verify forced compaction
  5. Check journal for synthesis entries after compaction

Future Enhancements (Not in Scope)

  • Observation masking for tool outputs (hybrid approach)
  • Compaction history log for debugging
  • User-visible compaction notifications
  • /compact manual trigger command
No due date
100% Completed