Context Compaction System

New issue

Implementation Plan: Conversation History Compaction

Issue #10: Implement Conversation History Trimming
Wiki: https://forge.wordpainter.net/blightbow/evennia_ai/wiki/Context-and-Memory-Flow-Analysis

Overview

Implement a two-phase context compaction system that:

Compresses old conversation history during sleep (ideal path)
Provides emergency compaction fallback when approaching context limits
Preserves facts via LLM-generated summaries before trimming

Based on research from Claude Code, MemGPT, and JetBrains NeurIPS 2025.

Configuration Attributes

Add to assistant_script.py in at_script_creation():

# Context compaction settings
self.db.compact_sleep_threshold = 0.7    # Trigger at 70% during sleep
self.db.compact_emergency_threshold = 0.8  # Emergency trigger at 80%
self.db.compact_preserve_window = 20       # Keep last N messages intact
self.db.compact_model = None               # None = use main LLM, or override
self.db.compact_enabled = True             # Allow disabling
self.db.last_compaction = None             # Timestamp of last compaction

Implementation Steps

Step 1: Add token counting helper

File: evennia/contrib/base_systems/ai/helpers.py

def count_conversation_tokens(history: list, model: str = "gpt-4") -> int:
    """Count total tokens in conversation history."""
    from .utils.token_counter import count_tokens
    total = 0
    for msg in history:
        content = msg.get("content", "")
        if content:
            total += count_tokens(content, model=model)
    return total

Step 2: Add compaction summary generator

File: evennia/contrib/base_systems/ai/llm_interaction.py

@inlineCallbacks
def generate_context_summary(script, messages_to_compact: list):
    """
    Generate a summary of conversation messages for compaction.

    Uses compaction_model if configured, else main LLM.
    Returns summary text.
    """
    prompt = script.db.compact_prompt or DEFAULT_COMPACTION_PROMPT

    # Build compaction request
    compaction_messages = [
        {"role": "system", "content": prompt},
        {"role": "user", "content": format_messages_for_compaction(messages_to_compact)}
    ]

    # Use compact_model if set, otherwise main LLM
    model = script.db.compact_model or script.db.llm_model

    # Call LLM for summary
    response = yield call_llm_simple(script, compaction_messages, model=model)
    defer.returnValue(response.content if response else None)

Default prompt:

DEFAULT_COMPACTION_PROMPT = """Summarize this conversation history. Focus on:
- Key facts learned about entities, locations, and events
- Decisions made and actions taken
- Current state and ongoing tasks
- Important context for future interactions

Exclude routine greetings, tool execution details, and redundant info.
Format as a concise narrative (max 500 words)."""

Step 3: Add main compaction function

File: evennia/contrib/base_systems/ai/rag_memory.py

@inlineCallbacks
def compact_conversation_history(script, character, force=False):
    """
    Compact conversation history by summarizing old messages.

    Args:
        script: AssistantScript instance
        character: AssistantCharacter instance
        force: If True, compact even below threshold (for emergency)

    Returns:
        dict with compaction results
    """
    if not script.db.compact_enabled and not force:
        defer.returnValue({"skipped": True, "reason": "compaction disabled"})

    history = script.db.conversation_history or []
    if not history:
        defer.returnValue({"skipped": True, "reason": "no history"})

    # Calculate current token usage
    model = script.db.llm_model or "gpt-4"
    total_tokens = count_conversation_tokens(history, model)
    max_tokens = script.db.max_context_tokens or 100000
    usage_pct = total_tokens / max_tokens

    # Determine threshold based on context (sleep vs emergency)
    threshold = script.db.compact_emergency_threshold if force else script.db.compact_sleep_threshold

    if usage_pct < threshold and not force:
        defer.returnValue({
            "skipped": True,
            "reason": f"below threshold ({usage_pct:.1%} < {threshold:.0%})",
            "token_usage": usage_pct
        })

    # Calculate preserve window
    preserve_count = script.db.compact_preserve_window or 20
    if len(history) <= preserve_count:
        defer.returnValue({"skipped": True, "reason": "history within preserve window"})

    # Split history: compact vs preserve
    messages_to_compact = history[:-preserve_count]
    messages_to_preserve = history[-preserve_count:]

    # Generate summary
    summary = yield generate_context_summary(script, messages_to_compact)
    if not summary:
        defer.returnValue({"success": False, "error": "summary generation failed"})

    # Create compaction marker message
    compaction_msg = {
        "role": "system",
        "content": f"[CONTEXT SUMMARY]\n{summary}",
        "metadata": {
            "type": "compaction",
            "compacted_count": len(messages_to_compact),
            "compacted_at": timezone.now().isoformat(),
            "original_tokens": total_tokens,
        }
    }

    # Store summary in journal for archival
    journal = character.db.journal or {"entries": []}
    journal["entries"].append({
        "id": f"compact_{timezone.now().strftime('%Y%m%d_%H%M%S')}",
        "timestamp": timezone.now().isoformat(),
        "content": f"[CONTEXT SYNTHESIS]\n{summary}",
        "source_type": "compaction",
        "importance": 7,  # High importance - this is compressed context
        "tags": ["compaction", "synthesis"],
    })
    character.db.journal = journal

    # Replace history with compaction marker + preserved messages
    script.db.conversation_history = [compaction_msg] + messages_to_preserve
    script.db.last_compaction = timezone.now().isoformat()

    new_tokens = count_conversation_tokens(script.db.conversation_history, model)

    logger.log_info(
        f"Context compaction: {len(messages_to_compact)} messages → summary, "
        f"{total_tokens} → {new_tokens} tokens ({(1 - new_tokens/total_tokens)*100:.0f}% reduction)"
    )

    defer.returnValue({
        "success": True,
        "compacted_messages": len(messages_to_compact),
        "preserved_messages": len(messages_to_preserve),
        "original_tokens": total_tokens,
        "new_tokens": new_tokens,
        "reduction_pct": (1 - new_tokens/total_tokens) * 100,
    })

Step 4: Integrate into sleep phase

File: evennia/contrib/base_systems/ai/rag_memory.py

In run_sleep_tick(), after memory consolidation and before returning:

# Context compaction (after consolidation, facts are safe in Mem0)
compaction_result = yield compact_conversation_history(script, character)
results["context_compaction"] = compaction_result

Step 5: Add emergency compaction check

File: evennia/contrib/base_systems/ai/assistant_script.py

In at_tick(), near the start (after early exits, before main logic):

# Emergency context compaction check
if self.db.compact_enabled:
    history = self.db.conversation_history or []
    if history:
        total_tokens = count_conversation_tokens(history, self.db.llm_model or "gpt-4")
        max_tokens = self.db.max_context_tokens or 100000
        if total_tokens / max_tokens >= self.db.compact_emergency_threshold:
            logger.log_warn(
                f"Emergency compaction triggered: {total_tokens/max_tokens:.1%} context usage"
            )
            yield compact_conversation_history(self, character, force=True)

Step 6: Add tests

File: evennia/contrib/base_systems/ai/tests/test_context_compaction.py

Test classes:

TestCompactConversationHistorySkips - threshold checks, disabled, empty
TestCompactConversationHistoryExecution - summary generation, message replacement
TestCompactConversationHistoryJournal - synthesis entry creation
TestEmergencyCompaction - force=True behavior
TestSleepPhaseIntegration - called during run_sleep_tick

Files Modified

File	Changes
`assistant_script.py`	Add config attrs, emergency check in at_tick
`rag_memory.py`	Add `compact_conversation_history()`, call in `run_sleep_tick()`
`llm_interaction.py`	Add `generate_context_summary()`, `DEFAULT_COMPACTION_PROMPT`
`helpers.py`	Add `count_conversation_tokens()`
`tests/test_context_compaction.py`	New test file

Verification Steps

Run existing tests to ensure no regressions
Run new compaction tests
Manual test: Set low threshold, verify compaction triggers during sleep
Manual test: Set low emergency threshold, verify forced compaction
Check journal for synthesis entries after compaction

Future Enhancements (Not in Scope)

Observation masking for tool outputs (hybrid approach)
Compaction history log for debugging
User-visible compaction notifications
/compact manual trigger command

No due date

100% Completed

[P1] Implement Conversation History Trimming

#10 by blightbow was closed 2025-12-06 17:11:52 +00:00