Context Compaction System
Implementation Plan: Conversation History Compaction
Issue #10: Implement Conversation History Trimming
Wiki: https://forge.wordpainter.net/blightbow/evennia_ai/wiki/Context-and-Memory-Flow-Analysis
Overview
Implement a two-phase context compaction system that:
- Compresses old conversation history during sleep (ideal path)
- Provides emergency compaction fallback when approaching context limits
- Preserves facts via LLM-generated summaries before trimming
Based on research from Claude Code, MemGPT, and JetBrains NeurIPS 2025.
Configuration Attributes
Add to assistant_script.py in at_script_creation():
# Context compaction settings
self.db.compact_sleep_threshold = 0.7 # Trigger at 70% during sleep
self.db.compact_emergency_threshold = 0.8 # Emergency trigger at 80%
self.db.compact_preserve_window = 20 # Keep last N messages intact
self.db.compact_model = None # None = use main LLM, or override
self.db.compact_enabled = True # Allow disabling
self.db.last_compaction = None # Timestamp of last compaction
Implementation Steps
Step 1: Add token counting helper
File: evennia/contrib/base_systems/ai/helpers.py
def count_conversation_tokens(history: list, model: str = "gpt-4") -> int:
"""Count total tokens in conversation history."""
from .utils.token_counter import count_tokens
total = 0
for msg in history:
content = msg.get("content", "")
if content:
total += count_tokens(content, model=model)
return total
Step 2: Add compaction summary generator
File: evennia/contrib/base_systems/ai/llm_interaction.py
@inlineCallbacks
def generate_context_summary(script, messages_to_compact: list):
"""
Generate a summary of conversation messages for compaction.
Uses compaction_model if configured, else main LLM.
Returns summary text.
"""
prompt = script.db.compact_prompt or DEFAULT_COMPACTION_PROMPT
# Build compaction request
compaction_messages = [
{"role": "system", "content": prompt},
{"role": "user", "content": format_messages_for_compaction(messages_to_compact)}
]
# Use compact_model if set, otherwise main LLM
model = script.db.compact_model or script.db.llm_model
# Call LLM for summary
response = yield call_llm_simple(script, compaction_messages, model=model)
defer.returnValue(response.content if response else None)
Default prompt:
DEFAULT_COMPACTION_PROMPT = """Summarize this conversation history. Focus on:
- Key facts learned about entities, locations, and events
- Decisions made and actions taken
- Current state and ongoing tasks
- Important context for future interactions
Exclude routine greetings, tool execution details, and redundant info.
Format as a concise narrative (max 500 words)."""
Step 3: Add main compaction function
File: evennia/contrib/base_systems/ai/rag_memory.py
@inlineCallbacks
def compact_conversation_history(script, character, force=False):
"""
Compact conversation history by summarizing old messages.
Args:
script: AssistantScript instance
character: AssistantCharacter instance
force: If True, compact even below threshold (for emergency)
Returns:
dict with compaction results
"""
if not script.db.compact_enabled and not force:
defer.returnValue({"skipped": True, "reason": "compaction disabled"})
history = script.db.conversation_history or []
if not history:
defer.returnValue({"skipped": True, "reason": "no history"})
# Calculate current token usage
model = script.db.llm_model or "gpt-4"
total_tokens = count_conversation_tokens(history, model)
max_tokens = script.db.max_context_tokens or 100000
usage_pct = total_tokens / max_tokens
# Determine threshold based on context (sleep vs emergency)
threshold = script.db.compact_emergency_threshold if force else script.db.compact_sleep_threshold
if usage_pct < threshold and not force:
defer.returnValue({
"skipped": True,
"reason": f"below threshold ({usage_pct:.1%} < {threshold:.0%})",
"token_usage": usage_pct
})
# Calculate preserve window
preserve_count = script.db.compact_preserve_window or 20
if len(history) <= preserve_count:
defer.returnValue({"skipped": True, "reason": "history within preserve window"})
# Split history: compact vs preserve
messages_to_compact = history[:-preserve_count]
messages_to_preserve = history[-preserve_count:]
# Generate summary
summary = yield generate_context_summary(script, messages_to_compact)
if not summary:
defer.returnValue({"success": False, "error": "summary generation failed"})
# Create compaction marker message
compaction_msg = {
"role": "system",
"content": f"[CONTEXT SUMMARY]\n{summary}",
"metadata": {
"type": "compaction",
"compacted_count": len(messages_to_compact),
"compacted_at": timezone.now().isoformat(),
"original_tokens": total_tokens,
}
}
# Store summary in journal for archival
journal = character.db.journal or {"entries": []}
journal["entries"].append({
"id": f"compact_{timezone.now().strftime('%Y%m%d_%H%M%S')}",
"timestamp": timezone.now().isoformat(),
"content": f"[CONTEXT SYNTHESIS]\n{summary}",
"source_type": "compaction",
"importance": 7, # High importance - this is compressed context
"tags": ["compaction", "synthesis"],
})
character.db.journal = journal
# Replace history with compaction marker + preserved messages
script.db.conversation_history = [compaction_msg] + messages_to_preserve
script.db.last_compaction = timezone.now().isoformat()
new_tokens = count_conversation_tokens(script.db.conversation_history, model)
logger.log_info(
f"Context compaction: {len(messages_to_compact)} messages → summary, "
f"{total_tokens} → {new_tokens} tokens ({(1 - new_tokens/total_tokens)*100:.0f}% reduction)"
)
defer.returnValue({
"success": True,
"compacted_messages": len(messages_to_compact),
"preserved_messages": len(messages_to_preserve),
"original_tokens": total_tokens,
"new_tokens": new_tokens,
"reduction_pct": (1 - new_tokens/total_tokens) * 100,
})
Step 4: Integrate into sleep phase
File: evennia/contrib/base_systems/ai/rag_memory.py
In run_sleep_tick(), after memory consolidation and before returning:
# Context compaction (after consolidation, facts are safe in Mem0)
compaction_result = yield compact_conversation_history(script, character)
results["context_compaction"] = compaction_result
Step 5: Add emergency compaction check
File: evennia/contrib/base_systems/ai/assistant_script.py
In at_tick(), near the start (after early exits, before main logic):
# Emergency context compaction check
if self.db.compact_enabled:
history = self.db.conversation_history or []
if history:
total_tokens = count_conversation_tokens(history, self.db.llm_model or "gpt-4")
max_tokens = self.db.max_context_tokens or 100000
if total_tokens / max_tokens >= self.db.compact_emergency_threshold:
logger.log_warn(
f"Emergency compaction triggered: {total_tokens/max_tokens:.1%} context usage"
)
yield compact_conversation_history(self, character, force=True)
Step 6: Add tests
File: evennia/contrib/base_systems/ai/tests/test_context_compaction.py
Test classes:
TestCompactConversationHistorySkips- threshold checks, disabled, emptyTestCompactConversationHistoryExecution- summary generation, message replacementTestCompactConversationHistoryJournal- synthesis entry creationTestEmergencyCompaction- force=True behaviorTestSleepPhaseIntegration- called during run_sleep_tick
Files Modified
| File | Changes |
|---|---|
assistant_script.py |
Add config attrs, emergency check in at_tick |
rag_memory.py |
Add compact_conversation_history(), call in run_sleep_tick() |
llm_interaction.py |
Add generate_context_summary(), DEFAULT_COMPACTION_PROMPT |
helpers.py |
Add count_conversation_tokens() |
tests/test_context_compaction.py |
New test file |
Verification Steps
- Run existing tests to ensure no regressions
- Run new compaction tests
- Manual test: Set low threshold, verify compaction triggers during sleep
- Manual test: Set low emergency threshold, verify forced compaction
- Check journal for synthesis entries after compaction
Future Enhancements (Not in Scope)
- Observation masking for tool outputs (hybrid approach)
- Compaction history log for debugging
- User-visible compaction notifications
/compactmanual trigger command