2 Data Flow 01 Context Compaction
blightbow edited this page 2025-12-09 02:39:18 +00:00

Data Flow 01: Context Compaction

Engineering documentation series - Data flows in the AI Assistant system


Overview

This document describes the data flows involved in context compaction, including:

  • Sleep-triggered compaction (70% threshold)
  • Emergency compaction (80% threshold)
  • Pre-compaction fact extraction
Document Description
Context and Memory Flow Analysis Original research and design
Data-Flow-02-ReAct-Loop Tick loop execution
Data-Flow-03-Memory-Consolidation Sleep phase memory operations

1. Sleep-Triggered Compaction

Occurs during the dreaming phase of sleep mode when context usage exceeds 70%.

Trigger Point

assistant_script.py::at_tick()
  └─> _run_sleep_tick()
        └─> rag_memory.py::run_sleep_tick()
              └─> compact_conversation_history(script, character)

Data Flow

┌─────────────────────────────────────────────────────────────────────────────┐
│ SLEEP TICK START                                                            │
│ Operating mode: sleep, Phase: dreaming                                      │
└─────────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ MEMORY OPERATIONS (rag_memory.py::run_sleep_tick)                           │
│ ─────────────────────────────────────────────────────────────────────────── │
│ 1. Memory link generation (consolidate_memories_to_semantic)                │
│ 2. Entity consolidation (helpers/entity_context.py::run_entity_consolidation_batch) │
│ 3. Orphaned memory link cleanup (prune_orphaned_memory_links)               │
│ 4. Episodic prune (helpers/episodic_index.py::prune_low_importance_entries) │
│ 5. Stale conversation cleanup (helpers/working_memory.py::clear_stale_conversations) │
│ 6. Pre-compaction fact extraction (run_pre_compaction_extraction)           │
└─────────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ COMPACTION CHECK (compact_conversation_history)                             │
│ ─────────────────────────────────────────────────────────────────────────── │
│ Input:                                                                      │
│   script.db.conversation_history    (full message list)                     │
│   script.db.compact_sleep_threshold (0.7 default)                           │
│   script.db.compact_preserve_window (20 default)                            │
│   script.db.max_context_tokens      (100000 default)                        │
│                                                                             │
│ Check: token_count / max_tokens >= 0.7                                      │
│   └─> If below threshold: return {skipped: true}                            │
│   └─> If history <= preserve_window: return {skipped: true}                 │
└─────────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼ (threshold exceeded)
┌─────────────────────────────────────────────────────────────────────────────┐
│ SPLIT HISTORY                                                               │
│ ─────────────────────────────────────────────────────────────────────────── │
│ messages_to_compact  = history[:-20]     (older messages)                   │
│ messages_to_preserve = history[-20:]     (recent messages)                  │
│                                                                             │
│ Example with 50 messages:                                                   │
│   [msg0, msg1, ... msg29] → TO COMPACT (30 messages)                        │
│   [msg30, msg31, ... msg49] → TO PRESERVE (20 messages)                     │
└─────────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ LLM SUMMARY CALL (generate_context_summary)                                 │
│ ─────────────────────────────────────────────────────────────────────────── │
│ IMPORTANT: This is a SEPARATE LLM call with DIFFERENT context               │
│                                                                             │
│ Payload sent to LLM:                                                        │
│   [system: DEFAULT_COMPACTION_PROMPT]                                       │
│   [user: formatted messages_to_compact only]                                │
│                                                                             │
│ NOT the full conversation_history - only the old messages being summarized  │
│                                                                             │
│ Returns: Summary text (string)                                              │
└─────────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ BUFFER REPLACEMENT                                                          │
│ ─────────────────────────────────────────────────────────────────────────── │
│ Create compaction marker:                                                   │
│   {                                                                         │
│     role: "system",                                                         │
│     content: "[CONTEXT SUMMARY]\n{summary}",                                │
│     metadata: {type: "compaction", compacted_count: 30, ...}                │
│   }                                                                         │
│                                                                             │
│ New history = [compaction_marker] + messages_to_preserve                    │
│                                                                             │
│ Before: 50 messages (~70,000 tokens)                                        │
│ After:  21 messages (~15,000 tokens) = 78% reduction                        │
└─────────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ JOURNAL ARCHIVE                                                             │
│ ─────────────────────────────────────────────────────────────────────────── │
│ character.db.journal["entries"].append({                                    │
│   id: "compact_20251206_171500",                                            │
│   content: "[CONTEXT SYNTHESIS]\n{summary}",                                │
│   source_type: "compaction",                                                │
│   importance: 7,                                                            │
│   tags: ["compaction", "synthesis"]                                         │
│ })                                                                          │
│                                                                             │
│ Purpose: Archive compacted context for future retrieval                     │
└─────────────────────────────────────────────────────────────────────────────┘

2. Emergency Compaction

Occurs at the start of any tick when context usage exceeds 80%.

Trigger Point

assistant_script.py::at_tick()
  └─> Check: token_count / max_tokens >= 0.8
        └─> rag_memory.py::run_pre_compaction_extraction(max_iterations=3)
        └─> rag_memory.py::compact_conversation_history(script, character, force=True)

Data Flow

┌─────────────────────────────────────────────────────────────────────────────┐
│ TICK START (awake mode)                                                     │
└─────────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ EMERGENCY CHECK (before any other processing)                               │
│ ─────────────────────────────────────────────────────────────────────────── │
│ Conditions:                                                                 │
│   - operating_mode == "awake"                                               │
│   - compact_enabled == True                                                 │
│   - conversation_history not empty                                          │
│   - token_count / max_tokens >= compact_emergency_threshold (0.8)           │
│                                                                             │
│ If triggered:                                                               │
│   logger.log_warn("Emergency compaction triggered: 85% context usage")      │
│   yield run_pre_compaction_extraction(max_iterations=3)                     │
│   yield compact_conversation_history(script, character, force=True)         │
└─────────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ PRE-COMPACTION EXTRACTION (limited iterations)                              │
│ ─────────────────────────────────────────────────────────────────────────── │
│ max_iterations=3 (vs 5 for sleep) to minimize delay                         │
│ Assistant journals critical facts while context still available             │
└─────────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ COMPACTION (same flow as sleep, but force=True)                             │
│ ─────────────────────────────────────────────────────────────────────────── │
│ force=True means:                                                           │
│   - Uses emergency_threshold (0.8) not sleep_threshold (0.7)                │
│   - Skips the "below threshold" check                                       │
│   - Still respects preserve_window                                          │
└─────────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ NORMAL TICK PROCESSING CONTINUES                                            │
│ ─────────────────────────────────────────────────────────────────────────── │
│ With reduced context, tick can now proceed safely                           │
└─────────────────────────────────────────────────────────────────────────────┘

3. Pre-Compaction Fact Extraction

Status: Implemented (commits df19f0ad1, 042fa758b)

Allows the assistant to journal important facts while full context is still available, before compaction discards older messages.

Key Insight: Context Timing

┌─────────────────────────────────────────────────────────────────────────────┐
│ WHY THIS WORKS                                                              │
│ ─────────────────────────────────────────────────────────────────────────── │
│                                                                             │
│ The "journal facts" LLM call and the "compaction summary" LLM call are      │
│ SEPARATE calls with DIFFERENT payloads:                                     │
│                                                                             │
│ 1. Journal facts call:                                                      │
│    - Submits: FULL conversation_history (~80% of context limit)             │
│    - Returns: Tool calls to journal facts                                   │
│                                                                             │
│ 2. Tool execution (LOCAL, no LLM call):                                     │
│    - Executes tool calls                                                    │
│    - Adds results to LOCAL buffer (now ~85%)                                │
│    - Facts saved to persistent journal                                      │
│                                                                             │
│ 3. Compaction summary call:                                                 │
│    - Submits: ONLY messages_to_compact (NOT full buffer)                    │
│    - Buffer size doesn't matter - it's a different payload                  │
│                                                                             │
│ 4. Buffer replacement:                                                      │
│    - Replaces history with summary + preserved window                       │
│    - Buffer now ~25% of limit                                               │
│                                                                             │
│ Tool calls from step 2 land in preserved window and age out naturally.      │
│ Even if buffer temporarily exceeds 100%, we never submit that buffer.       │
└─────────────────────────────────────────────────────────────────────────────┘

Implementation

┌─────────────────────────────────────────────────────────────────────────────┐
│ rag_memory.py::run_pre_compaction_extraction()                              │
│ ─────────────────────────────────────────────────────────────────────────── │
│                                                                             │
│ Entry conditions:                                                           │
│   - pre_compact_extraction_enabled == True (default)                        │
│   - conversation_history has >= 5 messages                                  │
│                                                                             │
│ Context type: CONTEXT_PRE_COMPACTION                                        │
│   - Tools: noop, add_journal_entry, update_entity_observation               │
│   - Execution mode: react_loop                                              │
│   - Max iterations: 5 (sleep) or 3 (emergency)                              │
│                                                                             │
│ Loop termination:                                                           │
│   - LLM returns noop (explicit completion)                                  │
│   - LLM returns no tool call                                                │
│   - LLM requests non-journal tool (stops to prevent scope creep)            │
│   - Max iterations reached                                                  │
│                                                                             │
│ Returns:                                                                    │
│   {                                                                         │
│     "success": True,                                                        │
│     "facts_recorded": 3,      # Number of journal entries created           │
│     "iterations": 4           # Loops executed before termination           │
│   }                                                                         │
└─────────────────────────────────────────────────────────────────────────────┘

Data Flow

┌─────────────────────────────────────────────────────────────────────────────┐
│ COMPACTION TRIGGER (sleep or emergency)                                     │
│ Context: ~70-80% full                                                       │
└─────────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ PRE-COMPACTION REACT LOOP                                                   │
│ ─────────────────────────────────────────────────────────────────────────── │
│ Set context signal: script.db.context_signals["pre_compaction"] = True      │
│                                                                             │
│ Prompt context type: "pre_compaction"                                       │
│   - Instructs assistant to review context                                   │
│   - Focus on journaling important facts                                     │
│   - Use journal tools (add_journal_entry, update_entity_observation)        │
│                                                                             │
│ Loop (max 3-5 iterations):                                                  │
│   ┌─────────────────────────────────────────────────────────────────────┐   │
│   │ Iteration 1                                                         │   │
│   │ LLM sees: full context + "extract important facts" prompt           │   │
│   │ LLM returns: tool_call(add_journal_entry, "Alice is allergic...")   │   │
│   │ Execute: fact saved to journal, result added to buffer              │   │
│   └─────────────────────────────────────────────────────────────────────┘   │
│                         │                                                   │
│                         ▼                                                   │
│   ┌─────────────────────────────────────────────────────────────────────┐   │
│   │ Iteration 2                                                         │   │
│   │ LLM sees: context + previous tool result                            │   │
│   │ LLM returns: tool_call(add_journal_entry, "Quest deadline...")      │   │
│   │ Execute: fact saved to journal, result added to buffer              │   │
│   └─────────────────────────────────────────────────────────────────────┘   │
│                         │                                                   │
│                         ▼                                                   │
│   ┌─────────────────────────────────────────────────────────────────────┐   │
│   │ Iteration 3                                                         │   │
│   │ LLM sees: context + previous tool results                           │   │
│   │ LLM returns: noop (assistant judges extraction complete)            │   │
│   │ Exit loop                                                           │   │
│   └─────────────────────────────────────────────────────────────────────┘   │
│                                                                             │
│ Clear signal: script.db.context_signals["pre_compaction"] = False           │
└─────────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ COMPACTION (existing flow)                                                  │
│ ─────────────────────────────────────────────────────────────────────────── │
│ Buffer: ~85% (original + tool call results)                                 │
│ Split: messages_to_compact vs messages_to_preserve                          │
│ Tool calls from extraction are in preserved window (recent)                 │
│ Summary LLM call: only sends messages_to_compact                            │
│ Replacement: buffer drops to ~25%                                           │
└─────────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ RESULT                                                                      │
│ ─────────────────────────────────────────────────────────────────────────── │
│ - Important facts preserved in journal (permanent)                          │
│ - Context summary includes synthesized information                          │
│ - Full context meaning captured before detail lost                          │
│ - Buffer ready for next tick cycle                                          │
└─────────────────────────────────────────────────────────────────────────────┘

4. Configuration Reference

Attribute Default Description
compact_enabled True Master switch for compaction
compact_sleep_threshold 0.7 Trigger during sleep at 70%
compact_emergency_threshold 0.8 Trigger emergency at 80%
compact_preserve_window 20 Messages to keep intact
compact_model None Override model for summary (None = main)
compact_prompt None Override prompt (None = default)
max_context_tokens 100000 Context limit for calculations
pre_compact_extraction_enabled True Enable pre-compaction fact extraction
pre_compact_max_iterations 5 Max extraction loop cycles
pre_compact_prompt None Custom extraction prompt (None = default)

5. Key Files

File Key Functions Purpose
assistant_script.py Config attributes Compaction settings
assistant_script.py at_tick() Emergency compaction check
rag_memory.py compact_conversation_history() Main compaction logic
rag_memory.py run_pre_compaction_extraction() Pre-compaction fact extraction
rag_memory.py run_sleep_tick() Sleep phase integration
llm_interaction.py generate_context_summary() LLM summary generation
helpers/execution.py count_conversation_tokens() Token counting
prompt_contexts.py CONTEXT_PRE_COMPACTION Pre-compaction context type

Document created: 2025-12-06 Updated: 2025-12-06 - Pre-compaction extraction implemented Updated: 2025-12-09 - Converted to semantic refs, updated helpers/ package paths Related issue: #10 (closed)