1 Architecture Token Management
blightbow edited this page 2025-12-08 04:54:34 +00:00

Architecture: Token Management

Infrastructure - Token Counting, Budget Analysis, and Prompt Health


Overview

The token management system provides utilities for context window management:

  • Token counting - Tiered fallback: toksum → tiktoken → heuristic
  • Budget calculation - Per-component token allocation and positioning
  • Prompt health - Validation and quality checks for prompt components

1. Token Counter

Provides accurate token counting with graceful fallback (utils/token_counter.py).

Tiered Fallback

Tier Library Coverage Accuracy
1 toksum OpenAI, Anthropic, Google, Meta, Mistral Native tokenizers
2 tiktoken OpenAI models Accurate
3 Heuristic All ~4 chars/token estimate

Usage

from evennia.contrib.base_systems.ai.utils.token_counter import (
    count_tokens,
    count_tokens_batch,
)

# Single text
tokens = count_tokens("Your prompt text", model="gpt-4")

# Batch (efficient)
token_counts = count_tokens_batch(["text1", "text2", "text3"], model="gpt-4")

# OpenRouter prefixed models work automatically
tokens = count_tokens("Hello", model="openai/gpt-4")  # Normalized to gpt-4

Model Limits

from evennia.contrib.base_systems.ai.utils.token_counter import get_model_max_tokens

max_tokens = get_model_max_tokens("gpt-4")        # 128000
max_tokens = get_model_max_tokens("claude-3-opus") # 200000

Utilization Tracking

from evennia.contrib.base_systems.ai.utils.token_counter import (
    calculate_utilization,
    get_utilization_status,
)

utilization = calculate_utilization(64000, "gpt-4")  # 50.0%

status = get_utilization_status(85.0)
# {"level": "warning", "color": "warning", "message": "Approaching token limit"}
Threshold Level Message
< 70% ok Token budget healthy
70-85% warning Approaching token limit
85-95% danger Critical: Near token limit
> 95% critical Token limit exceeded

2. Token Budget Calculator

Analyzes token allocation across prompt components (utils/token_budget.py).

Usage

from evennia.contrib.base_systems.ai.utils.token_budget import TokenBudgetCalculator

calculator = TokenBudgetCalculator(model="gpt-4")

components = [
    {"label": "System Prompt", "content": "You are...", "priority": "high"},
    {"label": "Project Context", "content": "Project info..."},
]

budget = calculator.calculate_budget(components)
# {
#     "components": [...],
#     "total_tokens": 3456,
#     "max_tokens": 128000,
#     "available_tokens": 124544,
#     "utilization": 2.7,
#     "model": "gpt-4",
# }

Positioning Analysis

Based on "lost in the middle" research - LLMs recall information better at beginning/end of context.

# Position quality assessment
position = calculator._get_position(index=0, total_components=5)  # "beginning"
quality = calculator._assess_position_quality("middle", priority="high")  # "suboptimal"

# Analyze positioning issues
warnings = calculator.analyze_positioning(budget)
# [{"level": "warning", "message": "Context buffer in middle position (suboptimal)"...}]

Optimization Suggestions

suggestions = calculator.suggest_optimizations(budget)
# Detects: high utilization, large components, positioning issues

Component Impact

impact = calculator.calculate_component_impact(budget, "context_buffer")
# {
#     "tokens_saved": 375,
#     "new_utilization": 3.2,
#     "utilization_change": 0.5,
# }

Visualization Data

bar_data = calculator.generate_stacked_bar_data(budget)
# Returns segments for Bootstrap progress bars or chart libraries

3. Prompt Health Analyzer

Validates prompt components for quality issues (utils/prompt_health.py).

Usage

from evennia.contrib.base_systems.ai.utils.prompt_health import PromptHealthAnalyzer

analyzer = PromptHealthAnalyzer(character=char, script=script)

# Single component
health = analyzer.analyze_component({
    "label": "System Prompt",
    "content": "You are {{character_name}}...",
})
# {
#     "is_healthy": True,
#     "issues": [],
#     "warnings": 0,
#     "errors": 0,
#     "token_count": 150,
# }

# All components
report = analyzer.analyze_all_components(components)
# {
#     "overall_health": "healthy",  # or "warnings", "unhealthy", "critical"
#     "total_errors": 0,
#     "total_warnings": 0,
# }

Validation Checks

Check Level Description
Empty content warning Component has no content
Character limit error > 95,000 chars
Character limit warning > 80,000 chars
Undefined {{variable}} error Variable not defined (with character context)
Unrecognized {placeholder} warning Unknown placeholder syntax
Unmatched braces warning Mismatched { and } count
Unclosed code block warning Odd number of ```
Excessive whitespace info Too many blank lines

Variable Resolution

# Interpolate {{variables}} with actual values
result = analyzer.interpolate_text("Hello {{character_name}}")
# {
#     "interpolated_text": "Hello Aria",
#     "variables_found": ["character_name"],
#     "variables_resolved": {"character_name": "Aria"},
#     "undefined_variables": [],
# }

Built-in Variables

Always available:

  • character_name, character_key
  • location, location_name
  • date, time, datetime
  • project_summary, project_status, project_context, active_project

4. Integration Points

LLM Interaction

Token counting used during message building:

# In llm_interaction.py:build_llm_messages()
from .utils.token_counter import count_tokens

# History is added newest-to-oldest until budget exhausted
for message in reversed(history):
    msg_tokens = count_tokens(message["content"], model=model)
    if history_tokens + msg_tokens > available_tokens:
        break

API Workbench

Token budget exposed via REST API for prompt preview:

# In api/views/mixins/workbench.py
from evennia.contrib.base_systems.ai.utils.token_budget import TokenBudgetCalculator

calculator = TokenBudgetCalculator(model=model)
budget = calculator.calculate_budget(components)

Key Files

File Lines Purpose
utils/token_counter.py 40-78 Lazy library loading
utils/token_counter.py 84-108 Model name normalization
utils/token_counter.py 110-148 get_encoding() cache
utils/token_counter.py 150-201 count_tokens() tiered fallback
utils/token_counter.py 203-249 count_tokens_batch()
utils/token_counter.py 274-358 Utilization helpers
utils/token_budget.py 21-100 TokenBudgetCalculator.calculate_budget()
utils/token_budget.py 101-150 Position quality assessment
utils/token_budget.py 151-254 Analysis and visualization
utils/token_budget.py 255-338 Optimization suggestions
utils/prompt_health.py 27-131 PromptHealthAnalyzer.analyze_component()
utils/prompt_health.py 228-274 analyze_all_components()
utils/prompt_health.py 451-587 Variable interpolation

See also: Architecture-Context-System | Architecture-Prompt-System | Architecture-LLM-Interaction