Page:
Architecture Token Management
Pages
Architecture Commands and API
Architecture Context System
Architecture Core Engine
Architecture Event Sourcing
Architecture Generative Reflection
Architecture Helpers
Architecture Journal System
Architecture LLM Interaction
Architecture LLM Providers
Architecture Logging
Architecture Memory and Sleep
Architecture Overview
Architecture Persona Protection
Architecture Prompt System
Architecture RAG Implementation
Architecture Resilience System
Architecture Safety System
Architecture Self Management
Architecture Sub Agent Delegation
Architecture Task Assessment
Architecture Token Management
Architecture Tool System
Configuration Reference
Context and Memory Flow Analysis
Data Flow 01 Context Compaction
Data Flow 02 ReAct Loop
Data Flow 03 Memory Consolidation
Data Flow 04 Message Classification
Data Flow 05 Entity Profile System
Data Flow 06 Tool Execution
Data Flow 07 Sleep Mode Transitions
Data Flow 08 LLM Provider Interaction
Data Flow 09 Self Management Operations
Home
LLM Decision Patterns
Research Foundations
User Guide 00 Index
User Guide 01 Getting Started
User Guide 02 Configuration and Customization
User Guide 03 Advanced Capabilities
User Guide 04 Troubleshooting
No results
1
Architecture Token Management
blightbow edited this page 2025-12-08 04:54:34 +00:00
Architecture: Token Management
Infrastructure - Token Counting, Budget Analysis, and Prompt Health
Overview
The token management system provides utilities for context window management:
- Token counting - Tiered fallback: toksum → tiktoken → heuristic
- Budget calculation - Per-component token allocation and positioning
- Prompt health - Validation and quality checks for prompt components
1. Token Counter
Provides accurate token counting with graceful fallback (utils/token_counter.py).
Tiered Fallback
| Tier | Library | Coverage | Accuracy |
|---|---|---|---|
| 1 | toksum | OpenAI, Anthropic, Google, Meta, Mistral | Native tokenizers |
| 2 | tiktoken | OpenAI models | Accurate |
| 3 | Heuristic | All | ~4 chars/token estimate |
Usage
from evennia.contrib.base_systems.ai.utils.token_counter import (
count_tokens,
count_tokens_batch,
)
# Single text
tokens = count_tokens("Your prompt text", model="gpt-4")
# Batch (efficient)
token_counts = count_tokens_batch(["text1", "text2", "text3"], model="gpt-4")
# OpenRouter prefixed models work automatically
tokens = count_tokens("Hello", model="openai/gpt-4") # Normalized to gpt-4
Model Limits
from evennia.contrib.base_systems.ai.utils.token_counter import get_model_max_tokens
max_tokens = get_model_max_tokens("gpt-4") # 128000
max_tokens = get_model_max_tokens("claude-3-opus") # 200000
Utilization Tracking
from evennia.contrib.base_systems.ai.utils.token_counter import (
calculate_utilization,
get_utilization_status,
)
utilization = calculate_utilization(64000, "gpt-4") # 50.0%
status = get_utilization_status(85.0)
# {"level": "warning", "color": "warning", "message": "Approaching token limit"}
| Threshold | Level | Message |
|---|---|---|
| < 70% | ok | Token budget healthy |
| 70-85% | warning | Approaching token limit |
| 85-95% | danger | Critical: Near token limit |
| > 95% | critical | Token limit exceeded |
2. Token Budget Calculator
Analyzes token allocation across prompt components (utils/token_budget.py).
Usage
from evennia.contrib.base_systems.ai.utils.token_budget import TokenBudgetCalculator
calculator = TokenBudgetCalculator(model="gpt-4")
components = [
{"label": "System Prompt", "content": "You are...", "priority": "high"},
{"label": "Project Context", "content": "Project info..."},
]
budget = calculator.calculate_budget(components)
# {
# "components": [...],
# "total_tokens": 3456,
# "max_tokens": 128000,
# "available_tokens": 124544,
# "utilization": 2.7,
# "model": "gpt-4",
# }
Positioning Analysis
Based on "lost in the middle" research - LLMs recall information better at beginning/end of context.
# Position quality assessment
position = calculator._get_position(index=0, total_components=5) # "beginning"
quality = calculator._assess_position_quality("middle", priority="high") # "suboptimal"
# Analyze positioning issues
warnings = calculator.analyze_positioning(budget)
# [{"level": "warning", "message": "Context buffer in middle position (suboptimal)"...}]
Optimization Suggestions
suggestions = calculator.suggest_optimizations(budget)
# Detects: high utilization, large components, positioning issues
Component Impact
impact = calculator.calculate_component_impact(budget, "context_buffer")
# {
# "tokens_saved": 375,
# "new_utilization": 3.2,
# "utilization_change": 0.5,
# }
Visualization Data
bar_data = calculator.generate_stacked_bar_data(budget)
# Returns segments for Bootstrap progress bars or chart libraries
3. Prompt Health Analyzer
Validates prompt components for quality issues (utils/prompt_health.py).
Usage
from evennia.contrib.base_systems.ai.utils.prompt_health import PromptHealthAnalyzer
analyzer = PromptHealthAnalyzer(character=char, script=script)
# Single component
health = analyzer.analyze_component({
"label": "System Prompt",
"content": "You are {{character_name}}...",
})
# {
# "is_healthy": True,
# "issues": [],
# "warnings": 0,
# "errors": 0,
# "token_count": 150,
# }
# All components
report = analyzer.analyze_all_components(components)
# {
# "overall_health": "healthy", # or "warnings", "unhealthy", "critical"
# "total_errors": 0,
# "total_warnings": 0,
# }
Validation Checks
| Check | Level | Description |
|---|---|---|
| Empty content | warning | Component has no content |
| Character limit | error | > 95,000 chars |
| Character limit | warning | > 80,000 chars |
Undefined {{variable}} |
error | Variable not defined (with character context) |
Unrecognized {placeholder} |
warning | Unknown placeholder syntax |
| Unmatched braces | warning | Mismatched { and } count |
| Unclosed code block | warning | Odd number of ``` |
| Excessive whitespace | info | Too many blank lines |
Variable Resolution
# Interpolate {{variables}} with actual values
result = analyzer.interpolate_text("Hello {{character_name}}")
# {
# "interpolated_text": "Hello Aria",
# "variables_found": ["character_name"],
# "variables_resolved": {"character_name": "Aria"},
# "undefined_variables": [],
# }
Built-in Variables
Always available:
character_name,character_keylocation,location_namedate,time,datetimeproject_summary,project_status,project_context,active_project
4. Integration Points
LLM Interaction
Token counting used during message building:
# In llm_interaction.py:build_llm_messages()
from .utils.token_counter import count_tokens
# History is added newest-to-oldest until budget exhausted
for message in reversed(history):
msg_tokens = count_tokens(message["content"], model=model)
if history_tokens + msg_tokens > available_tokens:
break
API Workbench
Token budget exposed via REST API for prompt preview:
# In api/views/mixins/workbench.py
from evennia.contrib.base_systems.ai.utils.token_budget import TokenBudgetCalculator
calculator = TokenBudgetCalculator(model=model)
budget = calculator.calculate_budget(components)
Key Files
| File | Lines | Purpose |
|---|---|---|
utils/token_counter.py |
40-78 | Lazy library loading |
utils/token_counter.py |
84-108 | Model name normalization |
utils/token_counter.py |
110-148 | get_encoding() cache |
utils/token_counter.py |
150-201 | count_tokens() tiered fallback |
utils/token_counter.py |
203-249 | count_tokens_batch() |
utils/token_counter.py |
274-358 | Utilization helpers |
utils/token_budget.py |
21-100 | TokenBudgetCalculator.calculate_budget() |
utils/token_budget.py |
101-150 | Position quality assessment |
utils/token_budget.py |
151-254 | Analysis and visualization |
utils/token_budget.py |
255-338 | Optimization suggestions |
utils/prompt_health.py |
27-131 | PromptHealthAnalyzer.analyze_component() |
utils/prompt_health.py |
228-274 | analyze_all_components() |
utils/prompt_health.py |
451-587 | Variable interpolation |
See also: Architecture-Context-System | Architecture-Prompt-System | Architecture-LLM-Interaction