Architecture: LLM Providers
Layer 2 - Unified LLM Client and Provider Abstraction
Overview
The LLM provider system provides a Twisted-native, provider-agnostic interface for language model API calls:
- UnifiedLLMClient - Single interface for all providers
- LLMProvider ABC - Provider-specific request/response formatting
- Response models - Standardized output with token usage and rate limits
- Retry logic - Exponential backoff with jitter
- Circuit breaker integration - Fault isolation (see Architecture-Resilience-System)
1. UnifiedLLMClient
The main interface for LLM API calls (llm/client.py).
Initialization
from evennia.contrib.base_systems.ai.llm import UnifiedLLMClient
client = UnifiedLLMClient(
provider="openai", # "openai" | "anthropic" | "openrouter" | "ollama" | "local"
auth_token="sk-...", # API key
model="gpt-4", # Model name
retry_config=RetryConfig(), # Optional retry settings
circuit_breaker=breaker, # Optional CircuitBreaker
debug=False, # Enable debug logging
# Provider-specific options:
app_name="My App", # OpenRouter X-Title header
site_url="https://...", # OpenRouter HTTP-Referer
)
Chat Completion
@inlineCallbacks
def example():
response = yield client.chat_completion(
messages=[
{"role": "system", "content": "You are helpful."},
{"role": "user", "content": "Hello!"}
],
tools=[...], # Optional tool schemas
tool_choice="auto", # "auto" | "none" | specific tool
temperature=0.7, # Provider-specific kwargs
max_tokens=1000,
)
print(response.content)
print(response.tool_calls)
print(response.usage.total_tokens)
Key Properties
| Property | Type | Description |
|---|---|---|
provider_name |
str | Current provider name |
model |
str | Current model name |
supports_tools |
bool | Whether provider supports tool calling |
rate_limit_info |
RateLimitInfo | Rate limits from last request |
circuit_breaker |
CircuitBreaker | Circuit breaker if configured |
Token Counting
# Uses provider-specific tokenizer (tiktoken for OpenAI, estimation for others)
tokens = client.count_tokens("Some text to count")
Tool Result Formatting
# Format tool result message for current provider
tool_msg = client.format_tool_result(tool_call_id="call_123", result={"status": "ok"})
# Returns provider-specific format:
# OpenAI: {"role": "tool", "tool_call_id": "...", "content": "..."}
# Anthropic: {"role": "user", "content": [{"type": "tool_result", ...}]}
2. LLMProvider ABC
Abstract base class for provider implementations (llm/providers/base.py).
Required Methods
| Method | Purpose |
|---|---|
build_headers(auth_token) |
Build Twisted-compatible request headers |
build_request(messages, tools, **kwargs) |
Build provider-specific request body |
parse_response(response_json, headers) |
Parse response into LLMResponse |
count_tokens(text) |
Count tokens using provider's tokenizer |
format_tool_result(tool_call_id, result) |
Format tool result message |
Provider Metadata
class MyProvider(LLMProvider):
name = "myprovider"
supports_tools = True # Tool calling support
supports_streaming = True # Streaming support
supports_chat = True # Chat format support
Optional Methods
| Method | Default Behavior |
|---|---|
get_api_url() |
Returns self.api_url |
parse_rate_limit_headers(headers) |
Creates RateLimitInfo from standard headers |
parse_error_response(response_json) |
Extracts error message from response |
3. Available Providers
Factory function get_provider() creates provider instances.
| Provider | Name | Default Model | Tool Support |
|---|---|---|---|
| OpenAI | "openai" |
gpt-4 | Yes |
| Anthropic | "anthropic" |
claude-3-5-sonnet-20241022 | Yes |
| OpenRouter | "openrouter" |
openai/gpt-4o-mini | Yes |
| Ollama | "ollama" |
llama3.2 | Varies |
| Local | "local" |
None | No |
OpenAI Provider
Supports OpenAI API and OpenAI-compatible endpoints (Azure, etc.).
# Standard OpenAI
client = UnifiedLLMClient(provider="openai", auth_token="sk-...", model="gpt-4")
# Azure OpenAI (custom endpoint)
client = UnifiedLLMClient(
provider="openai",
api_url="https://myinstance.openai.azure.com/...",
auth_token="...",
model="gpt-4"
)
Anthropic Provider
Native Claude API support with prompt caching.
client = UnifiedLLMClient(
provider="anthropic",
auth_token="sk-ant-...",
model="claude-3-5-sonnet-20241022"
)
OpenRouter Provider
Multi-provider gateway with attribution headers.
client = UnifiedLLMClient(
provider="openrouter",
auth_token="sk-or-...",
model="anthropic/claude-3.5-sonnet",
app_name="My Evennia Game", # X-Title header
site_url="https://mygame.com", # HTTP-Referer header
)
Local Providers
For local model servers (Ollama, text-generation-webui).
# Ollama
client = UnifiedLLMClient(
provider="ollama",
api_url="http://localhost:11434/api/chat",
model="llama3.2"
)
# Generic local (OpenAI-compatible)
client = UnifiedLLMClient(
provider="local",
api_url="http://localhost:5000/v1/chat/completions",
model="my-model"
)
4. Response Models
Standardized response structures (llm/responses.py).
LLMResponse
@dataclass
class LLMResponse:
content: str = "" # Text content
tool_calls: List[ToolCall] = [] # Parsed tool calls
finish_reason: str = "" # "stop" | "tool_calls" | "error" | ...
usage: Optional[TokenUsage] = None # Token statistics
rate_limit_info: Optional[RateLimitInfo] = None
model: str = "" # Model used
provider: str = "" # Provider name
raw_response: Optional[Dict] = None # Original response
@property
def has_tool_calls(self) -> bool:
return len(self.tool_calls) > 0
ToolCall
@dataclass
class ToolCall:
id: str # Tool call ID for result pairing
name: str # Tool name
arguments: Dict[str, Any] # Parsed arguments
TokenUsage
@dataclass
class TokenUsage:
prompt_tokens: int = 0
completion_tokens: int = 0
total_tokens: int = 0
# Cache statistics (provider-specific)
cache_creation_tokens: int = 0 # Anthropic: tokens written to cache
cache_read_tokens: int = 0 # Anthropic: tokens from cache
cached_tokens: int = 0 # OpenAI: automatic caching
@property
def cache_hit_tokens(self) -> int:
"""Total tokens served from cache (works for both providers)."""
return self.cache_read_tokens + self.cached_tokens
RateLimitInfo
@dataclass
class RateLimitInfo:
remaining_tokens: Optional[int] = None
remaining_requests: Optional[int] = None
reset_tokens: Optional[str] = None
reset_requests: Optional[str] = None
limit_tokens: Optional[int] = None # OpenRouter
limit_requests: Optional[int] = None # OpenRouter
def is_approaching_limit(self, token_threshold=1000, request_threshold=10) -> bool:
"""Check if approaching rate limits."""
5. Retry Configuration
Exponential backoff with "Full Jitter" algorithm (llm/responses.py).
@dataclass
class RetryConfig:
max_attempts: int = 3
backoff_base: float = 1.0 # Initial delay (seconds)
backoff_max: float = 30.0 # Maximum delay
backoff_multiplier: float = 2.0 # Exponential factor
jitter: bool = True # Random jitter (prevents thundering herd)
transient_status_codes: Set[int] = {429, 500, 502, 503, 504}
Delay Calculation
With jitter enabled (default):
delay = random(0, min(backoff_max, backoff_base * 2^attempt))
Without jitter:
delay = min(backoff_max, backoff_base * 2^attempt)
Retryable Errors
Only these HTTP status codes trigger retry:
429- Rate limited500- Internal server error502- Bad gateway503- Service unavailable504- Gateway timeout
6. Request Flow
client.chat_completion(messages, tools, **kwargs)
│
▼
┌──────────────────────────────────────────────────────────────────┐
│ UnifiedLLMClient._request_with_retry() │
├──────────────────────────────────────────────────────────────────┤
│ 1. Check circuit breaker │
│ if breaker.is_available == False: │
│ raise CircuitOpenError │
│ │
│ 2. For attempt in range(max_attempts): │
│ status, body, headers = _make_request(body) │
│ │
│ if status == 200: │
│ breaker.record_success() │
│ return (status, body, headers) │
│ │
│ if is_retryable(status): │
│ breaker.record_failure() │
│ delay = get_delay(attempt) # with jitter │
│ yield deferLater(delay) │
│ │
│ 3. All retries exhausted → return error response │
└──────────────────────────────────────────────────────────────────┘
│
▼
client._parse_response(status_code, response_bytes, headers)
│
├── Update rate_limit_info from headers
├── Log warning if approaching limits
├── Handle error responses (status != 200)
└── Parse successful response via provider.parse_response()
│
▼
LLMResponse (standardized output)
Key Files
| File | Lines | Purpose |
|---|---|---|
llm/client.py |
1-432 | UnifiedLLMClient implementation |
llm/providers/base.py |
1-174 | LLMProvider ABC |
llm/providers/__init__.py |
1-70 | get_provider() factory |
llm/providers/openai.py |
1-219 | OpenAI/compatible provider |
llm/providers/anthropic.py |
— | Anthropic Claude provider |
llm/providers/openrouter.py |
— | OpenRouter provider |
llm/providers/local.py |
— | Ollama and local server providers |
llm/responses.py |
1-170 | Response dataclasses |
See also: Architecture-Resilience-System | Architecture-Core-Engine | Data-Flow-08-LLM-Provider-Interaction