blightbow/evennia_ai

Fork 0

Table of Contents

Architecture: Task Assessment

Overview
1. Assessment Output

Complexity Score Mapping

2. Event Classification

Classification Categories
Classification Function

3. Context Signals

Token Pressure Levels
Error Recovery
Goal Urgency

4. LLM Assessment

Usage
Assessment Prompt
Enabling LLM Assessment

5. Heuristic Assessment

Usage
Heuristic Rules

6. Assessment Flow
7. Integration with Execution Patterns
Key Files

Architecture: Task Assessment

Layer 3 - LLM-Driven Complexity Evaluation

Overview

Task assessment is the third layer of the three-layer execution pattern system. It provides runtime complexity evaluation that can only restrict (never expand) what Layers 1 and 2 allow:

Layer 1 (Static Config) ⊇ Layer 2 (Context Config) ⊇ Layer 3 (LLM Assessment)

Features:

LLM-driven assessment - Full complexity analysis via prompt
Heuristic fallback - Fast path without LLM call
Event classification - Categorizes pending events
Context signals - Token pressure, errors, goal urgency

1. Assessment Output

Both LLM and heuristic assessment return the same schema:

{
    "complexity_score": int,          # 1-5 scale
    "recommended_mode": str,          # "single_action" | "react_loop" | None
    "recommended_iterations": int,    # Max iterations for react_loop
    "recommend_confirm_dangerous": bool,
    "reasoning": str,                 # Explanation
    "context_signals": dict,          # Runtime signals (heuristic only)
}

Complexity Score Mapping

Score	Meaning	Default Mode	Iterations
1-2	Simple task	single_action	1
3-4	Moderate complexity	react_loop	3-5
5	Complex multi-step	react_loop	max

2. Event Classification

Events are classified to inform pattern selection (task_assessment.py).

Classification Categories

Category	Keywords	Event Types	Pattern Impact
`communication`	say, page, whisper, reply	say, page, whisper, channel	single_action (await response)
`building`	spawn, create, build, destroy	build, spawn, create	react_loop + confirm_dangerous
`observation`	look, examine, inspect	look, examine	single_action
`movement`	go, move, walk, enter	move, travel	single_action
`query`	what, who, where, how, ?	query, help	react_loop (2 iterations)
`goal_action`	goal, task, complete	goal, task	react_loop
`unknown`	—	—	Default handling

Classification Function

from evennia.contrib.base_systems.ai.task_assessment import classify_event_content

event = {"type": "say", "message": "Hello there!"}
category = classify_event_content(event)  # → "communication"

For multiple events:

from evennia.contrib.base_systems.ai.task_assessment import classify_events

events = [event1, event2, event3]
dominant_class, class_counts = classify_events(events)
# dominant_class = "communication"
# class_counts = {"communication": 2, "query": 1}

3. Context Signals

Runtime context information for pattern selection.

from evennia.contrib.base_systems.ai.task_assessment import build_context_signals

signals = build_context_signals(script)
# {
#     "event_class": "communication",
#     "event_class_counts": {"communication": 2},
#     "token_pressure": "medium",
#     "token_usage_pct": 45.2,
#     "recent_errors": 0,
#     "goal_urgency": "high",
# }

Token Pressure Levels

Level	Usage	Effect
`low`	< 40%	Normal operation
`medium`	40-60%	No restrictions
`high`	60-80%	Reduced iterations
`critical`	> 80%	Forced single_action

Error Recovery

Consecutive Errors	Effect
2+	Reduced iterations
3+	Forced single_action + confirm_dangerous

Goal Urgency

Priority	Effect
`critical`, `high`	+1 iteration (if not under pressure)
`medium`, `low`, `none`	No modifier

4. LLM Assessment

Full complexity analysis via LLM prompt.

Usage

from evennia.contrib.base_systems.ai.task_assessment import assess_task_complexity

@inlineCallbacks
def example():
    # Only runs if execution_config.task_assessment_enabled = True
    assessment = yield assess_task_complexity(script)

    if assessment:
        # Use with select_execution_pattern()
        pattern = select_execution_pattern(static, context, assessment)

Assessment Prompt

The LLM receives:

Operating mode, token usage %
Recent tools used
Conversation history size
Pending events (up to 5)
Current goals (up to 5)

And returns JSON with complexity_score (1-5) plus recommendations.

Enabling LLM Assessment

script.db.execution_config = {
    "task_assessment_enabled": True,  # Enable Layer 3
    ...
}

5. Heuristic Assessment

Fast path without LLM call.

Usage

from evennia.contrib.base_systems.ai.task_assessment import (
    get_quick_assessment,
    get_quick_assessment_for_script,
)

# Direct call with events/goals
assessment = get_quick_assessment(pending_events, current_goals, context_signals)

# Convenience wrapper using script
assessment = get_quick_assessment_for_script(script)

Heuristic Rules

Condition	Complexity	Mode	Iterations
Communication event	1	single_action	1
Building event	4	react_loop	3
Observation event	1	single_action	1
Query event	2	react_loop	2
Autonomous (no events/goals)	1	single_action	1
Goal pursuit	3+	react_loop	goals+2
Default events	2+	varies	events+1

Context signal adjustments are applied after base pattern selection.

6. Assessment Flow

Tick Event
    │
    ▼
Check: task_assessment_enabled?
    │
    ├── YES ─────────────────────────────────────┐
    │                                            │
    │   assess_task_complexity(script)           │
    │       │                                    │
    │       ├── Build context summary            │
    │       ├── Format assessment prompt         │
    │       ├── Call LLM                         │
    │       ├── Parse JSON response              │
    │       └── Apply complexity heuristics      │
    │                                            │
    │   Returns: assessment dict                 │
    │                                            │
    └── NO ──────────────────────────────────────┐
                                                 │
    get_quick_assessment_for_script(script)      │
        │                                        │
        ├── build_context_signals()              │
        ├── classify_events()                    │
        ├── Apply base rules per event_class     │
        └── Apply context signal adjustments     │
                                                 │
    Returns: assessment dict                     │
                                                 │
    ▼
select_execution_pattern(static, context, assessment)
    │
    ▼
Execute with selected pattern

7. Integration with Execution Patterns

Assessment feeds into pattern selection:

from evennia.contrib.base_systems.ai.prompt_contexts import select_execution_pattern

# Layer 1: Static config (from execution_config)
# Layer 2: Context config (from context type)
# Layer 3: Assessment (from LLM or heuristics)

effective_pattern = select_execution_pattern(
    static_config,
    context_config,
    assessment  # Can only restrict, never expand
)

Key Files

File	Lines	Purpose
`task_assessment.py`	44-114	Event classification
`task_assessment.py`	162-310	Context signal calculation
`task_assessment.py`	314-427	Assessment prompt and context building
`task_assessment.py`	429-518	Response parsing and heuristics
`task_assessment.py`	520-584	`assess_task_complexity()` main function
`task_assessment.py`	587-726	`get_quick_assessment()` heuristic path