1 Architecture Task Assessment
blightbow edited this page 2025-12-08 04:13:18 +00:00

Architecture: Task Assessment

Layer 3 - LLM-Driven Complexity Evaluation


Overview

Task assessment is the third layer of the three-layer execution pattern system. It provides runtime complexity evaluation that can only restrict (never expand) what Layers 1 and 2 allow:

Layer 1 (Static Config) ⊇ Layer 2 (Context Config) ⊇ Layer 3 (LLM Assessment)

Features:

  • LLM-driven assessment - Full complexity analysis via prompt
  • Heuristic fallback - Fast path without LLM call
  • Event classification - Categorizes pending events
  • Context signals - Token pressure, errors, goal urgency

1. Assessment Output

Both LLM and heuristic assessment return the same schema:

{
    "complexity_score": int,          # 1-5 scale
    "recommended_mode": str,          # "single_action" | "react_loop" | None
    "recommended_iterations": int,    # Max iterations for react_loop
    "recommend_confirm_dangerous": bool,
    "reasoning": str,                 # Explanation
    "context_signals": dict,          # Runtime signals (heuristic only)
}

Complexity Score Mapping

Score Meaning Default Mode Iterations
1-2 Simple task single_action 1
3-4 Moderate complexity react_loop 3-5
5 Complex multi-step react_loop max

2. Event Classification

Events are classified to inform pattern selection (task_assessment.py).

Classification Categories

Category Keywords Event Types Pattern Impact
communication say, page, whisper, reply say, page, whisper, channel single_action (await response)
building spawn, create, build, destroy build, spawn, create react_loop + confirm_dangerous
observation look, examine, inspect look, examine single_action
movement go, move, walk, enter move, travel single_action
query what, who, where, how, ? query, help react_loop (2 iterations)
goal_action goal, task, complete goal, task react_loop
unknown Default handling

Classification Function

from evennia.contrib.base_systems.ai.task_assessment import classify_event_content

event = {"type": "say", "message": "Hello there!"}
category = classify_event_content(event)  # → "communication"

For multiple events:

from evennia.contrib.base_systems.ai.task_assessment import classify_events

events = [event1, event2, event3]
dominant_class, class_counts = classify_events(events)
# dominant_class = "communication"
# class_counts = {"communication": 2, "query": 1}

3. Context Signals

Runtime context information for pattern selection.

from evennia.contrib.base_systems.ai.task_assessment import build_context_signals

signals = build_context_signals(script)
# {
#     "event_class": "communication",
#     "event_class_counts": {"communication": 2},
#     "token_pressure": "medium",
#     "token_usage_pct": 45.2,
#     "recent_errors": 0,
#     "goal_urgency": "high",
# }

Token Pressure Levels

Level Usage Effect
low < 40% Normal operation
medium 40-60% No restrictions
high 60-80% Reduced iterations
critical > 80% Forced single_action

Error Recovery

Consecutive Errors Effect
2+ Reduced iterations
3+ Forced single_action + confirm_dangerous

Goal Urgency

Priority Effect
critical, high +1 iteration (if not under pressure)
medium, low, none No modifier

4. LLM Assessment

Full complexity analysis via LLM prompt.

Usage

from evennia.contrib.base_systems.ai.task_assessment import assess_task_complexity

@inlineCallbacks
def example():
    # Only runs if execution_config.task_assessment_enabled = True
    assessment = yield assess_task_complexity(script)

    if assessment:
        # Use with select_execution_pattern()
        pattern = select_execution_pattern(static, context, assessment)

Assessment Prompt

The LLM receives:

  • Operating mode, token usage %
  • Recent tools used
  • Conversation history size
  • Pending events (up to 5)
  • Current goals (up to 5)

And returns JSON with complexity_score (1-5) plus recommendations.

Enabling LLM Assessment

script.db.execution_config = {
    "task_assessment_enabled": True,  # Enable Layer 3
    ...
}

5. Heuristic Assessment

Fast path without LLM call.

Usage

from evennia.contrib.base_systems.ai.task_assessment import (
    get_quick_assessment,
    get_quick_assessment_for_script,
)

# Direct call with events/goals
assessment = get_quick_assessment(pending_events, current_goals, context_signals)

# Convenience wrapper using script
assessment = get_quick_assessment_for_script(script)

Heuristic Rules

Condition Complexity Mode Iterations
Communication event 1 single_action 1
Building event 4 react_loop 3
Observation event 1 single_action 1
Query event 2 react_loop 2
Autonomous (no events/goals) 1 single_action 1
Goal pursuit 3+ react_loop goals+2
Default events 2+ varies events+1

Context signal adjustments are applied after base pattern selection.


6. Assessment Flow

Tick Event
    │
    ▼
Check: task_assessment_enabled?
    │
    ├── YES ─────────────────────────────────────┐
    │                                            │
    │   assess_task_complexity(script)           │
    │       │                                    │
    │       ├── Build context summary            │
    │       ├── Format assessment prompt         │
    │       ├── Call LLM                         │
    │       ├── Parse JSON response              │
    │       └── Apply complexity heuristics      │
    │                                            │
    │   Returns: assessment dict                 │
    │                                            │
    └── NO ──────────────────────────────────────┐
                                                 │
    get_quick_assessment_for_script(script)      │
        │                                        │
        ├── build_context_signals()              │
        ├── classify_events()                    │
        ├── Apply base rules per event_class     │
        └── Apply context signal adjustments     │
                                                 │
    Returns: assessment dict                     │
                                                 │
    ▼
select_execution_pattern(static, context, assessment)
    │
    ▼
Execute with selected pattern

7. Integration with Execution Patterns

Assessment feeds into pattern selection:

from evennia.contrib.base_systems.ai.prompt_contexts import select_execution_pattern

# Layer 1: Static config (from execution_config)
# Layer 2: Context config (from context type)
# Layer 3: Assessment (from LLM or heuristics)

effective_pattern = select_execution_pattern(
    static_config,
    context_config,
    assessment  # Can only restrict, never expand
)

Key Files

File Lines Purpose
task_assessment.py 44-114 Event classification
task_assessment.py 162-310 Context signal calculation
task_assessment.py 314-427 Assessment prompt and context building
task_assessment.py 429-518 Response parsing and heuristics
task_assessment.py 520-584 assess_task_complexity() main function
task_assessment.py 587-726 get_quick_assessment() heuristic path

See also: Architecture-Context-System | Architecture-Core-Engine | Data-Flow-02-ReAct-Loop