[P0] Fix Tool Timeout Race Condition #6

Closed
opened 2025-12-05 13:48:52 +00:00 by blightbow · 1 comment
Owner

Problem

In tool_execution.py:199-220, if tool.execute() takes longer than the timeout, both callbacks fire. This can cause:

  • Double-callback errors
  • Inconsistent state

Suggested Fix

Use DeferredList with proper cancellation instead of dual callbacks.

Priority

P0 — Immediate (Blocks Production)

Source

Architecture Audit 2025-12-03, Section 1: Async/Concurrency Issues

## Problem In `tool_execution.py:199-220`, if `tool.execute()` takes longer than the timeout, both callbacks fire. This can cause: - Double-callback errors - Inconsistent state ## Suggested Fix Use `DeferredList` with proper cancellation instead of dual callbacks. ## Priority **P0 — Immediate (Blocks Production)** ## Source Architecture Audit 2025-12-03, Section 1: Async/Concurrency Issues
Author
Owner

Resolution

Replaced the buggy dual-callback timeout pattern with Twisted's built-in addTimeout().

The Bug

The original code (lines 199-220):

  1. Created a Deferred d
  2. Called tool.execute() synchronously and immediately fired d.callback(result)
  3. Set up reactor.callLater() to call d.callback() on timeout
  4. Waited with yield d

Problems:

  • For @inlineCallbacks tools (memory, delegation), result was a Deferred object, not the actual result
  • d.callback(deferred_object) fired d with wrong value
  • result.get("success") would fail since result was a Deferred
  • Potential AlreadyCalledError if somehow both paths fired

The Fix

tool_execution.py (lines 200-215):

# Execute tool with proper async handling and timeout
# maybeDeferred handles both sync and @inlineCallbacks tools
tool_d = defer.maybeDeferred(tool.execute, character, **filtered_params)

# Add timeout with proper cancellation
def on_timeout_cancel(failure, timeout_value):
    return {"error": f"Tool execution timed out after {timeout_value}s", "success": False}

tool_d.addTimeout(timeout, reactor, onTimeoutCancel=on_timeout_cancel)

try:
    result = yield tool_d
except Exception as e:
    error_msg = sanitize_error_for_response(e, script, f"Tool {tool_name}")
    result = {"error": error_msg, "success": False}

Benefits:

  • maybeDeferred() handles both sync and async tools correctly
  • addTimeout() uses Twisted's built-in cancellation (no race condition)
  • Single code path for all tool types
  • Proper exception handling

Tests Added

test_tool_execution.py (9 tests):

  • TestMaybeDeferredPattern - sync/async/exception handling
  • TestAddTimeoutPattern - fast completion, timeout, async tool pattern
  • TestNoDoubleCallback - verify no AlreadyCalledError
  • TestToolExecutionSourceVerification - verify fix in source code
## Resolution Replaced the buggy dual-callback timeout pattern with Twisted's built-in `addTimeout()`. ### The Bug The original code (lines 199-220): 1. Created a Deferred `d` 2. Called `tool.execute()` synchronously and immediately fired `d.callback(result)` 3. Set up `reactor.callLater()` to call `d.callback()` on timeout 4. Waited with `yield d` **Problems:** - For `@inlineCallbacks` tools (memory, delegation), `result` was a Deferred object, not the actual result - `d.callback(deferred_object)` fired `d` with wrong value - `result.get("success")` would fail since result was a Deferred - Potential `AlreadyCalledError` if somehow both paths fired ### The Fix **tool_execution.py** (lines 200-215): ```python # Execute tool with proper async handling and timeout # maybeDeferred handles both sync and @inlineCallbacks tools tool_d = defer.maybeDeferred(tool.execute, character, **filtered_params) # Add timeout with proper cancellation def on_timeout_cancel(failure, timeout_value): return {"error": f"Tool execution timed out after {timeout_value}s", "success": False} tool_d.addTimeout(timeout, reactor, onTimeoutCancel=on_timeout_cancel) try: result = yield tool_d except Exception as e: error_msg = sanitize_error_for_response(e, script, f"Tool {tool_name}") result = {"error": error_msg, "success": False} ``` **Benefits:** - `maybeDeferred()` handles both sync and async tools correctly - `addTimeout()` uses Twisted's built-in cancellation (no race condition) - Single code path for all tool types - Proper exception handling ### Tests Added **test_tool_execution.py** (9 tests): - `TestMaybeDeferredPattern` - sync/async/exception handling - `TestAddTimeoutPattern` - fast completion, timeout, async tool pattern - `TestNoDoubleCallback` - verify no `AlreadyCalledError` - `TestToolExecutionSourceVerification` - verify fix in source code
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
blightbow/evennia_ai#6
No description provided.