[P2] Improve Error Categorization #15

Closed
opened 2025-12-05 13:49:34 +00:00 by blightbow · 1 comment
Owner

Problem

Error handling doesn't distinguish between error types:

  • Auth errors (should fail fast)
  • Transient errors (should retry)
  • Business logic errors (should report)

Suggested Fix

Implement error classification to enable appropriate handling strategies for each error type.

Priority

P2 — Medium Priority

Source

Architecture Audit 2025-12-03, Priority Recommendations

## Problem Error handling doesn't distinguish between error types: - Auth errors (should fail fast) - Transient errors (should retry) - Business logic errors (should report) ## Suggested Fix Implement error classification to enable appropriate handling strategies for each error type. ## Priority **P2 — Medium Priority** ## Source Architecture Audit 2025-12-03, Priority Recommendations
Author
Owner

Implementation complete in commit 5ea5bf953.

Changes:

  • Created errors.py with ErrorCategory enum (AUTH, TRANSIENT, BUSINESS_LOGIC, UNKNOWN)
  • Added classification functions: classify_http_error(), classify_exception(), should_retry(), should_trip_circuit_breaker()
  • Updated llm/client.py to use error classification for retry and circuit breaker decisions

Key behavior changes:

  • Auth errors (401, 403): Fail fast, no retries, don't trip circuit breaker
  • Transient errors (429, 5xx, timeouts): Retry with backoff, trip circuit breaker
  • Business logic errors (400, 404, ValueError, etc.): Fail fast, don't trip circuit breaker

Impact: Auth failures no longer cause false service unavailability by tripping the circuit breaker.

Tests: 30 tests added in test_error_classification.py covering classification, retry decisions, and LLM client integration.

Implementation complete in commit `5ea5bf953`. **Changes:** - Created `errors.py` with `ErrorCategory` enum (AUTH, TRANSIENT, BUSINESS_LOGIC, UNKNOWN) - Added classification functions: `classify_http_error()`, `classify_exception()`, `should_retry()`, `should_trip_circuit_breaker()` - Updated `llm/client.py` to use error classification for retry and circuit breaker decisions **Key behavior changes:** - **Auth errors (401, 403)**: Fail fast, no retries, don't trip circuit breaker - **Transient errors (429, 5xx, timeouts)**: Retry with backoff, trip circuit breaker - **Business logic errors (400, 404, ValueError, etc.)**: Fail fast, don't trip circuit breaker **Impact**: Auth failures no longer cause false service unavailability by tripping the circuit breaker. **Tests**: 30 tests added in `test_error_classification.py` covering classification, retry decisions, and LLM client integration.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
blightbow/evennia_ai#15
No description provided.