1 Architecture RAG Implementation
blightbow edited this page 2025-12-08 04:51:46 +00:00

Architecture: RAG Implementation

Infrastructure - Vector Search with Qdrant and Pluggable Embeddings


Overview

The RAG (Retrieval-Augmented Generation) system provides semantic search over assistant knowledge:

  • Qdrant vector database - Stores embeddings for semantic similarity search
  • Pluggable embedding providers - FastEmbed, OpenAI, or Ollama
  • Three collections - Journal, session memory, projects
  • Resilience - Retry with exponential backoff, circuit breaker integration

This expands on the high-level overview in Architecture-Memory-and-Sleep.


1. Embedding Providers

Three pluggable backends eliminate the heavy torch/sentence-transformers dependency (rag/embeddings.py).

Provider Comparison

Provider Install Size Token Required Batch Support
FastEmbed ~100MB (ONNX) No Yes
OpenAI ~1MB (requests) Cloud: Yes, Local: No Yes
Ollama ~1MB (requests) No No (sequential)

EmbeddingProvider ABC

All providers implement the same interface:

class EmbeddingProvider(ABC):
    @property
    def dimensions(self) -> int: ...
    @property
    def model_name(self) -> str: ...
    def embed(self, text: str) -> List[float]: ...
    def embed_batch(self, texts: List[str]) -> List[List[float]]: ...

Auto-Detection

The factory function maps LLM provider to embedding provider:

LLM Provider Embedding Provider Rationale
openai openai Same API, reuses token
anthropic fastembed Anthropic has no embedding API
openrouter fastembed OpenRouter only proxies chat
ollama ollama Same server, /api/embeddings
local openai Local servers are OpenAI-compatible
from evennia.contrib.base_systems.ai.rag.embeddings import get_embedding_provider

# Auto-detect from LLM provider
provider = get_embedding_provider(llm_provider="anthropic")  # -> FastEmbed

# Explicit with custom URL
provider = get_embedding_provider(
    embedding_provider="openai",
    embedding_url="http://my-server/v1/embeddings",
    embedding_token="optional-token",
)

Model Dimensions

Provider Default Model Dimensions
FastEmbed all-MiniLM-L6-v2 384
OpenAI text-embedding-3-small 1536
Ollama nomic-embed-text 768

2. QdrantRAGClient

Main client class for vector operations (rag/qdrant_client.py).

Initialization

from evennia.contrib.base_systems.ai.rag import QdrantRAGClient
from evennia.contrib.base_systems.ai.rag.embeddings import get_embedding_provider

provider = get_embedding_provider(llm_provider="openai", llm_token="sk-...")

client = QdrantRAGClient(
    host="localhost",           # Hostname or IP
    port=6333,                  # HTTP API port
    grpc_port=6334,             # Optional gRPC port (better performance)
    api_key=None,               # Optional authentication
    use_tls=False,              # HTTPS/TLS
    timeout=10,                 # Connection timeout
    embedding_provider=provider, # Required
    retry_config=RetryConfig(), # Optional retry settings
    circuit_breaker=breaker,    # Optional circuit breaker
)

Deployment Support

Works with multiple Qdrant deployments:

  • Docker (local development)
  • Kubernetes (production)
  • Qdrant Cloud (managed service)
  • Direct binary (bare metal)

3. Collections

Three collections are auto-created on initialization:

Collection Purpose Payload Fields
journal Personal journal entries content, tags, related_projects, timestamp
session_memory Facts and patterns type, content
projects Project contexts project_key, summary, full_context

4. Journal Operations

add_journal_entry()

client.add_journal_entry(
    entry_id=42,
    content="Today I learned about the market district...",
    tags=["exploration", "learning"],
    related_projects=["city_knowledge"],
    timestamp="2025-12-06T10:00:00Z",
)

search_journal()

results = client.search_journal(
    query="market district merchants",
    tags=["exploration"],      # Optional tag filter
    days_back=7,               # Optional time filter
    limit=5,
)
# Returns: [{"id", "score", "content", "tags", "timestamp"}, ...]

delete_journal_entry()

client.delete_journal_entry(entry_id=42)

5. Session Memory Operations

add_memory()

client.add_memory(
    memory_id=1,
    memory_type="fact",        # "fact" or "pattern"
    content="The blacksmith opens at dawn",
)

search_memory()

results = client.search_memory(
    query="blacksmith schedule",
    memory_type="fact",        # Optional type filter
    limit=5,
)
# Returns: [{"id", "score", "type", "content"}, ...]

6. Project Operations

add_project_context()

client.add_project_context(
    project_key="tavern_renovation",
    summary="Renovating the old tavern",
    full_context="Detailed context for embedding...",
)

search_projects()

results = client.search_projects(
    query="building renovation",
    limit=3,
)
# Returns: [{"project_key", "score", "summary"}, ...]

7. Resilience

Retry with Backoff

All operations use exponential backoff with jitter:

from evennia.contrib.base_systems.ai.llm.responses import RetryConfig

retry_config = RetryConfig(
    max_attempts=3,
    backoff_base=0.5,
    backoff_max=5.0,
)

Circuit Breaker Integration

Read operations return empty results when circuit is open:

# Graceful degradation - returns [] instead of raising
results = client.search_journal("query")  # [] if circuit open

Write operations raise CircuitOpenError when circuit is open.

Health Check

health = client.health_check()
# {
#     "healthy": True,
#     "error": None,
#     "info": {
#         "host": "localhost",
#         "port": 6333,
#         "tls": False,
#         "collections": 3,
#         "circuit_breaker": {...}
#     }
# }

8. Utility Methods

get_collection_info()

info = client.get_collection_info()
# {"journal": 150, "session_memory": 45, "projects": 3}

rebuild_from_data()

Re-index all embeddings from existing data:

stats = client.rebuild_from_data(
    journal_entries=[...],
    session_memory={"facts": [...], "patterns": [...]},
    projects={...},
)
# {"journal": 150, "session_memory": 45, "projects": 3, "errors": 0}

Key Files

File Lines Purpose
rag/__init__.py 1-53 Package exports, usage examples
rag/embeddings.py 97-143 EmbeddingProvider ABC
rag/embeddings.py 146-204 FastEmbedProvider
rag/embeddings.py 206-310 OpenAIEmbeddingProvider
rag/embeddings.py 312-389 OllamaEmbeddingProvider
rag/embeddings.py 411-532 get_embedding_provider() factory
rag/qdrant_client.py 62-184 QdrantRAGClient.__init__()
rag/qdrant_client.py 185-250 Retry wrapper _with_retry()
rag/qdrant_client.py 280-393 Journal operations
rag/qdrant_client.py 394-476 Session memory operations
rag/qdrant_client.py 477-552 Project operations
rag/qdrant_client.py 555-682 Utility methods

See also: Architecture-Memory-and-Sleep | Architecture-Resilience-System | Architecture-LLM-Providers