Architecture: RAG Implementation
Infrastructure - Vector Search with Qdrant and Pluggable Embeddings
Overview
The RAG (Retrieval-Augmented Generation) system provides semantic search over assistant knowledge:
- Qdrant vector database - Stores embeddings for semantic similarity search
- Pluggable embedding providers - FastEmbed, OpenAI, or Ollama
- Three collections - Journal, session memory, projects
- Resilience - Retry with exponential backoff, circuit breaker integration
This expands on the high-level overview in Architecture-Memory-and-Sleep.
1. Embedding Providers
Three pluggable backends eliminate the heavy torch/sentence-transformers dependency (rag/embeddings.py).
Provider Comparison
| Provider | Install Size | Token Required | Batch Support |
|---|---|---|---|
| FastEmbed | ~100MB (ONNX) | No | Yes |
| OpenAI | ~1MB (requests) | Cloud: Yes, Local: No | Yes |
| Ollama | ~1MB (requests) | No | No (sequential) |
EmbeddingProvider ABC
All providers implement the same interface:
class EmbeddingProvider(ABC):
@property
def dimensions(self) -> int: ...
@property
def model_name(self) -> str: ...
def embed(self, text: str) -> List[float]: ...
def embed_batch(self, texts: List[str]) -> List[List[float]]: ...
Auto-Detection
The factory function maps LLM provider to embedding provider:
| LLM Provider | Embedding Provider | Rationale |
|---|---|---|
openai |
openai | Same API, reuses token |
anthropic |
fastembed | Anthropic has no embedding API |
openrouter |
fastembed | OpenRouter only proxies chat |
ollama |
ollama | Same server, /api/embeddings |
local |
openai | Local servers are OpenAI-compatible |
from evennia.contrib.base_systems.ai.rag.embeddings import get_embedding_provider
# Auto-detect from LLM provider
provider = get_embedding_provider(llm_provider="anthropic") # -> FastEmbed
# Explicit with custom URL
provider = get_embedding_provider(
embedding_provider="openai",
embedding_url="http://my-server/v1/embeddings",
embedding_token="optional-token",
)
Model Dimensions
| Provider | Default Model | Dimensions |
|---|---|---|
| FastEmbed | all-MiniLM-L6-v2 | 384 |
| OpenAI | text-embedding-3-small | 1536 |
| Ollama | nomic-embed-text | 768 |
2. QdrantRAGClient
Main client class for vector operations (rag/qdrant_client.py).
Initialization
from evennia.contrib.base_systems.ai.rag import QdrantRAGClient
from evennia.contrib.base_systems.ai.rag.embeddings import get_embedding_provider
provider = get_embedding_provider(llm_provider="openai", llm_token="sk-...")
client = QdrantRAGClient(
host="localhost", # Hostname or IP
port=6333, # HTTP API port
grpc_port=6334, # Optional gRPC port (better performance)
api_key=None, # Optional authentication
use_tls=False, # HTTPS/TLS
timeout=10, # Connection timeout
embedding_provider=provider, # Required
retry_config=RetryConfig(), # Optional retry settings
circuit_breaker=breaker, # Optional circuit breaker
)
Deployment Support
Works with multiple Qdrant deployments:
- Docker (local development)
- Kubernetes (production)
- Qdrant Cloud (managed service)
- Direct binary (bare metal)
3. Collections
Three collections are auto-created on initialization:
| Collection | Purpose | Payload Fields |
|---|---|---|
journal |
Personal journal entries | content, tags, related_projects, timestamp |
session_memory |
Facts and patterns | type, content |
projects |
Project contexts | project_key, summary, full_context |
4. Journal Operations
add_journal_entry()
client.add_journal_entry(
entry_id=42,
content="Today I learned about the market district...",
tags=["exploration", "learning"],
related_projects=["city_knowledge"],
timestamp="2025-12-06T10:00:00Z",
)
search_journal()
results = client.search_journal(
query="market district merchants",
tags=["exploration"], # Optional tag filter
days_back=7, # Optional time filter
limit=5,
)
# Returns: [{"id", "score", "content", "tags", "timestamp"}, ...]
delete_journal_entry()
client.delete_journal_entry(entry_id=42)
5. Session Memory Operations
add_memory()
client.add_memory(
memory_id=1,
memory_type="fact", # "fact" or "pattern"
content="The blacksmith opens at dawn",
)
search_memory()
results = client.search_memory(
query="blacksmith schedule",
memory_type="fact", # Optional type filter
limit=5,
)
# Returns: [{"id", "score", "type", "content"}, ...]
6. Project Operations
add_project_context()
client.add_project_context(
project_key="tavern_renovation",
summary="Renovating the old tavern",
full_context="Detailed context for embedding...",
)
search_projects()
results = client.search_projects(
query="building renovation",
limit=3,
)
# Returns: [{"project_key", "score", "summary"}, ...]
7. Resilience
Retry with Backoff
All operations use exponential backoff with jitter:
from evennia.contrib.base_systems.ai.llm.responses import RetryConfig
retry_config = RetryConfig(
max_attempts=3,
backoff_base=0.5,
backoff_max=5.0,
)
Circuit Breaker Integration
Read operations return empty results when circuit is open:
# Graceful degradation - returns [] instead of raising
results = client.search_journal("query") # [] if circuit open
Write operations raise CircuitOpenError when circuit is open.
Health Check
health = client.health_check()
# {
# "healthy": True,
# "error": None,
# "info": {
# "host": "localhost",
# "port": 6333,
# "tls": False,
# "collections": 3,
# "circuit_breaker": {...}
# }
# }
8. Utility Methods
get_collection_info()
info = client.get_collection_info()
# {"journal": 150, "session_memory": 45, "projects": 3}
rebuild_from_data()
Re-index all embeddings from existing data:
stats = client.rebuild_from_data(
journal_entries=[...],
session_memory={"facts": [...], "patterns": [...]},
projects={...},
)
# {"journal": 150, "session_memory": 45, "projects": 3, "errors": 0}
Key Files
| File | Lines | Purpose |
|---|---|---|
rag/__init__.py |
1-53 | Package exports, usage examples |
rag/embeddings.py |
97-143 | EmbeddingProvider ABC |
rag/embeddings.py |
146-204 | FastEmbedProvider |
rag/embeddings.py |
206-310 | OpenAIEmbeddingProvider |
rag/embeddings.py |
312-389 | OllamaEmbeddingProvider |
rag/embeddings.py |
411-532 | get_embedding_provider() factory |
rag/qdrant_client.py |
62-184 | QdrantRAGClient.__init__() |
rag/qdrant_client.py |
185-250 | Retry wrapper _with_retry() |
rag/qdrant_client.py |
280-393 | Journal operations |
rag/qdrant_client.py |
394-476 | Session memory operations |
rag/qdrant_client.py |
477-552 | Project operations |
rag/qdrant_client.py |
555-682 | Utility methods |
See also: Architecture-Memory-and-Sleep | Architecture-Resilience-System | Architecture-LLM-Providers