Table of Contents
- Architecture: Persona Protection
Architecture: Persona Protection
Infrastructure - Preventing Perspective Bleed and Self-Reference Contamination
Overview
Persona protection prevents the assistant from storing facts about itself, which would contaminate long-term memory and cause persona degradation over time. This is implemented through multiple defense layers at different points in the data pipeline.
Problem: When an LLM processes its own outputs or reflects on its actions, it naturally generates self-referential content ("I helped the player", "I should remember to..."). If stored in memory, this content:
- Pollutes semantic retrieval with irrelevant self-references
- Causes "perspective bleed" where the assistant confuses its identity with stored facts
- Degrades persona coherence over extended operation
Solution: Filter self-referential content at every point where facts are extracted or synthesized.
Research Foundations
| Source | Contribution | Implementation |
|---|---|---|
| MIRIX (arXiv:2507.07957) | Persona isolation in multi-agent memory | Extraction prompt constraints |
| O-MEM (arXiv:2511.13593) | Persona Attributes (Pa) vs Persona Facts (Pf) | Entity profile structure |
| Mem0 Custom Extraction | Configurable fact extraction prompts | ASSISTANT_FACT_EXTRACTION_PROMPT |
Defense Layer 1: Memory Extraction
Location: rag_memory.py:21-55
When journal entries are consolidated into Mem0 semantic memory, a custom extraction prompt ensures only world-state facts are stored.
ASSISTANT_FACT_EXTRACTION_PROMPT = """
Extract only objective facts about the world, users, and events.
Use third-person, factual language focused on WORLD STATE, not assistant behavior.
NEVER extract:
- Assistant actions ("The assistant helped", "I did X")
- Assistant personality or identity information
- Opinions about the assistant's role or behavior
ALWAYS extract:
- World state facts ("The tavern is in the north wing")
- User/player information ("Player Alice prefers formal address")
- NPC behavior ("The guard is suspicious of strangers")
- Event outcomes ("Sword prices are 50 gold at the market")
"""
Examples:
| Input | Output |
|---|---|
| "I talked to the merchant about sword prices." | ["Sword prices discussed with merchant in market district"] |
| "The player asked me to fetch water from the well." | ["Player requested water from the town well"] |
| "I was really helpful today!" | [] (filtered) |
| "The assistant should be more careful." | [] (filtered) |
This prompt is passed to Mem0's custom fact extraction feature, ensuring the filtering happens at the source.
Defense Layer 2: Reflection Filtering
Location: generative_reflection.py:83-115
When the Generative Agents reflection pipeline generates insights, a post-processing filter catches any self-referential content that slipped through.
Regex Patterns
SELF_REFERENCE_PATTERNS = [
r"^I\s+(should|am|tend|need|must|will|have|was|can|could)",
r"(?i)(the assistant|my behavior|my approach|myself|my identity)",
r"(?i)(i learned|i realized|i discovered|i noticed|i found)",
r"(?i)(i will|i need to|i should|i must)",
]
Filter Function
def filter_self_referential_insights(insights: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
"""Filter insights that reference the assistant itself."""
filtered = []
for insight in insights:
content = insight.get("content", "")
is_self_referential = any(
re.search(pattern, content) for pattern in SELF_REFERENCE_PATTERNS
)
if not is_self_referential:
filtered.append(insight)
else:
logger.log_info(f"Filtered self-referential insight: {content[:50]}...")
return filtered
Applied at: generative_reflection.py:452-453 before storing insights to journal.
Defense Layer 3: Prompt Constraints
Location: generative_reflection.py:58-76
The insight generation prompt explicitly instructs the LLM to avoid self-reference:
INSIGHT_GENERATION_PROMPT = """
...
IMPORTANT CONSTRAINTS:
- Focus on WORLD STATE and PATTERNS, not assistant behavior
- Do NOT generate insights about your own identity, capabilities, or behavior
- Do NOT use first-person ("I should...", "I noticed...")
- Frame insights as objective observations about the world
...
"""
This prevents self-referential content from being generated in the first place, reducing reliance on post-processing filters.
Defense Layer 4: Entity Consolidation
Location: helpers.py:1677-1702, 1873-1899
When entity observations (Pf) are consolidated into attributes (Pa) during the dreaming phase, similar protections apply.
Consolidation Prompt Constraints
ENTITY_CONSOLIDATION_PROMPT = """
...
IMPORTANT CONSTRAINTS:
- Extract ONLY factual patterns observable from the evidence
- Do NOT infer personality traits without clear evidence
- Do NOT include speculation or assumptions
- Frame attributes as objective observations, NOT opinions
- Keep attributes concise (2-8 words each)
...
"""
Attribute Filter Function
def _filter_self_referential_attributes(attributes: dict[str, str]) -> dict[str, str]:
"""Filter attributes that reference the assistant itself."""
SELF_PATTERNS = [
r"(?i)\b(i|me|my|myself|assistant|ai)\b",
r"(?i)(helped|assisted|told|showed|explained)",
r"(?i)(our relationship|our conversation)",
]
filtered = {}
for key, value in attributes.items():
is_self_ref = any(re.search(pattern, value) for pattern in SELF_PATTERNS)
if not is_self_ref:
filtered[key] = value
else:
logger.log_info(f"Filtered self-referential attribute: {key}")
return filtered
Applied at: helpers.py:1805-1806 during consolidate_entity_observations().
Defense Pipeline Summary
┌─────────────────────────────────────────────────────────────────┐
│ INPUT: Raw Assistant Output │
└─────────────────────────────┬───────────────────────────────────┘
│
┌───────────────┴───────────────┐
▼ ▼
┌─────────────────────────┐ ┌─────────────────────────┐
│ Memory Consolidation │ │ Reflection Pipeline │
│ (Journal → Mem0) │ │ (Journal → Insights) │
└────────────┬────────────┘ └────────────┬────────────┘
│ │
▼ ▼
┌─────────────────────────┐ ┌─────────────────────────┐
│ ASSISTANT_FACT_ │ │ INSIGHT_GENERATION_ │
│ EXTRACTION_PROMPT │ │ PROMPT (constraints) │
│ (pre-extraction filter) │ │ (pre-generation filter) │
└────────────┬────────────┘ └────────────┬────────────┘
│ │
▼ ▼
┌─────────────────────────┐ ┌─────────────────────────┐
│ Mem0 stores only │ │ filter_self_referential │
│ world-state facts │ │ _insights() (regex) │
└─────────────────────────┘ └────────────┬────────────┘
│
▼
┌─────────────────────────┐
│ Journal stores only │
│ world-focused insights │
└─────────────────────────┘
Entity Consolidation Pipeline
┌─────────────────────────────────────────────────────────────────┐
│ INPUT: Entity Observations (Pf) │
└─────────────────────────────┬───────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ ENTITY_CONSOLIDATION_PROMPT │
│ (objective observations only) │
└─────────────────────────────┬───────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ _filter_self_referential_attributes() │
│ (removes I/me/my/assistant references) │
└─────────────────────────────┬───────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Entity Attributes (Pa) stored │
│ (contains only entity facts) │
└─────────────────────────────────────────────────────────────────┘
Key Files
| File | Lines | Purpose |
|---|---|---|
rag_memory.py |
21-55 | ASSISTANT_FACT_EXTRACTION_PROMPT |
generative_reflection.py |
58-76 | INSIGHT_GENERATION_PROMPT with constraints |
generative_reflection.py |
83-88 | SELF_REFERENCE_PATTERNS regex list |
generative_reflection.py |
91-115 | filter_self_referential_insights() |
generative_reflection.py |
452-453 | Filter application in pipeline |
helpers.py |
1677-1702 | ENTITY_CONSOLIDATION_PROMPT |
helpers.py |
1873-1899 | _filter_self_referential_attributes() |
helpers.py |
1805-1806 | Filter application in consolidation |
Testing Persona Protection
Test coverage for persona protection is in:
tests/test_generative_reflection.py- Insight filtering teststests/test_entity_consolidation.py- Attribute filtering tests
Example test pattern:
def test_filter_self_referential_insights(self):
"""Self-referential insights should be filtered."""
insights = [
{"content": "Players prefer morning sessions"},
{"content": "I should be more helpful"}, # Should be filtered
{"content": "The tavern is busiest at night"},
]
filtered = filter_self_referential_insights(insights)
self.assertEqual(len(filtered), 2)
self.assertNotIn("I should", str(filtered))
See also: Architecture-Memory-and-Sleep | Architecture-Generative-Reflection | Research-Foundations