2 Architecture Persona Protection
Blightbow edited this page 2025-12-09 04:12:38 -05:00

Architecture: Persona Protection

Infrastructure - Preventing Perspective Bleed and Self-Reference Contamination


Overview

Persona protection prevents the assistant from storing facts about itself, which would contaminate long-term memory and cause persona degradation over time. This is implemented through multiple defense layers at different points in the data pipeline.

Problem: When an LLM processes its own outputs or reflects on its actions, it naturally generates self-referential content ("I helped the player", "I should remember to..."). If stored in memory, this content:

  • Pollutes semantic retrieval with irrelevant self-references
  • Causes "perspective bleed" where the assistant confuses its identity with stored facts
  • Degrades persona coherence over extended operation

Solution: Filter self-referential content at every point where facts are extracted or synthesized.


Research Foundations

Source Contribution Implementation
MIRIX (arXiv:2507.07957) Persona isolation in multi-agent memory Extraction prompt constraints
O-MEM (arXiv:2511.13593) Persona Attributes (Pa) vs Persona Facts (Pf) Entity profile structure
Mem0 Custom Extraction Configurable fact extraction prompts ASSISTANT_FACT_EXTRACTION_PROMPT

Defense Layer 1: Memory Extraction

Location: rag_memory.py:21-55

When journal entries are consolidated into Mem0 semantic memory, a custom extraction prompt ensures only world-state facts are stored.

ASSISTANT_FACT_EXTRACTION_PROMPT = """
Extract only objective facts about the world, users, and events.
Use third-person, factual language focused on WORLD STATE, not assistant behavior.

NEVER extract:
- Assistant actions ("The assistant helped", "I did X")
- Assistant personality or identity information
- Opinions about the assistant's role or behavior

ALWAYS extract:
- World state facts ("The tavern is in the north wing")
- User/player information ("Player Alice prefers formal address")
- NPC behavior ("The guard is suspicious of strangers")
- Event outcomes ("Sword prices are 50 gold at the market")
"""

Examples:

Input Output
"I talked to the merchant about sword prices." ["Sword prices discussed with merchant in market district"]
"The player asked me to fetch water from the well." ["Player requested water from the town well"]
"I was really helpful today!" [] (filtered)
"The assistant should be more careful." [] (filtered)

This prompt is passed to Mem0's custom fact extraction feature, ensuring the filtering happens at the source.


Defense Layer 2: Reflection Filtering

Location: generative_reflection.py:83-115

When the Generative Agents reflection pipeline generates insights, a post-processing filter catches any self-referential content that slipped through.

Regex Patterns

SELF_REFERENCE_PATTERNS = [
    r"^I\s+(should|am|tend|need|must|will|have|was|can|could)",
    r"(?i)(the assistant|my behavior|my approach|myself|my identity)",
    r"(?i)(i learned|i realized|i discovered|i noticed|i found)",
    r"(?i)(i will|i need to|i should|i must)",
]

Filter Function

def filter_self_referential_insights(insights: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
    """Filter insights that reference the assistant itself."""
    filtered = []
    for insight in insights:
        content = insight.get("content", "")
        is_self_referential = any(
            re.search(pattern, content) for pattern in SELF_REFERENCE_PATTERNS
        )
        if not is_self_referential:
            filtered.append(insight)
        else:
            logger.log_info(f"Filtered self-referential insight: {content[:50]}...")
    return filtered

Applied at: generative_reflection.py:452-453 before storing insights to journal.


Defense Layer 3: Prompt Constraints

Location: generative_reflection.py:58-76

The insight generation prompt explicitly instructs the LLM to avoid self-reference:

INSIGHT_GENERATION_PROMPT = """
...
IMPORTANT CONSTRAINTS:
- Focus on WORLD STATE and PATTERNS, not assistant behavior
- Do NOT generate insights about your own identity, capabilities, or behavior
- Do NOT use first-person ("I should...", "I noticed...")
- Frame insights as objective observations about the world
...
"""

This prevents self-referential content from being generated in the first place, reducing reliance on post-processing filters.


Defense Layer 4: Entity Consolidation

Location: helpers.py:1677-1702, 1873-1899

When entity observations (Pf) are consolidated into attributes (Pa) during the dreaming phase, similar protections apply.

Consolidation Prompt Constraints

ENTITY_CONSOLIDATION_PROMPT = """
...
IMPORTANT CONSTRAINTS:
- Extract ONLY factual patterns observable from the evidence
- Do NOT infer personality traits without clear evidence
- Do NOT include speculation or assumptions
- Frame attributes as objective observations, NOT opinions
- Keep attributes concise (2-8 words each)
...
"""

Attribute Filter Function

def _filter_self_referential_attributes(attributes: dict[str, str]) -> dict[str, str]:
    """Filter attributes that reference the assistant itself."""
    SELF_PATTERNS = [
        r"(?i)\b(i|me|my|myself|assistant|ai)\b",
        r"(?i)(helped|assisted|told|showed|explained)",
        r"(?i)(our relationship|our conversation)",
    ]
    
    filtered = {}
    for key, value in attributes.items():
        is_self_ref = any(re.search(pattern, value) for pattern in SELF_PATTERNS)
        if not is_self_ref:
            filtered[key] = value
        else:
            logger.log_info(f"Filtered self-referential attribute: {key}")
    return filtered

Applied at: helpers.py:1805-1806 during consolidate_entity_observations().


Defense Pipeline Summary

┌─────────────────────────────────────────────────────────────────┐
│                    INPUT: Raw Assistant Output                   │
└─────────────────────────────┬───────────────────────────────────┘
                              │
              ┌───────────────┴───────────────┐
              ▼                               ▼
┌─────────────────────────┐     ┌─────────────────────────┐
│   Memory Consolidation   │     │   Reflection Pipeline   │
│   (Journal → Mem0)       │     │   (Journal → Insights)  │
└────────────┬────────────┘     └────────────┬────────────┘
             │                               │
             ▼                               ▼
┌─────────────────────────┐     ┌─────────────────────────┐
│ ASSISTANT_FACT_         │     │ INSIGHT_GENERATION_     │
│ EXTRACTION_PROMPT       │     │ PROMPT (constraints)    │
│ (pre-extraction filter) │     │ (pre-generation filter) │
└────────────┬────────────┘     └────────────┬────────────┘
             │                               │
             ▼                               ▼
┌─────────────────────────┐     ┌─────────────────────────┐
│ Mem0 stores only        │     │ filter_self_referential │
│ world-state facts       │     │ _insights() (regex)     │
└─────────────────────────┘     └────────────┬────────────┘
                                             │
                                             ▼
                                ┌─────────────────────────┐
                                │ Journal stores only     │
                                │ world-focused insights  │
                                └─────────────────────────┘

Entity Consolidation Pipeline

┌─────────────────────────────────────────────────────────────────┐
│              INPUT: Entity Observations (Pf)                     │
└─────────────────────────────┬───────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                 ENTITY_CONSOLIDATION_PROMPT                      │
│                 (objective observations only)                    │
└─────────────────────────────┬───────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│              _filter_self_referential_attributes()               │
│              (removes I/me/my/assistant references)              │
└─────────────────────────────┬───────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│              Entity Attributes (Pa) stored                       │
│              (contains only entity facts)                        │
└─────────────────────────────────────────────────────────────────┘

Key Files

File Lines Purpose
rag_memory.py 21-55 ASSISTANT_FACT_EXTRACTION_PROMPT
generative_reflection.py 58-76 INSIGHT_GENERATION_PROMPT with constraints
generative_reflection.py 83-88 SELF_REFERENCE_PATTERNS regex list
generative_reflection.py 91-115 filter_self_referential_insights()
generative_reflection.py 452-453 Filter application in pipeline
helpers.py 1677-1702 ENTITY_CONSOLIDATION_PROMPT
helpers.py 1873-1899 _filter_self_referential_attributes()
helpers.py 1805-1806 Filter application in consolidation

Testing Persona Protection

Test coverage for persona protection is in:

  • tests/test_generative_reflection.py - Insight filtering tests
  • tests/test_entity_consolidation.py - Attribute filtering tests

Example test pattern:

def test_filter_self_referential_insights(self):
    """Self-referential insights should be filtered."""
    insights = [
        {"content": "Players prefer morning sessions"},
        {"content": "I should be more helpful"},  # Should be filtered
        {"content": "The tavern is busiest at night"},
    ]
    filtered = filter_self_referential_insights(insights)
    self.assertEqual(len(filtered), 2)
    self.assertNotIn("I should", str(filtered))

See also: Architecture-Memory-and-Sleep | Architecture-Generative-Reflection | Research-Foundations