Skip to main content

Memory Graph System

The Memory Graph is Soul Kernel’s core system for storing, retrieving, and synchronizing memories across devices.

Overview

Each Soul maintains a personal memory graph that:
  • Stores experiences as interconnected nodes
  • Enables semantic search and retrieval
  • Syncs across devices while preserving privacy
  • Maintains temporal and causal relationships

Architecture

Storage Layers

The memory graph uses a hybrid storage approach implemented in the storage crate:
  1. SQLite - Primary storage for memory events and metadata
    • WAL mode enabled for better concurrency
    • Automatic schema migrations
    • Indexes on timestamp and author fields
  2. Qdrant - Vector store for semantic search (currently mocked)
    • HNSW index for fast similarity search
    • Cosine similarity for relevance scoring
    • Filterable by event type, author, and time range
  3. Event Log - Append-only history for sync
    • Immutable event sourcing pattern
    • Supports CRDT-style reconciliation

Memory Types

#[derive(Debug, Clone, Serialize, Deserialize)]
pub enum MemoryEventType {
    Observation,    // Things the Soul perceives
    Interaction,    // Conversations and exchanges  
    System,         // System events and state changes
}

Implementation

Storage Crate Structure

The memory persistence layer is implemented in kernel/storage/:
storage/
├── src/
│   ├── lib.rs              # Public API
│   ├── error.rs            # Error types
│   ├── models.rs           # Data models
│   ├── memory_store.rs     # Trait definition
│   ├── sqlite_adapter.rs   # SQLite implementation
│   ├── qdrant_adapter.rs   # Vector store (mock)
│   └── hybrid_store.rs     # Combined storage
├── tests/
│   └── integration_test.rs # Integration tests
└── benches/
    └── memory_bench.rs     # Performance benchmarks

Memory Event Structure

pub struct MemoryEvent {
    pub id: Uuid,
    pub timestamp: DateTime<Utc>,
    pub author: String,
    pub event_type: MemoryEventType,
    pub content: String,
    pub embedding: Vec<f32>,
    pub metadata: serde_json::Value,
}

Storage API

The MemoryStore trait provides the core interface:
#[async_trait]
pub trait MemoryStore: Send + Sync {
    /// Insert a new memory event
    async fn insert_event(&self, event: &MemoryEvent) -> Result<Uuid>;
    
    /// Query memory events by vector similarity
    async fn query_embeddings(&self, query: &MemoryQuery) -> Result<QueryResult>;
    
    /// Get a specific memory event by ID
    async fn get_event(&self, id: &Uuid) -> Result<Option<MemoryEvent>>;
    
    /// Get all events since timestamp (for sync)
    async fn get_events_since(&self, timestamp: i64, limit: usize) -> Result<Vec<MemoryEvent>>;
    
    /// Run database migrations
    async fn migrate(&self) -> Result<()>;
    
    /// Compact storage (maintenance)
    async fn compact(&self) -> Result<()>;
}

Key Features

Vector similarity search with configurable parameters:
let query = MemoryQuery {
    embedding: vec![0.9, 0.1, 0.0, 0.0],  // Query vector
    top_k: 5,                              // Return top 5 results
    score_threshold: Some(0.5),            // Minimum similarity
    filter: Some(MemoryFilter {
        event_types: Some(vec![MemoryEventType::Observation]),
        authors: Some(vec!["device_1".to_string()]),
        after: Some(start_time),
        before: Some(end_time),
    }),
};

let results = store.query_embeddings(&query).await?;

2. Temporal Awareness

Chronological queries for sync and history:
// Get events since a specific timestamp
let events = store.get_events_since(
    last_sync_timestamp,
    1000  // limit
).await?;

3. Performance

Benchmarked performance metrics:
  • Single insert: ~65 microseconds
  • Query top-k from 1000 events: ~2.1ms
  • Bulk insert 10k events: ~92ms

4. Thread Safety

All storage operations are thread-safe using Arc<Mutex<Connection>> for SQLite.

Usage Examples

Basic Memory Storage

use storage::{HybridMemoryStore, MemoryStore, MemoryEvent, MemoryEventType};

// Initialize storage
let store = HybridMemoryStore::new("soul_memories.db", None).await?;
store.migrate().await?;

// Create and store a memory
let memory = MemoryEvent::new(
    "iphone_device_1".to_string(),
    MemoryEventType::Observation,
    "User mentioned their birthday is June 15th".to_string(),
    embedding_vector,  // Generated by LLM
);

let id = store.insert_event(&memory).await?;

Complex Queries

// Find memories about birthdays from user interactions
let query = MemoryQuery {
    embedding: generate_embedding("birthday celebrations"),
    top_k: 10,
    score_threshold: Some(0.7),
    filter: Some(MemoryFilter {
        event_types: Some(vec![MemoryEventType::Interaction]),
        authors: None,
        after: Some(Utc::now() - Duration::days(30)),
        before: None,
    }),
};

let results = store.query_embeddings(&query).await?;
for scored_event in results.events {
    println!("Score: {:.3} - {}", 
        scored_event.score, 
        scored_event.event.content
    );
}

Memory Sync

// Get events for synchronization
let last_sync = Utc::now() - Duration::hours(1);
let new_events = store.get_events_since(
    last_sync.timestamp(),
    100
).await?;

// Process events for sync...

Database Schema

The SQLite schema for memory storage:
CREATE TABLE memory_events (
    id TEXT PRIMARY KEY,
    timestamp INTEGER NOT NULL,
    author TEXT NOT NULL,
    event_type TEXT NOT NULL,
    content TEXT NOT NULL,
    embedding BLOB NOT NULL,  -- Serialized f32 array
    metadata TEXT NOT NULL    -- JSON
);

CREATE INDEX idx_memory_timestamp ON memory_events(timestamp DESC);
CREATE INDEX idx_memory_author ON memory_events(author);

Privacy & Security

  • Memories stored locally by default
  • Optional encryption at rest (future enhancement)
  • User controls sync preferences per memory type
  • GDPR-compliant deletion via event ID

Best Practices

  1. Generate Quality Embeddings - Use appropriate models for your domain
  2. Index Strategically - Add indexes for common query patterns
  3. Batch Operations - Use bulk inserts for better performance
  4. Monitor Storage Size - Implement retention policies
  5. Test Sync Logic - Ensure CRDT convergence properties

Performance Optimization

  • WAL Mode: Better concurrent read/write performance
  • Connection Pooling: Reuse database connections
  • Embedding Dimensions: Balance accuracy vs storage (384 dims default)
  • Compaction: Run compact() periodically to optimize storage

Next Steps

Change Log

  • 2025-06-13: Updated with actual storage implementation details
  • 2025-06-13: Added performance benchmarks and code examples
  • 2025-06-12: Initial memory graph architecture documentation