Skip to main content

Embeddings System

The Embeddings system provides vector generation capabilities for Soul Kernel’s semantic search and memory indexing.

Overview

The embeddings system enables:
  • Semantic understanding of text content
  • Vector similarity search in memory graph
  • Multiple embedding providers (Mock, OpenAI, Local)
  • Efficient caching to reduce API costs
  • Rust 1.79 compatibility

Architecture

Provider Architecture

The embeddings system uses a trait-based design implemented in the embeddings crate:
#[async_trait]
pub trait EmbeddingService: Send + Sync {
    /// Generate embedding vector for the given text
    async fn generate_embedding(&self, text: &str) -> Result<Vec<f32>>;
    
    /// Generate embeddings for multiple texts (batch operation)
    async fn generate_embeddings(&self, texts: &[String]) -> Result<Vec<Vec<f32>>>;
    
    /// Get the dimension of embeddings produced by this service
    fn dimension(&self) -> usize;
    
    /// Get the model name/identifier
    fn model_name(&self) -> &str;
}

Supported Providers

  1. Mock Provider - For testing and development
    • 384-dimensional vectors
    • Deterministic output based on text hash
    • ~50μs generation time
  2. OpenAI Provider - Production-quality embeddings
    • Supports all OpenAI embedding models
    • 1536-dim (text-embedding-3-small) or 3072-dim (text-embedding-3-large)
    • Batch processing optimization
    • ~310ms single, ~40ms per embedding in batch
  3. Local Provider - Planned for offline use
    • Candle-based local models
    • Privacy-preserving
    • No API costs

Implementation

Crate Structure

embeddings/
├── src/
│   ├── lib.rs              # Public API
│   ├── error.rs            # Error types
│   ├── service.rs          # Core trait
│   ├── config.rs           # Configuration
│   ├── cache.rs            # LRU caching
│   └── providers/
│       ├── mod.rs          # Provider exports
│       ├── mock.rs         # Mock implementation
│       └── openai.rs       # OpenAI implementation
├── tests/
│   └── integration_test.rs # Integration tests
└── Cargo.toml              # Dependencies

Configuration

pub struct EmbeddingConfig {
    pub provider: EmbeddingProvider,
    pub model: String,
    pub dimension: usize,
    pub cache_size: usize,
}

pub enum EmbeddingProvider {
    Mock,
    OpenAI { api_key: String },
    Local { model_path: PathBuf },
}

Caching Layer

LRU cache implementation for cost optimization:
pub struct EmbeddingCache {
    cache: Arc<Mutex<LruCache<String, Vec<f32>>>>,
}

impl EmbeddingCache {
    pub fn new(capacity: usize) -> Self;
    pub fn get(&self, text: &str) -> Option<Vec<f32>>;
    pub fn put(&self, text: &str, embedding: Vec<f32>);
}

Key Features

1. Provider Abstraction

Easy switching between providers:
// Mock provider for testing
let config = EmbeddingConfig::default();

// OpenAI provider for production
let config = EmbeddingConfig::openai(api_key);

// Create service from config
let service = create_embedding_service(&config).await?;

2. Batch Processing

Efficient batch operations with OpenAI:
let texts = vec![
    "First document".to_string(),
    "Second document".to_string(),
    // ... up to 100 texts
];

let embeddings = service.generate_embeddings(&texts).await?;

3. Error Handling

Comprehensive error types:
pub enum EmbeddingError {
    ModelNotFound(String),
    ApiError(String),
    RateLimit,
    InvalidInput(String),
    NotSupported(String),
}

4. Conditional Compilation

Feature flags for optional dependencies:
[features]
default = []
openai = ["ureq", "url"]
local = ["candle", "candle-nn", "candle-transformers"]

Usage Examples

Basic Embedding Generation

use embeddings::{create_embedding_service, EmbeddingConfig};

// Setup with OpenAI
let config = EmbeddingConfig::openai(std::env::var("OPENAI_API_KEY")?);
let service = create_embedding_service(&config).await?;

// Generate embedding
let text = "The quick brown fox jumps over the lazy dog";
let embedding = service.generate_embedding(text).await?;
println!("Generated {}-dimensional embedding", embedding.len());

With Caching

use embeddings::{EmbeddingCache, create_embedding_service};

let service = create_embedding_service(&config).await?;
let cache = EmbeddingCache::new(1000);

// Check cache first
let embedding = if let Some(cached) = cache.get(text) {
    cached
} else {
    let embedding = service.generate_embedding(text).await?;
    cache.put(text, embedding.clone());
    embedding
};

Integration with Storage

use storage::{MemoryEvent, MemoryMetadata};
use embeddings::create_embedding_service;

// Generate embedding for memory
let content = "User's birthday is June 15th";
let embedding = service.generate_embedding(content).await?;

// Create memory event with embedding
let event = MemoryEvent::new(
    content.to_string(),
    embedding,
    MemoryMetadata {
        source: "conversation".to_string(),
        tags: vec!["birthday".to_string()],
        confidence: 0.9,
        ..Default::default()
    },
);

// Store in database
store.insert(&event).await?;

OpenAI Setup

Environment Configuration

  1. Create openai.env file:
cp openai.env.example openai.env
  1. Add your API key:
OPENAI_API_KEY=sk-proj-...
OPENAI_EMBEDDING_MODEL=text-embedding-3-small
EMBEDDING_CACHE_SIZE=1000
  1. Run with feature flag:
cargo run --features openai

Supported Models

  • text-embedding-3-small - 1536 dimensions, fastest
  • text-embedding-3-large - 3072 dimensions, highest quality
  • text-embedding-ada-002 - 1536 dimensions, legacy

Performance Metrics

Generation Speed

  • Mock: ~50μs per embedding
  • OpenAI Single: ~310ms per embedding
  • OpenAI Batch: ~40ms per embedding (in batches)

Semantic Quality

  • Mock similarity for related texts: ~0.05
  • OpenAI similarity for related texts: ~0.55

Memory Usage

  • 384-dim mock: ~1.5KB per embedding
  • 1536-dim OpenAI: ~6KB per embedding
  • 3072-dim OpenAI: ~12KB per embedding

Best Practices

  1. Use Caching - Avoid redundant API calls
  2. Batch When Possible - Up to 100 texts per request
  3. Handle Rate Limits - Implement exponential backoff
  4. Choose Right Model - Balance quality vs cost
  5. Normalize Inputs - Clean text before embedding

Security Considerations

  • API keys stored in environment variables
  • Never commit openai.env to version control
  • Use mock provider for public demos
  • Consider local models for sensitive data

Future Enhancements

  • Local model support with Candle
  • Embedding versioning and migration
  • Query expansion techniques
  • Multi-modal embeddings
  • Custom fine-tuned models

Next Steps

Change Log

  • 2025-06-13: Initial embeddings system documentation
  • 2025-06-13: Added OpenAI provider implementation details
  • 2025-06-13: Documented caching and performance metrics