Skip to main content

Documentation Index

Fetch the complete documentation index at: https://doc.ambientsoul.ai/llms.txt

Use this file to discover all available pages before exploring further.

Embeddings System

The Embeddings system provides vector generation capabilities for Soul Kernel’s semantic search and memory indexing.

Overview

The embeddings system enables:
  • Semantic understanding of text content
  • Vector similarity search in memory graph
  • Multiple embedding providers (Mock, OpenAI, Local)
  • Efficient caching to reduce API costs
  • Rust 1.79 compatibility

Architecture

Provider Architecture

The embeddings system uses a trait-based design implemented in the embeddings crate:
#[async_trait]
pub trait EmbeddingService: Send + Sync {
    /// Generate embedding vector for the given text
    async fn generate_embedding(&self, text: &str) -> Result<Vec<f32>>;
    
    /// Generate embeddings for multiple texts (batch operation)
    async fn generate_embeddings(&self, texts: &[String]) -> Result<Vec<Vec<f32>>>;
    
    /// Get the dimension of embeddings produced by this service
    fn dimension(&self) -> usize;
    
    /// Get the model name/identifier
    fn model_name(&self) -> &str;
}

Supported Providers

  1. Mock Provider - For testing and development
    • 384-dimensional vectors
    • Deterministic output based on text hash
    • ~50μs generation time
  2. OpenAI Provider - Production-quality embeddings
    • Supports all OpenAI embedding models
    • 1536-dim (text-embedding-3-small) or 3072-dim (text-embedding-3-large)
    • Batch processing optimization
    • ~310ms single, ~40ms per embedding in batch
  3. Local Provider - Planned for offline use
    • Candle-based local models
    • Privacy-preserving
    • No API costs

Implementation

Crate Structure

embeddings/
├── src/
│   ├── lib.rs              # Public API
│   ├── error.rs            # Error types
│   ├── service.rs          # Core trait
│   ├── config.rs           # Configuration
│   ├── cache.rs            # LRU caching
│   └── providers/
│       ├── mod.rs          # Provider exports
│       ├── mock.rs         # Mock implementation
│       └── openai.rs       # OpenAI implementation
├── tests/
│   └── integration_test.rs # Integration tests
└── Cargo.toml              # Dependencies

Configuration

pub struct EmbeddingConfig {
    pub provider: EmbeddingProvider,
    pub model: String,
    pub dimension: usize,
    pub cache_size: usize,
}

pub enum EmbeddingProvider {
    Mock,
    OpenAI { api_key: String },
    Local { model_path: PathBuf },
}

Caching Layer

LRU cache implementation for cost optimization:
pub struct EmbeddingCache {
    cache: Arc<Mutex<LruCache<String, Vec<f32>>>>,
}

impl EmbeddingCache {
    pub fn new(capacity: usize) -> Self;
    pub fn get(&self, text: &str) -> Option<Vec<f32>>;
    pub fn put(&self, text: &str, embedding: Vec<f32>);
}

Key Features

1. Provider Abstraction

Easy switching between providers:
// Mock provider for testing
let config = EmbeddingConfig::default();

// OpenAI provider for production
let config = EmbeddingConfig::openai(api_key);

// Create service from config
let service = create_embedding_service(&config).await?;

2. Batch Processing

Efficient batch operations with OpenAI:
let texts = vec![
    "First document".to_string(),
    "Second document".to_string(),
    // ... up to 100 texts
];

let embeddings = service.generate_embeddings(&texts).await?;

3. Error Handling

Comprehensive error types:
pub enum EmbeddingError {
    ModelNotFound(String),
    ApiError(String),
    RateLimit,
    InvalidInput(String),
    NotSupported(String),
}

4. Conditional Compilation

Feature flags for optional dependencies:
[features]
default = []
openai = ["ureq", "url"]
local = ["candle", "candle-nn", "candle-transformers"]

Usage Examples

Basic Embedding Generation

use embeddings::{create_embedding_service, EmbeddingConfig};

// Setup with OpenAI
let config = EmbeddingConfig::openai(std::env::var("OPENAI_API_KEY")?);
let service = create_embedding_service(&config).await?;

// Generate embedding
let text = "The quick brown fox jumps over the lazy dog";
let embedding = service.generate_embedding(text).await?;
println!("Generated {}-dimensional embedding", embedding.len());

With Caching

use embeddings::{EmbeddingCache, create_embedding_service};

let service = create_embedding_service(&config).await?;
let cache = EmbeddingCache::new(1000);

// Check cache first
let embedding = if let Some(cached) = cache.get(text) {
    cached
} else {
    let embedding = service.generate_embedding(text).await?;
    cache.put(text, embedding.clone());
    embedding
};

Integration with Storage

use storage::{MemoryEvent, MemoryMetadata};
use embeddings::create_embedding_service;

// Generate embedding for memory
let content = "User's birthday is June 15th";
let embedding = service.generate_embedding(content).await?;

// Create memory event with embedding
let event = MemoryEvent::new(
    content.to_string(),
    embedding,
    MemoryMetadata {
        source: "conversation".to_string(),
        tags: vec!["birthday".to_string()],
        confidence: 0.9,
        ..Default::default()
    },
);

// Store in database
store.insert(&event).await?;

OpenAI Setup

Environment Configuration

  1. Create openai.env file:
cp openai.env.example openai.env
  1. Add your API key:
OPENAI_API_KEY=sk-proj-...
OPENAI_EMBEDDING_MODEL=text-embedding-3-small
EMBEDDING_CACHE_SIZE=1000
  1. Run with feature flag:
cargo run --features openai

Supported Models

  • text-embedding-3-small - 1536 dimensions, fastest
  • text-embedding-3-large - 3072 dimensions, highest quality
  • text-embedding-ada-002 - 1536 dimensions, legacy

Performance Metrics

Generation Speed

  • Mock: ~50μs per embedding
  • OpenAI Single: ~310ms per embedding
  • OpenAI Batch: ~40ms per embedding (in batches)

Semantic Quality

  • Mock similarity for related texts: ~0.05
  • OpenAI similarity for related texts: ~0.55

Memory Usage

  • 384-dim mock: ~1.5KB per embedding
  • 1536-dim OpenAI: ~6KB per embedding
  • 3072-dim OpenAI: ~12KB per embedding

Best Practices

  1. Use Caching - Avoid redundant API calls
  2. Batch When Possible - Up to 100 texts per request
  3. Handle Rate Limits - Implement exponential backoff
  4. Choose Right Model - Balance quality vs cost
  5. Normalize Inputs - Clean text before embedding

Security Considerations

  • API keys stored in environment variables
  • Never commit openai.env to version control
  • Use mock provider for public demos
  • Consider local models for sensitive data

Future Enhancements

  • Local model support with Candle
  • Embedding versioning and migration
  • Query expansion techniques
  • Multi-modal embeddings
  • Custom fine-tuned models

Next Steps

Change Log

  • 2025-06-13: Initial embeddings system documentation
  • 2025-06-13: Added OpenAI provider implementation details
  • 2025-06-13: Documented caching and performance metrics