Embeddings System
The Embeddings system provides vector generation capabilities for Soul Kernel’s semantic search and memory indexing.Overview
The embeddings system enables:- Semantic understanding of text content
- Vector similarity search in memory graph
- Multiple embedding providers (Mock, OpenAI, Local)
- Efficient caching to reduce API costs
- Rust 1.79 compatibility
Architecture
Provider Architecture
The embeddings system uses a trait-based design implemented in theembeddings crate:
Supported Providers
-
Mock Provider - For testing and development
- 384-dimensional vectors
- Deterministic output based on text hash
- ~50μs generation time
-
OpenAI Provider - Production-quality embeddings
- Supports all OpenAI embedding models
- 1536-dim (text-embedding-3-small) or 3072-dim (text-embedding-3-large)
- Batch processing optimization
- ~310ms single, ~40ms per embedding in batch
-
Local Provider - Planned for offline use
- Candle-based local models
- Privacy-preserving
- No API costs
Implementation
Crate Structure
Configuration
Caching Layer
LRU cache implementation for cost optimization:Key Features
1. Provider Abstraction
Easy switching between providers:2. Batch Processing
Efficient batch operations with OpenAI:3. Error Handling
Comprehensive error types:4. Conditional Compilation
Feature flags for optional dependencies:Usage Examples
Basic Embedding Generation
With Caching
Integration with Storage
OpenAI Setup
Environment Configuration
- Create
openai.envfile:
- Add your API key:
- Run with feature flag:
Supported Models
text-embedding-3-small- 1536 dimensions, fastesttext-embedding-3-large- 3072 dimensions, highest qualitytext-embedding-ada-002- 1536 dimensions, legacy
Performance Metrics
Generation Speed
- Mock: ~50μs per embedding
- OpenAI Single: ~310ms per embedding
- OpenAI Batch: ~40ms per embedding (in batches)
Semantic Quality
- Mock similarity for related texts: ~0.05
- OpenAI similarity for related texts: ~0.55
Memory Usage
- 384-dim mock: ~1.5KB per embedding
- 1536-dim OpenAI: ~6KB per embedding
- 3072-dim OpenAI: ~12KB per embedding
Best Practices
- Use Caching - Avoid redundant API calls
- Batch When Possible - Up to 100 texts per request
- Handle Rate Limits - Implement exponential backoff
- Choose Right Model - Balance quality vs cost
- Normalize Inputs - Clean text before embedding
Security Considerations
- API keys stored in environment variables
- Never commit
openai.envto version control - Use mock provider for public demos
- Consider local models for sensitive data
Future Enhancements
- Local model support with Candle
- Embedding versioning and migration
- Query expansion techniques
- Multi-modal embeddings
- Custom fine-tuned models
Next Steps
- See Memory Graph for storage integration
- Read API Reference for detailed docs
- Try Embedding Search Tutorial
- Explore examples/ for working code
Change Log
- 2025-06-13: Initial embeddings system documentation
- 2025-06-13: Added OpenAI provider implementation details
- 2025-06-13: Documented caching and performance metrics