Overview
TheEmbeddingGenerationService handles asynchronous embedding generation for memories in elizaOS. It processes embeddings in a priority queue to avoid blocking the main runtime and provides semantic search capabilities.
Key Features
- Asynchronous processing: Non-blocking queue-based embedding generation
- Priority management: High, normal, and low priority queues
- Intent extraction: Generates semantic intent for better embeddings
- Context enrichment: Augments short messages with conversation context
- Retry logic: Automatic retry with configurable attempts
- Batch processing: Processes multiple embeddings in parallel
- Graceful degradation: Automatically disables if no embedding model available
Service Lifecycle
Starting the Service
- Checks for TEXT_EMBEDDING model availability
- Registers event handlers
- Starts the processing loop
- Begins processing queued embeddings
Stopping the Service
- Stops the processing interval
- Processes remaining high-priority items
- Logs remaining queue size
Queue Management
Priority Levels
High Priority
Processed immediately. Used for critical messages that need semantic search right away.Use cases:
- Direct mentions
- Commands
- User queries requiring context
Normal Priority
Standard priority for most messages. Processed in FIFO order after high priority items.Use cases:
- Regular conversation messages
- Background updates
Low Priority
Processed last. Used for non-urgent embeddings.Use cases:
- Historical data backfill
- System-generated messages
- Archived content
Queue Operations
Queue Embedding Generation
Monitor Queue Status
Clear Queue
Configuration
Queue Settings
Retry Configuration
Token Limits
Intent Generation
The service automatically generates semantic intent for messages to improve embedding quality.How It Works
- Length Check: Only generates intent for messages > 20 characters
- Intent Extraction: Uses TEXT_SMALL model to extract core meaning
- Embedding Source: Uses intent instead of raw text for embedding
- Metadata Storage: Stores intent in memory.metadata.intent
Example
Benefits
- Better semantic search: Intent captures meaning vs. literal words
- Improved retrieval: More relevant results in RAG
- Context preservation: Core meaning extracted from verbose messages
Context Enrichment
Short messages (< 100 tokens) are enriched with recent conversation context.Why Context Matters
Short messages like “yes”, “ok”, or “do it” lack semantic meaning on their own. Context enrichment provides conversation history for better embeddings.How It Works
Example
Text Preparation
Embedding text is cleaned and prepared to maximize semantic quality.Preparation Steps
- Strip Formatting: Remove names, timestamps, entity IDs, markdown
- Context Enrichment: Add conversation context for short messages
- Truncation: Trim to model’s max token limit
- Validation: Ensure non-empty text
Stripping Function
Token-Based Truncation
Events
The service emits and listens to several events:EMBEDDING_GENERATION_REQUESTED
Triggered when a new embedding is queued.EMBEDDING_GENERATION_COMPLETED
Emitted when embedding generation succeeds.EMBEDDING_GENERATION_FAILED
Emitted when embedding generation fails after max retries.Monitoring and Logging
The service logs detailed information for observability:Log Events
Runtime Logs
The service also creates runtime logs for tracking:Error Handling
Retry Logic
Common Failures
Model Unavailable
Model Unavailable
Rate Limiting
Rate Limiting
Embedding API may rate limit requests.Solution: Reduce batch size or increase processing interval.
Token Limit Exceeded
Token Limit Exceeded
Input text exceeds model’s token limit.Solution: Automatic truncation handles this, but ensure maxInputTokens is set correctly.
Empty Content
Empty Content
Memory has no text content to embed.Solution: Service skips these automatically. Ensure messages have content.text.
Queue Optimization
Making Room
When the queue reaches capacity, the service removes items strategically:- Low priority, oldest
- Low priority, newer
- Normal priority, oldest
- Normal priority, newer
- High priority (rarely removed)
Batch Processing
Embeddings are processed in parallel batches:Advanced Usage
Custom Embedding Models
Monitoring Queue Health
Batch Backfill
Event Listeners
Best Practices
Priority Assignment
- Use high priority for user-facing messages that need immediate semantic search
- Use normal priority for most messages
- Use low priority for bulk operations and historical data
Queue Management
- Monitor queue size regularly
- Increase batch size for bulk operations
- Adjust processing interval based on API rate limits
- Set maxQueueSize based on available memory
Performance
- Enable intent generation for better semantic quality
- Use context enrichment for short messages
- Process embeddings asynchronously (don’t await)
- Batch operations when possible
Monitoring
- Watch for EMBEDDING_GENERATION_FAILED events
- Alert on queue size > 500
- Track average processing duration
- Monitor retry rates
Troubleshooting
Service Not Processing
- Check if TEXT_EMBEDDING model is registered
- Verify service is not disabled
- Check queue stats - may be empty
- Review logs for errors
High Queue Size
- Increase batch size
- Decrease processing interval
- Add more workers (scale horizontally)
- Optimize embedding model performance
Intent Generation Failures
- Verify TEXT_SMALL model is available
- Check message length (>20 chars required)
- Review logs for generation errors
- Falls back to original text automatically
Context Enrichment Issues
- Ensure room has message history
- Check memory access permissions
- Verify getMemories() implementation
- Falls back to original message if fails