How It Works
The RAG system uses hybrid search combining:- Vector similarity (ChromaDB with cosine distance)
- Keyword search (SQLite FTS5 full-text search)
- Weighted merging (70% vector + 30% keyword by default)
Setup
Before using the learning feature, you need to set up Ollama with the embedding model.Install Ollama
Download and install Ollama from ollama.ai:
Pull the embedding model
Download the
nomic-embed-text model (768 dimensions):This model is specifically designed for embeddings and is required for RAG to work.
Learning New Knowledge
Teach Asta new information using natural language commands.Learn from Text
- Research the topic (web search, documentation, etc.)
- Extract and process the content
- Split text into 500-character chunks
- Generate embeddings using Ollama
- Store vectors in ChromaDB and keywords in SQLite FTS5
Learn from Documents
You can also provide direct content to learn:Learn from URLs
Querying Learned Knowledge
Once Asta has learned something, it automatically retrieves relevant knowledge when you ask related questions.Automatic Context Injection
The RAG system runs on every message and injects relevant context:Topic-Specific Queries
You can query specific topics you’ve taught:Hybrid Search Example
When you ask a question:- Vector search: Embeds your question and finds semantically similar chunks
- Keyword search: Searches for exact term matches using FTS5
- Merge & rank: Combines results with weighted scoring
- Return top K: Returns the 5 most relevant chunks (configurable)
Managing Learned Topics
List Topics
See everything Asta has learned:- Topic name
- Number of chunks stored
- Total content size
Delete Topics
Remove learned knowledge:Update Topics
Replace existing knowledge:Storage & Configuration
The RAG system stores data in two locations:ChromaDB (Vector Store)
- 768-dimensional embeddings (nomic-embed-text)
- Document chunks (up to 500 characters each)
- Metadata (topic name)
- Collection:
asta_rag
SQLite FTS5 (Keyword Index)
- Document ID
- Topic name
- Full chunk text for keyword search
- FTS5 index for fast text matching
Implementation Details
The RAG service is implemented inbackend/app/rag/service.py.
Embedding Process
Hybrid Search Algorithm
RAG Status & Diagnostics
Check if the RAG system is properly configured:Advanced Configuration
Chunking Strategy
RAGService.add() in backend/app/rag/service.py.
Search Weights
Adjust the hybrid search weights:Collection Settings
Best Practices
Learn focused topics
Create specific topics rather than generic ones:
- ✅
learn about Python async/await - ✅
learn about our GraphQL API schema - ❌
learn about programming(too broad)
Use descriptive topic names
Topic names are case-insensitive and normalized:
Python→pythonMachine Learning→machine learning- Use consistent naming for related topics
Provide context-rich content
Better source material leads to better retrieval:
- Include examples and explanations
- Add relevant keywords naturally
- Structure content clearly
Troubleshooting
Ollama connection issues
Error:Cannot reach Ollama at http://localhost:11434
Solutions:
- Check if Ollama is running:
ollama serve - Verify port with:
curl http://localhost:11434/api/tags - Set correct URL:
export OLLAMA_BASE_URL="http://your-host:11434"
Model not found
Error:Model not found. Run: ollama pull nomic-embed-text
Solution:
ChromaDB errors
Error:RAG store failed: ...
Solutions:
- Check disk space for ChromaDB directory
- Verify write permissions:
ls -la ~/workspace/source/backend/chroma_db/ - Delete and reinitialize:
rm -rf ~/workspace/source/backend/chroma_db/
Poor search results
Issues:- Irrelevant chunks returned
- Missing obvious matches
- Add more context when learning (longer, detailed content)
- Use more specific queries
- Adjust search weights (increase keyword_weight for exact matches)
- Verify topic names match (case-insensitive)
Performance Considerations
- Embedding time: ~100-500ms per chunk (depends on Ollama performance)
- Query time: ~50-200ms for hybrid search (5 results)
- Storage: ~3KB per 500-character chunk (embeddings + metadata)
- Memory: ChromaDB loads indices into memory for fast querying
For large knowledge bases (>10,000 chunks), consider running Ollama on a dedicated GPU server and setting
OLLAMA_BASE_URL to the remote endpoint.