Overview
Khoj uses an embedding model to understand documents. Multilingual embedding models improve the search quality for documents not in English. This affects both search and chat with docs experiences across Khoj.Setup
To improve search and chat quality for non-English documents, you can use a multilingual model.For example, the paraphrase-multilingual-MiniLM-L12-v2 supports 50+ languages, has decent search quality and speed for a consumer machine.
Configure Search Model
Open the search config on your server’s admin settings page. Either create a new search model, if none exists, or update the existing one.For example:
- Set the
bi_encoderfield tosentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 - Set the
cross_encoderfield tomixedbread-ai/mxbai-rerank-xsmall-v1
Advanced Configuration
Query Prefix for Modern Models
Modern search/embedding models like mixedbread-ai/mxbai-embed-large-v1 expect a prefix to the query (or docs) string to improve encoding.
bi_encoder_query_encode_config field of your embedding model with {prompt: <prefix-prompt>} to improve the search quality of these models.
You can pass any valid JSON object that the SentenceTransformer
encode function accepts.Recommended Models
Balanced Performance: paraphrase-multilingual-MiniLM-L12-v2
Balanced Performance: paraphrase-multilingual-MiniLM-L12-v2
Languages: 50+ including Arabic, Chinese, Dutch, English, French, German, Italian, Korean, Polish, Portuguese, Russian, Spanish, TurkishPros:
- Good balance of speed and quality
- Works well on consumer hardware
- Wide language support
sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2High Quality: mixedbread-ai/mxbai-embed-large-v1
High Quality: mixedbread-ai/mxbai-embed-large-v1
Languages: Multilingual support with state-of-the-art performancePros:
- Excellent search quality
- Modern architecture
- Supports query prefixes
mixedbread-ai/mxbai-embed-large-v1Reranker: mxbai-rerank-xsmall-v1
Reranker: mxbai-rerank-xsmall-v1
Purpose: Rerank search results for improved relevancePros:
- Small and fast
- Improves final result quality
- Works across languages
mixedbread-ai/mxbai-rerank-xsmall-v1Testing Your Configuration
After setting up multilingual support:- Index some documents in your target language
- Try searching for content in that language
- Start a chat and ask questions about your documents
- Verify results are relevant and accurate
