Multilingual Support

Learn more about Mintlify

Enter your email to receive updates about new features and product releases.

Overview
Setup
Advanced Configuration
Query Prefix for Modern Models
Recommended Models
Testing Your Configuration

Overview

Khoj uses an embedding model to understand documents. Multilingual embedding models improve the search quality for documents not in English. This affects both search and chat with docs experiences across Khoj.

Setup

To improve search and chat quality for non-English documents, you can use a multilingual model.

For example, the paraphrase-multilingual-MiniLM-L12-v2 supports 50+ languages, has decent search quality and speed for a consumer machine.

Configure Search Model

Open the search config on your server’s admin settings page. Either create a new search model, if none exists, or update the existing one.For example:

Set the bi_encoder field to sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
Set the cross_encoder field to mixedbread-ai/mxbai-rerank-xsmall-v1

Regenerate Content Index

This step is very important, as you’ll need to re-encode all your content with the new model.

Regenerate your content index from all the relevant clients.

Advanced Configuration

Query Prefix for Modern Models

Modern search/embedding models like mixedbread-ai/mxbai-embed-large-v1 expect a prefix to the query (or docs) string to improve encoding.

Update the bi_encoder_query_encode_config field of your embedding model with {prompt: <prefix-prompt>} to improve the search quality of these models.

{
  "prompt": "Represent this query for searching documents"
}

You can pass any valid JSON object that the SentenceTransformer encode function accepts.

Recommended Models

Balanced Performance: paraphrase-multilingual-MiniLM-L12-v2

Languages: 50+ including Arabic, Chinese, Dutch, English, French, German, Italian, Korean, Polish, Portuguese, Russian, Spanish, TurkishPros:

Good balance of speed and quality
Works well on consumer hardware
Wide language support

Model: sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2

High Quality: mixedbread-ai/mxbai-embed-large-v1

Languages: Multilingual support with state-of-the-art performancePros:

Excellent search quality
Modern architecture
Supports query prefixes

Note: Requires more compute resourcesModel: mixedbread-ai/mxbai-embed-large-v1

Reranker: mxbai-rerank-xsmall-v1

Purpose: Rerank search results for improved relevancePros:

Small and fast
Improves final result quality
Works across languages

Model: mixedbread-ai/mxbai-rerank-xsmall-v1

Testing Your Configuration

After setting up multilingual support:

Index some documents in your target language
Try searching for content in that language
Start a chat and ask questions about your documents
Verify results are relevant and accurate

If search quality is not satisfactory, try a different embedding model or adjust the query prefix configuration for modern models.

Tailscale

⌘I

Build docs developers (and LLMs) love

Get started for free Talk to us

Get Started

Features

Clients

Data Sources

Advanced

Multilingual Support

Overview

Setup

Advanced Configuration

Query Prefix for Modern Models

Recommended Models

Testing Your Configuration

Build docs developers (and LLMs) love

Get Started

Features

Clients

Data Sources

Advanced

​Overview

​Setup

​Advanced Configuration

​Query Prefix for Modern Models

​Recommended Models

​Testing Your Configuration

Build docs developers (and LLMs) love

Overview

Setup

Advanced Configuration

Query Prefix for Modern Models

Recommended Models

Testing Your Configuration