Supported File Formats
Khoj can process and index the following file types:PDF Documents
Portable Document Format files (.pdf)
Word Documents
Microsoft Word files (.docx)
Markdown
Markdown files (.md)
Org Mode
Emacs Org mode files (.org)
Plain Text
Text files (.txt) and code files
HTML/XML
Web pages (.html, .htm, .xml)
Uploading Files
There are several ways to share your documents with Khoj:Web Interface
The easiest way to get started is by uploading files directly through the web UI.Navigate to the Search Page
Go to app.khoj.dev/search
Drag and Drop or Select Files
You can either drag files directly into the upload area or click to browse your files
The web interface is perfect for one-off documents you need to interact with quickly.
Desktop App
For continuous syncing of local documents, the desktop app provides the most seamless experience.Desktop App
Set up automatic document syncing with the Khoj desktop application
The desktop app is ideal if you have many documents on your computer or need them to stay in sync automatically.
Editor Integrations
If you use Obsidian or Emacs for note-taking, you can configure automatic syncing:Obsidian Plugin
Sync your Obsidian vault with Khoj
Emacs Integration
Integrate Khoj with your Emacs workflow
How Files Are Processed
When you upload files to Khoj, they go through several processing steps:- Text Extraction: Content is extracted from the file format (e.g., text from PDFs, content from Word documents)
- Chunking: Large documents are split into smaller, manageable chunks (typically ~256 tokens) for better search results
- Embedding: Each chunk is converted into a vector embedding using AI models
- Indexing: The embeddings are stored in the search index, making your content searchable
Khoj automatically handles documents with special characters and multiple languages. The chunking process preserves context by maintaining headings and document structure.
File Limits and Performance
Best Practices
- Organize your documents: Use clear filenames and folder structures for easier reference
- Keep files updated: Re-upload or sync files when you make significant changes
- Use appropriate formats: PDFs and Word documents work well for formatted content, while Markdown and plain text are great for notes
- Avoid duplicate uploads: The desktop app and editor integrations handle this automatically
Searching Your Files
Once indexed, you can:- Search: Find specific information across all your documents
- Chat: Ask questions about your documents and get contextual answers
- Get references: Khoj provides file paths and line numbers for source attribution
Search Documentation
Learn more about searching your indexed content
Privacy and Security
Your documents are processed and stored securely. When using Khoj Cloud, your data is encrypted in transit and at rest. For maximum privacy, you can also self-host Khoj.
Self-Hosting
Host Khoj on your own infrastructure for complete data control
Troubleshooting
Files not showing up in search
- Wait a few minutes for processing to complete
- Check that the file format is supported
- Verify the file isn’t empty or corrupted
- Try re-uploading the file
Slow processing for large files
- Large PDFs with many pages may take several minutes to process
- Consider splitting very large documents into smaller files
- The first sync with many files will take longer; subsequent syncs are faster
Character encoding issues
- Khoj handles UTF-8 encoded files by default
- If you see garbled text, ensure your files are saved with UTF-8 encoding
