Files

Khoj supports indexing a wide variety of file formats, making it easy to search and chat with your documents. You can upload files directly through the web interface, use the desktop app for automatic syncing, or integrate with your favorite note-taking tools.

Supported File Formats

Khoj can process and index the following file types:

PDF Documents

Portable Document Format files (.pdf)

Word Documents

Microsoft Word files (.docx)

Markdown

Markdown files (.md)

Org Mode

Emacs Org mode files (.org)

Plain Text

Text files (.txt) and code files

HTML/XML

Web pages (.html, .htm, .xml)

Uploading Files

There are several ways to share your documents with Khoj:

Web Interface

The easiest way to get started is by uploading files directly through the web UI.

Navigate to the Search Page

Go to app.khoj.dev/search

Click 'Add Documents'

Look for the “Add Documents” button in the interface

Drag and Drop or Select Files

You can either drag files directly into the upload area or click to browse your files

Wait for Processing

Khoj will automatically process and index your documents, making them searchable

Upload documents by dragging and dropping

The web interface is perfect for one-off documents you need to interact with quickly.

Desktop App

For continuous syncing of local documents, the desktop app provides the most seamless experience.

Desktop App

Set up automatic document syncing with the Khoj desktop application

The desktop app is ideal if you have many documents on your computer or need them to stay in sync automatically.

Editor Integrations

If you use Obsidian or Emacs for note-taking, you can configure automatic syncing:

Obsidian Plugin

Sync your Obsidian vault with Khoj

Emacs Integration

Integrate Khoj with your Emacs workflow

How Files Are Processed

When you upload files to Khoj, they go through several processing steps:

Text Extraction: Content is extracted from the file format (e.g., text from PDFs, content from Word documents)
Chunking: Large documents are split into smaller, manageable chunks (typically ~256 tokens) for better search results
Embedding: Each chunk is converted into a vector embedding using AI models
Indexing: The embeddings are stored in the search index, making your content searchable

Khoj automatically handles documents with special characters and multiple languages. The chunking process preserves context by maintaining headings and document structure.

File Limits and Performance

Very long words (over 500 characters) are automatically removed during processing to maintain quality. This typically only affects corrupted or malformed files.

Best Practices

Organize your documents: Use clear filenames and folder structures for easier reference
Keep files updated: Re-upload or sync files when you make significant changes
Use appropriate formats: PDFs and Word documents work well for formatted content, while Markdown and plain text are great for notes
Avoid duplicate uploads: The desktop app and editor integrations handle this automatically

Searching Your Files

Once indexed, you can:

Search: Find specific information across all your documents
Chat: Ask questions about your documents and get contextual answers
Get references: Khoj provides file paths and line numbers for source attribution

Search Documentation

Learn more about searching your indexed content

Privacy and Security

Your documents are processed and stored securely. When using Khoj Cloud, your data is encrypted in transit and at rest. For maximum privacy, you can also self-host Khoj.

Self-Hosting

Host Khoj on your own infrastructure for complete data control

Troubleshooting

Files not showing up in search

Wait a few minutes for processing to complete
Check that the file format is supported
Verify the file isn’t empty or corrupted
Try re-uploading the file

Slow processing for large files

Large PDFs with many pages may take several minutes to process
Consider splitting very large documents into smaller files
The first sync with many files will take longer; subsequent syncs are faster

Character encoding issues

Khoj handles UTF-8 encoded files by default
If you see garbled text, ensure your files are saved with UTF-8 encoding

Get Started

Features

Clients

Data Sources

Advanced

Supported File Formats

PDF Documents

Word Documents

Markdown

Org Mode

Plain Text

HTML/XML

Uploading Files

Web Interface

Desktop App

Desktop App

Editor Integrations

Obsidian Plugin

Emacs Integration

How Files Are Processed

File Limits and Performance

Best Practices

Searching Your Files

Search Documentation

Privacy and Security

Self-Hosting

Troubleshooting

Files not showing up in search

Slow processing for large files

Character encoding issues

Build docs developers (and LLMs) love

Get Started

Features

Clients

Data Sources

Advanced

​Supported File Formats

PDF Documents

Word Documents

Markdown

Org Mode

Plain Text

HTML/XML

​Uploading Files

​Web Interface

​Desktop App

Desktop App

​Editor Integrations

Obsidian Plugin

Emacs Integration

​How Files Are Processed

​File Limits and Performance

​Best Practices

​Searching Your Files

Search Documentation

​Privacy and Security

Self-Hosting

​Troubleshooting

​Files not showing up in search

​Slow processing for large files

​Character encoding issues

Build docs developers (and LLMs) love

Supported File Formats

Uploading Files

Web Interface

Desktop App

Editor Integrations

How Files Are Processed

File Limits and Performance

Best Practices

Searching Your Files

Privacy and Security

Troubleshooting

Files not showing up in search

Slow processing for large files

Character encoding issues