Skip to main content
Privacy is a cornerstone principle of Khoj. Whether you use Khoj Cloud or self-host, we’re committed to keeping your personal data secure and under your control.

Privacy by Design

Khoj is built with privacy as a core feature, not an afterthought:

Always Self-Hostable

Run Khoj entirely on your own hardware with zero data leaving your network.

Open Source

All code is public and auditable on GitHub.

No Data Selling

We will never sell your data or use it to train models.

Minimal Collection

We collect only what’s necessary to provide the service.

Self-Hosted Privacy

Self-hosting gives you complete control and maximum privacy:

Complete Data Ownership

1

Your Hardware, Your Data

When you self-host Khoj, all your data stays on your machine:
  • Documents are stored in your PostgreSQL database
  • Embeddings generated locally never leave your device
  • Chat history remains on your server
  • No external servers see your queries or files
2

Offline Operation

Khoj can run completely offline:
docker-compose.yml
environment:
  # Use local models only
  - OPENAI_BASE_URL=http://localhost:11434/v1/
  - KHOJ_DEFAULT_CHAT_MODEL=qwen3
With local models like Ollama, you never need an internet connection.
3

No Telemetry Required

Disable anonymous usage telemetry completely:
docker-compose.yml
environment:
  - KHOJ_TELEMETRY_DISABLE=True
Telemetry helps us prioritize features, but it’s completely optional and anonymous.

What Gets Sent Where?

Even when self-hosting with online AI models, here’s what happens:
Data TypeStored LocallySent to AI ProviderSent to Khoj
Your documents
Embeddings
Chat messages✅ (Only relevant context)
Usage stats⚠️ (Anonymous, if enabled)
When using commercial AI models (OpenAI, Anthropic, Google), relevant portions of your indexed data may be sent as context with your queries. This is necessary for the AI to answer questions about your documents.

Secure Remote Access

Access your self-hosted Khoj securely:

Khoj Cloud Privacy

Not everyone can self-host, so we’ve built Khoj Cloud with strong privacy protections:

Data Handling

Your Documents & Embeddings
  • Stored in encrypted PostgreSQL database on AWS
  • Sharded by unique user ID (isolated per user)
  • Embeddings generated by open-source models on our private Hugging Face endpoints
  • Raw text stored to improve syncing and provide chat context
Your Account InformationWith Google SSO:
  • Name
  • Email address
  • Profile photo URL
We do NOT access:
  • Your Gmail
  • Google Drive
  • Any other Google services
Chat History
  • Stored encrypted in our database
  • Used only to provide conversation continuity
  • Never used for model training
  • IP addresses (anonymized in logs)
  • Detailed usage patterns (only aggregate metrics)
  • Your API keys for third-party services
  • Billing information (handled by Stripe)
When you use Khoj Cloud, your queries may be sent to:
ServiceWhenWhat’s SentPurpose
OpenAIUsing GPT modelsQuery + relevant document contextGenerate AI responses
AnthropicUsing Claude modelsQuery + relevant document contextGenerate AI responses
GoogleUsing Gemini modelsQuery + relevant document contextGenerate AI responses
Hugging FaceAlwaysDocument text onlyGenerate embeddings (on private endpoint)
Serper/ExaUsing /onlineSearch query onlyWeb search results
You control which AI models you use via Settings. Choose based on your privacy preferences.

Infrastructure Security

1

Encryption at Rest

All data in our PostgreSQL database is encrypted using AWS RDS encryption:
  • AES-256 encryption
  • Automatic encrypted backups
  • Encrypted storage volumes
2

Encryption in Transit

All communication uses TLS 1.3:
  • HTTPS for web traffic
  • Encrypted API calls
  • Secure WebSocket connections
3

Isolated Embeddings

Document embeddings are generated on stateless Hugging Face endpoints:
  • No persistent memory
  • Private dedicated endpoints
  • Hosted on AWS within our infrastructure
4

Access Controls

  • Multi-factor authentication available
  • Role-based access (for team plans)
  • Session management with secure cookies

Privacy Controls

You control your data:

Delete Your Data

Go to SettingsDanger ZoneDelete AccountImmediately removes all your data from our systems.

Export Your Data

Go to SettingsData ExportDownload all your documents and chat history.

Revoke Integrations

Disconnect Notion, GitHub, or other integrations anytime.Stops data syncing immediately.

Choose AI Models

Select which AI providers you’re comfortable with.Different providers have different privacy policies.

Telemetry & Analytics

Khoj collects minimal, anonymous usage data to improve the product:

What We Collect

{
  "event": "chat_message_sent",
  "timestamp": "2024-03-05T10:30:00Z",
  "khoj_version": "1.0.0",
  "server_id": "anonymous-hash-xyz123"
}

What We DON’T Collect

  • ❌ IP addresses
  • ❌ Query contents
  • ❌ Document contents or names
  • ❌ Chat message contents
  • ❌ File paths or folder structure
  • ❌ Any personally identifiable information

How Telemetry is Used

  1. Feature Prioritization: Understanding which features are most used
  2. Performance Monitoring: Detecting performance issues across versions
  3. Error Tracking: Identifying bugs to fix
All telemetry is sent to PostHog, an open-source analytics platform. View our telemetry code:

Disable Telemetry

Add to your environment configuration:
docker-compose.yml
environment:
  - KHOJ_TELEMETRY_DISABLE=True
Or for pip installation:
export KHOJ_TELEMETRY_DISABLE=True
khoj

AI Model Privacy Comparison

Different AI providers have different privacy policies:
ProviderData RetentionTraining on Your DataPrivacy Policy
OpenAI30 days❌ (API zero retention)Link
AnthropicNo retention❌ (Explicit opt-out)Link
Google GeminiNo retention❌ (Enterprise API)Link
Local (Ollama)Never leaves deviceN/A
For maximum privacy, use local models with Ollama or LM Studio.

Security Best Practices

Follow these guidelines when using Khoj:

For Self-Hosting

1

Strong Admin Credentials

Set secure passwords:
docker-compose.yml
environment:
  - [email protected]
  - KHOJ_ADMIN_PASSWORD=use-a-long-random-password-here
  - KHOJ_DJANGO_SECRET_KEY=generate-a-unique-secret-key
Use a password manager to generate strong passwords.
2

Regular Updates

Keep Khoj updated:
# Docker
docker-compose pull && docker-compose up

# Pip
pip install --upgrade khoj
3

Network Security

  • Use Tailscale or VPN for remote access
  • Don’t expose ports directly to the internet
  • Use HTTPS with valid certificates
  • Keep firewall rules restrictive
4

Backup Your Data

Regularly backup your PostgreSQL database:
# Docker volume backup
docker-compose exec database pg_dump -U postgres postgres > khoj_backup.sql

For Cloud Users

1

Enable MFA

Add two-factor authentication to your account (coming soon).
2

Review Integrations

Periodically check SettingsIntegrations and remove unused connections.
3

Monitor Activity

Check your recent conversations and uploaded documents regularly.

Compliance & Certifications

Khoj is working toward SOC 2 Type II compliance. Contact [email protected] for enterprise security requirements.

Current Status

StandardStatusNotes
GDPR✅ CompliantRight to deletion, export, and access
SOC 2🔄 In ProgressExpected Q3 2026
HIPAA❌ Not compliantNot recommended for healthcare PHI

Privacy Policy & Terms

Read our full legal documents:

Data Breaches & Incident Response

In the unlikely event of a security incident:
  1. We’ll notify affected users within 72 hours
  2. Provide details on what data was affected
  3. Outline remediation steps taken
  4. Publish a post-mortem on our blog

Your Data is Yours

Core Principle

We exist to serve you, not to monetize your data.Khoj is a sustainable, open-source alternative to closed-source corporate AI. We make money through subscriptions and enterprise licenses, not by selling user data.

FAQ

On Khoj Cloud: Employees can only access your data for support purposes with your explicit permission or in response to legal requirements.Self-Hosted: Only you have access to your data.
No. We never use your documents, chats, or queries to train any AI models. This applies to both cloud and self-hosted deployments.
All your data is permanently deleted within 30 days:
  • Documents and embeddings
  • Chat history
  • Account information
  • Integration credentials
Backups are purged after 30 days.
We comply with valid legal requests. For cloud users, we’ll notify you unless legally prohibited. Self-hosted users control their own data and legal obligations.
Not currently for cloud. E2E encryption would prevent server-side search and embeddings generation. Self-hosting provides equivalent privacy by keeping all data local.

Contact & Concerns

Privacy questions or concerns?
Interested in helping us build more privacy features? Join our Discord or contribute on GitHub.

Build docs developers (and LLMs) love