Privacy by Design
Khoj is built with privacy as a core feature, not an afterthought:Always Self-Hostable
Run Khoj entirely on your own hardware with zero data leaving your network.
Open Source
All code is public and auditable on GitHub.
No Data Selling
We will never sell your data or use it to train models.
Minimal Collection
We collect only what’s necessary to provide the service.
Self-Hosted Privacy
Self-hosting gives you complete control and maximum privacy:Complete Data Ownership
Your Hardware, Your Data
When you self-host Khoj, all your data stays on your machine:
- Documents are stored in your PostgreSQL database
- Embeddings generated locally never leave your device
- Chat history remains on your server
- No external servers see your queries or files
Offline Operation
Khoj can run completely offline:With local models like Ollama, you never need an internet connection.
docker-compose.yml
What Gets Sent Where?
Even when self-hosting with online AI models, here’s what happens:| Data Type | Stored Locally | Sent to AI Provider | Sent to Khoj |
|---|---|---|---|
| Your documents | ✅ | ❌ | ❌ |
| Embeddings | ✅ | ❌ | ❌ |
| Chat messages | ✅ | ✅ (Only relevant context) | ❌ |
| Usage stats | ✅ | ❌ | ⚠️ (Anonymous, if enabled) |
Secure Remote Access
Access your self-hosted Khoj securely:- Tailscale (Recommended)
- Reverse Proxy
- Local Network Only
Use Tailscale for encrypted, private network access:
- End-to-end encrypted via WireGuard
- No ports exposed to the internet
- Access from any device on your Tailscale network
Khoj Cloud Privacy
Not everyone can self-host, so we’ve built Khoj Cloud with strong privacy protections:Data Handling
What We Store
What We Store
Your Documents & Embeddings
- Stored in encrypted PostgreSQL database on AWS
- Sharded by unique user ID (isolated per user)
- Embeddings generated by open-source models on our private Hugging Face endpoints
- Raw text stored to improve syncing and provide chat context
- Name
- Email address
- Profile photo URL
- Your Gmail
- Google Drive
- Any other Google services
- Stored encrypted in our database
- Used only to provide conversation continuity
- Never used for model training
What We Don't Store
What We Don't Store
- IP addresses (anonymized in logs)
- Detailed usage patterns (only aggregate metrics)
- Your API keys for third-party services
- Billing information (handled by Stripe)
Third-Party Services
Third-Party Services
When you use Khoj Cloud, your queries may be sent to:
| Service | When | What’s Sent | Purpose |
|---|---|---|---|
| OpenAI | Using GPT models | Query + relevant document context | Generate AI responses |
| Anthropic | Using Claude models | Query + relevant document context | Generate AI responses |
| Using Gemini models | Query + relevant document context | Generate AI responses | |
| Hugging Face | Always | Document text only | Generate embeddings (on private endpoint) |
| Serper/Exa | Using /online | Search query only | Web search results |
You control which AI models you use via Settings. Choose based on your privacy preferences.
Infrastructure Security
Encryption at Rest
All data in our PostgreSQL database is encrypted using AWS RDS encryption:
- AES-256 encryption
- Automatic encrypted backups
- Encrypted storage volumes
Encryption in Transit
All communication uses TLS 1.3:
- HTTPS for web traffic
- Encrypted API calls
- Secure WebSocket connections
Isolated Embeddings
Document embeddings are generated on stateless Hugging Face endpoints:
- No persistent memory
- Private dedicated endpoints
- Hosted on AWS within our infrastructure
Privacy Controls
You control your data:Delete Your Data
Go to Settings → Danger Zone → Delete AccountImmediately removes all your data from our systems.
Export Your Data
Go to Settings → Data ExportDownload all your documents and chat history.
Revoke Integrations
Disconnect Notion, GitHub, or other integrations anytime.Stops data syncing immediately.
Choose AI Models
Select which AI providers you’re comfortable with.Different providers have different privacy policies.
Telemetry & Analytics
Khoj collects minimal, anonymous usage data to improve the product:What We Collect
What We DON’T Collect
- ❌ IP addresses
- ❌ Query contents
- ❌ Document contents or names
- ❌ Chat message contents
- ❌ File paths or folder structure
- ❌ Any personally identifiable information
How Telemetry is Used
- Feature Prioritization: Understanding which features are most used
- Performance Monitoring: Detecting performance issues across versions
- Error Tracking: Identifying bugs to fix
All telemetry is sent to PostHog, an open-source analytics platform. View our telemetry code:
Disable Telemetry
- Self-Hosted
- Cloud
Add to your environment configuration:Or for pip installation:
docker-compose.yml
AI Model Privacy Comparison
Different AI providers have different privacy policies:Security Best Practices
Follow these guidelines when using Khoj:For Self-Hosting
Strong Admin Credentials
Set secure passwords:Use a password manager to generate strong passwords.
docker-compose.yml
Network Security
- Use Tailscale or VPN for remote access
- Don’t expose ports directly to the internet
- Use HTTPS with valid certificates
- Keep firewall rules restrictive
For Cloud Users
Compliance & Certifications
Khoj is working toward SOC 2 Type II compliance. Contact [email protected] for enterprise security requirements.
Current Status
| Standard | Status | Notes |
|---|---|---|
| GDPR | ✅ Compliant | Right to deletion, export, and access |
| SOC 2 | 🔄 In Progress | Expected Q3 2026 |
| HIPAA | ❌ Not compliant | Not recommended for healthcare PHI |
Privacy Policy & Terms
Read our full legal documents:Data Breaches & Incident Response
In the unlikely event of a security incident:- We’ll notify affected users within 72 hours
- Provide details on what data was affected
- Outline remediation steps taken
- Publish a post-mortem on our blog
Your Data is Yours
Core Principle
We exist to serve you, not to monetize your data.Khoj is a sustainable, open-source alternative to closed-source corporate AI. We make money through subscriptions and enterprise licenses, not by selling user data.
FAQ
Can Khoj employees see my data?
Can Khoj employees see my data?
On Khoj Cloud: Employees can only access your data for support purposes with your explicit permission or in response to legal requirements.Self-Hosted: Only you have access to your data.
Is my data used to train AI models?
Is my data used to train AI models?
No. We never use your documents, chats, or queries to train any AI models. This applies to both cloud and self-hosted deployments.
What happens if I delete my account?
What happens if I delete my account?
All your data is permanently deleted within 30 days:
- Documents and embeddings
- Chat history
- Account information
- Integration credentials
Can law enforcement access my data?
Can law enforcement access my data?
We comply with valid legal requests. For cloud users, we’ll notify you unless legally prohibited. Self-hosted users control their own data and legal obligations.
Is end-to-end encryption possible?
Is end-to-end encryption possible?
Not currently for cloud. E2E encryption would prevent server-side search and embeddings generation. Self-hosting provides equivalent privacy by keeping all data local.
Contact & Concerns
Privacy questions or concerns?- Email: [email protected]
- Security Issues: [email protected] (PGP key available)
- General: [email protected]
