How It Works
Khoj intelligently decides when to search online:Intent Detection
Khoj analyzes your query to determine if it requires current information:
- News and current events
- Recent product releases or updates
- Real-time data (weather, stock prices, scores)
- URLs or specific web content
- Questions beyond its training data cutoff
Web Search
Khoj searches the internet using configured search providers, gathering:
- Search results from multiple sources
- Webpage content and summaries
- Answer boxes and knowledge graphs
- Social media posts (if configured)
Content Retrieval
Relevant webpages are fetched and processed:
- Full text extraction
- Cleaned and formatted for readability
- Images and media identified
When to Use Online Search
- Automatic
- Explicit
Khoj automatically searches online for queries like:Current Events:Recent Information:Real-time Data:Web Content:
Example Queries
News & Current Events
News & Current Events
Khoj searches recent news and provides multiple sources for balanced coverage
Product Research
Product Research
Local Information
Local Information
How-To & Tutorials
How-To & Tutorials
Article Summaries
Article Summaries
Khoj reads the full webpage and provides structured summaries
Comparative Research
Comparative Research
Citations and Sources
Khoj always shows where information came from:
- Inline citations: [1], [2], [3] markers in the response
- Source list: Full URLs and titles at the end
- Clickable links: Navigate to original sources
- Multiple sources: Balanced perspective from different publishers
Combined Search: Web + Your Documents
Khoj can blend online research with your personal knowledge:Self-Hosting Configuration
Set up online search for your self-hosted Khoj instance:Search Providers
- SearXNG (Docker)
- Serper.dev
- Firecrawl
- Exa
Included in docker-compose.yml by defaultNo configuration needed! SearXNG runs automatically when you use:Benefits:
- Completely self-hosted and private
- Aggregates results from multiple search engines
- No API keys required
- Free and open source
SearXNG is the default option and works out of the box with Docker
Webpage Reading
Configure how Khoj reads and extracts webpage content:- Default (Requests)
- Firecrawl
- Olostep
- Exa
Built-in, no setup requiredKhoj uses Python’s
requests library to fetch webpages.- Works immediately
- No API keys needed
- Basic content extraction
- May struggle with JavaScript-heavy sites
You can use different providers for search and webpage reading. For example:
- Search with Serper.dev
- Read webpages with Firecrawl
Best Practices
Be Specific
“Latest iPhone reviews” is better than “phone reviews”
Include Time Context
“2024 tax deadlines” instead of just “tax deadlines”
Ask for Structure
“List pros and cons” or “summarize in bullet points”
Verify Sources
Always click through to cited sources for important information
Limitations
Paywall Content
Paywall Content
Khoj cannot access content behind paywalls or login-required sites.Workaround: Manually copy content and paste into chat
Real-Time Data
Real-Time Data
Search results may be a few minutes old, not instantaneous.Not suitable for: Live sports scores, stock tickers, breaking news
Geographic Bias
Geographic Bias
Results may favor certain regions depending on search provider.Tip: Include location in query: “news in Japan” vs just “news”
Rate Limits
Rate Limits
API-based search providers have usage limits.Self-hosting: Monitor your API usage and costs
Privacy Considerations
When using online search:
- Your queries are sent to the search provider
- Visited URLs may be logged by the search service
- Use SearXNG (self-hosted) for maximum privacy
- Khoj does not store or log your search queries
Troubleshooting
Online search not working
Online search not working
Cloud users:
- Should work automatically - report if broken
- Check Docker logs:
docker-compose logs searxng - Verify environment variables are set correctly
- Ensure API keys are valid
- Restart Khoj after configuration changes
Poor quality results
Poor quality results
Try:
- Make query more specific
- Include time context (“2024”, “recent”, “latest”)
- Use
/onlineto force web search - Try a different search provider
Cannot read specific webpage
Cannot read specific webpage
Possible causes:
- Paywall or login required
- JavaScript-heavy site
- Bot protection
- Try Olostep for complex sites
- Manually copy content and paste in chat
- Use reader mode in browser first
Slow response times
Slow response times
Reasons:
- Fetching multiple webpages
- Slow search API
- Complex webpage rendering
- Use Serper.dev for faster search
- Be more specific to reduce scope
- Consider upgrading self-hosted hardware
Advanced: Research Mode
For comprehensive research, use the/research command:
- Performs multiple searches
- Reads more sources
- Provides deeper analysis
- Takes longer but more thorough
- Better citations and cross-referencing
Next Steps
Automate Research
Schedule recurring online research tasks
Chat Features
Learn other slash commands and capabilities
Code Execution
Combine web data with code analysis
