Skip to main content

System Overview

Khoj is a full-stack AI application with a Python backend, Next.js frontend, and multiple client interfaces. It uses PostgreSQL with pgvector for semantic search and supports various AI models through a plugin architecture. Khoj Architecture

High-Level Architecture

1

Client Layer

Multiple client interfaces communicate with the Khoj server:
  • Web App (Next.js)
  • Desktop App (Electron)
  • Obsidian Plugin
  • Emacs Package
  • Android App
  • API Clients
2

API Layer

FastAPI-based REST API handles:
  • Authentication & Authorization
  • Chat & Search Endpoints
  • Content Indexing
  • Agent Management
  • Automation Triggers
3

Processing Layer

Core business logic for:
  • Document Processing
  • Embedding Generation
  • Conversation Management
  • Tool Execution
  • Research Mode
4

Storage Layer

  • PostgreSQL with pgvector for embeddings
  • Django ORM for database management
  • File system for static assets
5

External Services

  • LLM APIs (OpenAI, Anthropic, Google, etc.)
  • Speech Recognition (Whisper)
  • Image Generation
  • Web Search

Backend Architecture

Directory Structure

src/khoj/
├── app/                    # Django application
│   ├── settings.py        # Django settings
│   ├── urls.py            # URL routing
│   └── wsgi.py            # WSGI application
├── database/              # Database layer
│   ├── models/           # Django models
│   ├── adapters/         # Database adapters
│   └── migrations/       # Schema migrations
├── interface/             # Client interfaces
│   └── built/            # Compiled frontend assets
├── processor/             # Core processing logic
│   ├── content/          # Content processors
│   ├── conversation/     # Chat & conversation
│   ├── embeddings.py     # Embedding generation
│   ├── tools/            # Agent tools
│   └── speech/           # Speech processing
├── routers/               # FastAPI routers
│   ├── api.py            # Main API endpoints
│   ├── api_chat.py       # Chat endpoints
│   ├── api_agents.py     # Agent management
│   ├── api_content.py    # Content indexing
│   ├── auth.py           # Authentication
│   └── helpers.py        # Shared utilities
├── search_filter/         # Search filters
├── search_type/           # Search implementations
├── utils/                 # Utility functions
│   ├── helpers.py        # Helper functions
│   ├── initialization.py # App initialization
│   └── state.py          # Application state
├── configure.py           # Configuration logic
└── main.py               # Application entry point

Core Components

The entry point that:
  • Initializes Django
  • Sets up FastAPI application
  • Configures CORS middleware
  • Runs database migrations
  • Collects static files
  • Starts the uvicorn server
# Initialize Django
os.environ.setdefault("DJANGO_SETTINGS_MODULE", "khoj.app.settings")
django.setup()

# Initialize FastAPI
app = FastAPI()

# Configure routes and middleware
configure_routes(app)
configure_middleware(app)
FastAPI routers organized by domain:
  • api.py: Core API (health, config, sync)
  • api_chat.py: Chat & conversation (70KB+, main chat logic)
  • api_agents.py: Agent creation and management
  • api_content.py: Content upload and indexing
  • api_automation.py: Scheduled automations
  • api_subscription.py: User subscriptions
  • auth.py: OAuth, login, registration
  • helpers.py: Shared router utilities (130KB+)
Example endpoint structure:
@api_chat.post("/chat")
async def chat(
    request: Request,
    q: str,
    conversation_id: str = None,
    user: KhojUser = Depends(get_current_user)
):
    # Chat logic
Contains core processing modules:Content Processors (processor/content/):
  • PDF, Markdown, Org-mode, Word, Notion
  • Image, plaintext, Github, Docx
Conversation (processor/conversation/):
  • LLM integrations (OpenAI, Anthropic, Google, Offline)
  • Prompt management
  • Context building
  • Tool calling
Tools (processor/tools/):
  • Web search
  • Code execution
  • Document retrieval
  • Custom agent tools
Embeddings (processor/embeddings.py):
  • Text embedding generation
  • Batch processing
  • Model management
Django-based database management:Models (database/models/):
  • User, Subscription
  • Conversation, ChatModel
  • Agent, Entry
  • SearchModel, Automation
  • ProcessLock
Adapters (database/adapters/):
  • High-level database operations
  • Business logic for data access
  • Caching and optimization
Migrations (database/migrations/):
  • Schema version control
  • Database evolution
Shared utility modules:
  • helpers.py: Common helper functions (47KB)
  • initialization.py: App startup logic
  • state.py: Application state management
  • cli.py: Command-line interface
  • constants.py: Application constants

Frontend Architecture

Web Application

src/interface/web/
├── app/                   # Next.js App Router
│   ├── (app)/            # Main application routes
│   ├── api/              # API routes
│   ├── layout.tsx        # Root layout
│   └── page.tsx          # Home page
├── components/            # React components
│   ├── ui/               # UI primitives (Radix)
│   ├── chat/             # Chat components
│   ├── agents/           # Agent management
│   └── settings/         # Settings UI
├── lib/                   # Utilities
│   ├── utils.ts          # Helper functions
│   └── api.ts            # API client
├── public/                # Static assets
├── styles/                # Global styles
└── package.json           # Dependencies

Key Frontend Technologies

Next.js 14

  • App Router for file-based routing
  • Server-side rendering (SSR)
  • API routes for backend proxying
  • Static export for production

TypeScript

  • Type-safe React components
  • API client with full typing
  • Enhanced IDE support

Tailwind CSS

  • Utility-first styling
  • Responsive design
  • Custom design system

Radix UI

  • Accessible primitives
  • Dialog, Dropdown, Tooltip
  • Form components

State Management

  • SWR: Data fetching and caching
  • React Context: Global state (user, settings)
  • URL State: Search params for routing
  • WebSockets: Real-time chat streaming

Client Architecture

Technology: Electron + ReactLocation: src/interface/desktop/Features:
  • Tray icon integration
  • System notifications
  • Auto-start on login
  • Local file indexing
  • Embedded web view
Structure:
desktop/
├── main.js          # Electron main process
├── preload.js       # Preload script
├── renderer/        # React renderer
└── package.json

Data Flow

Search & Retrieval

1

Query Ingestion

User submits search query through client → API endpoint
2

Embedding Generation

Query text converted to vector embedding using Sentence Transformers
3

Vector Search

pgvector performs cosine similarity search in PostgreSQL
4

Re-ranking

Top results re-ranked using cross-encoder model for accuracy
5

Filter Application

Apply date, file, and content filters to results
6

Response

Ranked results returned to client with metadata

Chat Conversation

1

Message Ingestion

User message received at /api/chat endpoint
2

Context Retrieval

Relevant documents retrieved using semantic search
3

Tool Planning

Agent determines which tools to use (search, calculate, etc.)
4

Tool Execution

Tools executed in sequence, results collected
5

Prompt Construction

System prompt + conversation history + context + tool results
6

LLM Generation

Prompt sent to configured LLM (OpenAI, Anthropic, etc.)
7

Streaming Response

Response streamed back to client via WebSocket/SSE
8

History Storage

Conversation saved to database for future context

Content Indexing

1

Content Upload

Files/URLs uploaded through /api/content
2

Format Detection

File type detected using mime type and magika
3

Content Extraction

Appropriate processor extracts text (PDF, Markdown, etc.)
4

Chunking

Text split into semantic chunks using LangChain
5

Embedding Generation

Each chunk converted to vector embedding
6

Storage

Embeddings and metadata stored in PostgreSQL with pgvector
7

Indexing Complete

Content now searchable and available for chat context

Database Schema

Key Models

class KhojUser(AbstractUser):
    uuid = models.UUIDField(default=uuid.uuid4)
    email = models.EmailField(unique=True)
    phone_number = models.CharField()
    is_active = models.BooleanField(default=True)
    subscription = models.ForeignKey(Subscription)
class Conversation(models.Model):
    user = models.ForeignKey(KhojUser)
    conversation_log = models.JSONField(default=dict)
    agent = models.ForeignKey(Agent, null=True)
    title = models.CharField()
    created_at = models.DateTimeField(auto_now_add=True)
    updated_at = models.DateTimeField(auto_now=True)
class Entry(models.Model):
    user = models.ForeignKey(KhojUser)
    embeddings = VectorField(dimensions=384)  # pgvector
    raw = models.TextField()
    compiled = models.TextField()
    heading = models.CharField()
    file_source = models.CharField()
    file_type = models.CharField()
    created_at = models.DateTimeField(auto_now_add=True)
class Agent(models.Model):
    creator = models.ForeignKey(KhojUser)
    name = models.CharField()
    personality = models.TextField()
    chat_model = models.ForeignKey(ChatModelOptions)
    tools = models.JSONField(default=list)
    files = models.ManyToManyField(Entry)
    public = models.BooleanField(default=False)

Testing Architecture

Test Organization

tests/
├── conftest.py              # Pytest fixtures
├── helpers.py               # Test utilities
├── test_client.py           # API endpoint tests
├── test_agents.py           # Agent functionality
├── test_conversation_utils.py
├── test_date_filter.py
├── test_file_filter.py
├── test_grep_files.py
├── data/                    # Test data
└── evals/                   # Evaluation tests

Testing Stack

pytest

  • Unit and integration tests
  • Fixtures for database setup
  • Parallel test execution

pytest-django

  • Django test database
  • Model factories
  • Transaction management

pytest-asyncio

  • Async endpoint testing
  • WebSocket tests

factory-boy

  • Test data generation
  • Model factories

CI/CD Pipeline

GitHub Actions Workflows

Runs on every PR and push to master:
  • Matrix testing on Python 3.10, 3.11, 3.12
  • PostgreSQL service with pgvector
  • Full test suite execution
  • Type checking with mypy

Performance Considerations

Embedding Caching

Embeddings cached to avoid recomputation. Incremental updates only process changed content.

Connection Pooling

PostgreSQL connection pooling via Django to handle concurrent requests efficiently.

Async Processing

FastAPI async endpoints allow non-blocking I/O for external API calls.

Vector Indexing

pgvector indexes optimize similarity search performance for large datasets.
For detailed performance metrics, see the Performance Guide.

Security Architecture

1

Authentication

  • OAuth 2.0 (Google, GitHub)
  • JWT-based sessions
  • Magic link login
2

Authorization

  • User-scoped data access
  • Agent permission model
  • Public vs private agents
3

Data Protection

  • HTTPS-only in production
  • Environment variable secrets
  • Secure password hashing
4

Input Validation

  • Pydantic models for API validation
  • File upload size limits
  • Content sanitization

Extension Points

Adding New Content Types

  1. Create processor in processor/content/
  2. Implement ContentProcessor interface
  3. Register in content type registry
  4. Add tests in tests/

Adding New LLM Providers

  1. Create provider class in processor/conversation/
  2. Implement LLMProvider interface
  3. Add configuration in settings
  4. Update model selection UI

Adding New Tools

  1. Create tool in processor/tools/
  2. Define tool schema
  3. Register in agent toolset
  4. Add to tool selection UI

Visual References

Codebase Visualization

Interactive visualization of the entire codebase structure

Architecture Diagram

Refer to “ for the complete system diagram

Additional Resources

Development Setup

Set up your local environment

Performance Guide

Optimization best practices

API Documentation

Complete API reference

GitHub Repository

View the source code

Build docs developers (and LLMs) love