Quark — Talk to your docs

Quark is a Retrieval-Augmented Generation (RAG) system that lets you ingest PDFs and documents, then ask questions about them in natural language. It processes both text and images, stores context across sessions using a dual-memory layer, and always cites the source page for every answer.

Quick start

Set up Quark and chat with your first document in minutes.

Architecture

Understand the ingestion pipeline, memory system, and vector search.

CLI usage

Learn all CLI commands for ingesting documents and querying them locally.

API reference

Integrate Quark into your application via the REST API.

How it works

Ingest your documents

Point Quark at a PDF. It extracts text and images using Unstructured.io and pdfplumber, then embeds everything into a Qdrant vector database.

Ask a question

Type your question in the CLI or send it to the /chat/completions endpoint. Quark retrieves the most relevant chunks and re-ranks them for precision.

Get a grounded answer

The LLM responds using only your document content, citing sources by page number. Short-term context is kept in Redis; long-term memory is compressed into Mem0.

Key features

Multimodal ingestion

Processes text, tables, and images from PDFs. Visual elements are described by a vision LLM and made searchable.

Dual-layer memory

Redis handles short-term session context. Mem0 stores long-term user preferences and history across sessions.

Re-ranked retrieval

VoyageAI embeddings combined with a re-ranking pass deliver highly relevant context to the LLM.

Grounded responses

The LLM is instructed to use only your document content and cite every source — no hallucinations.

CLI interface

A full-featured terminal UI with session management, ingest tracking, and chat history — no browser needed.

REST API

An Elysia-powered HTTP server exposes ingestion, retrieval, and session management endpoints for programmatic access.

Get Started

Architecture

Using Quark

Self-Hosting

Quark — Talk to your docs

Quick start

Architecture

CLI usage

API reference

How it works

Key features

Multimodal ingestion

Dual-layer memory

Re-ranked retrieval

Grounded responses

CLI interface

REST API

Build docs developers (and LLMs) love

Get Started

Architecture

Using Quark

Self-Hosting

Quick start

Architecture

CLI usage

API reference

​How it works

​Key features

Multimodal ingestion

Dual-layer memory

Re-ranked retrieval

Grounded responses

CLI interface

REST API

Build docs developers (and LLMs) love

How it works

Key features