Skip to main content
Quark is a Retrieval-Augmented Generation (RAG) system built for deep document analysis and persistent context awareness. You point it at a PDF, it ingests the text and images, and you chat with the content directly. Every answer is grounded in your documents, with the source cited by page — no hallucinations.
All services that Quark depends on are free to use and do not require a credit card for registration.

What Quark does

Quark solves a fundamental problem with general-purpose LLMs: they make things up. When you ask a question about a document, a standard chatbot may confidently answer with information that was never in the file. Quark prevents this by constraining the LLM to only what it retrieved from your ingested documents, and by citing the source for every claim. Beyond retrieval accuracy, Quark is aware of context across a session (Redis short-term memory) and across sessions (Mem0 long-term memory), so it remembers your preferences and prior conversations without you repeating yourself.

Key capabilities

Multimodal document ingestion

Parses PDFs for both text and images. Text is partitioned by Unstructured.io; images are extracted by pdfplumber. A custom sync layer aligns both modalities before embedding.

Dual-layer memory

Short-term memory (STM) is handled by Redis for rapid in-session context. Long-term memory (LTM) is powered by Mem0 to persist user history and preferences across sessions.

Vector search with Qdrant

Document chunks are embedded with VoyageAI and stored in Qdrant. At query time, the most relevant vectors are retrieved and passed to the LLM as context.

CLI interface

A full-featured terminal UI for ingesting documents, managing sessions, and chatting — no browser required.

REST API

An Elysia-powered HTTP server exposes ingestion, retrieval, and session management endpoints for programmatic access.

Local chat history

All conversation logs are stored in a local SQLite database. Your data stays on your system.

Services Quark relies on

Quark is assembled from several best-in-class services. You will need an account for each one to run the full system.
ServiceRole
Groq (or any OpenAI-compatible provider)LLM inference
VoyageAIText embeddings
Unstructured.ioDocument partitioning (text + tables)
QdrantVector database
Mem0Long-term memory
Upstash RedisShort-term session memory
ElasticLakeObject storage
SupabaseRelational database

Architecture at a glance

Quark follows a Modular RAG pattern. Ingestion of images and text is decoupled and re-synced at the metadata level, preserving higher contextual integrity than text-only pipelines. The dual-memory layer mimics human cognitive function by separating immediate recall (Redis) from historical knowledge (Mem0). When you ingest a document, Quark runs it through a multi-stage pipeline:
  1. Parse — Unstructured.io splits text; pdfplumber extracts images.
  2. Sync — A custom sync layer aligns text and image chunks by position.
  3. Embed — VoyageAI converts each chunk into a dense vector.
  4. Store — Vectors are upserted into Qdrant; objects go to ElasticLake.
When you ask a question, Quark:
  1. Embeds your query with VoyageAI.
  2. Retrieves the top matching chunks from Qdrant.
  3. Hydrates the prompt with STM context from Redis and LTM context from Mem0.
  4. Sends the grounded prompt to the LLM and streams the response.

Build docs developers (and LLMs) love