Introduction

Quark is a Retrieval-Augmented Generation (RAG) system built for deep document analysis and persistent context awareness. You point it at a PDF, it ingests the text and images, and you chat with the content directly. Every answer is grounded in your documents, with the source cited by page — no hallucinations.

All services that Quark depends on are free to use and do not require a credit card for registration.

What Quark does

Quark solves a fundamental problem with general-purpose LLMs: they make things up. When you ask a question about a document, a standard chatbot may confidently answer with information that was never in the file. Quark prevents this by constraining the LLM to only what it retrieved from your ingested documents, and by citing the source for every claim. Beyond retrieval accuracy, Quark is aware of context across a session (Redis short-term memory) and across sessions (Mem0 long-term memory), so it remembers your preferences and prior conversations without you repeating yourself.

Key capabilities

Multimodal document ingestion

Parses PDFs for both text and images. Text is partitioned by Unstructured.io; images are extracted by pdfplumber. A custom sync layer aligns both modalities before embedding.

Dual-layer memory

Short-term memory (STM) is handled by Redis for rapid in-session context. Long-term memory (LTM) is powered by Mem0 to persist user history and preferences across sessions.

Vector search with Qdrant

Document chunks are embedded with VoyageAI and stored in Qdrant. At query time, the most relevant vectors are retrieved and passed to the LLM as context.

CLI interface

A full-featured terminal UI for ingesting documents, managing sessions, and chatting — no browser required.

REST API

An Elysia-powered HTTP server exposes ingestion, retrieval, and session management endpoints for programmatic access.

Local chat history

All conversation logs are stored in a local SQLite database. Your data stays on your system.

Services Quark relies on

Quark is assembled from several best-in-class services. You will need an account for each one to run the full system.

Service	Role
Groq (or any OpenAI-compatible provider)	LLM inference
VoyageAI	Text embeddings
Unstructured.io	Document partitioning (text + tables)
Qdrant	Vector database
Mem0	Long-term memory
Upstash Redis	Short-term session memory
ElasticLake	Object storage
Supabase	Relational database

Architecture at a glance

Quark follows a Modular RAG pattern. Ingestion of images and text is decoupled and re-synced at the metadata level, preserving higher contextual integrity than text-only pipelines. The dual-memory layer mimics human cognitive function by separating immediate recall (Redis) from historical knowledge (Mem0). When you ingest a document, Quark runs it through a multi-stage pipeline:

Parse — Unstructured.io splits text; pdfplumber extracts images.
Sync — A custom sync layer aligns text and image chunks by position.
Embed — VoyageAI converts each chunk into a dense vector.
Store — Vectors are upserted into Qdrant; objects go to ElasticLake.

When you ask a question, Quark:

Embeds your query with VoyageAI.
Retrieves the top matching chunks from Qdrant.
Hydrates the prompt with STM context from Redis and LTM context from Mem0.
Sends the grounded prompt to the LLM and streams the response.

Get Started

Architecture

Using Quark

Self-Hosting

Introduction

What Quark does

Key capabilities

Multimodal document ingestion

Dual-layer memory

Vector search with Qdrant

CLI interface

REST API

Local chat history

Services Quark relies on

Architecture at a glance

Build docs developers (and LLMs) love

Get Started

Architecture

Using Quark

Self-Hosting

​What Quark does

​Key capabilities

Multimodal document ingestion

Dual-layer memory

Vector search with Qdrant

CLI interface

REST API

Local chat history

​Services Quark relies on

​Architecture at a glance

Build docs developers (and LLMs) love

What Quark does

Key capabilities

Services Quark relies on

Architecture at a glance