Skip to main content
Quark is a high-performance RAG (Retrieval-Augmented Generation) system built around a modular pipeline that separates concerns cleanly: ingesting documents is a distinct, asynchronous process from answering questions about them. This separation lets each stage be optimized independently and makes the system easy to extend. Two main paths define how data moves through Quark: the ingestion path and the retrieval path.

Ingestion path

When you upload a document, Quark runs a multi-stage pipeline to extract, enrich, and store all of its content — including images and tables — as searchable vector embeddings.
1

Text partitioning

Unstructured.io splits the document into structured text elements using its HiRes strategy. Chunks are capped at 1,500 characters and split by title boundaries.
2

Image extraction

In parallel, a pdfplumber Python worker (vision-worker.py) scans each page and returns a map of base64-encoded images keyed by page number.
3

Sync layer

The visionMaker function merges the text elements and the extracted images at the metadata level, matching images to their source pages. This preserves contextual integrity: every image stays anchored to the text that surrounds it in the original document.
4

Vision analysis

A vision LLM describes each image or table. Diagrams receive the “expert academic illustrator” prompt; tables receive the “professional data analyst” prompt.
5

Embedding and storage

Text and visual descriptions are embedded in batches of 12 using VoyageAI and upserted into a Qdrant collection alongside structured metadata tags.

Retrieval path

When you ask a question, Quark assembles context from three sources — vector search, short-term memory, and long-term memory — before calling the LLM.
1

Query embedding

Your question is embedded using VoyageAI voyage-4-large with the Query input type.
2

Vector search

Qdrant returns up to 15 candidate chunks using cosine similarity over 1024-dimensional vectors.
3

Re-ranking

VoyageAI re-ranks the candidates by relevance score. If the top result scores below 0.2, Quark returns a “no relevant notes found” message rather than hallucinating an answer.
4

Dual-memory context assembly

Short-term memory (Redis) and long-term memory (Mem0) are fetched and prepended to the vector search results to form the full context window.
5

LLM response

The assembled context and your question are sent to the LLM with a tutor-style prompt scoped to your institution.

Interfaces

Quark exposes two interfaces:
  • CLI — run npm run cli to start an interactive terminal session and chat with your documents immediately.
  • REST API — a programmatic interface for integrating Quark into other applications.

Key design decision

By decoupling the ingestion of images and text and re-syncing them at the metadata level, Quark maintains higher contextual integrity than standard text-only pipelines.
Standard RAG pipelines discard images or treat them as opaque blobs. Quark instead extracts images independently with pdfplumber and then re-merges them with their corresponding text blocks via page-level metadata alignment. This means visual content — diagrams, charts, tables — is embedded alongside the text it annotates, rather than in isolation or not at all.

Explore the architecture

Ingestion pipeline

Step-by-step walkthrough of how documents are partitioned, enriched with vision analysis, and stored in Qdrant.

Memory system

How Redis STM and Mem0 LTM work together to give Quark persistent, session-aware context.

Vector search and re-ranking

Two-stage retrieval: Qdrant similarity search followed by VoyageAI re-ranking.

Build docs developers (and LLMs) love