Ingestion path
When you upload a document, Quark runs a multi-stage pipeline to extract, enrich, and store all of its content — including images and tables — as searchable vector embeddings.Text partitioning
Unstructured.io splits the document into structured text elements using its
HiRes strategy. Chunks are capped at 1,500 characters and split by title boundaries.Image extraction
In parallel, a pdfplumber Python worker (
vision-worker.py) scans each page and returns a map of base64-encoded images keyed by page number.Sync layer
The
visionMaker function merges the text elements and the extracted images at the metadata level, matching images to their source pages. This preserves contextual integrity: every image stays anchored to the text that surrounds it in the original document.Vision analysis
A vision LLM describes each image or table. Diagrams receive the “expert academic illustrator” prompt; tables receive the “professional data analyst” prompt.
Retrieval path
When you ask a question, Quark assembles context from three sources — vector search, short-term memory, and long-term memory — before calling the LLM.Vector search
Qdrant returns up to 15 candidate chunks using cosine similarity over 1024-dimensional vectors.
Re-ranking
VoyageAI re-ranks the candidates by relevance score. If the top result scores below 0.2, Quark returns a “no relevant notes found” message rather than hallucinating an answer.
Dual-memory context assembly
Short-term memory (Redis) and long-term memory (Mem0) are fetched and prepended to the vector search results to form the full context window.
Interfaces
Quark exposes two interfaces:- CLI — run
npm run clito start an interactive terminal session and chat with your documents immediately. - REST API — a programmatic interface for integrating Quark into other applications.
Key design decision
By decoupling the ingestion of images and text and re-syncing them at the metadata level, Quark maintains higher contextual integrity than standard text-only pipelines.Standard RAG pipelines discard images or treat them as opaque blobs. Quark instead extracts images independently with pdfplumber and then re-merges them with their corresponding text blocks via page-level metadata alignment. This means visual content — diagrams, charts, tables — is embedded alongside the text it annotates, rather than in isolation or not at all.
Explore the architecture
Ingestion pipeline
Step-by-step walkthrough of how documents are partitioned, enriched with vision analysis, and stored in Qdrant.
Memory system
How Redis STM and Mem0 LTM work together to give Quark persistent, session-aware context.
Vector search and re-ranking
Two-stage retrieval: Qdrant similarity search followed by VoyageAI re-ranking.