Entry point
tags scopes every chunk produced from this document so you can later filter searches to a specific institution, course, or study mode.
Tags type
IngestionResult type
visualChunks counts the elements that received a visual_description in their metadata — a useful signal for verifying that image extraction and vision analysis ran correctly.
Pipeline stages
partitionDocument — text extraction
Calls Unstructured.io to split the document into structured elements. The call runs in parallel with image extraction to reduce overall ingestion time.Key parameters passed to Unstructured.io:
| Parameter | Value | Effect |
|---|---|---|
strategy | HiRes | Highest-fidelity layout analysis |
chunkingStrategy | by_title | Splits on heading boundaries |
maxCharacters | 1500 | Hard cap per chunk (MAXCHAR) |
extractImageBlockTypes | ["Image", "Table", "Figure", "Graphic"] | Captures visual elements inline |
pdfInferTableStructure | true | Reconstructs table HTML |
splitPdfConcurrencyLevel | 15 | Parallel PDF page processing |
getLocalImages — image extraction
Runs in parallel with The return type is a dictionary where each key is a page number and each value is an ordered list of base64 image strings found on that page. The Python process runs inside the project’s
partitionDocument. Spawns a pdfplumber Python worker via vision-bridge.ts that scans every page and returns a map of base64-encoded images.venv so it is isolated from the Node.js runtime.visionMaker — sync layer
Merges the text elements from Unstructured.io and the images from pdfplumber at the metadata level.The merge strategy:
- For each
CompositeElement(a chunked text block), iflocalImageshas images for that page, the first available image is attached to the element’smetadata.image_base64field. - Any images that remain after the merge (pages where Unstructured.io produced no text block, or pages with more images than text blocks) are appended as new
Imageelements with a syntheticelement_id. - All elements are sorted by
page_numberbefore being returned.
describeVisualElements — vision analysis
Iterates over every element and, for those with an attached image or HTML table, calls a vision LLM to produce a rich text description. Concurrency is limited to 3 parallel calls (Two prompt types are used, selected by element type:For HTML tables (those with a
pLimit(3)) to respect API rate limits.- Diagram prompt (Image / Figure)
- Table prompt
text_as_html field), Quark converts the HTML to Markdown using marked instead of calling the vision LLM, since the structured data is already machine-readable.After analysis, the description is appended to the element’s text field as [Visual Analysis]: <description> and stored in metadata.visual_description (truncated to 500 characters). The raw base64 is cleared from the stored payload.processMetadata — embedding and storage
Generates embeddings for all enriched elements and upserts them into Qdrant in batches.Batching behaviour:
- Elements are processed in batches of 12 (
BATCHSIZE = 12). - Between batches, the pipeline sleeps for 21 seconds to respect VoyageAI’s rate limits.
- Each batch is embedded with
EmbedRequestInputType.Documentand upserted to Qdrant in a singlewait: truecall.
Constants reference
| Constant | Value | Effect |
|---|---|---|
MAXCHAR | 1500 | Maximum characters per text chunk |
BATCHSIZE | 12 | Chunks embedded and upserted per batch |
limit | pLimit(3) | Max concurrent vision LLM calls |
TTL_SECONDS | 1800 | Session TTL in Redis (see memory system) |
The pipeline runs
partitionDocument and getLocalImages in parallel using Promise.all. On large PDFs this can cut ingestion time significantly compared to running them sequentially.