Retrieval in Quark is a two-stage process. The first stage casts a wide net using approximate nearest-neighbour search. The second stage re-scores and re-orders those candidates using a cross-encoder re-ranker, which understands the relationship between the query and each chunk rather than comparing vector representations independently.
Stage 1 — vector similarity search
export const getRelevantContext = async (
filters: Tags,
collectionName: string,
query: string,
queryVector: number[],
limit: number,
)
Query embedding
Before searching, the user’s question is embedded using VoyageAI with the Query input type. This is distinct from the Document input type used at ingestion time — VoyageAI optimizes the representation differently depending on whether the text is a query or a passage.
const queryVector = (await generateEmbedding(
retrival.message,
EmbedRequestInputType.Query,
)) as number[];
Collection configuration
The Qdrant collection is created with 1024-dimensional cosine similarity vectors:
await vector().createCollection(collectionName, {
vectors: {
size: 1024,
distance: "Cosine",
},
optimizers_config: {
default_segment_number: 2,
},
});
Candidate retrieval
Qdrant returns up to VECTOR_LIMIT (15) candidates with their full payloads:
const results = await vector().search(collectionName, {
vector: queryVector,
limit: limit, // VECTOR_LIMIT = 15
with_payload: true,
});
Each result is mapped to a normalized structure before re-ranking:
{
text: string, // chunk text (including any visual analysis)
score: number, // raw cosine similarity score
page: number, // source page number
isVisual: boolean, // true for Image and Table chunks
imageUrl: string | null,
}
Stage 2 — re-ranking
Re-ranking significantly improves precision compared to returning raw cosine similarity results. The cross-encoder model evaluates each (query, chunk) pair in context, whereas the bi-encoder similarity search compares independent vector representations. The result is a more accurate relevance ordering, especially for nuanced academic questions.
The candidate texts are passed to the VoyageAI re-ranker:
export const reRank = async (query: string, initalResult: string[]) => {
const rerank = await embedding().rerank({
query: query,
documents: initalResult,
model: "rerank-2",
topK: 5,
});
return rerank;
}
The re-ranker returns up to 5 results, each with a relevanceScore. The original document objects are reconstructed from the re-ranked indices so that page, isVisual, and imageUrl are preserved:
const optimizedRes = reRankedRes.data.map((item: RerankResponseDataItem) => {
const orgDoc = doc[item.index as number];
return {
...orgDoc,
score: item.relevanceScore,
};
});
Similarity threshold
After re-ranking, Quark checks whether the top result meets the minimum quality bar before calling the LLM:
if (
topCandidates.length === 0 ||
(topCandidates[0].score ?? 0) < SIMILARITY_THRESHOLD
) {
return {
answer: "I could not find any relevant notes for your question.",
sources: [],
};
}
SIMILARITY_THRESHOLD is set to 0.2. If no candidate meets this threshold, Quark returns a “no relevant notes” response rather than passing low-quality context to the LLM. This prevents hallucinated answers when the document set does not contain relevant information.
Vector payload schema
Every point stored in Qdrant carries a structured payload that enables both retrieval and display:
| Field | Type | Description |
|---|
text | string | Chunk text, including any [Visual Analysis] annotation |
page_number | number | Source page in the original document |
isVisual | boolean | true for Image and Table element types |
imageUrl | string | null | Reserved for future S3 image URL storage |
institution | string | From ingestion Tags |
mode | string | From ingestion Tags |
courseName | string | From ingestion Tags |
chunkIndex | number | Position of the chunk within the document |
Filtering searches
The filters parameter (a Tags object with institution and mode) is passed to getRelevantContext and can be used to scope a search to a subset of the collection — for example, to retrieve only chunks from a specific institution or study mode. This allows a single Qdrant collection to serve multiple tenants or course contexts without cross-contamination.
Constants reference
| Constant | Value | Effect |
|---|
VECTOR_LIMIT | 15 | Candidates fetched from Qdrant before re-ranking |
SIMILARITY_THRESHOLD | 0.2 | Minimum re-ranked score required to generate a response |
mem0Limit | 5 | Max long-term memory results fetched per query |