Skip to main content
Retrieval in Quark is a two-stage process. The first stage casts a wide net using approximate nearest-neighbour search. The second stage re-scores and re-orders those candidates using a cross-encoder re-ranker, which understands the relationship between the query and each chunk rather than comparing vector representations independently.
export const getRelevantContext = async (
  filters: Tags,
  collectionName: string,
  query: string,
  queryVector: number[],
  limit: number,
)

Query embedding

Before searching, the user’s question is embedded using VoyageAI with the Query input type. This is distinct from the Document input type used at ingestion time — VoyageAI optimizes the representation differently depending on whether the text is a query or a passage.
const queryVector = (await generateEmbedding(
  retrival.message,
  EmbedRequestInputType.Query,
)) as number[];

Collection configuration

The Qdrant collection is created with 1024-dimensional cosine similarity vectors:
await vector().createCollection(collectionName, {
  vectors: {
    size: 1024,
    distance: "Cosine",
  },
  optimizers_config: {
    default_segment_number: 2,
  },
});

Candidate retrieval

Qdrant returns up to VECTOR_LIMIT (15) candidates with their full payloads:
const results = await vector().search(collectionName, {
  vector: queryVector,
  limit: limit,       // VECTOR_LIMIT = 15
  with_payload: true,
});
Each result is mapped to a normalized structure before re-ranking:
{
  text: string,        // chunk text (including any visual analysis)
  score: number,       // raw cosine similarity score
  page: number,        // source page number
  isVisual: boolean,   // true for Image and Table chunks
  imageUrl: string | null,
}

Stage 2 — re-ranking

Re-ranking significantly improves precision compared to returning raw cosine similarity results. The cross-encoder model evaluates each (query, chunk) pair in context, whereas the bi-encoder similarity search compares independent vector representations. The result is a more accurate relevance ordering, especially for nuanced academic questions.
The candidate texts are passed to the VoyageAI re-ranker:
export const reRank = async (query: string, initalResult: string[]) => {
  const rerank = await embedding().rerank({
    query: query,
    documents: initalResult,
    model: "rerank-2",
    topK: 5,
  });
  return rerank;
}
The re-ranker returns up to 5 results, each with a relevanceScore. The original document objects are reconstructed from the re-ranked indices so that page, isVisual, and imageUrl are preserved:
const optimizedRes = reRankedRes.data.map((item: RerankResponseDataItem) => {
  const orgDoc = doc[item.index as number];
  return {
    ...orgDoc,
    score: item.relevanceScore,
  };
});

Similarity threshold

After re-ranking, Quark checks whether the top result meets the minimum quality bar before calling the LLM:
if (
  topCandidates.length === 0 ||
  (topCandidates[0].score ?? 0) < SIMILARITY_THRESHOLD
) {
  return {
    answer: "I could not find any relevant notes for your question.",
    sources: [],
  };
}
SIMILARITY_THRESHOLD is set to 0.2. If no candidate meets this threshold, Quark returns a “no relevant notes” response rather than passing low-quality context to the LLM. This prevents hallucinated answers when the document set does not contain relevant information.

Vector payload schema

Every point stored in Qdrant carries a structured payload that enables both retrieval and display:
FieldTypeDescription
textstringChunk text, including any [Visual Analysis] annotation
page_numbernumberSource page in the original document
isVisualbooleantrue for Image and Table element types
imageUrlstring | nullReserved for future S3 image URL storage
institutionstringFrom ingestion Tags
modestringFrom ingestion Tags
courseNamestringFrom ingestion Tags
chunkIndexnumberPosition of the chunk within the document

Filtering searches

The filters parameter (a Tags object with institution and mode) is passed to getRelevantContext and can be used to scope a search to a subset of the collection — for example, to retrieve only chunks from a specific institution or study mode. This allows a single Qdrant collection to serve multiple tenants or course contexts without cross-contamination.

Constants reference

ConstantValueEffect
VECTOR_LIMIT15Candidates fetched from Qdrant before re-ranking
SIMILARITY_THRESHOLD0.2Minimum re-ranked score required to generate a response
mem0Limit5Max long-term memory results fetched per query

Build docs developers (and LLMs) love