Vector search and re-ranking

Retrieval in Quark is a two-stage process. The first stage casts a wide net using approximate nearest-neighbour search. The second stage re-scores and re-orders those candidates using a cross-encoder re-ranker, which understands the relationship between the query and each chunk rather than comparing vector representations independently.

Stage 1 — vector similarity search

export const getRelevantContext = async (
  filters: Tags,
  collectionName: string,
  query: string,
  queryVector: number[],
  limit: number,
)

Query embedding

Before searching, the user’s question is embedded using VoyageAI with the Query input type. This is distinct from the Document input type used at ingestion time — VoyageAI optimizes the representation differently depending on whether the text is a query or a passage.

const queryVector = (await generateEmbedding(
  retrival.message,
  EmbedRequestInputType.Query,
)) as number[];

Collection configuration

The Qdrant collection is created with 1024-dimensional cosine similarity vectors:

await vector().createCollection(collectionName, {
  vectors: {
    size: 1024,
    distance: "Cosine",
  },
  optimizers_config: {
    default_segment_number: 2,
  },
});

Candidate retrieval

Qdrant returns up to VECTOR_LIMIT (15) candidates with their full payloads:

const results = await vector().search(collectionName, {
  vector: queryVector,
  limit: limit,       // VECTOR_LIMIT = 15
  with_payload: true,
});

Each result is mapped to a normalized structure before re-ranking:

{
  text: string,        // chunk text (including any visual analysis)
  score: number,       // raw cosine similarity score
  page: number,        // source page number
  isVisual: boolean,   // true for Image and Table chunks
  imageUrl: string | null,
}

Stage 2 — re-ranking

Re-ranking significantly improves precision compared to returning raw cosine similarity results. The cross-encoder model evaluates each (query, chunk) pair in context, whereas the bi-encoder similarity search compares independent vector representations. The result is a more accurate relevance ordering, especially for nuanced academic questions.

The candidate texts are passed to the VoyageAI re-ranker:

export const reRank = async (query: string, initalResult: string[]) => {
  const rerank = await embedding().rerank({
    query: query,
    documents: initalResult,
    model: "rerank-2",
    topK: 5,
  });
  return rerank;
}

The re-ranker returns up to 5 results, each with a relevanceScore. The original document objects are reconstructed from the re-ranked indices so that page, isVisual, and imageUrl are preserved:

const optimizedRes = reRankedRes.data.map((item: RerankResponseDataItem) => {
  const orgDoc = doc[item.index as number];
  return {
    ...orgDoc,
    score: item.relevanceScore,
  };
});

Similarity threshold

After re-ranking, Quark checks whether the top result meets the minimum quality bar before calling the LLM:

if (
  topCandidates.length === 0 ||
  (topCandidates[0].score ?? 0) < SIMILARITY_THRESHOLD
) {
  return {
    answer: "I could not find any relevant notes for your question.",
    sources: [],
  };
}

SIMILARITY_THRESHOLD is set to 0.2. If no candidate meets this threshold, Quark returns a “no relevant notes” response rather than passing low-quality context to the LLM. This prevents hallucinated answers when the document set does not contain relevant information.

Vector payload schema

Every point stored in Qdrant carries a structured payload that enables both retrieval and display:

Field	Type	Description
`text`	`string`	Chunk text, including any `[Visual Analysis]` annotation
`page_number`	`number`	Source page in the original document
`isVisual`	`boolean`	`true` for `Image` and `Table` element types
`imageUrl`	`string \| null`	Reserved for future S3 image URL storage
`institution`	`string`	From ingestion `Tags`
`mode`	`string`	From ingestion `Tags`
`courseName`	`string`	From ingestion `Tags`
`chunkIndex`	`number`	Position of the chunk within the document

Filtering searches

The filters parameter (a Tags object with institution and mode) is passed to getRelevantContext and can be used to scope a search to a subset of the collection — for example, to retrieve only chunks from a specific institution or study mode. This allows a single Qdrant collection to serve multiple tenants or course contexts without cross-contamination.

Constants reference

Constant	Value	Effect
`VECTOR_LIMIT`	`15`	Candidates fetched from Qdrant before re-ranking
`SIMILARITY_THRESHOLD`	`0.2`	Minimum re-ranked score required to generate a response
`mem0Limit`	`5`	Max long-term memory results fetched per query

Get Started

Architecture

Using Quark

Self-Hosting

Vector search and re-ranking

Stage 1 — vector similarity search

Query embedding

Collection configuration

Candidate retrieval

Stage 2 — re-ranking

Similarity threshold

Vector payload schema

Filtering searches

Constants reference

Build docs developers (and LLMs) love

Get Started

Architecture

Using Quark

Self-Hosting

​Stage 1 — vector similarity search

​Query embedding

​Collection configuration

​Candidate retrieval

​Stage 2 — re-ranking

​Similarity threshold

​Vector payload schema

​Filtering searches

​Constants reference

Build docs developers (and LLMs) love

Stage 1 — vector similarity search

Query embedding

Collection configuration

Candidate retrieval

Stage 2 — re-ranking

Similarity threshold

Vector payload schema

Filtering searches

Constants reference