The problem HyDE solves
Standard RAG embeds the raw query and looks for the nearest document vectors. This works when the query and the document are phrased similarly. It fails when they’re not. Consider a creator asking:“what content works best for Tuesday posts?”The relevant indexed documents might describe post performance data, engagement metrics by day, or content scheduling analysis — none of which use the phrase “works best for Tuesday posts” directly. The semantic distance between the query embedding and the document embeddings is large enough that straightforward vector search returns poor results.
This is a widespread RAG failure mode. Queries are short and conversational. Documents are dense and informational. They embed differently even when they’re about the same thing.
How HyDE works
Instead of embedding the raw query, HyDE generates a hypothetical ideal answer and embeds that:Receive the creator query
The agent receives the raw query:
"what content works best for Tuesday posts?"Generate a hypothetical ideal document
The agent generates a short hypothetical answer that looks like a real retrieved document — something like:
“Tuesday posts perform best with mid-length video content (30–60s). Fitness and lifestyle topics see 2.1x average engagement versus Sunday. Peak engagement window is 6–9 PM local time. Carousel posts outperform single images by 34% on Tuesdays based on creator analytics..”This hypothetical document doesn’t need to be factually accurate — it just needs to look like a real answer would look.
Embed the hypothetical document
The embedding model encodes the hypothetical document rather than the original query. The resulting vector sits much closer in embedding space to actual indexed documents about content performance and scheduling patterns.
Retrieve against the real index
The hypothetical document embedding is used to query the vector store. Because the hypothetical and the real documents share structural and semantic similarity, retrieval precision improves substantially.
Pass to RRF fusion
The dense retrieval hits from HyDE are passed to the RRF fusion stage where they are merged with BM25 sparse results.
Why this matters for creator-specific questions
Creator queries are often highly domain-specific. They use platform slang, reference specific content formats, or ask about patterns that only exist in the creator economy context. The indexed documents — analytics reports, platform policy summaries, content performance data — are written in a different register. HyDE neutralizes this by translating the query into the same register as the answer before embedding. The agent acts as its own query rewriter.Example walkthrough
| Stage | Content |
|---|---|
| Original query | "what content works best for Tuesday posts?" |
| Hypothetical document | "Tuesday content performance: fitness (2.1x), lifestyle (1.8x), peak 6-9 PM. Carousels +34% vs single image. 30-60s video optimal. Avoid text-heavy posts." |
| Embedding target | The hypothetical document (not the original query) |
| Retrieved documents | Actual analytics summaries, scheduling guides, and content performance records that structurally match the hypothetical |
Implementation location
HyDE is implemented as part of the retrieval pipeline inmemory/retrieval/. The hypothetical document generation uses the local Ollama LLM (same model as the active agent) so no external API calls are required.
HyDE runs entirely on-device via Ollama. No query text or hypothetical document content leaves the server. This preserves the sovereign AI principle: all inference is local, no data reaches external APIs.