Skip to main content
Semantic search converts text into high-dimensional vectors using transformer embedding models, then finds documents with similar vector representations. This approach understands synonyms, paraphrases, and conceptual similarity—queries like “car” will match documents about “automobile” even without exact keyword overlap.

How it works

Semantic search uses dense vector embeddings to find documents with similar meaning to the query, enabling conceptual matching beyond keyword overlap.

Search process

  1. Query embedding - Convert query text to dense vector using embedding model
  2. Vector search - Find nearest neighbors in vector database index
  3. Result retrieval - Return top-k most similar documents
  4. Optional RAG - Generate answer using retrieved documents as context
Traditional keyword search fails when:
  • Query uses synonyms (“automobile” vs “car”)
  • Documents use different terminology for same concepts
  • User doesn’t know exact terms used in documents
Semantic search handles these by encoding meaning, not just keywords.

Key features

  • Supports any SentenceTransformers model
  • Optional semantic diversification removes near-duplicate results
  • Integrates with Groq or OpenAI for RAG answer generation
  • Metadata filters narrow results by category, date, source, or custom fields

Implementation

from vectordb.haystack.semantic_search import PineconeSemanticSearchPipeline

pipeline = PineconeSemanticSearchPipeline(
    "src/vectordb/haystack/semantic_search/configs/pinecone/arc.yaml"
)
result = pipeline.search("What is photosynthesis?", top_k=5)

for doc in result["documents"]:
    print(doc.content)

Configuration

Required settings

pinecone.api_key
string
required
Pinecone API authentication key
pinecone.index_name
string
required
Target index name for search

Optional settings

pinecone.namespace
string
Document organization namespace for logical partitioning
embedder.model_name
string
default:"all-MiniLM-L6-v2"
Embedding model configuration (must match indexing model)
rag
object
Optional LLM configuration for answer generation

Example configuration

pinecone:
  api_key: "${PINECONE_API_KEY}"
  index_name: "semantic-search"
  namespace: "production"
  metric: "cosine"

embeddings:
  model: "sentence-transformers/all-MiniLM-L6-v2"

rag:
  enabled: true
  generator_model: "gpt-4o-mini"

Search parameters

query
string
required
Search query text to embed and match against documents
top_k
integer
default:10
Number of results to return
filters
dict
Metadata filters for pre-filtering (e.g., {"category": "tech"})
namespace
string
Isolated document collection within index

Embedding consistency

Critical: Query and documents must use the SAME embedding model. Mixing models (e.g., indexing with MiniLM, querying with OpenAI) will produce nonsensical results as vectors exist in different spaces.
Always ensure:
  • Query embedder matches document embedder
  • Same model version is used
  • Consistent preprocessing (normalization, truncation)

RAG integration

If an LLM is configured, the pipeline can generate answers using retrieved documents as context. This combines retrieval accuracy with generation fluency for question-answering applications.
results = pipeline.search(
    query="renewable energy technologies",
    top_k=5,
    filters={"category": "science"},
)

print(f"Query: {results['query']}")
for doc in results["documents"]:
    print(f"Score: {doc.score:.3f} - {doc.content[:80]}...")
if "answer" in results:
    print(f"RAG Answer: {results['answer']}")

Database support

Semantic search is available across all supported vector databases:
  • Pinecone - Managed vector database with namespaces
  • Weaviate - Open-source vector search with collections
  • Qdrant - High-performance search with payload filtering
  • Milvus - Scalable vector database with partition-key isolation
  • Chroma - Lightweight vector store for local development

Hybrid search

Combine dense and sparse retrieval with fusion

Reranking

Cross-encoder second-stage scoring

Diversity filtering

Post-retrieval redundancy reduction

MMR

Maximal marginal relevance for diversity

Build docs developers (and LLMs) love