Skip to main content
Sparse search uses SPLADE models to create sparse vectors that emphasize specific terms, similar to traditional BM25 but with learned term importance. This excels when exact terminology matters—legal documents, product SKUs, or technical specifications.

How it works

Sparse search creates term-focused embeddings where non-zero dimensions correspond to important keywords, enabling precise lexical matching.

Search process

  1. Sparse embedding - Convert query text to sparse vector using SPLADE or BM25
  2. Keyword matching - Match based on term importance and frequency
  3. Result ranking - Rank documents by sparse similarity scores
  4. Optional RAG - Generate answer using retrieved documents

SPLADE vs BM25

SPLADE (Sparse Lexical and Expansion)
  • Neural sparse encoder with learned term importance
  • Expands query with related terms
  • Better generalization than traditional BM25
  • Requires model inference
BM25 (Best Matching 25)
  • Classic TF-IDF based ranking function
  • Fast, no model required
  • Purely statistical term weighting
  • Native support in Weaviate

Key features

  • Supports SPLADE-based or BM25-style sparse encoders
  • Weaviate uses native BM25 without external embeddings
  • Works alongside dense search or as standalone retrieval method
  • Excels at exact terminology and keyword precision

Implementation

from vectordb.langchain.sparse_indexing import PineconeSparseSearchPipeline

pipeline = PineconeSparseSearchPipeline("config.yaml")
results = pipeline.search(
    query="HIPAA compliance requirements",
    top_k=10,
    filters={"category": "legal"},
)

for doc in results["documents"]:
    print(doc.page_content[:100])

Configuration

Required settings

pinecone.api_key
string
required
Pinecone API authentication key
pinecone.index_name
string
required
Target index name for sparse search

Optional settings

pinecone.namespace
string
Namespace within the index
pinecone.dimension
integer
default:384
Dimension for placeholder dense vector (sparse search only)
sparse.model
string
Sparse encoder model (e.g., SPLADE)

Example configuration

pinecone:
  api_key: "${PINECONE_API_KEY}"
  index_name: "sparse-search"
  namespace: "default"
  dimension: 384

sparse:
  enabled: true
  model: "splade-cocondenser-ensembledistil"

Search parameters

query
string
required
Search query text to encode with sparse embedder
top_k
integer
default:10
Number of results to return
filters
dict
Optional metadata filters for pre-filtering
Sparse search is ideal when: Exact terminology matters
  • Legal documents with specific clauses
  • Medical records with precise diagnoses
  • Technical documentation with exact API names
  • Product catalogs with SKUs or model numbers
Query contains specific identifiers
  • Document IDs or reference numbers
  • Proper nouns and acronyms
  • Version numbers or dates
  • Industry-specific jargon

Example scenarios

# Legal document search
results = pipeline.search(
    query="GDPR Article 17 right to erasure",
    top_k=5,
    filters={"jurisdiction": "EU"},
)
# Sparse search excels at matching exact article numbers

# Product SKU lookup
results = pipeline.search(
    query="laptop model XPS-9520",
    top_k=3,
)
# Exact match on model number "XPS-9520"

# Technical API search
results = pipeline.search(
    query="boto3 s3.upload_file method",
    top_k=5,
)
# Precise matching on "boto3" and "upload_file"

Sparse vector representation

Sparse embeddings are dictionaries mapping token indices to weights:
# Example sparse embedding
{
  "indices": [42, 156, 892, 1024, 2048],
  "values": [0.95, 0.78, 0.62, 0.43, 0.31]
}
Only non-zero dimensions are stored, making sparse vectors memory-efficient despite high dimensionality (often 30k+ dimensions).

Database-specific implementations

Pinecone

Uses sparse_values field alongside placeholder dense vectors. Requires explicit sparse embedding generation.
documents = pipeline.db.query_with_sparse(
    vector=[0.0] * dimension,  # Placeholder dense
    sparse_vector=query_embedding,
    top_k=top_k,
    filter=filters,
    namespace=namespace,
)

Weaviate

Native BM25 support without external sparse embeddings. Uses built-in keyword search:
results = collection.query.bm25(
    query="search terms",
    limit=top_k,
    where=filters,
)

Qdrant

Supports sparse vectors with payload-based filtering and optimized indexing.

Milvus

Sparse vector fields with partition-key isolation for multi-tenant scenarios.

Chroma

Flexible sparse search with document and collection scoping.
For best results, use sparse search as part of a hybrid retrieval strategy. Combine with dense semantic search for both conceptual understanding and keyword precision.
See Hybrid search for fusion strategies.

Performance considerations

  • Sparse search is typically faster than dense search (fewer dimensions to compare)
  • BM25 requires no model inference, just term statistics
  • SPLADE models add inference latency but provide better generalization
  • Memory footprint is low due to sparse representation

Evaluation metrics

Sparse search performance metrics:
  • Precision@k - Fraction of retrieved docs that are relevant
  • Recall@k - Fraction of relevant docs that are retrieved
  • MRR - Mean reciprocal rank of first relevant result
  • NDCG@k - Normalized discounted cumulative gain

Hybrid search

Combine dense and sparse retrieval

Semantic search

Dense vector similarity search

Reranking

Improve results with cross-encoders

Metadata filtering

Structured filtering on document fields

Build docs developers (and LLMs) love