Skip to main content
Parent document retrieval solves the chunk size tradeoff: small chunks match queries precisely, but lack surrounding context needed for good answers. This technique indexes small chunks for accurate retrieval, then returns their larger parent documents for generation.

The chunk size dilemma

When indexing documents, you face a trade-off:

Small chunks (128-256 tokens)

Pros:
  • Precise semantic matching
  • Low false positive rate
  • Better retrieval accuracy
Cons:
  • Missing surrounding context
  • Incomplete information
  • Poor for generation

Large chunks (512-1024 tokens)

Pros:
  • Rich context for generation
  • Complete information
  • Better for Q&A
Cons:
  • Noisy retrieval
  • Higher false positives
  • Worse precision
Parent document retrieval gives you the best of both worlds.

How it works

  1. Split documents into parent and child chunks
    • Parents: Large chunks (512-1024 tokens) with full context
    • Children: Small chunks (128-256 tokens) for precise matching
  2. Index only child chunks
    • Store child embeddings in vector database
    • Each child has metadata linking to its parent ID
  3. Store parent documents separately
    • ParentDocumentStore maintains parent text in memory
    • Maps child IDs to parent IDs and full parent text
  4. During search
    • Retrieve top-k child chunks from vector DB
    • Map child IDs to parent IDs
    • Return unique parent documents (deduplicated)

Basic usage

from vectordb.langchain.parent_document_retrieval.indexing import (
    PineconeParentDocumentRetrievalIndexingPipeline
)

pipeline = PineconeParentDocumentRetrievalIndexingPipeline(
    "configs/pinecone_parent_doc.yaml"
)

# Load and index documents
pipeline.load_dataset()
pipeline.index_documents()

# ParentDocumentStore is automatically saved to disk

Configuration

pinecone:
  api_key: ${PINECONE_API_KEY}
  index_name: parent-child-index
  namespace: default
  dimension: 384

embedding:
  provider: sentence_transformers
  model: all-MiniLM-L6-v2

parent_document:
  parent_chunk_size: 1000  # Tokens per parent document
  parent_overlap: 100      # Overlap between parents
  child_chunk_size: 200    # Tokens per child chunk
  child_overlap: 20        # Overlap between children

parent_store:
  cache_dir: ./cache       # Where to save parent store
  store_path: ./cache/parent_store.pkl  # For loading in search

llm:
  provider: groq
  model: llama-3.3-70b-versatile
  api_key: ${GROQ_API_KEY}

ParentDocumentStore

The parent store maintains chunk-to-parent mappings in memory:
from vectordb.langchain.parent_document_retrieval.parent_store import (
    ParentDocumentStore
)

# Initialize with persistence
store = ParentDocumentStore(cache_dir="./cache")

# Add a parent document
store.add_parent(
    parent_id="parent_1",
    parent_doc={
        "text": "Complete document text with full context...",
        "metadata": {"source": "article.txt", "author": "Jane"},
        "source_index": 0
    }
)

# Map child chunks to this parent
for i in range(5):
    store.add_chunk_mapping(
        chunk_id=f"chunk_{i}",
        parent_id="parent_1"
    )

# Retrieve parent from any child
parent = store.get_parent("chunk_2")
print(parent["text"])  # Full parent document

# Batch retrieval with deduplication
chunk_ids = ["chunk_1", "chunk_2", "chunk_3"]  # May share parents
parents = store.get_parents_for_chunks(chunk_ids)
print(f"Retrieved {len(parents)} unique parents")

# Save to disk for later use
store.save("parent_store.pkl")

# Load from disk during search
loaded_store = ParentDocumentStore.load("./cache/parent_store.pkl")

Indexing pipeline internals

Here’s how the LangChain indexing pipeline works:
from langchain.text_splitter import RecursiveCharacterTextSplitter
import uuid

class ParentDocumentIndexingPipeline:
    def __init__(self, config):
        # Initialize parent and child text splitters
        self.parent_splitter = RecursiveCharacterTextSplitter(
            chunk_size=config["parent_chunk_size"],
            chunk_overlap=config["parent_overlap"]
        )
        self.child_splitter = RecursiveCharacterTextSplitter(
            chunk_size=config["child_chunk_size"],
            chunk_overlap=config["child_overlap"]
        )
        
        # Initialize parent store
        cache_dir = config["parent_store"]["cache_dir"]
        self.parent_store = ParentDocumentStore(cache_dir=cache_dir)
    
    def index_documents(self, documents):
        all_child_chunks = []
        
        for doc_idx, document in enumerate(documents):
            # Split into parent chunks
            parent_chunks = self.parent_splitter.split_text(document.content)
            
            for parent_chunk in parent_chunks:
                # Generate unique parent ID
                parent_id = str(uuid.uuid4())
                
                # Store parent in ParentDocumentStore
                self.parent_store.add_parent(
                    parent_id=parent_id,
                    parent_doc={
                        "text": parent_chunk,
                        "metadata": document.meta,
                        "source_index": doc_idx
                    }
                )
                
                # Split parent into child chunks
                child_chunks = self.child_splitter.split_text(parent_chunk)
                
                for child_chunk in child_chunks:
                    # Generate unique child ID
                    child_id = str(uuid.uuid4())
                    
                    # Map child to parent
                    self.parent_store.add_chunk_mapping(child_id, parent_id)
                    
                    # Prepare child for indexing
                    all_child_chunks.append({
                        "id": child_id,
                        "content": child_chunk,
                        "metadata": {"parent_id": parent_id}
                    })
        
        # Embed and index only child chunks
        embeddings = self.embedder.embed_documents(all_child_chunks)
        self.vector_db.index(all_child_chunks, embeddings)
        
        # Save parent store to disk
        self.parent_store.save("parent_store.pkl")

Search pipeline internals

How search retrieves children but returns parents:
class ParentDocumentSearchPipeline:
    def __init__(self, config):
        # Load parent store from disk
        store_path = config["parent_store"]["store_path"]
        self.parent_store = ParentDocumentStore.load(store_path)
    
    def search(self, query, top_k=10):
        # Embed query
        query_embedding = self.embedder.embed_query(query)
        
        # Search for child chunks (retrieve 2x for deduplication)
        child_documents = self.vector_db.query(
            query_embedding=query_embedding,
            top_k=top_k * 2
        )
        
        # Extract child IDs from results
        chunk_ids = [
            doc.id if hasattr(doc, "id") else doc.metadata["id"]
            for doc in child_documents
        ]
        
        # Map child IDs to parent documents (deduplicated)
        parent_documents = self.parent_store.get_parents_for_chunks(
            chunk_ids
        )
        
        # Limit to requested top_k
        parent_documents = parent_documents[:top_k]
        
        return {"parent_documents": parent_documents, "query": query}

Why over-fetch children?

The search pipeline retrieves top_k * 2 children because:
  • Multiple children may belong to the same parent
  • After deduplication, you might have fewer than top_k unique parents
  • Over-fetching ensures you have enough unique parents
Example:
# Retrieve 10 children, might get:
children = [
    {"id": "c1", "parent_id": "p1"},
    {"id": "c2", "parent_id": "p1"},  # Same parent as c1
    {"id": "c3", "parent_id": "p2"},
    {"id": "c4", "parent_id": "p1"},  # Same parent again
    {"id": "c5", "parent_id": "p3"},
    # ...
]

# After deduplication: only 3 unique parents (p1, p2, p3)
# Over-fetching compensates for this

Chunk size recommendations

parent_chunk_size: 1000
parent_overlap: 100
child_chunk_size: 200
child_overlap: 20
Works well for articles, documentation, and general content.

Trade-offs

  • Child chunks are indexed in vector DB (normal storage)
  • Parent documents stored in ParentDocumentStore (in-memory pickle file)
  • Total storage: ~1.5-2x standard indexing
  • Mitigation: Parent store can be compressed or moved to Redis/database
  • ParentDocumentStore loads into memory during search
  • For 1M parents with 1KB text each: ~1GB RAM
  • Mitigation: Use database-backed parent store for production
  • Need to track which parents have been returned
  • Over-fetching required to ensure enough unique parents
  • Benefit: Handled automatically by ParentDocumentStore

Production considerations

1

Persist parent store

Save ParentDocumentStore to disk after indexing:
store.save("parent_store.pkl")
Load during search:
store = ParentDocumentStore.load("./cache/parent_store.pkl")
2

Scale parent storage

For large datasets, use a database instead of pickle:
# Store parents in PostgreSQL or Redis
# Implement custom ParentDocumentStore with DB backend
3

Monitor deduplication rate

Track how many children map to unique parents:
children_retrieved = 20
unique_parents = 8
dedup_rate = unique_parents / children_retrieved  # 0.4

# If dedup_rate < 0.5, increase over-fetch multiplier
4

Combine with RAG

Use parent documents for answer generation:
parent_docs = results["parent_documents"]
parent_texts = [p["text"] for p in parent_docs]
answer = llm.generate(query, context=parent_texts)

See also

Build docs developers (and LLMs) love