Parent document retrieval

Parent document retrieval solves the chunk size tradeoff: small chunks match queries precisely, but lack surrounding context needed for good answers. This technique indexes small chunks for accurate retrieval, then returns their larger parent documents for generation.

The chunk size dilemma

When indexing documents, you face a trade-off:

Small chunks (128-256 tokens)

Pros:

Precise semantic matching
Low false positive rate
Better retrieval accuracy

Cons:

Missing surrounding context
Incomplete information
Poor for generation

Large chunks (512-1024 tokens)

Pros:

Rich context for generation
Complete information
Better for Q&A

Cons:

Noisy retrieval
Higher false positives
Worse precision

Parent document retrieval gives you the best of both worlds.

How it works

Split documents into parent and child chunks
- Parents: Large chunks (512-1024 tokens) with full context
- Children: Small chunks (128-256 tokens) for precise matching
Index only child chunks
- Store child embeddings in vector database
- Each child has metadata linking to its parent ID
Store parent documents separately
- ParentDocumentStore maintains parent text in memory
- Maps child IDs to parent IDs and full parent text
During search
- Retrieve top-k child chunks from vector DB
- Map child IDs to parent IDs
- Return unique parent documents (deduplicated)

Basic usage

from vectordb.langchain.parent_document_retrieval.indexing import (
    PineconeParentDocumentRetrievalIndexingPipeline
)

pipeline = PineconeParentDocumentRetrievalIndexingPipeline(
    "configs/pinecone_parent_doc.yaml"
)

# Load and index documents
pipeline.load_dataset()
pipeline.index_documents()

# ParentDocumentStore is automatically saved to disk

Configuration

pinecone:
  api_key: ${PINECONE_API_KEY}
  index_name: parent-child-index
  namespace: default
  dimension: 384

embedding:
  provider: sentence_transformers
  model: all-MiniLM-L6-v2

parent_document:
  parent_chunk_size: 1000  # Tokens per parent document
  parent_overlap: 100      # Overlap between parents
  child_chunk_size: 200    # Tokens per child chunk
  child_overlap: 20        # Overlap between children

parent_store:
  cache_dir: ./cache       # Where to save parent store
  store_path: ./cache/parent_store.pkl  # For loading in search

llm:
  provider: groq
  model: llama-3.3-70b-versatile
  api_key: ${GROQ_API_KEY}

ParentDocumentStore

The parent store maintains chunk-to-parent mappings in memory:

from vectordb.langchain.parent_document_retrieval.parent_store import (
    ParentDocumentStore
)

# Initialize with persistence
store = ParentDocumentStore(cache_dir="./cache")

# Add a parent document
store.add_parent(
    parent_id="parent_1",
    parent_doc={
        "text": "Complete document text with full context...",
        "metadata": {"source": "article.txt", "author": "Jane"},
        "source_index": 0
    }
)

# Map child chunks to this parent
for i in range(5):
    store.add_chunk_mapping(
        chunk_id=f"chunk_{i}",
        parent_id="parent_1"
    )

# Retrieve parent from any child
parent = store.get_parent("chunk_2")
print(parent["text"])  # Full parent document

# Batch retrieval with deduplication
chunk_ids = ["chunk_1", "chunk_2", "chunk_3"]  # May share parents
parents = store.get_parents_for_chunks(chunk_ids)
print(f"Retrieved {len(parents)} unique parents")

# Save to disk for later use
store.save("parent_store.pkl")

# Load from disk during search
loaded_store = ParentDocumentStore.load("./cache/parent_store.pkl")

Indexing pipeline internals

Here’s how the LangChain indexing pipeline works:

from langchain.text_splitter import RecursiveCharacterTextSplitter
import uuid

class ParentDocumentIndexingPipeline:
    def __init__(self, config):
        # Initialize parent and child text splitters
        self.parent_splitter = RecursiveCharacterTextSplitter(
            chunk_size=config["parent_chunk_size"],
            chunk_overlap=config["parent_overlap"]
        )
        self.child_splitter = RecursiveCharacterTextSplitter(
            chunk_size=config["child_chunk_size"],
            chunk_overlap=config["child_overlap"]
        )
        
        # Initialize parent store
        cache_dir = config["parent_store"]["cache_dir"]
        self.parent_store = ParentDocumentStore(cache_dir=cache_dir)
    
    def index_documents(self, documents):
        all_child_chunks = []
        
        for doc_idx, document in enumerate(documents):
            # Split into parent chunks
            parent_chunks = self.parent_splitter.split_text(document.content)
            
            for parent_chunk in parent_chunks:
                # Generate unique parent ID
                parent_id = str(uuid.uuid4())
                
                # Store parent in ParentDocumentStore
                self.parent_store.add_parent(
                    parent_id=parent_id,
                    parent_doc={
                        "text": parent_chunk,
                        "metadata": document.meta,
                        "source_index": doc_idx
                    }
                )
                
                # Split parent into child chunks
                child_chunks = self.child_splitter.split_text(parent_chunk)
                
                for child_chunk in child_chunks:
                    # Generate unique child ID
                    child_id = str(uuid.uuid4())
                    
                    # Map child to parent
                    self.parent_store.add_chunk_mapping(child_id, parent_id)
                    
                    # Prepare child for indexing
                    all_child_chunks.append({
                        "id": child_id,
                        "content": child_chunk,
                        "metadata": {"parent_id": parent_id}
                    })
        
        # Embed and index only child chunks
        embeddings = self.embedder.embed_documents(all_child_chunks)
        self.vector_db.index(all_child_chunks, embeddings)
        
        # Save parent store to disk
        self.parent_store.save("parent_store.pkl")

Search pipeline internals

How search retrieves children but returns parents:

class ParentDocumentSearchPipeline:
    def __init__(self, config):
        # Load parent store from disk
        store_path = config["parent_store"]["store_path"]
        self.parent_store = ParentDocumentStore.load(store_path)
    
    def search(self, query, top_k=10):
        # Embed query
        query_embedding = self.embedder.embed_query(query)
        
        # Search for child chunks (retrieve 2x for deduplication)
        child_documents = self.vector_db.query(
            query_embedding=query_embedding,
            top_k=top_k * 2
        )
        
        # Extract child IDs from results
        chunk_ids = [
            doc.id if hasattr(doc, "id") else doc.metadata["id"]
            for doc in child_documents
        ]
        
        # Map child IDs to parent documents (deduplicated)
        parent_documents = self.parent_store.get_parents_for_chunks(
            chunk_ids
        )
        
        # Limit to requested top_k
        parent_documents = parent_documents[:top_k]
        
        return {"parent_documents": parent_documents, "query": query}

Why over-fetch children?

The search pipeline retrieves top_k * 2 children because:

Multiple children may belong to the same parent
After deduplication, you might have fewer than top_k unique parents
Over-fetching ensures you have enough unique parents

Example:

# Retrieve 10 children, might get:
children = [
    {"id": "c1", "parent_id": "p1"},
    {"id": "c2", "parent_id": "p1"},  # Same parent as c1
    {"id": "c3", "parent_id": "p2"},
    {"id": "c4", "parent_id": "p1"},  # Same parent again
    {"id": "c5", "parent_id": "p3"},
    # ...
]

# After deduplication: only 3 unique parents (p1, p2, p3)
# Over-fetching compensates for this

Chunk size recommendations

General purpose
Technical docs
Short-form content
Long-form content

parent_chunk_size: 1000
parent_overlap: 100
child_chunk_size: 200
child_overlap: 20

Works well for articles, documentation, and general content.

parent_chunk_size: 1500
parent_overlap: 150
child_chunk_size: 300
child_overlap: 30

Larger chunks preserve code blocks and technical context.

parent_chunk_size: 600
parent_overlap: 50
child_chunk_size: 150
child_overlap: 15

For tweets, chat messages, or brief articles.

parent_chunk_size: 2000
parent_overlap: 200
child_chunk_size: 400
child_overlap: 40

For books, research papers, or extensive reports.

Trade-offs

Storage overhead

Child chunks are indexed in vector DB (normal storage)
Parent documents stored in ParentDocumentStore (in-memory pickle file)
Total storage: ~1.5-2x standard indexing
Mitigation: Parent store can be compressed or moved to Redis/database

Memory usage

ParentDocumentStore loads into memory during search
For 1M parents with 1KB text each: ~1GB RAM
Mitigation: Use database-backed parent store for production

Deduplication complexity

Need to track which parents have been returned
Over-fetching required to ensure enough unique parents
Benefit: Handled automatically by ParentDocumentStore

Production considerations

Persist parent store

Save ParentDocumentStore to disk after indexing:

store.save("parent_store.pkl")

Load during search:

store = ParentDocumentStore.load("./cache/parent_store.pkl")

Scale parent storage

For large datasets, use a database instead of pickle:

# Store parents in PostgreSQL or Redis
# Implement custom ParentDocumentStore with DB backend

Monitor deduplication rate

Track how many children map to unique parents:

children_retrieved = 20
unique_parents = 8
dedup_rate = unique_parents / children_retrieved  # 0.4

# If dedup_rate < 0.5, increase over-fetch multiplier

Combine with RAG

Use parent documents for answer generation:

parent_docs = results["parent_documents"]
parent_texts = [p["text"] for p in parent_docs]
answer = llm.generate(query, context=parent_texts)

Getting Started

Core Concepts

Vector Databases

Retrieval Features

Advanced RAG

Data Management

Parent document retrieval

The chunk size dilemma

Small chunks (128-256 tokens)

Large chunks (512-1024 tokens)

How it works

Basic usage

Configuration

ParentDocumentStore

Indexing pipeline internals

Search pipeline internals

Why over-fetch children?

Chunk size recommendations

Trade-offs

Production considerations

See also

Build docs developers (and LLMs) love

Getting Started

Core Concepts

Vector Databases

Retrieval Features

Advanced RAG

Data Management

​The chunk size dilemma

Small chunks (128-256 tokens)

Large chunks (512-1024 tokens)

​How it works

​Basic usage

​Configuration

​ParentDocumentStore

​Indexing pipeline internals

​Search pipeline internals

​Why over-fetch children?

​Chunk size recommendations

​Trade-offs

​Production considerations

​See also

Build docs developers (and LLMs) love

The chunk size dilemma

How it works

Basic usage

Configuration

ParentDocumentStore

Indexing pipeline internals

Search pipeline internals

Why over-fetch children?

Chunk size recommendations

Trade-offs

Production considerations

See also