Skip to main content

Overview

Retrievers return Document objects given a text query. They provide a standard interface for retrieving relevant documents from various sources.

BaseRetriever

Abstract base class for document retrieval systems. Source: langchain_core.retrievers:55 Inherits: RunnableSerializable[str, list[Document]]

Type Aliases

RetrieverInput = str
RetrieverOutput = list[Document]

Properties

tags
list[str] | None
default:"None"
Optional tags associated with the retriever for callbacks and tracing
metadata
dict[str, Any] | None
default:"None"
Optional metadata associated with the retriever for callbacks and tracing

Core Methods

invoke

def invoke(
    self,
    input: str,
    config: RunnableConfig | None = None,
    **kwargs: Any
) -> list[Document]
Retrieve documents relevant to a query.
input
str
required
The query string
config
RunnableConfig | None
Configuration for callbacks, tags, metadata
**kwargs
Any
Additional arguments passed to the retriever
return
list[Document]
List of relevant documents

ainvoke

async def ainvoke(
    self,
    input: str,
    config: RunnableConfig | None = None,
    **kwargs: Any
) -> list[Document]
Async version of invoke.

batch

def batch(
    self,
    inputs: list[str],
    config: RunnableConfig | list[RunnableConfig] | None = None,
    *,
    return_exceptions: bool = False,
    **kwargs: Any
) -> list[list[Document]]
Batch retrieve documents for multiple queries.
inputs
list[str]
required
List of query strings
return
list[list[Document]]
List of document lists, one per query

abatch

async def abatch(
    self,
    inputs: list[str],
    config: RunnableConfig | list[RunnableConfig] | None = None,
    *,
    return_exceptions: bool = False,
    **kwargs: Any
) -> list[list[Document]]
Async batch retrieval.

Implementation Methods

When subclassing BaseRetriever, implement these methods:

_get_relevant_documents (Required)

def _get_relevant_documents(
    self,
    query: str,
    *,
    run_manager: CallbackManagerForRetrieverRun | None = None
) -> list[Document]
Retrieve documents relevant to a query. Must be implemented by subclasses.
query
str
required
The query string
run_manager
CallbackManagerForRetrieverRun | None
Callback manager for the retriever run
return
list[Document]
List of relevant documents

_aget_relevant_documents (Optional)

async def _aget_relevant_documents(
    self,
    query: str,
    *,
    run_manager: AsyncCallbackManagerForRetrieverRun | None = None
) -> list[Document]
Async version of _get_relevant_documents. Override for native async support.

Example Implementation

from langchain_core.documents import Document
from langchain_core.retrievers import BaseRetriever

class SimpleRetriever(BaseRetriever):
    """Retriever that returns the first k documents from a list."""
    
    docs: list[Document]
    k: int = 5
    
    def _get_relevant_documents(
        self,
        query: str,
        *,
        run_manager: CallbackManagerForRetrieverRun | None = None
    ) -> list[Document]:
        """Return the first k documents."""
        return self.docs[:self.k]
    
    async def _aget_relevant_documents(
        self,
        query: str,
        *,
        run_manager: AsyncCallbackManagerForRetrieverRun | None = None
    ) -> list[Document]:
        """Async version."""
        return self.docs[:self.k]

# Usage
retriever = SimpleRetriever(
    docs=[Document(page_content="doc1"), Document(page_content="doc2")],
    k=1
)
results = retriever.invoke("query")

VectorStoreRetriever

Retriever that uses a vector store for similarity search. Source: langchain_core.vectorstores.base Inherits: BaseRetriever

Properties

vectorstore
VectorStore
required
The vector store to retrieve from
search_type
Literal['similarity', 'mmr', 'similarity_score_threshold']
default:"'similarity'"
Type of search to perform:
  • 'similarity': Standard similarity search
  • 'mmr': Maximum marginal relevance (diverse results)
  • 'similarity_score_threshold': Filter by minimum similarity score
search_kwargs
dict[str, Any]
default:"{}"
Additional keyword arguments for search. Common keys:
  • k: Number of documents to retrieve
  • score_threshold: Minimum similarity score (for 'similarity_score_threshold')
  • fetch_k: Number of documents to fetch before MMR (for 'mmr')
  • lambda_mult: Diversity parameter for MMR (for 'mmr')

Example

from langchain_core.vectorstores import VectorStore

# From a vector store
retriever = vectorstore.as_retriever(
    search_type="similarity",
    search_kwargs={"k": 5}
)

docs = retriever.invoke("What is LangChain?")

# MMR for diverse results
retriever = vectorstore.as_retriever(
    search_type="mmr",
    search_kwargs={"k": 5, "fetch_k": 20, "lambda_mult": 0.5}
)

# With score threshold
retriever = vectorstore.as_retriever(
    search_type="similarity_score_threshold",
    search_kwargs={"score_threshold": 0.8}
)

MultiQueryRetriever

Generate multiple queries and retrieve documents for each. Source: langchain.retrievers.multi_query Inherits: BaseRetriever Generates multiple perspectives on a query and retrieves documents for each, combining results.

Properties

retriever
BaseRetriever
required
The base retriever to use for each generated query
llm_chain
Runnable
required
LLM chain to generate alternative queries
parser_key
str
default:"'lines'"
Key to extract queries from LLM output

ContextualCompressionRetriever

Compress retrieved documents using a document compressor. Source: langchain.retrievers.contextual_compression Inherits: BaseRetriever

Properties

base_retriever
BaseRetriever
required
The base retriever to get initial documents
base_compressor
BaseDocumentCompressor
required
Compressor to filter or compress retrieved documents

ParentDocumentRetriever

Retrieve smaller chunks but return larger parent documents. Source: langchain.retrievers.parent_document_retriever Inherits: BaseRetriever Retrieves small chunks for better search, but returns full parent documents for context.

Properties

vectorstore
VectorStore
required
Vector store containing child chunks
docstore
BaseStore
required
Store containing parent documents
child_splitter
TextSplitter
required
Splitter to create child chunks
parent_splitter
TextSplitter | None
default:"None"
Optional splitter to create parent documents

EnsembleRetriever

Combine multiple retrievers using weighted reciprocal rank fusion. Source: langchain.retrievers.ensemble Inherits: BaseRetriever

Properties

retrievers
list[BaseRetriever]
required
List of retrievers to combine
weights
list[float]
required
Weights for each retriever (must sum to 1.0)
c
int
default:"60"
Constant for reciprocal rank fusion

RetrieverLike

Type alias for objects that can act as retrievers. Source: langchain_core.retrievers
RetrieverLike = Runnable[str, list[Document]]
Any Runnable that takes a string and returns documents can be used as a retriever.

LangSmithRetrieverParams

Parameters for LangSmith tracing of retrievers. Source: langchain_core.retrievers:39
class LangSmithRetrieverParams(TypedDict, total=False):
    ls_retriever_name: str
    ls_vector_store_provider: str | None
    ls_embedding_provider: str | None
    ls_embedding_model: str | None
ls_retriever_name
str
Name of the retriever for tracing
ls_vector_store_provider
str | None
Vector store provider name
ls_embedding_provider
str | None
Embedding provider name
ls_embedding_model
str | None
Embedding model name

Build docs developers (and LLMs) love