DataloaderCatalog
Factory class for creating dataset loaders by name.Methods
create
Create a dataset loader instance.Dataset type name. Options: “triviaqa”, “arc”, “popqa”, “factscore”, “earnings_calls”
Dataset split to load (e.g., “train”, “test”, “validation”)
Optional limit on record count to load
Optional HuggingFace dataset ID override
Configured dataset loader instance ready to load data
supported_datasets
Return list of supported dataset identifiers.Tuple of supported dataset type names
LoadedDataset
Wrapper for normalized dataset records with conversion methods.Constructor
Identifier of the dataset (e.g., “triviaqa”, “arc”)
Normalized dataset records
Methods
records
Return normalized dataset records.List of normalized dataset records
to_dict_items
Convert normalized records to dictionary items.List of dictionary representations of records
to_haystack
Convert records to Haystack documents.List of Haystack Document objects ready for indexing
to_langchain
Convert records to LangChain documents.List of LangChain Document objects ready for indexing
evaluation_queries
Extract evaluation queries from records.Optional limit applied after deduplication
List of evaluation queries with ground truth answers for retrieval testing
BaseDatasetLoader
Base class for dataset loaders. All specific dataset loaders inherit from this class.Supported Datasets
- TriviaQALoader: TriviaQA question-answering dataset
- ARCLoader: AI2 Reasoning Challenge (ARC) dataset
- PopQALoader: PopQA popularity-based question-answering dataset
- FactScoreLoader: FactScore factual consistency dataset
- EarningsCallsLoader: Earnings calls transcripts dataset