What are Document Loaders?
Document Loaders extract content from different file types and sources, converting them intoDocument objects that contain:
- pageContent: The extracted text content
- metadata: Information about the source (filename, page number, URL, etc.)
Available Document Loaders
Flowise supports a wide variety of document loaders across different categories:File-Based Loaders
PDF File
Load and extract text from PDF documents
Text File
Load text-based files including .txt, .md, .html, and code files
File Loader
Generic loader supporting multiple file types (PDF, DOCX, CSV, JSON, etc.)
CSV
Load and parse CSV files into documents
Web-Based Loaders
Cheerio Web Scraper
Scrape and extract content from web pages using CSS selectors
Puppeteer Web Scraper
Advanced web scraping with JavaScript execution support
Playwright
Cross-browser web scraping with full page rendering
FireCrawl
Modern web scraping and crawling service
Cloud Storage Loaders
Google Drive
Load files from Google Drive
S3 File
Load files from Amazon S3 buckets
S3 Directory
Load multiple files from S3 directories
Productivity Tool Loaders
Notion
Load pages, databases, and folders from Notion
Confluence
Load content from Atlassian Confluence
Jira
Load issues and project data from Jira
Airtable
Load records from Airtable bases
Microsoft Office Loaders
Word
Load .doc and .docx files
Excel
Load spreadsheet data
PowerPoint
Load presentation content
Common Configuration Options
Most document loaders share these common configuration options:Optional text splitter to chunk documents into smaller pieces for processing
Additional metadata to attach to all extracted documents
Comma-separated list of metadata keys to exclude from the output. Use
* to omit all default metadata keys except those specified in Additional Metadata.Example: key1, key2, key3.nestedKey1Output Options
Document loaders typically provide two output types:Document Output
Returns an array of document objects with metadata and pageContent. Use this when you need to preserve document structure and metadata.
Best Practices
Performance Tips
- Use text splitters to chunk large documents for better LLM processing
- Set appropriate limits when crawling websites to avoid long processing times
- Use metadata to track document sources for better retrieval
Using Document Loaders in Workflows
Document loaders are typically used in these scenarios:- RAG (Retrieval Augmented Generation): Load documents to create a knowledge base
- Data Ingestion: Import data from various sources into vector stores
- Content Processing: Extract and transform content for analysis
- Knowledge Base Creation: Build searchable document repositories
Next Steps
PDF Loader
Learn how to load PDF documents
Web Scraper
Extract content from websites
Vector Stores
Store and search documents