Skip to main content
Document Store is Flowise’s centralized system for storing, processing, and managing documents used in Retrieval-Augmented Generation (RAG) applications. It provides a unified interface to load documents, split them into chunks, and prepare them for vector storage.

Overview

The Document Store feature allows you to:
  • Store and organize documents for multiple chatflows
  • Process documents with various loaders (PDF, CSV, text, web scraping, etc.)
  • Configure text splitters for optimal chunking
  • Preview and edit document chunks before upserting
  • Upsert documents to vector stores with embeddings
  • Query and test retrieval performance
  • Track upsert history and configurations

Creating a Document Store

2
From the main navigation menu, click on Document Store to view all your document stores.
3
Create New Store
4
Click the Add New button in the top-right corner.
5
Configure Store
6
Provide the following information:
7
  • Name: A descriptive name for your document store
  • Description: Optional details about the store’s purpose
  • 8
    Save
    9
    Click Add to create the document store.

    Adding Documents

    Once you’ve created a document store, you can add documents using various loaders:
    1
    Select Document Loader
    2
    Click Add Document Loader to see available loader types:
    3
  • File Loaders: PDF, DOCX, TXT, CSV, JSON
  • Web Loaders: Web scraper, Cheerio web scraper, Playwright web scraper
  • API Loaders: Notion, Airtable, GitHub, GitBook
  • Database Loaders: Postgres, MySQL, MongoDB
  • 4
    Configure Loader
    5
    Each loader has specific configuration options:
    6
    // Example: PDF loader configuration
    {
      "textSplitter": "RecursiveCharacterTextSplitter",
      "chunkSize": 1000,
      "chunkOverlap": 200,
      "metadata": {
        "source": "user_manual.pdf"
      }
    }
    
    7
    Preview Chunks
    8
    Before processing, you can preview how the document will be split:
    9
  • View chunk size and count
  • See chunk content and metadata
  • Adjust splitter settings if needed
  • 10
    Process Document
    11
    Click Process to load and chunk the document. The chunks are stored in the document store.

    Configuring Vector Store

    To enable retrieval, you need to configure embeddings and a vector store:
    1
    Select Embeddings Provider
    2
    Choose an embeddings provider to convert text chunks into vectors:
    3
  • OpenAI Embeddings: text-embedding-ada-002, text-embedding-3-small, text-embedding-3-large
  • Azure OpenAI Embeddings: For Azure deployments
  • Cohere Embeddings: Multilingual support
  • HuggingFace Embeddings: Open-source models
  • Google VertexAI Embeddings: Google Cloud integration
  • 4
    Configure Credentials
    5
    Provide API keys or credentials for your embeddings provider.
    6
    Select Vector Store
    7
    Choose where to store your vectors:
    8
  • Pinecone: Managed vector database
  • Qdrant: Open-source vector search
  • Weaviate: AI-native vector database
  • Chroma: Lightweight vector store
  • Supabase: Postgres with pgvector
  • Milvus: Scalable vector database
  • 9
    Configure Record Manager (Optional)
    10
    For advanced indexing and deduplication, configure a Record Manager:
    11
  • Tracks document versions
  • Prevents duplicate embeddings
  • Enables incremental updates
  • Supports cleanup modes: incremental, full
  • 12
    Save Configuration
    13
    Click Save Config to store the configuration without upserting.
    14
    Upsert Documents
    15
    Click Upsert to embed and store your document chunks in the vector store.
    The upsert process can take several minutes depending on the number of chunks and the embeddings provider’s rate limits.

    Managing Document Chunks

    You can view and edit individual document chunks:

    View Chunks

    From the document store detail page, click View & Edit Chunks for any loader to see all chunks:
    Loader: PDF File (user_manual.pdf)
    Chunks: 45
    Total Characters: 89,234
    Status: SYNC (synced with vector store)
    

    Edit Chunks

    Click on any chunk to edit its content or metadata:
    • Modify chunk text
    • Update metadata fields
    • Delete unwanted chunks

    Refresh Documents

    For documents that change over time, use the Refresh option to re-process and upsert all loaders.
    Deleting chunks from the document store will also remove them from the vector store if a Record Manager is configured.

    Querying the Document Store

    Test your retrieval setup with the built-in query interface:
    1
    Open Query Interface
    2
    From the document store actions menu, select Retrieval Query.
    3
    Configure Query Settings
    4
  • Query: Enter your search query
  • Top K: Number of results to return (default: 4)
  • Search Type:
    • similarity: Cosine similarity search
    • mmr: Maximum Marginal Relevance
  • 5
    Execute Query
    6
    Click Search to retrieve relevant chunks from your vector store.
    7
    Review Results
    8
    Each result shows:
    9
  • Chunk content
  • Similarity score
  • Source metadata
  • Position in original document
  • Using Document Store in Chatflows

    To use your document store in a chatflow:
    1. Add a Retriever node to your canvas
    2. Connect it to your Vector Store node
    3. In the Vector Store configuration, select your Document Store
    4. The chatflow will automatically use the embedded documents
    [User Query] → [Retriever] → [Vector Store (Document Store)] → [LLM] → [Response]
    

    Document Store API

    Flowise provides REST APIs for programmatic access to document stores.

    Upsert Documents via API

    import requests
    
    API_URL = "http://localhost:3000/api/v1/document-store/upsert/{storeId}"
    API_KEY = "your_api_key_here"
    
    # Upload files
    form_data = {
        "files": ('document.pdf', open('document.pdf', 'rb'))
    }
    
    body_data = {
        "docId": "loader-id",
        "metadata": {"source": "api"},
        "replaceExisting": True
    }
    
    headers = {
        "Authorization": f"Bearer {API_KEY}"
    }
    
    response = requests.post(API_URL, files=form_data, data=body_data, headers=headers)
    print(response.json())
    

    List Document Stores

    curl -X GET http://localhost:3000/api/v1/document-store/store \
      -H "Authorization: Bearer <your_api_key>"
    

    Get Specific Document Store

    curl -X GET http://localhost:3000/api/v1/document-store/store/{storeId} \
      -H "Authorization: Bearer <your_api_key>"
    

    Delete Document Store

    curl -X DELETE http://localhost:3000/api/v1/document-store/store/{storeId} \
      -H "Authorization: Bearer <your_api_key>"
    
    For the complete API reference, see the Document Store API documentation.

    Document Store Status

    Document stores have different statuses indicating their state:
    • NEW: Newly created, no documents added
    • STALE: Documents added but not yet processed
    • SYNCED: Documents processed and synced
    • UPSERTED: Documents embedded and stored in vector store
    • UPSERTING: Currently upserting to vector store
    • SYNC: Chunks are synced with vector store

    Best Practices

    Chunk Size Optimization

    Choose chunk sizes based on your use case:
    • Small chunks (200-500 tokens): Better for precise retrieval
    • Medium chunks (500-1000 tokens): Balanced approach
    • Large chunks (1000-2000 tokens): More context per chunk

    Metadata Strategy

    Add metadata to improve filtering and retrieval:
    {
      "source": "user_manual.pdf",
      "category": "technical_documentation",
      "version": "2.0",
      "date": "2024-03-01",
      "author": "technical_team"
    }
    

    Document Organization

    Create separate document stores for:
    • Different knowledge domains
    • Various access levels
    • Multiple languages
    • Distinct projects or clients

    Monitoring and Maintenance

    • Regularly review document store usage in chatflows
    • Monitor upsert history for errors
    • Update documents when source content changes
    • Clean up unused document stores

    Troubleshooting

    Chunks Not Appearing

    • Verify the document was processed successfully
    • Check that text splitter configuration is correct
    • Ensure file format is supported by the loader

    Upsert Failures

    • Verify embeddings provider credentials
    • Check vector store connection settings
    • Review API rate limits
    • Examine error logs in upsert history

    Poor Retrieval Quality

    • Adjust chunk size and overlap
    • Try different text splitters
    • Experiment with embeddings models
    • Refine search parameters (top_k, search type)

    Build docs developers (and LLMs) love