/ingest endpoint processes and adds documents to the RAG system’s knowledge base, making them available for retrieval during question answering.
Endpoint
Request Body
Absolute path to the document file to ingest. Currently supports Markdown (.md) files.
Response
Returns a success message confirming the document was ingested.Example Request
How Document Ingestion Works
- Document Loading - The system loads the document from the specified filepath using the Unstructured API
- Chunking - The document is split into smaller chunks for efficient retrieval
- Embedding - Each chunk is converted to a vector embedding
- Storage - Chunks and embeddings are stored in the ChromaDB vector database
Configuration
Document ingestion requires the following environment variable:The Unstructured API is used for robust document parsing. Ensure your API key is set in the
.env file.Supported File Formats
Currently, the system supports:- Markdown files (.md)
Bulk Ingestion
For ingesting multiple documents at once, use the CLI helper instead:kb_docs/ folder.
Chunking Strategy
Documents are chunked with:- Configurable chunk size (default optimized for context windows)
- Overlap between chunks to preserve context
- Metadata preservation (document name, chunk ID)
Best Practices
- File Paths - Use absolute paths to avoid ambiguity
- File Format - Ensure files are properly formatted Markdown
- Organization - Structure your knowledge base documents logically for better retrieval
- Updates - Re-ingest documents after making updates to reflect changes in the knowledge base
Error Handling
Common errors:- File not found - Verify the filepath is correct and absolute
- API key missing - Ensure
UNSTRUCTURED_API_KEYis set - Invalid format - Check that the file is a supported format
detail field.