POST /ingest

The /ingest endpoint processes and adds documents to the RAG system’s knowledge base, making them available for retrieval during question answering.

Endpoint

POST /api/v1/ingest

Request Body

filepath

string

required

Absolute path to the document file to ingest. Currently supports Markdown (.md) files.

Response

Returns a success message confirming the document was ingested.

Example Request

curl -X POST "http://localhost:8000/api/v1/ingest" \
  -H "Content-Type: application/json" \
  -d '{"filepath": "/path/to/my_doc.md"}'

How Document Ingestion Works

Document Loading - The system loads the document from the specified filepath using the Unstructured API
Chunking - The document is split into smaller chunks for efficient retrieval
Embedding - Each chunk is converted to a vector embedding
Storage - Chunks and embeddings are stored in the ChromaDB vector database

Configuration

Document ingestion requires the following environment variable:

UNSTRUCTURED_API_KEY=your_api_key

The Unstructured API is used for robust document parsing. Ensure your API key is set in the .env file.

Supported File Formats

Currently, the system supports:

Markdown files (.md)

Additional formats may be supported in future releases.

Bulk Ingestion

For ingesting multiple documents at once, use the CLI helper instead:

uv run -m src.rag.ingest

This will ingest all Markdown files from the kb_docs/ folder.

Chunking Strategy

Documents are chunked with:

Configurable chunk size (default optimized for context windows)
Overlap between chunks to preserve context
Metadata preservation (document name, chunk ID)

Ingesting large documents may take time depending on file size and the Unstructured API response time. The endpoint will return once processing is complete.

Best Practices

File Paths - Use absolute paths to avoid ambiguity
File Format - Ensure files are properly formatted Markdown
Organization - Structure your knowledge base documents logically for better retrieval
Updates - Re-ingest documents after making updates to reflect changes in the knowledge base

Error Handling

Common errors:

File not found - Verify the filepath is correct and absolute
API key missing - Ensure UNSTRUCTURED_API_KEY is set
Invalid format - Check that the file is a supported format

All errors return status code 500 with a descriptive message in the detail field.

Endpoints

Models

Endpoint

Request Body

Response

Example Request

How Document Ingestion Works

Configuration

Supported File Formats

Bulk Ingestion

Chunking Strategy

Best Practices

Error Handling

Build docs developers (and LLMs) love

Endpoints

Models

​Endpoint

​Request Body

​Response

​Example Request

​How Document Ingestion Works

​Configuration

​Supported File Formats

​Bulk Ingestion

​Chunking Strategy

​Best Practices

​Error Handling

Build docs developers (and LLMs) love

Endpoint

Request Body

Response

Example Request

How Document Ingestion Works

Configuration

Supported File Formats

Bulk Ingestion

Chunking Strategy

Best Practices

Error Handling