Upload a document file and optionally process it for embedding into a knowledge base. The file is chunked, embedded, and stored in the vector database for semantic search.
Request
The document file to upload. Supported formats depend on your configuration (PDF, DOCX, TXT, Markdown, etc.)
JSON string or object with additional metadata about the file. Can include custom fields for your application.
Query Parameters
Whether to process the file for RAG (extract text, chunk, and embed)
Whether to process the file asynchronously in the background
Bearer token for authentication
Response
Whether the upload was successful
Unique identifier for the uploaded file
Original filename of the uploaded file
Storage path of the uploaded file
File processing data Processing status: “pending”, “completed”, or “failed”
Extracted text content (after processing)
Error message if processing failed
File metadata Custom metadata provided during upload
ID of the user who uploaded the file
Unix timestamp when the file was uploaded
Unix timestamp when the file was last updated
curl -X POST "https://your-domain.com/api/v1/files/?process=true" \
-H "Authorization: Bearer YOUR_TOKEN" \
-F "file=@/path/to/document.pdf" \
-F 'metadata={"source":"user_upload","category":"documentation"}'
200 - Success (Background Processing)
200 - Success (Completed)
400 - Invalid File Type
{
"status" : true ,
"id" : "file_abc123" ,
"filename" : "product_guide.pdf" ,
"path" : "files/abc123_product_guide.pdf" ,
"data" : {
"status" : "pending"
},
"meta" : {
"name" : "product_guide.pdf" ,
"content_type" : "application/pdf" ,
"size" : 245678 ,
"data" : {
"source" : "user_upload" ,
"category" : "documentation"
}
},
"user_id" : "user_456def" ,
"created_at" : 1678901234 ,
"updated_at" : 1678901234
}
Add File to Knowledge Base
After uploading a file, add it to a knowledge base:
POST /api/v1/knowledge/{knowledge_id}/file/add
Request Body
ID of the uploaded file to add to the knowledge base
curl -X POST "https://your-domain.com/api/v1/knowledge/kb_123/file/add" \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{"file_id": "file_abc123"}'
Batch Upload Files
Upload multiple files to a knowledge base at once:
POST /api/v1/knowledge/{knowledge_id}/files/batch/add
Request Body
[
{ "file_id" : "file_1" },
{ "file_id" : "file_2" },
{ "file_id" : "file_3" }
]
Processing Pipeline
Upload : File is stored and assigned a unique ID
Extraction : Text content is extracted based on file type (PDF, DOCX, etc.)
Chunking : Content is split into chunks (configured via CHUNK_SIZE and CHUNK_OVERLAP)
Embedding : Each chunk is embedded using the configured embedding model
Storage : Embeddings are stored in the vector database for retrieval
Monitoring Processing Status
Check the processing status of a file:
GET /api/v1/files/{file_id}/process/status?stream= true
This returns a Server-Sent Events (SSE) stream with status updates:
data: {"status": "pending"}
data: {"status": "completed"}
Notes
Supported file types are configurable via ALLOWED_FILE_EXTENSIONS
Maximum file size is controlled by FILE_MAX_SIZE setting
Processing extracts text using various engines (PyMuPDF, Tika, Docling, etc.)
Audio files are transcribed using the configured STT engine
Files are automatically chunked and embedded if process=true