Skip to main content
POST
/
api
/
v1
/
files
curl -X POST "https://your-domain.com/api/v1/files/?process=true" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -F "file=@/path/to/document.pdf" \
  -F 'metadata={"source":"user_upload","category":"documentation"}'
{
  "status": true,
  "id": "file_abc123",
  "filename": "product_guide.pdf",
  "path": "files/abc123_product_guide.pdf",
  "data": {
    "status": "pending"
  },
  "meta": {
    "name": "product_guide.pdf",
    "content_type": "application/pdf",
    "size": 245678,
    "data": {
      "source": "user_upload",
      "category": "documentation"
    }
  },
  "user_id": "user_456def",
  "created_at": 1678901234,
  "updated_at": 1678901234
}
Upload a document file and optionally process it for embedding into a knowledge base. The file is chunked, embedded, and stored in the vector database for semantic search.

Request

Form Data

file
file
required
The document file to upload. Supported formats depend on your configuration (PDF, DOCX, TXT, Markdown, etc.)
metadata
string | object
JSON string or object with additional metadata about the file. Can include custom fields for your application.

Query Parameters

process
boolean
default:"true"
Whether to process the file for RAG (extract text, chunk, and embed)
process_in_background
boolean
default:"true"
Whether to process the file asynchronously in the background

Headers

Authorization
string
required
Bearer token for authentication

Response

status
boolean
Whether the upload was successful
id
string
Unique identifier for the uploaded file
filename
string
Original filename of the uploaded file
path
string
Storage path of the uploaded file
data
object
File processing data
status
string
Processing status: “pending”, “completed”, or “failed”
content
string
Extracted text content (after processing)
error
string
Error message if processing failed
meta
object
File metadata
name
string
Display name of the file
content_type
string
MIME type of the file
size
integer
File size in bytes
data
object
Custom metadata provided during upload
user_id
string
ID of the user who uploaded the file
created_at
integer
Unix timestamp when the file was uploaded
updated_at
integer
Unix timestamp when the file was last updated
curl -X POST "https://your-domain.com/api/v1/files/?process=true" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -F "file=@/path/to/document.pdf" \
  -F 'metadata={"source":"user_upload","category":"documentation"}'
{
  "status": true,
  "id": "file_abc123",
  "filename": "product_guide.pdf",
  "path": "files/abc123_product_guide.pdf",
  "data": {
    "status": "pending"
  },
  "meta": {
    "name": "product_guide.pdf",
    "content_type": "application/pdf",
    "size": 245678,
    "data": {
      "source": "user_upload",
      "category": "documentation"
    }
  },
  "user_id": "user_456def",
  "created_at": 1678901234,
  "updated_at": 1678901234
}

Add File to Knowledge Base

After uploading a file, add it to a knowledge base:
POST /api/v1/knowledge/{knowledge_id}/file/add

Request Body

file_id
string
required
ID of the uploaded file to add to the knowledge base
curl -X POST "https://your-domain.com/api/v1/knowledge/kb_123/file/add" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"file_id": "file_abc123"}'

Batch Upload Files

Upload multiple files to a knowledge base at once:
POST /api/v1/knowledge/{knowledge_id}/files/batch/add

Request Body

[
  {"file_id": "file_1"},
  {"file_id": "file_2"},
  {"file_id": "file_3"}
]

Processing Pipeline

  1. Upload: File is stored and assigned a unique ID
  2. Extraction: Text content is extracted based on file type (PDF, DOCX, etc.)
  3. Chunking: Content is split into chunks (configured via CHUNK_SIZE and CHUNK_OVERLAP)
  4. Embedding: Each chunk is embedded using the configured embedding model
  5. Storage: Embeddings are stored in the vector database for retrieval

Monitoring Processing Status

Check the processing status of a file:
GET /api/v1/files/{file_id}/process/status?stream=true
This returns a Server-Sent Events (SSE) stream with status updates:
data: {"status": "pending"}
data: {"status": "completed"}

Notes

  • Supported file types are configurable via ALLOWED_FILE_EXTENSIONS
  • Maximum file size is controlled by FILE_MAX_SIZE setting
  • Processing extracts text using various engines (PyMuPDF, Tika, Docling, etc.)
  • Audio files are transcribed using the configured STT engine
  • Files are automatically chunked and embedded if process=true

Build docs developers (and LLMs) love