Skip to main content

POST /api/batch/start

Start a new batch processing job. This is an alternative to the WebSocket endpoint for scenarios where you want to start a job and poll for status updates.

Authentication

Requires valid JWT access token in Authorization header.

Request Body

documents
array
required
Array of documents to process
config
object
required
Tagging configuration
column_mapping
object
Optional column mapping for custom CSV structures
job_id
string
Optional client-generated job ID. Server generates one if not provided.

Response

job_id
string
Unique identifier for the batch job
total_documents
integer
Number of documents to process
message
string
Success message

Example Request

cURL
curl -X POST http://localhost:8000/api/batch/start \
  -H "Authorization: Bearer YOUR_JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "documents": [
      {
        "title": "Document 1",
        "file_path": "https://example.com/doc1.pdf",
        "file_source_type": "url"
      }
    ],
    "config": {
      "api_key": "sk-xxx",
      "model_name": "openai/gpt-4o-mini",
      "num_pages": 3,
      "num_tags": 8
    }
  }'

Example Response

{
  "job_id": "batch-1704454800-abc123",
  "total_documents": 1,
  "message": "Batch processing started"
}

GET /api/batch/jobs//status

Get the current status of a batch processing job.

Authentication

Requires valid JWT access token.

Path Parameters

job_id
string
required
Job identifier

Response

job_id
string
Job identifier
status
string
Current status: pending, processing, completed, failed, cancelled, paused
total_documents
integer
Total documents in job
processed_count
integer
Number of processed documents
failed_count
integer
Number of failed documents
progress
number
Progress percentage (0.0 - 1.0)

Example Request

cURL
curl -X GET http://localhost:8000/api/batch/jobs/batch-1704454800-abc123/status \
  -H "Authorization: Bearer YOUR_JWT_TOKEN"

POST /api/batch/jobs//cancel

Cancel a running batch processing job.

Authentication

Requires valid JWT access token.

Path Parameters

job_id
string
required
Job identifier

Response

message
string
Confirmation message
job_id
string
Job identifier

Example Request

cURL
curl -X POST http://localhost:8000/api/batch/jobs/batch-1704454800-abc123/cancel \
  -H "Authorization: Bearer YOUR_JWT_TOKEN"
Cancelled jobs cannot be resumed. Documents that were already processed will retain their results.

POST /api/batch/jobs//pause

Pause a running batch processing job. Processing will stop after the current document completes.

Authentication

Requires valid JWT access token.

Path Parameters

job_id
string
required
Job identifier

Response

message
string
Confirmation message

Example Request

cURL
curl -X POST http://localhost:8000/api/batch/jobs/batch-1704454800-abc123/pause \
  -H "Authorization: Bearer YOUR_JWT_TOKEN"
Paused jobs can be resumed later. This is useful for managing API rate limits or temporarily stopping processing.

POST /api/batch/jobs//resume

Resume a paused batch processing job.

Authentication

Requires valid JWT access token.

Path Parameters

job_id
string
required
Job identifier

Response

message
string
Confirmation message

Example Request

cURL
curl -X POST http://localhost:8000/api/batch/jobs/batch-1704454800-abc123/resume \
  -H "Authorization: Bearer YOUR_JWT_TOKEN"

GET /api/batch/active

Get a list of all active (running or paused) batch jobs for the current user.

Authentication

Requires valid JWT access token.

Response

jobs
array
Array of active job summaries

Example Request

cURL
curl -X GET http://localhost:8000/api/batch/active \
  -H "Authorization: Bearer YOUR_JWT_TOKEN"

Example Response

{
  "jobs": [
    {
      "job_id": "batch-1704454800-abc123",
      "status": "processing",
      "total_documents": 100,
      "processed_count": 45,
      "started_at": "2024-01-15T10:30:00Z"
    }
  ]
}

Job Control Workflow

1

Start a job

Use POST /api/batch/start to begin processing
2

Monitor progress

Poll GET /api/batch/jobs/{job_id}/status for updates
3

Control execution

Use pause/resume/cancel as needed
  • Pause: Temporarily stop (e.g., for rate limit management)
  • Resume: Continue from where it paused
  • Cancel: Stop permanently
4

Check active jobs

Use GET /api/batch/active to see all running jobs

Polling vs WebSocket

Pros:
  • Simple HTTP requests
  • Works with any HTTP client
  • No persistent connection needed
Cons:
  • Higher latency (polling interval)
  • More network overhead
  • No real-time updates
Use when:
  • Running batch jobs from scripts
  • Backend-to-backend integration
  • Firewall blocks WebSocket
// Poll every 2-5 seconds for active jobs
const pollInterval = 3000; // 3 seconds

async function pollJobStatus(jobId) {
  const interval = setInterval(async () => {
    const response = await fetch(
      `http://localhost:8000/api/batch/jobs/${jobId}/status`,
      {
        headers: {
          'Authorization': `Bearer ${token}`
        }
      }
    );
    const status = await response.json();
    
    console.log(`Progress: ${(status.progress * 100).toFixed(1)}%`);
    
    if (status.status === 'completed' || status.status === 'failed') {
      clearInterval(interval);
      console.log('Job finished:', status.status);
    }
  }, pollInterval);
}

Error Responses

StatusDescription
400Bad Request - Invalid job ID or parameters
401Unauthorized - Invalid or missing token
404Not Found - Job does not exist
409Conflict - Job is in invalid state for this operation
500Internal Server Error

Build docs developers (and LLMs) love