Skip to main content
The Meta-Data Tag Generator provides a comprehensive REST API for processing PDF documents and extracting AI-powered metadata tags. The API is built with FastAPI and supports both synchronous and real-time WebSocket operations.

Base URL

http://localhost:8000
All API endpoints are prefixed with /api except the root endpoint.

API Version

Current version: 2.0.0 You can check the API version and status at the root endpoint:
curl http://localhost:8000
Response:
{
  "message": "Document Meta-Tagging API",
  "version": "2.0.0"
}

API Structure

The API is organized into the following sections:

Single Document

Process individual PDF files with AI-powered tagging

Batch Processing

Process multiple documents with real-time progress updates

User Management

User registration, login, and token management

History & Jobs

View processing history, job details, and user statistics

Health & Status

System health checks and monitoring endpoints

Authentication

Most endpoints require JWT authentication. See the Authentication page for details on obtaining and using access tokens. Public endpoints (no authentication required):
  • POST /api/auth/register - User registration
  • POST /api/auth/login - User login
  • GET /api/health - Health check
  • GET /api/status - Status check
Protected endpoints (require JWT token):
  • All /api/single/* endpoints
  • All /api/batch/* endpoints
  • GET /api/auth/me - Get current user
  • POST /api/auth/refresh - Refresh access token

Request Format

All API requests use standard HTTP methods:
  • GET - Retrieve data
  • POST - Create or process data
  • WebSocket - Real-time bidirectional communication

Content Types

The API supports the following content types:
curl -X POST http://localhost:8000/api/batch/validate-paths \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -d '{"paths": [{"path": "https://example.com/doc.pdf", "type": "url"}]}'

Response Format

All API responses are JSON-formatted with consistent structure:

Success Response

{
  "success": true,
  "data": {
    // Endpoint-specific data
  }
}

Error Response

{
  "detail": "Error message describing what went wrong"
}

HTTP Status Codes

StatusMeaningDescription
200OKRequest successful
400Bad RequestInvalid request parameters
401UnauthorizedMissing or invalid authentication token
403ForbiddenInsufficient permissions
404Not FoundResource not found
500Internal Server ErrorServer-side error

Rate Limits

The API does not currently implement rate limiting at the application level. However, your OpenRouter API key will have its own rate limits based on your plan.
Best practices:
  • Use batch processing for multiple documents instead of sequential single requests
  • Implement exponential backoff for retries
  • Monitor your OpenRouter usage dashboard

CORS Configuration

The API is configured with permissive CORS settings for development:
allow_origins=["*"]
allow_credentials=True
allow_methods=["*"]
allow_headers=["*"]
In production, configure CORS to allow only your frontend domain for security.

Endpoint Categories

Single Document Processing

Process individual PDF files via upload or URL:
  • POST /api/single/process - Process single PDF
  • GET /api/single/preview - Preview PDF from URL

Batch Processing

Process multiple documents with real-time progress:
  • WebSocket /api/batch/ws/{job_id} - Real-time batch processing
  • POST /api/batch/start - Start a batch job
  • GET /api/batch/jobs/{job_id}/status - Get job status
  • POST /api/batch/jobs/{job_id}/cancel - Cancel job
  • POST /api/batch/jobs/{job_id}/pause - Pause job
  • POST /api/batch/jobs/{job_id}/resume - Resume job
  • GET /api/batch/active - List active jobs
  • POST /api/batch/validate-paths - Validate file paths
  • GET /api/batch/template - Get CSV template
  • POST /api/batch/process - Legacy batch processing

User Management

Manage user accounts and authentication:
  • POST /api/auth/register - Register new user
  • POST /api/auth/login - User login
  • POST /api/auth/refresh - Refresh access token
  • POST /api/auth/logout - Logout user
  • GET /api/auth/me - Get current user

History & Jobs

View processing history and statistics:
  • GET /api/history/jobs - List user’s jobs
  • GET /api/history/jobs/{job_id} - Get job details
  • DELETE /api/history/jobs/{job_id} - Delete job
  • GET /api/history/documents - List recent documents
  • GET /api/history/documents/{doc_id} - Get document details
  • GET /api/history/documents/search - Search documents
  • GET /api/history/stats - Get user statistics

Health & Monitoring

Check system health and status:
  • GET /api/health - Comprehensive health check
  • GET /api/status - Simple status check

WebSocket Endpoints

The API supports WebSocket connections for real-time batch processing:
ws://localhost:8000/api/batch/ws/{job_id}
WebSocket connections support query parameter or header-based authentication:
// Query parameter auth
const ws = new WebSocket('ws://localhost:8000/api/batch/ws/my-job-id?token=YOUR_JWT_TOKEN');

// Or header-based auth (if supported by client)
const ws = new WebSocket('ws://localhost:8000/api/batch/ws/my-job-id');
ws.onopen = () => {
  ws.send(JSON.stringify({
    type: 'auth',
    token: 'YOUR_JWT_TOKEN'
  }));
};

External APIs

The system integrates with the following external services:

OpenRouter API

Endpoint: https://openrouter.ai/api/v1/chat/completions Used for AI-powered tag generation. You must provide your own OpenRouter API key.
Get your API key at openrouter.ai/keys

OCR Engines

  • Tesseract OCR: Local CLI tool for fast text extraction (Hindi + English)
  • EasyOCR: Deep learning OCR for 80+ languages (automatic fallback)

Optional Integrations

  • AWS S3: For batch processing from S3 buckets (requires AWS credentials)
  • MinIO: Local object storage for file persistence
  • PostgreSQL: User data and document history
  • Redis: Job state persistence and pub/sub

SDK Examples

While there’s no official SDK, you can easily integrate with the API using standard HTTP clients:
import requests

BASE_URL = "http://localhost:8000"

def process_pdf(file_path: str, api_key: str):
    with open(file_path, 'rb') as f:
        files = {'pdf_file': f}
        data = {
            'config': json.dumps({
                'api_key': api_key,
                'model_name': 'openai/gpt-4o-mini',
                'num_pages': 3,
                'num_tags': 8
            })
        }
        response = requests.post(
            f"{BASE_URL}/api/single/process",
            files=files,
            data=data,
            headers={'Authorization': f'Bearer {jwt_token}'}
        )
        return response.json()

API Changelog

Version 2.0.0 (Current)

  • Added user authentication with JWT tokens
  • Added document history tracking
  • Added WebSocket support for real-time batch processing
  • Added path validation endpoint
  • Added Redis for job state persistence
  • Improved error handling and retry logic

Version 1.0.0

  • Initial release
  • Single document processing
  • Legacy batch processing
  • OCR support (Tesseract + EasyOCR)
  • OpenRouter integration

Next Steps

Authentication Guide

Learn how to authenticate API requests

Process a Document

Start processing PDF documents

Batch Processing

Process multiple documents at once

Job History

View processing history and statistics

User Management

Create and manage user accounts

Build docs developers (and LLMs) love