Validate Paths

Overview

Validates file paths before batch processing to catch errors early. Checks accessibility of URLs, S3 objects, and local files, returning detailed validation results including content type and file size. Use this endpoint before starting a batch job to ensure all file paths are valid and accessible.

Authentication

Requires authentication via Bearer token:

Authorization: Bearer your_access_token

Request Body

paths

array

required

Array of path objects to validate. Each object contains:

path

string

required

The file path or URL to validate

type

string

required

Path type: "url", "s3", or "local"

Example Request

{
  "paths": [
    {
      "path": "https://example.com/document.pdf",
      "type": "url"
    },
    {
      "path": "s3://my-bucket/documents/report.pdf",
      "type": "s3"
    },
    {
      "path": "/absolute/path/to/file.pdf",
      "type": "local"
    }
  ]
}

Response

results

array

Array of validation results, one per input path

path

string

The validated path

valid

boolean

Whether the path is valid and accessible

error

string

Error message if validation failed (null if valid)

content_type

string

MIME type of the file (e.g., “application/pdf”)

size

integer

File size in bytes

total

integer

Total number of paths validated

valid_count

integer

Number of valid paths

invalid_count

integer

Number of invalid paths

Example Response

{
  "results": [
    {
      "path": "https://example.com/document.pdf",
      "valid": true,
      "error": null,
      "content_type": "application/pdf",
      "size": 1024000
    },
    {
      "path": "s3://my-bucket/missing.pdf",
      "valid": false,
      "error": "Object not found",
      "content_type": null,
      "size": null
    },
    {
      "path": "/absolute/path/to/file.pdf",
      "valid": true,
      "error": null,
      "content_type": null,
      "size": 2048000
    }
  ],
  "total": 3,
  "valid_count": 2,
  "invalid_count": 1
}

Validation Methods

URL Validation

For paths with type: "url":

Checks URL format (must start with http:// or https://)
Sends HTTP HEAD request to verify accessibility
If HEAD returns 405 (Method Not Allowed), falls back to GET request
Extracts Content-Type and Content-Length headers
Timeout: 10 seconds per URL

Common URL errors:

"URL must start with http:// or https://" - Invalid URL format
"HTTP 404" - File not found
"HTTP 403" - Access forbidden
"Request timed out" - Server didn’t respond in time
"Request failed: [details]" - Network or connection error

S3 Validation

For paths with type: "s3":

Accepts both s3://bucket/key and bucket/key formats
If path starts with http, validates as URL instead
Uses AWS SDK head_object() to check existence
Extracts ContentType and ContentLength from response
Requires S3 client to be configured with AWS credentials

Common S3 errors:

"S3 client not configured" - AWS credentials not set up
"Invalid S3 path format" - Missing key component
"Object not found" - S3 key doesn’t exist
"Bucket not found" - S3 bucket doesn’t exist
"S3 error: [details]" - Permission or network error

Local File Validation

For paths with type: "local":

Checks file exists in filesystem
Verifies it’s a file (not a directory)
Requires absolute path (relative paths rejected)
Extracts file size from filesystem metadata

Common local file errors:

"File not found: [path]" - Path doesn’t exist
"Not a file" - Path points to a directory
"Relative paths not supported" - Must use absolute path
"Path validation error: [details]" - Permission or OS error

Error Handling

Empty Path

{
  "path": "",
  "valid": false,
  "error": "Empty path",
  "content_type": null,
  "size": null
}

Unknown Path Type

{
  "path": "ftp://example.com/file.pdf",
  "valid": false,
  "error": "Unknown path type: ftp",
  "content_type": null,
  "size": null
}

Usage Example

cURL

curl -X POST "http://localhost:8000/api/batch/validate-paths" \
  -H "Authorization: Bearer your_access_token" \
  -H "Content-Type: application/json" \
  -d '{
    "paths": [
      {
        "path": "https://example.com/doc1.pdf",
        "type": "url"
      },
      {
        "path": "https://example.com/doc2.pdf",
        "type": "url"
      }
    ]
  }'

JavaScript/TypeScript

interface ValidationRequest {
  paths: Array<{
    path: string;
    type: 'url' | 's3' | 'local';
  }>;
}

interface ValidationResult {
  path: string;
  valid: boolean;
  error: string | null;
  content_type: string | null;
  size: number | null;
}

interface ValidationResponse {
  results: ValidationResult[];
  total: number;
  valid_count: number;
  invalid_count: number;
}

async function validatePaths(
  paths: Array<{ path: string; type: 'url' | 's3' | 'local' }>,
  token: string
): Promise<ValidationResponse> {
  const response = await fetch('http://localhost:8000/api/batch/validate-paths', {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${token}`,
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({ paths }),
  });
  
  if (!response.ok) {
    throw new Error(`Validation failed: ${response.statusText}`);
  }
  
  return response.json();
}

// Usage
const results = await validatePaths(
  [
    { path: 'https://example.com/doc.pdf', type: 'url' },
    { path: 's3://my-bucket/doc.pdf', type: 's3' },
  ],
  'your_access_token'
);

console.log(`Validated ${results.total} paths`);
console.log(`Valid: ${results.valid_count}, Invalid: ${results.invalid_count}`);

results.results.forEach(result => {
  if (!result.valid) {
    console.error(`Invalid: ${result.path} - ${result.error}`);
  }
});

Python

import requests
from typing import List, Dict, Literal

def validate_paths(
    paths: List[Dict[str, str]],
    token: str,
    base_url: str = "http://localhost:8000"
) -> Dict:
    response = requests.post(
        f"{base_url}/api/batch/validate-paths",
        headers={
            "Authorization": f"Bearer {token}",
            "Content-Type": "application/json"
        },
        json={"paths": paths}
    )
    response.raise_for_status()
    return response.json()

# Usage
paths = [
    {"path": "https://example.com/doc1.pdf", "type": "url"},
    {"path": "s3://bucket/doc2.pdf", "type": "s3"},
    {"path": "/tmp/doc3.pdf", "type": "local"},
]

result = validate_paths(paths, "your_access_token")

print(f"Valid: {result['valid_count']}/{result['total']}")

for validation in result['results']:
    if not validation['valid']:
        print(f"Error: {validation['path']} - {validation['error']}")
    else:
        print(f"OK: {validation['path']} ({validation['size']} bytes)")

Performance Notes

Validation runs in parallel for all paths. Total time depends on the slowest path to validate.

For large batches (100+ files), consider validating in chunks to avoid long request timeouts.

URL validation requires network requests and may be slow for unresponsive servers. The 10-second timeout per URL helps prevent hanging.

Overview

Single Document

Batch Processing

User Management

History & Jobs

Status & Health

Overview

Authentication

Request Body

Example Request

Response

Example Response

Validation Methods

URL Validation

S3 Validation

Local File Validation

Error Handling

Empty Path

Unknown Path Type

Usage Example

cURL

JavaScript/TypeScript

Python

Performance Notes

Build docs developers (and LLMs) love

Overview

Single Document

Batch Processing

User Management

History & Jobs

Status & Health

​Overview

​Authentication

​Request Body

​Example Request

​Response

​Example Response

​Validation Methods

​URL Validation

​S3 Validation

​Local File Validation

​Error Handling

​Empty Path

​Unknown Path Type

​Usage Example

​cURL

​JavaScript/TypeScript

​Python

​Performance Notes

Build docs developers (and LLMs) love

Overview

Authentication

Request Body

Example Request

Response

Example Response

Validation Methods

URL Validation

S3 Validation

Local File Validation

Error Handling

Empty Path

Unknown Path Type

Usage Example

cURL

JavaScript/TypeScript

Python

Performance Notes