Skip to main content
POST
/
api
/
batch
/
validate-paths
Validate Paths
curl --request POST \
  --url https://api.example.com/api/batch/validate-paths \
  --header 'Content-Type: application/json' \
  --data '
{
  "paths": [
    {
      "path": "<string>",
      "type": "<string>"
    }
  ]
}
'
{
  "results": [
    {
      "path": "<string>",
      "valid": true,
      "error": "<string>",
      "content_type": "<string>",
      "size": 123
    }
  ],
  "total": 123,
  "valid_count": 123,
  "invalid_count": 123
}

Overview

Validates file paths before batch processing to catch errors early. Checks accessibility of URLs, S3 objects, and local files, returning detailed validation results including content type and file size. Use this endpoint before starting a batch job to ensure all file paths are valid and accessible.

Authentication

Requires authentication via Bearer token:
Authorization: Bearer your_access_token

Request Body

paths
array
required
Array of path objects to validate. Each object contains:
path
string
required
The file path or URL to validate
type
string
required
Path type: "url", "s3", or "local"

Example Request

{
  "paths": [
    {
      "path": "https://example.com/document.pdf",
      "type": "url"
    },
    {
      "path": "s3://my-bucket/documents/report.pdf",
      "type": "s3"
    },
    {
      "path": "/absolute/path/to/file.pdf",
      "type": "local"
    }
  ]
}

Response

results
array
Array of validation results, one per input path
path
string
The validated path
valid
boolean
Whether the path is valid and accessible
error
string
Error message if validation failed (null if valid)
content_type
string
MIME type of the file (e.g., “application/pdf”)
size
integer
File size in bytes
total
integer
Total number of paths validated
valid_count
integer
Number of valid paths
invalid_count
integer
Number of invalid paths

Example Response

{
  "results": [
    {
      "path": "https://example.com/document.pdf",
      "valid": true,
      "error": null,
      "content_type": "application/pdf",
      "size": 1024000
    },
    {
      "path": "s3://my-bucket/missing.pdf",
      "valid": false,
      "error": "Object not found",
      "content_type": null,
      "size": null
    },
    {
      "path": "/absolute/path/to/file.pdf",
      "valid": true,
      "error": null,
      "content_type": null,
      "size": 2048000
    }
  ],
  "total": 3,
  "valid_count": 2,
  "invalid_count": 1
}

Validation Methods

URL Validation

For paths with type: "url":
  1. Checks URL format (must start with http:// or https://)
  2. Sends HTTP HEAD request to verify accessibility
  3. If HEAD returns 405 (Method Not Allowed), falls back to GET request
  4. Extracts Content-Type and Content-Length headers
  5. Timeout: 10 seconds per URL
Common URL errors:
  • "URL must start with http:// or https://" - Invalid URL format
  • "HTTP 404" - File not found
  • "HTTP 403" - Access forbidden
  • "Request timed out" - Server didn’t respond in time
  • "Request failed: [details]" - Network or connection error

S3 Validation

For paths with type: "s3":
  1. Accepts both s3://bucket/key and bucket/key formats
  2. If path starts with http, validates as URL instead
  3. Uses AWS SDK head_object() to check existence
  4. Extracts ContentType and ContentLength from response
  5. Requires S3 client to be configured with AWS credentials
Common S3 errors:
  • "S3 client not configured" - AWS credentials not set up
  • "Invalid S3 path format" - Missing key component
  • "Object not found" - S3 key doesn’t exist
  • "Bucket not found" - S3 bucket doesn’t exist
  • "S3 error: [details]" - Permission or network error

Local File Validation

For paths with type: "local":
  1. Checks file exists in filesystem
  2. Verifies it’s a file (not a directory)
  3. Requires absolute path (relative paths rejected)
  4. Extracts file size from filesystem metadata
Common local file errors:
  • "File not found: [path]" - Path doesn’t exist
  • "Not a file" - Path points to a directory
  • "Relative paths not supported" - Must use absolute path
  • "Path validation error: [details]" - Permission or OS error

Error Handling

Empty Path

{
  "path": "",
  "valid": false,
  "error": "Empty path",
  "content_type": null,
  "size": null
}

Unknown Path Type

{
  "path": "ftp://example.com/file.pdf",
  "valid": false,
  "error": "Unknown path type: ftp",
  "content_type": null,
  "size": null
}

Usage Example

cURL

curl -X POST "http://localhost:8000/api/batch/validate-paths" \
  -H "Authorization: Bearer your_access_token" \
  -H "Content-Type: application/json" \
  -d '{
    "paths": [
      {
        "path": "https://example.com/doc1.pdf",
        "type": "url"
      },
      {
        "path": "https://example.com/doc2.pdf",
        "type": "url"
      }
    ]
  }'

JavaScript/TypeScript

interface ValidationRequest {
  paths: Array<{
    path: string;
    type: 'url' | 's3' | 'local';
  }>;
}

interface ValidationResult {
  path: string;
  valid: boolean;
  error: string | null;
  content_type: string | null;
  size: number | null;
}

interface ValidationResponse {
  results: ValidationResult[];
  total: number;
  valid_count: number;
  invalid_count: number;
}

async function validatePaths(
  paths: Array<{ path: string; type: 'url' | 's3' | 'local' }>,
  token: string
): Promise<ValidationResponse> {
  const response = await fetch('http://localhost:8000/api/batch/validate-paths', {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${token}`,
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({ paths }),
  });
  
  if (!response.ok) {
    throw new Error(`Validation failed: ${response.statusText}`);
  }
  
  return response.json();
}

// Usage
const results = await validatePaths(
  [
    { path: 'https://example.com/doc.pdf', type: 'url' },
    { path: 's3://my-bucket/doc.pdf', type: 's3' },
  ],
  'your_access_token'
);

console.log(`Validated ${results.total} paths`);
console.log(`Valid: ${results.valid_count}, Invalid: ${results.invalid_count}`);

results.results.forEach(result => {
  if (!result.valid) {
    console.error(`Invalid: ${result.path} - ${result.error}`);
  }
});

Python

import requests
from typing import List, Dict, Literal

def validate_paths(
    paths: List[Dict[str, str]],
    token: str,
    base_url: str = "http://localhost:8000"
) -> Dict:
    response = requests.post(
        f"{base_url}/api/batch/validate-paths",
        headers={
            "Authorization": f"Bearer {token}",
            "Content-Type": "application/json"
        },
        json={"paths": paths}
    )
    response.raise_for_status()
    return response.json()

# Usage
paths = [
    {"path": "https://example.com/doc1.pdf", "type": "url"},
    {"path": "s3://bucket/doc2.pdf", "type": "s3"},
    {"path": "/tmp/doc3.pdf", "type": "local"},
]

result = validate_paths(paths, "your_access_token")

print(f"Valid: {result['valid_count']}/{result['total']}")

for validation in result['results']:
    if not validation['valid']:
        print(f"Error: {validation['path']} - {validation['error']}")
    else:
        print(f"OK: {validation['path']} ({validation['size']} bytes)")

Performance Notes

Validation runs in parallel for all paths. Total time depends on the slowest path to validate.
For large batches (100+ files), consider validating in chunks to avoid long request timeouts.
URL validation requires network requests and may be slow for unresponsive servers. The 10-second timeout per URL helps prevent hanging.

Build docs developers (and LLMs) love