Job Status & Tracking

Overview

Jobs represent asynchronous operations in the Connector Generator. Each major operation (discovery, scraping, digesting, code generation) creates a job that executes in the background, allowing your application to poll for progress and results. All jobs follow a consistent lifecycle with standardized status values, progress tracking, and error reporting.

Job Lifecycle

Job States

Jobs transition through four primary states defined in the system:

src/common/enums.py

class JobStatus(str, Enum):
    queued = "queued"        # Job created, waiting to start
    running = "running"      # Job actively processing
    finished = "finished"    # Job completed successfully
    failed = "failed"        # Job encountered errors
    not_found = "not_found"  # Job ID doesn't exist

State Descriptions

queued

Initial StateThe job has been created and queued for processing. Jobs are processed in FIFO order per job type.

No progress data available yet
Job hasn’t started consuming resources
Safe to cancel at this stage

running

Active ProcessingThe job is actively executing. This is when most processing occurs.

Progress updates are available
Resource consumption is active
startedAt timestamp is set
Job cannot be cancelled (must complete or fail)

finished

Successful CompletionThe job completed successfully and results are available.

Full results are accessible via result field
Session data is updated with outputs
finishedAt timestamp is set
Progress shows 100% completion

failed

Error StateThe job encountered errors and could not complete.

Detailed error messages available in errors array
Partial results may or may not be available
Session state may be inconsistent
Requires manual intervention or retry

Job Schema

Jobs follow a consistent structure defined in the database models:

src/common/database/models/job.py

class Job(Base):
    __tablename__ = "jobs"
    
    job_id: UUID                    # Unique identifier
    session_id: UUID                # Associated session
    job_type: str                   # e.g., "discovery.getCandidateLinks"
    status: str                     # Current state (queued/running/finished/failed)
    
    created_at: datetime            # Job creation time
    updated_at: datetime            # Last update time
    started_at: datetime | None     # When processing began
    finished_at: datetime | None    # When processing completed
    
    input: Dict[str, Any]           # Job input parameters
    result: Dict[str, Any] | None   # Output data (when finished)
    errors: List[str] | None        # Error messages (when failed)

Progress Tracking

Jobs provide detailed progress information through the JobProgress model:

src/common/database/models/job_progress.py

class JobProgress(Base):
    __tablename__ = "job_progress"
    
    job_id: UUID                        # Associated job
    stage: str | None                   # Current processing stage
    message: str | None                 # Human-readable status message
    total_processing: int | None        # Total items to process
    processing_completed: int | None    # Items processed so far
    updated_at: datetime                # Last progress update

Processing Stages

Different job types use different stage names:

Documentation Processing
Schema Extraction (Digester)
Discovery & Scraping
Code Generation

src/common/enums.py

class JobStage(str, Enum):
    queue = "queue"
    running = "running"
    chunking = "chunking"              # Splitting documentation
    processing_chunks = "processing_chunks"  # LLM analysis
    processing = "processing"          # General processing
    finished = "finished"

src/common/enums.py

class JobStage(str, Enum):
    sorting = "sorting"                          # Identifying object classes
    sorting_finished = "sorting_finished"
    relevancy_filtering = "relevancy_filtering"  # Filtering relevant docs
    relevancy_filtering_finished = "relevancy_filtering_finished"
    resolving_duplicates = "resolving_duplicates"  # Deduplication
    aggregation_finished = "aggregation_finished"
    schema_ready = "schema_ready"                # Schema assembled
    relations_ready = "relations_ready"          # Relationships mapped
    finished = "finished"

src/common/enums.py

class JobStage(str, Enum):
    queue = "queue"
    running = "running"
    processing = "processing"  # Searching or scraping
    generating = "generating"  # Generating results
    finished = "finished"

src/common/enums.py

class JobStage(str, Enum):
    queue = "queue"
    running = "running"
    generating = "generating"  # Generating connector code
    finished = "finished"

Monitoring Job Status

Basic Status Query

Check the current status of any job:

curl -X GET "http://localhost:8000/discovery/{session_id}/discovery?jobId={job_id}"

The jobId query parameter is optional. If omitted, the system retrieves the job ID from the session data based on the most recent job of that type.

Status Response Format

The response format varies slightly by job type but follows these base schemas:

Stage-Based Progress
Document-Based Progress
Iteration-Based Progress

Used by discovery and most single-operation jobs:

{
  "jobId": "8f2c5d90-3a17-4b3e-9c4e-7fa8b1d6e8a2",
  "status": "running",
  "createdAt": "2026-03-10T12:00:00Z",
  "startedAt": "2026-03-10T12:00:05Z",
  "updatedAt": "2026-03-10T12:01:30Z",
  "progress": {
    "stage": "processing",
    "message": "Analyzing documentation chunks"
  }
}

Schema: JobStatusStageResponse (src/common/schema.py:86)

Used by digester and multi-document processing jobs:

{
  "jobId": "8f2c5d90-3a17-4b3e-9c4e-7fa8b1d6e8a2",
  "status": "running",
  "createdAt": "2026-03-10T12:00:00Z",
  "startedAt": "2026-03-10T12:00:05Z",
  "updatedAt": "2026-03-10T12:01:30Z",
  "progress": {
    "stage": "processing",
    "message": "Processing documentation chunks",
    "processedDocuments": 15,
    "totalDocuments": 50
  }
}

Schema: JobStatusMultiDocResponse (src/common/schema.py:95)

Used by scraper for iterative crawling:

{
  "jobId": "7c9e6679-7425-40de-944b-e07fc1f90ae7",
  "status": "running",
  "createdAt": "2026-03-10T12:00:00Z",
  "startedAt": "2026-03-10T12:00:05Z",
  "updatedAt": "2026-03-10T12:05:20Z",
  "progress": {
    "stage": "processing",
    "message": "Crawling documentation pages",
    "completedIterations": 5,
    "totalIterations": 10
  }
}

Schema: JobStatusIterationResponse (src/common/schema.py:89)

Finished Job Response

When a job completes successfully, the response includes the full results:

{
  "jobId": "8f2c5d90-3a17-4b3e-9c4e-7fa8b1d6e8a2",
  "status": "finished",
  "createdAt": "2026-03-10T12:00:00Z",
  "startedAt": "2026-03-10T12:00:05Z",
  "updatedAt": "2026-03-10T12:02:45Z",
  "progress": {
    "stage": "finished",
    "message": "completed"
  },
  "result": {
    "candidateLinks": [
      "https://api.example.com/docs",
      "https://docs.example.com/api/reference"
    ],
    "searchQuery": "Example API documentation",
    "totalFound": 2
  }
}

Failed Job Response

Failed jobs provide detailed error information:

{
  "jobId": "8f2c5d90-3a17-4b3e-9c4e-7fa8b1d6e8a2",
  "status": "failed",
  "createdAt": "2026-03-10T12:00:00Z",
  "startedAt": "2026-03-10T12:00:05Z",
  "updatedAt": "2026-03-10T12:01:30Z",
  "errors": [
    "Failed to extract schema from documentation",
    "Insufficient documentation coverage for object 'Account'",
    "Consider uploading more comprehensive API documentation"
  ]
}

The errors field is an array of strings, with each element representing a distinct error message or line. This format supports multi-line error details while maintaining backward compatibility.

Polling Best Practices

Recommended Polling Pattern

async function waitForJob(sessionId, jobId, jobType) {
  const maxAttempts = 120;  // 10 minutes at 5s intervals
  let attempts = 0;
  
  while (attempts < maxAttempts) {
    const response = await fetch(
      `http://localhost:8000/${jobType}/${sessionId}/${jobType}?jobId=${jobId}`
    );
    const status = await response.json();
    
    console.log(`Job ${jobId}: ${status.status}`);
    
    if (status.progress) {
      const { stage, message, processedDocuments, totalDocuments } = status.progress;
      console.log(`  Stage: ${stage}`);
      console.log(`  Message: ${message}`);
      if (totalDocuments) {
        console.log(`  Progress: ${processedDocuments}/${totalDocuments}`);
      }
    }
    
    // Check terminal states
    if (status.status === 'finished') {
      return { success: true, result: status.result };
    }
    
    if (status.status === 'failed') {
      return { success: false, errors: status.errors };
    }
    
    // Continue polling
    await new Promise(resolve => setTimeout(resolve, 5000));
    attempts++;
  }
  
  throw new Error('Job polling timeout after 10 minutes');
}

Polling Recommendations

Interval

Poll every 5-10 seconds during processing. More frequent polling provides minimal benefit and increases server load.

Timeout

Set appropriate timeouts based on job type:

Documentation upload: 5-10 minutes
Discovery: 2-5 minutes
Scraping: 10-30 minutes
Digester: 10-20 minutes
Codegen: 5-15 minutes

Progress Display

Show progress percentages when available:

Progress: 15/50 documents (30%)

Error Handling

Handle network errors separately from job failures. Retry on network errors, but don’t retry failed jobs automatically.

List All Jobs in Session

Retrieve all jobs associated with a session to see the complete history:

curl -X GET "http://localhost:8000/session/{session_id}/jobs"

This endpoint is useful for:

Viewing complete workflow history
Debugging workflow issues
Identifying failed jobs that need retry
Monitoring overall progress across multiple operations

Job Types

The system defines several job types corresponding to different operations:

Job Type	Description	Module
`discovery.getCandidateLinks`	Search for documentation URLs	Discovery
`scrape.getRelevantDocumentation`	Scrape and process web documentation	Scraper
`documentation.processUpload`	Process uploaded documentation files	Session
`digester.getObjectClass`	Extract schema from documentation	Digester
`codegen.generateConnector`	Generate connector code	CodeGen

Advanced Progress Tracking

Real-time Progress Updates

For long-running jobs, implement exponential backoff or WebSocket connections for more efficient progress monitoring:

async function pollWithBackoff(sessionId, jobId, jobType) {
  let interval = 2000;  // Start with 2s
  const maxInterval = 10000;  // Cap at 10s
  
  while (true) {
    const status = await checkJobStatus(sessionId, jobId, jobType);
    
    if (status.status === 'finished' || status.status === 'failed') {
      return status;
    }
    
    // Exponential backoff
    await new Promise(resolve => setTimeout(resolve, interval));
    interval = Math.min(interval * 1.5, maxInterval);
  }
}

Progress Calculation

Calculate percentage completion when progress data is available:

function calculateProgress(progress) {
  if (!progress) return null;
  
  const { processedDocuments, totalDocuments, completedIterations, totalIterations } = progress;
  
  if (totalDocuments > 0) {
    return Math.round((processedDocuments / totalDocuments) * 100);
  }
  
  if (totalIterations > 0) {
    return Math.round((completedIterations / totalIterations) * 100);
  }
  
  return null;
}

Error Handling

Common Error Scenarios

Job Not Found

Status: not_foundThe provided job ID doesn’t exist or has been deleted.Resolution: Verify the job ID or retrieve it from session data.

Insufficient Documentation

Errors: "Insufficient documentation coverage for object..."The digester couldn’t extract complete schema information.Resolution: Upload more comprehensive documentation or adjust filter instructions.

LLM Processing Failure

Errors: "LLM processing failed for chunk..."AI processing encountered errors on specific documentation chunks.Resolution: Check documentation format and content quality. Non-critical errors may allow job to complete with warnings.

Timeout/Cancellation

Errors: "Job cancelled/interrupted..."The job was interrupted due to system shutdown or timeout.Resolution: Restart the operation. Results are not saved for cancelled jobs.

Retry Strategy

When a job fails, determine if retry is appropriate:

async function retryJob(sessionId, jobInput, jobType, maxRetries = 3) {
  for (let attempt = 1; attempt <= maxRetries; attempt++) {
    console.log(`Attempt ${attempt}/${maxRetries}`);
    
    const jobResponse = await startJob(sessionId, jobInput, jobType);
    const result = await waitForJob(sessionId, jobResponse.jobId, jobType);
    
    if (result.success) {
      return result;
    }
    
    // Check if retry makes sense
    const errors = result.errors || [];
    const isRetriable = !errors.some(err => 
      err.includes('Insufficient documentation') ||
      err.includes('Invalid input')
    );
    
    if (!isRetriable) {
      console.log('Non-retriable error, aborting');
      throw new Error(`Job failed: ${errors.join(', ')}`);
    }
    
    // Exponential backoff before retry
    if (attempt < maxRetries) {
      const delay = Math.pow(2, attempt) * 1000;
      console.log(`Waiting ${delay}ms before retry...`);
      await new Promise(resolve => setTimeout(resolve, delay));
    }
  }
  
  throw new Error('Max retries exceeded');
}

Sessions

Learn about session management and data persistence

Workflow

Understand the complete connector generation workflow

Get Started

Core Concepts

Guides

Overview

Job Lifecycle

Job States

State Descriptions

Job Schema

Progress Tracking

Processing Stages

Monitoring Job Status

Basic Status Query

Status Response Format

Finished Job Response

Failed Job Response

Polling Best Practices

Recommended Polling Pattern

Polling Recommendations

Interval

Timeout

Progress Display

Error Handling

List All Jobs in Session

Job Types

Advanced Progress Tracking

Real-time Progress Updates

Progress Calculation

Error Handling

Common Error Scenarios

Retry Strategy

Sessions

Workflow

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

​Overview

​Job Lifecycle

​Job States

​State Descriptions

​Job Schema

​Progress Tracking

​Processing Stages

​Monitoring Job Status

​Basic Status Query

​Status Response Format

​Finished Job Response

​Failed Job Response

​Polling Best Practices

​Recommended Polling Pattern

​Polling Recommendations

Interval

Timeout

Progress Display

Error Handling

​List All Jobs in Session

​Job Types

​Advanced Progress Tracking

​Real-time Progress Updates

​Progress Calculation

​Error Handling

​Common Error Scenarios

​Retry Strategy

​Related Concepts

Sessions

Workflow

Build docs developers (and LLMs) love

Overview

Job Lifecycle

Job States

State Descriptions

Job Schema

Progress Tracking

Processing Stages

Monitoring Job Status

Basic Status Query

Status Response Format

Finished Job Response

Failed Job Response

Polling Best Practices

Recommended Polling Pattern

Polling Recommendations

List All Jobs in Session

Job Types

Advanced Progress Tracking

Real-time Progress Updates

Progress Calculation

Error Handling

Common Error Scenarios

Retry Strategy

Related Concepts