Skip to main content

Overview

Jobs represent asynchronous operations in the Connector Generator. Each major operation (discovery, scraping, digesting, code generation) creates a job that executes in the background, allowing your application to poll for progress and results. All jobs follow a consistent lifecycle with standardized status values, progress tracking, and error reporting.

Job Lifecycle

Job States

Jobs transition through four primary states defined in the system:
src/common/enums.py
class JobStatus(str, Enum):
    queued = "queued"        # Job created, waiting to start
    running = "running"      # Job actively processing
    finished = "finished"    # Job completed successfully
    failed = "failed"        # Job encountered errors
    not_found = "not_found"  # Job ID doesn't exist

State Descriptions

Initial StateThe job has been created and queued for processing. Jobs are processed in FIFO order per job type.
  • No progress data available yet
  • Job hasn’t started consuming resources
  • Safe to cancel at this stage
Active ProcessingThe job is actively executing. This is when most processing occurs.
  • Progress updates are available
  • Resource consumption is active
  • startedAt timestamp is set
  • Job cannot be cancelled (must complete or fail)
Successful CompletionThe job completed successfully and results are available.
  • Full results are accessible via result field
  • Session data is updated with outputs
  • finishedAt timestamp is set
  • Progress shows 100% completion
Error StateThe job encountered errors and could not complete.
  • Detailed error messages available in errors array
  • Partial results may or may not be available
  • Session state may be inconsistent
  • Requires manual intervention or retry

Job Schema

Jobs follow a consistent structure defined in the database models:
src/common/database/models/job.py
class Job(Base):
    __tablename__ = "jobs"
    
    job_id: UUID                    # Unique identifier
    session_id: UUID                # Associated session
    job_type: str                   # e.g., "discovery.getCandidateLinks"
    status: str                     # Current state (queued/running/finished/failed)
    
    created_at: datetime            # Job creation time
    updated_at: datetime            # Last update time
    started_at: datetime | None     # When processing began
    finished_at: datetime | None    # When processing completed
    
    input: Dict[str, Any]           # Job input parameters
    result: Dict[str, Any] | None   # Output data (when finished)
    errors: List[str] | None        # Error messages (when failed)

Progress Tracking

Jobs provide detailed progress information through the JobProgress model:
src/common/database/models/job_progress.py
class JobProgress(Base):
    __tablename__ = "job_progress"
    
    job_id: UUID                        # Associated job
    stage: str | None                   # Current processing stage
    message: str | None                 # Human-readable status message
    total_processing: int | None        # Total items to process
    processing_completed: int | None    # Items processed so far
    updated_at: datetime                # Last progress update

Processing Stages

Different job types use different stage names:
src/common/enums.py
class JobStage(str, Enum):
    queue = "queue"
    running = "running"
    chunking = "chunking"              # Splitting documentation
    processing_chunks = "processing_chunks"  # LLM analysis
    processing = "processing"          # General processing
    finished = "finished"

Monitoring Job Status

Basic Status Query

Check the current status of any job:
curl -X GET "http://localhost:8000/discovery/{session_id}/discovery?jobId={job_id}"
The jobId query parameter is optional. If omitted, the system retrieves the job ID from the session data based on the most recent job of that type.

Status Response Format

The response format varies slightly by job type but follows these base schemas:
Used by discovery and most single-operation jobs:
{
  "jobId": "8f2c5d90-3a17-4b3e-9c4e-7fa8b1d6e8a2",
  "status": "running",
  "createdAt": "2026-03-10T12:00:00Z",
  "startedAt": "2026-03-10T12:00:05Z",
  "updatedAt": "2026-03-10T12:01:30Z",
  "progress": {
    "stage": "processing",
    "message": "Analyzing documentation chunks"
  }
}
Schema: JobStatusStageResponse (src/common/schema.py:86)

Finished Job Response

When a job completes successfully, the response includes the full results:
{
  "jobId": "8f2c5d90-3a17-4b3e-9c4e-7fa8b1d6e8a2",
  "status": "finished",
  "createdAt": "2026-03-10T12:00:00Z",
  "startedAt": "2026-03-10T12:00:05Z",
  "updatedAt": "2026-03-10T12:02:45Z",
  "progress": {
    "stage": "finished",
    "message": "completed"
  },
  "result": {
    "candidateLinks": [
      "https://api.example.com/docs",
      "https://docs.example.com/api/reference"
    ],
    "searchQuery": "Example API documentation",
    "totalFound": 2
  }
}

Failed Job Response

Failed jobs provide detailed error information:
{
  "jobId": "8f2c5d90-3a17-4b3e-9c4e-7fa8b1d6e8a2",
  "status": "failed",
  "createdAt": "2026-03-10T12:00:00Z",
  "startedAt": "2026-03-10T12:00:05Z",
  "updatedAt": "2026-03-10T12:01:30Z",
  "errors": [
    "Failed to extract schema from documentation",
    "Insufficient documentation coverage for object 'Account'",
    "Consider uploading more comprehensive API documentation"
  ]
}
The errors field is an array of strings, with each element representing a distinct error message or line. This format supports multi-line error details while maintaining backward compatibility.

Polling Best Practices

async function waitForJob(sessionId, jobId, jobType) {
  const maxAttempts = 120;  // 10 minutes at 5s intervals
  let attempts = 0;
  
  while (attempts < maxAttempts) {
    const response = await fetch(
      `http://localhost:8000/${jobType}/${sessionId}/${jobType}?jobId=${jobId}`
    );
    const status = await response.json();
    
    console.log(`Job ${jobId}: ${status.status}`);
    
    if (status.progress) {
      const { stage, message, processedDocuments, totalDocuments } = status.progress;
      console.log(`  Stage: ${stage}`);
      console.log(`  Message: ${message}`);
      if (totalDocuments) {
        console.log(`  Progress: ${processedDocuments}/${totalDocuments}`);
      }
    }
    
    // Check terminal states
    if (status.status === 'finished') {
      return { success: true, result: status.result };
    }
    
    if (status.status === 'failed') {
      return { success: false, errors: status.errors };
    }
    
    // Continue polling
    await new Promise(resolve => setTimeout(resolve, 5000));
    attempts++;
  }
  
  throw new Error('Job polling timeout after 10 minutes');
}

Polling Recommendations

Interval

Poll every 5-10 seconds during processing. More frequent polling provides minimal benefit and increases server load.

Timeout

Set appropriate timeouts based on job type:
  • Documentation upload: 5-10 minutes
  • Discovery: 2-5 minutes
  • Scraping: 10-30 minutes
  • Digester: 10-20 minutes
  • Codegen: 5-15 minutes

Progress Display

Show progress percentages when available:
Progress: 15/50 documents (30%)

Error Handling

Handle network errors separately from job failures. Retry on network errors, but don’t retry failed jobs automatically.

List All Jobs in Session

Retrieve all jobs associated with a session to see the complete history:
curl -X GET "http://localhost:8000/session/{session_id}/jobs"
This endpoint is useful for:
  • Viewing complete workflow history
  • Debugging workflow issues
  • Identifying failed jobs that need retry
  • Monitoring overall progress across multiple operations

Job Types

The system defines several job types corresponding to different operations:
Job TypeDescriptionModule
discovery.getCandidateLinksSearch for documentation URLsDiscovery
scrape.getRelevantDocumentationScrape and process web documentationScraper
documentation.processUploadProcess uploaded documentation filesSession
digester.getObjectClassExtract schema from documentationDigester
codegen.generateConnectorGenerate connector codeCodeGen

Advanced Progress Tracking

Real-time Progress Updates

For long-running jobs, implement exponential backoff or WebSocket connections for more efficient progress monitoring:
async function pollWithBackoff(sessionId, jobId, jobType) {
  let interval = 2000;  // Start with 2s
  const maxInterval = 10000;  // Cap at 10s
  
  while (true) {
    const status = await checkJobStatus(sessionId, jobId, jobType);
    
    if (status.status === 'finished' || status.status === 'failed') {
      return status;
    }
    
    // Exponential backoff
    await new Promise(resolve => setTimeout(resolve, interval));
    interval = Math.min(interval * 1.5, maxInterval);
  }
}

Progress Calculation

Calculate percentage completion when progress data is available:
function calculateProgress(progress) {
  if (!progress) return null;
  
  const { processedDocuments, totalDocuments, completedIterations, totalIterations } = progress;
  
  if (totalDocuments > 0) {
    return Math.round((processedDocuments / totalDocuments) * 100);
  }
  
  if (totalIterations > 0) {
    return Math.round((completedIterations / totalIterations) * 100);
  }
  
  return null;
}

Error Handling

Common Error Scenarios

Status: not_foundThe provided job ID doesn’t exist or has been deleted.Resolution: Verify the job ID or retrieve it from session data.
Errors: "Insufficient documentation coverage for object..."The digester couldn’t extract complete schema information.Resolution: Upload more comprehensive documentation or adjust filter instructions.
Errors: "LLM processing failed for chunk..."AI processing encountered errors on specific documentation chunks.Resolution: Check documentation format and content quality. Non-critical errors may allow job to complete with warnings.
Errors: "Job cancelled/interrupted..."The job was interrupted due to system shutdown or timeout.Resolution: Restart the operation. Results are not saved for cancelled jobs.

Retry Strategy

When a job fails, determine if retry is appropriate:
async function retryJob(sessionId, jobInput, jobType, maxRetries = 3) {
  for (let attempt = 1; attempt <= maxRetries; attempt++) {
    console.log(`Attempt ${attempt}/${maxRetries}`);
    
    const jobResponse = await startJob(sessionId, jobInput, jobType);
    const result = await waitForJob(sessionId, jobResponse.jobId, jobType);
    
    if (result.success) {
      return result;
    }
    
    // Check if retry makes sense
    const errors = result.errors || [];
    const isRetriable = !errors.some(err => 
      err.includes('Insufficient documentation') ||
      err.includes('Invalid input')
    );
    
    if (!isRetriable) {
      console.log('Non-retriable error, aborting');
      throw new Error(`Job failed: ${errors.join(', ')}`);
    }
    
    // Exponential backoff before retry
    if (attempt < maxRetries) {
      const delay = Math.pow(2, attempt) * 1000;
      console.log(`Waiting ${delay}ms before retry...`);
      await new Promise(resolve => setTimeout(resolve, delay));
    }
  }
  
  throw new Error('Max retries exceeded');
}

Sessions

Learn about session management and data persistence

Workflow

Understand the complete connector generation workflow

Build docs developers (and LLMs) love