Overview
Jobs represent asynchronous operations in the Connector Generator. Each major operation (discovery, scraping, digesting, code generation) creates a job that executes in the background, allowing your application to poll for progress and results. All jobs follow a consistent lifecycle with standardized status values, progress tracking, and error reporting.Job Lifecycle
Job States
Jobs transition through four primary states defined in the system:src/common/enums.py
State Descriptions
queued
queued
Initial StateThe job has been created and queued for processing. Jobs are processed in FIFO order per job type.
- No progress data available yet
- Job hasn’t started consuming resources
- Safe to cancel at this stage
running
running
Active ProcessingThe job is actively executing. This is when most processing occurs.
- Progress updates are available
- Resource consumption is active
startedAttimestamp is set- Job cannot be cancelled (must complete or fail)
finished
finished
Successful CompletionThe job completed successfully and results are available.
- Full results are accessible via
resultfield - Session data is updated with outputs
finishedAttimestamp is set- Progress shows 100% completion
failed
failed
Error StateThe job encountered errors and could not complete.
- Detailed error messages available in
errorsarray - Partial results may or may not be available
- Session state may be inconsistent
- Requires manual intervention or retry
Job Schema
Jobs follow a consistent structure defined in the database models:src/common/database/models/job.py
Progress Tracking
Jobs provide detailed progress information through theJobProgress model:
src/common/database/models/job_progress.py
Processing Stages
Different job types use different stage names:- Documentation Processing
- Schema Extraction (Digester)
- Discovery & Scraping
- Code Generation
src/common/enums.py
Monitoring Job Status
Basic Status Query
Check the current status of any job:The
jobId query parameter is optional. If omitted, the system retrieves the job ID from the session data based on the most recent job of that type.Status Response Format
The response format varies slightly by job type but follows these base schemas:- Stage-Based Progress
- Document-Based Progress
- Iteration-Based Progress
Used by discovery and most single-operation jobs:Schema:
JobStatusStageResponse (src/common/schema.py:86)Finished Job Response
When a job completes successfully, the response includes the full results:Failed Job Response
Failed jobs provide detailed error information:The
errors field is an array of strings, with each element representing a distinct error message or line. This format supports multi-line error details while maintaining backward compatibility.Polling Best Practices
Recommended Polling Pattern
Polling Recommendations
Interval
Poll every 5-10 seconds during processing. More frequent polling provides minimal benefit and increases server load.
Timeout
Set appropriate timeouts based on job type:
- Documentation upload: 5-10 minutes
- Discovery: 2-5 minutes
- Scraping: 10-30 minutes
- Digester: 10-20 minutes
- Codegen: 5-15 minutes
Progress Display
Show progress percentages when available:
Error Handling
Handle network errors separately from job failures. Retry on network errors, but don’t retry failed jobs automatically.
List All Jobs in Session
Retrieve all jobs associated with a session to see the complete history:- Viewing complete workflow history
- Debugging workflow issues
- Identifying failed jobs that need retry
- Monitoring overall progress across multiple operations
Job Types
The system defines several job types corresponding to different operations:| Job Type | Description | Module |
|---|---|---|
discovery.getCandidateLinks | Search for documentation URLs | Discovery |
scrape.getRelevantDocumentation | Scrape and process web documentation | Scraper |
documentation.processUpload | Process uploaded documentation files | Session |
digester.getObjectClass | Extract schema from documentation | Digester |
codegen.generateConnector | Generate connector code | CodeGen |
Advanced Progress Tracking
Real-time Progress Updates
For long-running jobs, implement exponential backoff or WebSocket connections for more efficient progress monitoring:Progress Calculation
Calculate percentage completion when progress data is available:Error Handling
Common Error Scenarios
Job Not Found
Job Not Found
Status:
not_foundThe provided job ID doesn’t exist or has been deleted.Resolution: Verify the job ID or retrieve it from session data.Insufficient Documentation
Insufficient Documentation
Errors:
"Insufficient documentation coverage for object..."The digester couldn’t extract complete schema information.Resolution: Upload more comprehensive documentation or adjust filter instructions.LLM Processing Failure
LLM Processing Failure
Errors:
"LLM processing failed for chunk..."AI processing encountered errors on specific documentation chunks.Resolution: Check documentation format and content quality. Non-critical errors may allow job to complete with warnings.Timeout/Cancellation
Timeout/Cancellation
Errors:
"Job cancelled/interrupted..."The job was interrupted due to system shutdown or timeout.Resolution: Restart the operation. Results are not saved for cancelled jobs.Retry Strategy
When a job fails, determine if retry is appropriate:Related Concepts
Sessions
Learn about session management and data persistence
Workflow
Understand the complete connector generation workflow