Architecture Overview
Jenkins Job Insight is a FastAPI-based service that analyzes Jenkins job failures using AI CLI tools. The system follows a webhook-to-callback flow, with comprehensive failure analysis and optional Jira integration.Core Components
| Component | Location | Purpose |
|---|---|---|
main.py | FastAPI application | HTTP endpoints, request handling, background tasks |
analyzer.py | Analysis engine | AI CLI calls, failure grouping, parallel execution |
jenkins.py | Jenkins client | Jenkins API wrapper, build info retrieval |
jira.py | Jira integration | Optional bug deduplication via Jira search |
storage.py | SQLite storage | Result persistence, HTML report caching |
Data Flow: Webhook to Callback
Async Flow Details
main.py:366-451 implements the/analyze endpoint:
- Request arrives —
analyze()endpoint handler receivesAnalyzeRequest - Job ID generation — UUID created before background task queue (main.py:427)
- Initial save — Record saved with
status="pending"(main.py:433) - Background task —
process_analysis_with_id()queued (main.py:434) - Immediate response — Returns job_id, result_url, html_report_url
- Status update —
status="running"(main.py:330) - Analysis —
analyze_job()called (main.py:334-336) - Jira enrichment — Optional post-processing (main.py:339-345)
- Result storage — Full result saved with
status="completed"(main.py:353) - Callback delivery —
deliver_results()sends HTTP POST (main.py:359)
Analysis Pipeline
Main Job Analysis (analyzer.py:1104-1401)
Failure Grouping & Deduplication
analyzer.py:136-152 —get_failure_signature():
- Creates MD5 hash from error message + first 5 stack trace lines
- Groups identical failures together
- Reduces redundant AI CLI calls
Parallel Execution
analyzer.py:154-177 —run_parallel_with_limit():
- Semaphore-based concurrency control (default: 10 concurrent)
- Uses
asyncio.gather(..., return_exceptions=True) - One failure doesn’t crash entire batch
Where Parallelism Happens
- Child job analysis (main.py:1215-1230) — All failed child jobs analyzed in parallel
- Failure group analysis (main.py:1296-1307) — Each unique error analyzed in parallel
- Jira searches (jira.py:398-399) — Multiple keyword sets searched in parallel
Parallelism is bounded to prevent overwhelming the AI CLI or external services. All parallel operations handle exceptions gracefully.
Jira Integration Flow
main.py:339-345 — Post-analysis enrichment:enrich_with_jira_matches():
- Collect all PRODUCT BUG failures
- Deduplicate by keyword set (same keywords = one search)
- Search Jira for each unique keyword set (parallel)
- AI filters candidates for relevance
- Attach matches to all reports sharing keywords
Jira integration is completely optional and non-blocking. All exceptions are caught and logged; failures never interrupt the analysis pipeline.
Storage & Result Retrieval
SQLite Database (storage.py)
Schema (storage.py:24-33):pending— Job queued, not startedrunning— Analysis in progresscompleted— Analysis successful, result availablefailed— Analysis failed, error in result
HTML Report Caching
main.py:633-684 —/results/{job_id}.html endpoint:
- Check disk cache first (unless
?refresh=1) - If pending/running: Return status page with auto-refresh
- If completed: Generate HTML from stored result, cache to disk
- Future requests served from cache
- Reports saved to
/data/reports/{job_id}.html - Lazy generation: only created when first accessed
- Cache invalidation via
?refresh=1query parameter
Configuration & Settings
main.py:216-276 —_merge_settings():
Every environment variable can be overridden per-request:
| Environment Variable | Request Field | Purpose |
|---|---|---|
JENKINS_URL | jenkins_url | Jenkins server URL |
AI_PROVIDER | ai_provider | AI CLI provider (claude/gemini/cursor) |
AI_MODEL | ai_model | Model identifier |
JIRA_URL | jira_url | Jira server URL |
ENABLE_JIRA | enable_jira | Enable Jira integration |
AI_CLI_TIMEOUT | ai_cli_timeout | Timeout in minutes |
Request body values always override environment variables. This allows per-request customization without restarting the service.
Error Handling
Jenkins API Errors
analyzer.py:600-650 —handle_jenkins_exception():
- 404 →
HTTPException(status_code=404)“Job not found” - 401 →
HTTPException(status_code=502)“Authentication failed” - 403 →
HTTPException(status_code=502)“Access denied”
AI CLI Failures
analyzer.py:481-545 —call_ai_cli():
- Timeout → Returns
(False, "Analysis timed out after N minutes") - Non-zero exit → Returns
(False, stderr_or_stdout) - Success → Returns
(True, stdout)
- analyzer.py:179-223 — Multi-strategy JSON parsing
- analyzer.py:225-326 — Regex-based recovery from malformed JSON
Graceful Degradation
- No test report → Falls back to console output analysis
- Repository clone fails → Analysis continues without code context
- Jira search fails → Logged, no matches attached, analysis completes
- One failure group fails → Other groups still analyzed and returned
Logging Strategy
CLAUDE.md:76-80 — Logging principles:- INFO: Milestones (job started, AI calls, completed)
- DEBUG: Detailed operations (response lengths, extracted data)
- WARNING: Recoverable errors (repo clone failed, JSON parsing fallback)
- ERROR/EXCEPTION: Failures requiring attention
LOG_LEVEL environment variable.
Next Steps
Failure Deduplication
Deep dive into signature-based grouping
CLI-Based AI
Why subprocess over SDKs