def get_failure_signature(failure: TestFailure) -> str: """Create a signature for grouping identical failures. Uses error message and first few lines of stack trace to identify failures that are essentially the same issue. """ # Use error message and first 5 lines of stack trace stack_lines = failure.stack_trace.split("\n")[:5] signature_text = f"{failure.error_message}|{'|'.join(stack_lines)}" return hashlib.sha256(signature_text.encode()).hexdigest()
Why first 5 stack trace lines?
Captures the unique call path to the error
Ignores test-specific frames higher in the stack
Balances precision vs. over-grouping
Why SHA-256 hash?
Consistent signature regardless of message length
Dictionary key for grouping
No collision risk for practical purposes
The signature uses error message + stack trace, not test name. Tests with different names but identical errors are grouped together.
# Analyze each unique failure group in parallelfailure_tasks = [ analyze_failure_group( failures=group, console_context=console_context, repo_path=repo_path, ai_provider=ai_provider, ai_model=ai_model, ai_cli_timeout=settings.ai_cli_timeout, ) for group in failure_groups.values()]group_results = await run_parallel_with_limit(failure_tasks)# Flatten results and handle exceptionsfailures = []group_list = list(failure_groups.values())for i, result in enumerate(group_results): if isinstance(result, Exception): # Create error entries for all failures in this group for tf in group_list[i]: failures.append( FailureAnalysis( test_name=tf.test_name, error=tf.error_message, analysis=AnalysisDetail( details=f"Analysis failed: {result}" ), ) ) else: failures.extend(result)
Groups are analyzed in parallel (bounded by MAX_CONCURRENT_AI_CALLS = 10). If one group analysis fails, others continue.
async def analyze_failure_group( failures: list[TestFailure], console_context: str, repo_path: Path | None, ai_provider: str = "", ai_model: str = "", ai_cli_timeout: int | None = None,) -> list[FailureAnalysis]: """Analyze a group of failures with the same error signature. Only calls Claude CLI once for the group, then applies the analysis to all failures in the group. """ # Use the first failure as representative representative = failures[0] test_names = [f.test_name for f in failures] prompt = f"""Analyze this test failure from a Jenkins CI job.AFFECTED TESTS ({len(failures)} tests with same error):{chr(10).join(f"- {name}" for name in test_names)}ERROR: {representative.error_message}STACK TRACE:{representative.stack_trace}CONSOLE CONTEXT:{console_context}You have access to the test repository. Explore the code to understand the failure.Note: Multiple tests failed with the same error. Provide ONE analysis that applies to all of them.{_JSON_RESPONSE_SCHEMA}"""
Key design decisions:
Single representative — First failure provides error + stack trace
All test names listed — AI sees full scope of impact
One AI call — Prompt explicitly requests unified analysis
Same analysis replicated — Applied to all failures in group (analyzer.py:846-853)
# Apply the same analysis to all failures in the groupreturn [ FailureAnalysis( test_name=f.test_name, error=f.error_message, analysis=parsed, ) for f in failures]
The same deduplication logic applies to child job analysis:analyzer.py:1000-1023 (in analyze_child_job()):
# Group failures by signature to avoid analyzing identical errors multiple timesfailure_groups: dict[str, list[TestFailure]] = defaultdict(list)for tf in test_failures: sig = get_failure_signature(tf) failure_groups[sig].append(tf)logger.info( f"Grouped {len(test_failures)} failures into {len(failure_groups)} unique error types")# Analyze each unique failure group in paralleltasks = [ analyze_failure_group( failures=group, console_context=console_context, repo_path=repo_path, ai_provider=ai_provider, ai_model=ai_model, ai_cli_timeout=ai_cli_timeout, ) for group in failure_groups.values()]group_results = await run_parallel_with_limit(tasks)
Deduplication happens at every level of the job hierarchy:
Jira searches are also deduplicated by keyword set:jira.py:368-375 — enrich_with_jira_matches():
# Deduplicate by keyword set — same keywords = one Jira searchkeyword_to_reports: dict[tuple[str, ...], list[ProductBugReport]] = {}for report in reports: if not report.jira_search_keywords: continue key = tuple(sorted(report.jira_search_keywords)) keyword_to_reports.setdefault(key, []).append(report)
If 10 PRODUCT BUG failures share the same jira_search_keywords, one Jira search is performed and results are shared.
INFO: Grouped 45 failures into 8 unique error typesINFO: Calling CLAUDE CLI for failure group (12 tests with same error)INFO: Calling CLAUDE CLI for failure group (8 tests with same error)INFO: Calling CLAUDE CLI for failure group (5 tests with same error)...