Understanding Results
The GTM Research Engine returns rich, structured results combining evidence from multiple sources with AI-powered analysis. This guide explains how to interpret confidence scores, evidence quality, and performance metrics.Result Structure
Every research response follows a consistent structure:Research Metadata
Research Metadata
Top-level fields provide execution context:
research_id: Unique identifier for this research runtotal_companies: Number of domains analyzedsearch_strategies_generated: Strategies created by LLMtotal_searches_executed: Total searches performed (companies Γ strategies)processing_time_ms: End-to-end execution timesearch_performance: Performance metrics
Company Results
Each company in theresults array contains comprehensive findings:
Core Fields
- domain
- confidence_score
- evidence_sources
- findings
Company identifierThe domain exactly as provided in the request.Used to correlate results with input data.
Confidence Scoring
Confidence scores are generated by LLM analysis of all collected evidence:Scoring Factors
Evidence Quantity
More evidence items generally correlate with higher confidence:
- 1-3 items: Limited basis for assessment
- 4-8 items: Good evidence foundation
- 9+ items: Strong evidence base
Source Diversity
Evidence from multiple sources increases confidence:
- Single source: Possible bias or limited scope
- 2-3 sources: Good cross-validation
- 4+ sources: Excellent corroboration
Technology Matches
Direct mentions of research goal technologies:
- Exact technology names (βKubernetesβ)
- Related technologies (βcontainer orchestrationβ)
- Implementation details (βproduction k8s clusterβ)
Confidence Labels
The frontend translates scores to human-readable labels:Confidence Thresholds
Theconfidence_threshold parameter filters high-confidence results:
Technologies Array
Extracted technologies are automatically identified by LLM:- Primary technologies: Directly related to research goal
- Supporting technologies: Commonly used with primary tech
- Infrastructure: Platforms, cloud providers, tools
- Languages: Programming languages mentioned
Technologies are deduplicated and normalized (e.g., βK8sβ β βKubernetesβ).
Technology Relevance
Not all technologies carry equal weight:High Relevance
High Relevance
Technologies directly matching research goalResearch goal: βCompanies using KubernetesβHigh relevance:
- Kubernetes
- K8s
- Container orchestration
- kubectl
Medium Relevance
Medium Relevance
Supporting or related technologiesResearch goal: βCompanies using KubernetesβMedium relevance:
- Docker
- Helm
- Istio
- Prometheus
Low Relevance
Low Relevance
Tangentially related or common technologiesResearch goal: βCompanies using KubernetesβLow relevance:
- Python
- React
- PostgreSQL
- AWS
Evidence Structure
Evidence items are the foundation of all findings:Evidence Fields
- url
- title
- snippet
- source_name
Source URLDirect link to the evidence source.Examples:
- Blog posts:
https://company.com/blog/kubernetes-migration - Job postings:
https://company.com/careers/devops-engineer - News:
https://techcrunch.com/2024/company-kubernetes - Documentation:
https://company.com/docs/infrastructure.pdf
Evidence Quality
Not all evidence is equally valuable:Evidence Grouping
The frontend groups evidence by source for easier review:- google_search (12 items)
- jobs_search (5 items)
- news_search (3 items)
Grouping helps identify which sources provided strongest evidence and detect potential gaps.
Signals Found
Signal count indicates goal-match strength:Signal Types
Signals are discrete indicators of goal match:- Direct Statements
- Job Requirements
- Technical Documentation
- News & Announcements
- External Mentions
- Employee Profiles
Company explicitly states technology useExamples:
- βWe use Kubernetes for container orchestrationβ
- βOur infrastructure runs on GKEβ
- βMigrated to Kubernetes in 2024β
Signal Interpretation
| Signal Count | Interpretation |
|---|---|
| 0-2 | Minimal evidence, likely weak or no match |
| 3-5 | Some evidence, possible match but limited |
| 6-9 | Good evidence across multiple types |
| 10-15 | Strong evidence with diverse signal types |
| 16+ | Very strong evidence, high confidence match |
Performance Metrics
Every response includes performance data:Metrics Breakdown
processing_time_ms
processing_time_ms
Total execution time in millisecondsIncludes:
- Strategy generation (~2-5s)
- Evidence collection (varies by depth)
- LLM analysis (~1-3s per company)
- Quick depth: 15,000-30,000ms (15-30s)
- Standard depth: 30,000-60,000ms (30-60s)
- Comprehensive: 60,000-120,000ms (1-2min)
queries_per_second
queries_per_second
Search execution throughputCalculated from:Typical values:
- 8-12 QPS: Good performance
- 13-20 QPS: Excellent performance
- Less than 8 QPS: Possible bottlenecks
max_parallel_searchessetting- Source API response times
- Network latency
- Rate limiting
failed_requests
failed_requests
Count of failed searchesIncremented when:
- Source API errors
- Network timeouts
- Circuit breaker trips
- Invalid responses
- 0-2%: Normal (transient errors)
- 3-5%: Monitor (possible issues)
-
5%: Investigate (systematic problems)
Empty Results
Companies without evidence receive empty results:Empty results are still valuable - they indicate companies that likely donβt match your criteria.
Frontend Result Display
The web interface presents results with visual hierarchy:Visual Indicators
- Confidence Colors
- Technology Chips
- Evidence Tabs
- Expandable Cards
- π’ Green: High confidence (β₯0.8)
- π Orange: Moderate (0.6-0.8)
- π΄ Red: Low (less than 0.6)
Streaming Results
Streaming mode provides incremental results:Event Types
Best Practices
Set Appropriate Confidence Thresholds
Set Appropriate Confidence Thresholds
Use case-specific thresholds:
- Automated filtering: 0.8-0.9 (high precision)
- Human review: 0.6-0.7 (balanced recall/precision)
- Exploratory research: 0.5-0.6 (high recall)
- All results: 0.0 (no filtering)
Validate Evidence Quality
Validate Evidence Quality
Click through to source URLs:
- Verify evidence is current
- Check context around snippets
- Assess source reliability
- Look for implementation details
Consider Source Diversity
Consider Source Diversity
Higher evidence_sources = stronger findings:
- Single source: Verify independently
- 2-3 sources: Good corroboration
- 4+ sources: Strong validation
Interpret Technologies in Context
Interpret Technologies in Context
Not all technologies equally relevant:
- Focus on research goal matches
- Consider supporting technologies
- Ignore common/generic tech
Monitor Performance Metrics
Monitor Performance Metrics
Track metrics over time:
- Processing time trends
- Failure rate patterns
- QPS variations
Troubleshooting
Low Confidence Scores Across All Results
Low Confidence Scores Across All Results
Possible causes:
- Research goal too specific
- Technologies not widely publicized
- Wrong company domains
- Broaden research criteria
- Try comprehensive search depth
- Verify domain accuracy
- Review evidence manually
High Confidence But Irrelevant Technologies
High Confidence But Irrelevant Technologies
Possible causes:
- Research goal ambiguity
- LLM misinterpreting goal
- Related but different technologies
- Make research goal more specific
- Review actual evidence snippets
- Adjust filtering logic
Many Empty Results
Many Empty Results
Possible causes:
- Invalid domains
- Limited public information
- Search strategies not finding evidence
- Validate domain list
- Increase search depth
- Try different research goal phrasing
- Check sample companies manually
Inconsistent Evidence Quality
Inconsistent Evidence Quality
Possible causes:
- Mixed source reliability
- Broad search strategies
- Varied company documentation
- Filter by source type
- Increase confidence threshold
- Review high-confidence results only
- Validate manually
Next Steps
Running Research
Learn execution modes and parameters
Search Strategies
Understand strategy generation and optimization