Running Research
The GTM Research Engine provides multiple ways to execute research queries. This guide covers both batch and streaming execution modes, along with configuration options and best practices.Quick Start
Configure Your Research Goal
Define a clear, specific research objective that describes what you’re looking for:
Select Search Depth
Choose the appropriate search depth based on your needs:
- quick: 4-6 strategies, fastest execution
- standard: 7-10 strategies, balanced coverage
- comprehensive: 11-13 strategies, maximum evidence
Execution Modes
The research engine supports two execution modes: batch and streaming.Batch Mode
Batch mode executes all research and returns complete results in a single response. Best for small datasets or when you need all results at once.Batch mode is ideal for:
- Small to medium datasets (< 50 companies)
- Automated workflows requiring complete results
- Scenarios where latency is less critical
Streaming Mode
Streaming mode provides real-time progress updates via Server-Sent Events (SSE). Best for large datasets or when you need live progress feedback.Request Parameters
All research requests require the following parameters:research_goal (required)
research_goal (required)
Type:
stringThe high-level research objective describing what you’re looking for.Examples:- “Find fintech companies using AI for fraud detection”
- “SaaS companies with open engineering positions”
- “Healthcare companies implementing blockchain technology”
- Be specific about the technology or criteria
- Include industry context when relevant
- Use natural language descriptions
company_domains (required)
company_domains (required)
Type: Limits:
array<string>List of company domains to analyze. Domains should be in the format example.com (no protocol).Example:- Recommended: 1-100 domains per request
- For larger datasets, use streaming mode
search_depth (required)
search_depth (required)
Type:
"quick" | "standard" | "comprehensive"Controls the number of search strategies generated and executed:-
quick: 4-6 strategies per domain
- Fastest execution
- Focuses on high-yield sources
- Best for preliminary research
-
standard: 7-10 strategies per domain
- Balanced speed and coverage
- Diverse search types
- Recommended for most use cases
-
comprehensive: 11-13 strategies per domain
- Maximum evidence gathering
- Exhaustive coverage
- Best for critical decisions
- Quick: ~15-30 seconds
- Standard: ~30-60 seconds
- Comprehensive: ~60-120 seconds
max_parallel_searches (required)
max_parallel_searches (required)
Type:
integerMaximum number of concurrent search requests per source.Recommended Values:- Quick searches:
5-10 - Standard searches:
10-15 - Comprehensive searches:
15-20
- Higher values = faster execution but more API load
- Rate limits apply per data source
- Circuit breakers prevent overload
confidence_threshold (required)
confidence_threshold (required)
Type:
float (0.0 - 1.0)Minimum confidence score for results. Only companies meeting or exceeding this threshold are included in high-confidence results.Recommended Thresholds:0.9-1.0: Very high confidence only (strict filtering)0.7-0.9: High confidence (balanced approach)0.5-0.7: Medium confidence (broader results)0.0-0.5: Include all findings
All results are returned regardless of confidence, but results are also filtered into a
high_confidence_results array based on this threshold.Understanding the Pipeline
The research pipeline executes in two main phases:- Phase 1: Evidence Collection
- Phase 2: LLM Analysis
- Result Building
The engine collects evidence from multiple sources in parallel:Key Features:
- Parallel execution across all sources
- Per-source rate limiting via semaphores
- Circuit breakers prevent cascade failures
- Automatic retry on transient errors
- Updates at 25%, 50%, 75%, 100% completion
- Reports domains with evidence found
- Tracks total evidence collected
Frontend Usage
The web interface provides a visual way to execute research:Configuration Steps
Configure Settings
Click the settings icon to configure:
- Company domains (one per line)
- Search depth (quick/standard/comprehensive)
- Max parallel searches (5-20)
- Confidence threshold (0.0-1.0)
Performance Optimization
Rate Limiting
The engine uses semaphores for per-source rate limiting:Circuit Breakers
Circuit breakers prevent cascade failures:- Opens after N consecutive failures
- Blocks requests while open
- Automatically resets after timeout
- Prevents overloading failing services
Metrics Tracking
The engine tracks performance metrics:- Total queries executed
- Failed requests count
- Queries per second
- Processing time
- Circuit breaker states
Best Practices
Choose the Right Search Depth
Choose the Right Search Depth
- Quick: Use for preliminary research or testing
- Standard: Default for most production use cases
- Comprehensive: Critical decisions requiring maximum evidence
Optimize Parallel Searches
Optimize Parallel Searches
Start with
max_parallel_searches: 10 and adjust based on:- Response times
- Error rates
- Data source rate limits
Set Appropriate Confidence Thresholds
Set Appropriate Confidence Thresholds
- Start with
0.7for balanced results - Increase to
0.8-0.9for high-precision use cases - Decrease to
0.5-0.6for exploratory research
high_confidence_results array for filtered results.Use Streaming for Large Datasets
Use Streaming for Large Datasets
For 20+ companies:
- Use streaming mode for progress visibility
- Implement proper SSE handling
- Process results incrementally
- Monitor heartbeats to detect disconnections
Handle Errors Gracefully
Handle Errors Gracefully
Common error scenarios:
- Circuit breaker open (wait for reset)
- Rate limit exceeded (reduce
max_parallel_searches) - Invalid domains (validate before submission)
- Network timeouts (implement retry logic)
Monitoring and Debugging
Response Metrics
Every response includes performance metrics:Logging
The pipeline logs key events:Monitor logs for:
- Query generation time (should be < 5s)
- Pipeline execution time
- Circuit breaker events
- Rate limit warnings
Troubleshooting
No Results Returned
No Results Returned
Possible Causes:
- Invalid company domains
- Overly restrictive research goal
- All sources failed (check circuit breakers)
- Verify domain format (no
https://) - Broaden research criteria
- Check source availability
- Review error logs
Slow Execution Times
Slow Execution Times
Possible Causes:
- Too many domains
- Low
max_parallel_searchesvalue - Network latency
- Source API slowness
- Batch large domain lists
- Increase parallel search limit
- Use
quicksearch depth - Check source performance metrics
High Error Rates
High Error Rates
Possible Causes:
- Excessive concurrency
- Source API issues
- Invalid search strategies
- Reduce
max_parallel_searches - Check circuit breaker status
- Review generated strategies
- Verify API credentials
Next Steps
Search Strategies
Learn how strategies are generated and optimized
Understanding Results
Interpret confidence scores and evidence data