Overview
The GTM Research Engine collects evidence from multiple independent data sources in parallel, providing comprehensive intelligence about companies. Each source is optimized for specific types of information, creating a complete picture of a company’s technology stack and business activities.Supported Data Sources
Google Search
Site-specific searches, file type filtering, Boolean queries via Tavily API
News Search
Press releases, funding news, partnerships via NewsAPI
Jobs Search
Job postings with TF-IDF semantic matching via Greenhouse API
How It Works
The research pipeline executes searches across all sources simultaneously, using async worker pools to maximize throughput while respecting rate limits.Source Configuration
Each data source has its own semaphore pool for rate limiting:Query Execution
Queries are executed concurrently across all sources and domains:Source-Specific Features
Google Search (Tavily API)
Executes site-specific searches with configurable depth and result limits:site:{DOMAIN} [keywords]- Search within company domainsite:{DOMAIN}/blog [keywords]- Target specific subdomainssite:{DOMAIN} filetype:pdf [keywords]- Find technical documentation- Boolean operators for precision queries
News Search (NewsAPI)
Searches news articles and press releases with company context:- Funding announcements
- Partnership press releases
- Security incidents
- Product launches
Jobs Search (Greenhouse API)
Uses TF-IDF vectorization for semantic matching of job descriptions:Evidence Deduplication
Redis-based deduplication ensures unique evidence across all sources:Deduplication happens at the evidence level, not the source level, ensuring all unique information is captured even if multiple sources return similar URLs.
Making a Multi-Source Request
Response Format
Configuration Options
Environment Variables
Configure API keys for each source:Search Depth
Controls the number of results per source:quick- 2 results per source (fastest, lowest cost)standard- 3 results per source (balanced)comprehensive- 5 results per source (most thorough)
Max Parallel Searches
Controls concurrency per source pool. Higher values increase speed but may hit rate limits:Best Practices
Optimize Search Depth
Optimize Search Depth
Start with
standard depth and adjust based on result quality:- Use
quickfor large batch operations (100+ domains) - Use
comprehensivefor high-value targets requiring maximum evidence
Monitor API Rate Limits
Monitor API Rate Limits
Each source has different rate limits:
- Tavily: Based on your plan tier
- NewsAPI: 1,000 requests/day (free tier)
- Greenhouse: No authentication required, but respect fair use
Balance Parallel Searches
Balance Parallel Searches
The
max_parallel_searches setting affects all sources:Handle Source Failures Gracefully
Handle Source Failures Gracefully
The pipeline continues even if individual sources fail:
Performance Metrics
Multi-source research provides 3-5x faster results compared to sequential execution:Related Features
Parallel Processing
Learn how async worker pools maximize throughput
AI-Powered Analysis
Understand how evidence is analyzed with LLMs