Skip to main content
The GTM Research Engine is a full-stack, AI-powered research platform built to aggregate and analyze company data from multiple sources in real-time. The system follows a modern microservices-inspired architecture with clear separation between the backend research engine and frontend user interface.

System Architecture

The platform consists of three primary layers:

Frontend Layer

React 18 with TypeScript and Material-UI v7

Backend Layer

FastAPI-powered research engine with async processing

Data Sources

Multi-source aggregation (Google, News, Jobs)

High-Level Architecture Diagram

Core Design Principles

1. Asynchronous Processing

The entire backend is built on Python’s asyncio framework, enabling:
  • Concurrent API requests to multiple data sources
  • Non-blocking I/O operations for database and cache access
  • Parallel research execution for multiple companies simultaneously
  • Real-time streaming of results via Server-Sent Events (SSE)
The system can execute up to 20 parallel searches concurrently, configurable per request via max_parallel_searches.

2. Separation of Concerns

The architecture follows clean separation patterns:
LayerResponsibilityTechnology
PresentationUser interface and interactionReact, Material-UI
APIRequest routing and validationFastAPI
Business LogicResearch pipeline orchestrationPython async/await
Data AccessMulti-source data retrievalHTTP clients, APIs
InfrastructureCaching, metrics, monitoringRedis, circuit breakers

3. AI-Powered Intelligence

The system leverages AI at multiple stages:
1

Query Generation

AI generates search strategies from research goals using Gemini 2.5 Flash
2

Data Collection

Execute intelligent queries across multiple data sources in parallel
3

Evidence Analysis

AI analyzes collected evidence to extract technologies and signals
4

Confidence Scoring

AI assigns confidence scores based on evidence quality and relevance

4. Resilience & Reliability

Built-in patterns for production reliability:
# Circuit breaker protection
breaker = CircuitBreaker(
    failure_threshold=5,
    reset_timeout_seconds=30.0
)

# Rate limiting per source
source_pools = {
    "google_search": asyncio.Semaphore(max_parallel_searches),
    "jobs_search": asyncio.Semaphore(max_parallel_searches),
    "news_search": asyncio.Semaphore(max_parallel_searches),
}
Circuit breakers automatically open after 5 consecutive failures, preventing cascading failures across data sources.

Technology Stack

Backend Stack

  • Framework: FastAPI 0.100+ (Python 3.11+)
  • Async Runtime: asyncio with uvicorn
  • Serialization: ORJSON for high-performance JSON
  • Caching: Redis for result caching and session management
  • AI: Gemini 2.5 Flash for query generation and analysis
  • Validation: Pydantic v2 for request/response models

Frontend Stack

  • Framework: React 18 with TypeScript
  • UI Library: Material-UI (MUI) v7
  • Build Tool: Vite 5.x
  • HTTP Client: Fetch API with streaming support
  • State Management: React hooks (useState, custom hooks)

Infrastructure

  • API Protocol: REST with SSE for streaming
  • CORS: Configured for cross-origin requests
  • Monitoring: Performance metrics and error tracking
  • Deployment: Docker-ready with health checks

Request Flow

Here’s how a typical research request flows through the system:
For real-time updates, use the /research/batch/stream endpoint which sends incremental results via Server-Sent Events.

Performance Characteristics

Typical Research Performance

  • Query Generation: 0.5-1.5 seconds
  • Evidence Collection: 5-15 seconds (depends on parallelism)
  • AI Analysis: 2-5 seconds per company
  • Total Processing Time: 10-25 seconds for 3-5 companies

Scalability Considerations

Horizontal Scaling

FastAPI workers can scale independently behind a load balancer

Rate Limiting

Per-source semaphores prevent API quota exhaustion

Caching

Redis caching reduces redundant API calls

Circuit Breakers

Automatic failover when sources become unavailable

Directory Structure

Backend Structure

backend/app/
├── api/                    # API routes and endpoints
   └── routes.py          # Research endpoints
├── core/                   # Core utilities
   ├── circuit_breaker.py # Circuit breaker pattern
   ├── config.py          # Application settings
   └── metrics.py         # Performance metrics
├── db/                     # Database layer
   └── redis_cache.py     # Redis caching
├── models/                 # Pydantic models
   ├── request.py         # Request schemas
   ├── response.py        # Response schemas
   └── search.py          # Search models
├── services/               # Business logic
   ├── pipeline.py        # Research orchestration
   ├── extractor.py       # AI analysis
   └── enhanced_streaming_aggregator.py
├── sources/                # Data source integrations
└── server.py              # FastAPI application

Frontend Structure

frontend/src/
├── components/             # React components
   ├── Header/            # Application header
   ├── SearchInterface/   # Search form and controls
   └── ResearchResults/   # Results display
├── hooks/                  # Custom React hooks
   ├── useSearch.ts       # Search logic
   └── useSettings.ts     # Settings management
├── types/                  # TypeScript definitions
   ├── search.ts          # Search types
   └── settings.ts        # Settings types
├── App.tsx                 # Main application
├── index.tsx              # Application entry
└── theme.ts               # MUI theme configuration

Configuration

Key configuration settings defined in backend/app/core/config.py:
@dataclass(frozen=True)
class Settings:
    max_parallel_searches: int = 20
    circuit_breaker_failures: int = 5
    circuit_breaker_reset_seconds: float = 30.0
    
    # API Rate Limits
    tavily_rpm: int = 500
    gemini_rpm: int = 2000
    newsapi_rpm: int = 300

Next Steps

Data Flow

Learn how data flows through the system

Backend Architecture

Deep dive into backend components

Frontend Architecture

Explore frontend structure and patterns

API Reference

View complete API documentation

Build docs developers (and LLMs) love