Architecture Overview
The agent system orchestrates access to three specialized data source tools:- Earnings Transcript Search - Hybrid vector + keyword search over quarterly earnings calls
- SEC 10-K Filings Agent - Specialized retrieval agent for annual SEC filings
- Tavily News Search - Real-time web search for breaking news
Key Architectural Concepts
1. Semantic Routing
1. Semantic Routing
Routes to data sources based on question intent, not keywords. The LLM analyzes what type of information would best answer the question.
2. Research Planning
2. Research Planning
The agent explains its reasoning before searching (“I need to find…”), making the research approach transparent and structured.
3. Multi-Source RAG
3. Multi-Source RAG
Combines multiple data sources (earnings transcripts, SEC filings, news) based on the question’s requirements.
4. Self-Reflection
4. Self-Reflection
Evaluates answer quality and iterates until confidence thresholds are met, ensuring comprehensive responses.
5. Answer Modes
5. Answer Modes
Configurable iteration depth (2-10 iterations) and quality thresholds (70-95%) based on question complexity.
6. Search-Optimized Follow-ups
6. Search-Optimized Follow-ups
Generates keyword phrases optimized for semantic search, not verbose questions, for better RAG retrieval.
The Six-Stage Pipeline
Every question flows through a carefully orchestrated six-stage pipeline:Stage 1: Setup & Initialization
Initializes RAG components and loads configuration:- Initialize search engine and response generator
- Load available quarters from database
- Set up streaming event handlers
- Configure iteration limits based on question complexity
Stage 2: Combined Reasoning + Analysis
A single LLM call (viaReasoningPlanner) performs comprehensive question analysis:
- Extract entities: Company tickers (
$AAPL,$MSFT) - Detect time references: “Q4 2024”, “last 3 quarters”, “latest”
- Semantic routing: Choose data source based on intent
- Detect answer mode: direct, standard, or detailed
- Explain research approach: 2-3 sentence reasoning statement
- Validate question: Reject off-topic or invalid questions
- Preserve temporal phrases: Exact time references (no resolution yet)
Why combine reasoning + analysis? This single LLM call is faster than two separate calls and produces more coherent results because the reasoning drives the analysis.
Stage 2.1: Search Planning
Resolves temporal references to specific quarters and builds declarative searches:- Resolve time references: “latest” →
get_last_n_quarters_for_company(ticker, 1) - Company-specific quarters: Each ticker gets its own most recent quarters
- Build search queries: Optimized for each data source (transcripts, 10-K, news)
- Return reasoning string: Streamed to frontend for transparency
Quarter resolution uses company-specific database queries:This ensures each company gets its own most recent quarters, not a global “latest”.
Stage 2.5 & 2.6: News and 10-K Search
Parallel execution of specialized data source searches: News Search (ifneeds_latest_news=true):
- Query Tavily API for real-time news
- Format with
[N1],[N2]citation markers - Include publication dates and URLs
data_source="10k"):
- Invoke specialized retrieval agent for annual filings
- Planning-driven sub-question generation
- LLM-based section routing (Item 1, Item 7, Item 8, etc.)
- Hybrid search with cross-encoder reranking
- Iterative retrieval (up to 5 iterations)
- Format with
[10K1],[10K2]citation markers
Stage 3: Transcript Search
Hybrid vector + keyword search over earnings transcripts:- Single-ticker: Direct search with quarter filtering
- Multi-ticker: Parallel search per company
- Hybrid scoring: 70% vector similarity + 30% keyword matching
- Deduplication: Remove duplicate chunks across searches
Stage 4: Initial Answer Generation
Generates the first answer using all retrieved context:- Single ticker:
generate_openai_response()with company-specific context - Multiple tickers:
generate_multi_ticker_response()with cross-company synthesis - Maintains period metadata: Preserves quarter information (“Q1 2025”, “FY 2024”)
- Includes all figures: Every financial metric from all sources
Stage 5: Iterative Improvement
The agent evaluates and improves the answer through iteration:Evaluate Quality
Score the answer on completeness, specificity, accuracy, and clarity (0-100 scale).
Generate Follow-up Keywords
Create search-optimized keyword phrases (not verbose questions) for missing information.
- Confidence ≥ threshold (varies by answer mode: 70-95%)
- Max iterations reached (2-10 depending on mode)
- Agent decides answer is sufficient
- No follow-up keyword phrases generated
| Mode | Iterations | Confidence | When Used |
|---|---|---|---|
direct | 2 | 70% | Quick factual lookups |
standard | 3 | 80% | Default balanced analysis |
detailed | 4 | 90% | Comprehensive research |
deep_search | 10 | 95% | Exhaustive search (reserved) |
Stage 6: Final Response Assembly
Assembles and streams the final response:- Stream final answer with citations
- Include all source attributions (transcripts, 10-K, news)
- Return metadata (confidence, chunks used, timing)
- Update conversation memory for follow-up questions
Key Components
Core Files
| File | Description |
|---|---|
__init__.py | Public API — exports Agent, RAGAgent, create_agent() |
agent_config.py | Agent configuration and iteration settings |
prompts.py | Centralized LLM prompt templates |
rag/rag_agent.py | Orchestration engine with pipeline stages |
rag/question_analyzer.py | LLM-based semantic routing |
rag/reasoning_planner.py | Combined reasoning + analysis |
Data Source Tools
| File | Tool | Description |
|---|---|---|
rag/search_engine.py | Transcript Search | Hybrid vector + keyword search |
rag/sec_filings_service_smart_parallel.py | 10-K Agent | Planning-driven parallel retrieval |
rag/tavily_service.py | News Search | Real-time news via Tavily API |
Supporting Components
| File | Description |
|---|---|
rag/response_generator.py | LLM response generation and evaluation |
rag/database_manager.py | PostgreSQL/pgvector operations |
rag/conversation_memory.py | Multi-turn conversation state |
rag/config.py | RAG configuration |
Streaming Events
The agent streams real-time progress to the frontend:| Event Type | Description |
|---|---|
progress | Generic progress updates |
analysis | Question analysis complete |
reasoning | Agent’s research planning statement |
news_search | News search results |
10k_search | 10-K SEC search results |
iteration_start | Beginning of iteration N |
agent_decision | Agent’s quality assessment |
iteration_followup | Follow-up questions being searched |
iteration_search | New chunks found |
iteration_complete | Iteration finished |
result | Final answer with citations |
rejected | Question rejected (out of scope) |
error | Error occurred |
Next Steps
Semantic Routing
Learn how the agent chooses the right data sources
RAG Pipeline
Deep dive into the retrieval and generation process
Data Sources
Explore the three specialized data source tools