Testing Guide

This guide covers testing practices for Finance Agent. While a comprehensive test suite is under development, you can test the application locally using the methods described below.

Current Testing Status

Finance Agent is actively developing a formal testing framework. Currently, testing is primarily done through:

Manual API testing via Swagger UI
Local development server testing
Production validation via FinanceBench (91% accuracy on 10-K questions)

Contributions to expand test coverage are welcome! See the Contributing Guide for how to get involved.

Local Testing

Running the Development Server

Start the server

python -m uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload

The --reload flag enables auto-reloading when code changes.

Access the application

Open your browser to:

Frontend: http://localhost:8000
API Documentation: http://localhost:8000/docs (Swagger UI)
Alternative API Docs: http://localhost:8000/redoc (ReDoc)

Testing API Endpoints

Using Swagger UI

The interactive API documentation at http://localhost:8000/docs allows you to test endpoints directly:

Navigate to Swagger UI

Open http://localhost:8000/docs in your browser.

Expand an endpoint

Click on any endpoint (e.g., POST /message/stream-v2) to see details.

Try it out

Click Try it out, fill in the request parameters, and click Execute.

View response

Swagger will show the response status, headers, and body.

Using cURL

Test endpoints from the command line:

curl -X GET "http://localhost:8000/companies/search?query=apple" \
  -H "accept: application/json"

Using Python Requests

Test programmatically with Python:

import requests
import json

BASE_URL = "http://localhost:8000"

# Test company search
response = requests.get(f"{BASE_URL}/companies/search", params={"query": "microsoft"})
print(response.json())

# Test RAG chat
response = requests.post(
    f"{BASE_URL}/message/stream-v2",
    json={
        "message": "What is Microsoft's cloud revenue?",
        "conversation_id": "test-conv-1"
    }
)
print(response.json())

Testing Key Features

Testing the RAG Agent

The core RAG agent can be tested in several ways:

1. Via API Endpoint

Use the streaming chat endpoint:

curl -X POST "http://localhost:8000/message/stream-v2" \
  -H "Content-Type: application/json" \
  -d '{
    "message": "What did Apple say about iPhone sales in Q4 2024?",
    "conversation_id": "test-123"
  }'

2. Direct Python Usage

Test the agent directly in Python:

import asyncio
from agent import create_agent

async def test_rag_agent():
    agent = create_agent()
    
    # Test earnings transcript question
    async for event in agent.execute_rag_flow(
        question="What was Apple's Q4 2024 revenue?",
        stream=True
    ):
        if event['type'] == 'reasoning':
            print(f"Planning: {event['message']}")
        elif event['type'] == 'result':
            print(f"Answer: {event['data']['answer']}")

asyncio.run(test_rag_agent())

Testing Data Sources

Earnings Transcripts

Test transcript search and retrieval:

from agent.rag.search_engine import search_similar_chunks
import asyncpg

async def test_transcript_search():
    conn = await asyncpg.connect(DATABASE_URL)
    
    results = await search_similar_chunks(
        query="revenue growth",
        top_k=10,
        quarter="2024_q4",
        ticker="AAPL",
        db=conn
    )
    
    for chunk in results:
        print(f"Score: {chunk['score']}, Text: {chunk['chunk_text'][:100]}")
    
    await conn.close()

SEC 10-K Filings

Test the SEC agent:

from agent.rag.sec_filings_service_smart_parallel import SECFilingsService

async def test_sec_agent():
    service = SECFilingsService()
    
    result = await service.search_10k(
        question="What was Tim Cook's compensation in 2023?",
        ticker="AAPL",
        year=2023
    )
    
    print(f"Answer: {result['answer']}")
    print(f"Citations: {result['citations']}")

News Search

Test Tavily news integration:

from agent.rag.tavily_service import TavilyService

def test_news_search():
    service = TavilyService()
    
    results = service.search_news(
        query="NVIDIA latest developments",
        max_results=5
    )
    
    print(f"Summary: {results['answer']}")
    for article in results['results']:
        print(f"- {article['title']}: {article['url']}")

Testing Configuration

Environment Variables

For testing, you may want to use different configuration:

# .env.test
ENVIRONMENT=development
RAG_DEBUG_MODE=true
AUTH_DISABLED=true
LOG_LEVEL=DEBUG

Load test environment:

cp .env.test .env
python -m uvicorn app.main:app --reload

Debug Mode

Enable debug mode for verbose logging:

# config.py or .env
RAG_DEBUG_MODE = True
LOG_LEVEL = "DEBUG"

This will output detailed information about:

Question analysis and routing decisions
Search queries and results
Agent reasoning and iterations
LLM prompts and responses

Performance Testing

Response Time Testing

Measure API response times:

import time
import requests

BASE_URL = "http://localhost:8000"

questions = [
    "What was Apple's Q4 2024 revenue?",
    "What is Microsoft's cloud strategy?",
    "Compare AAPL and MSFT revenue growth"
]

for question in questions:
    start = time.time()
    response = requests.post(
        f"{BASE_URL}/message/stream-v2",
        json={"message": question, "conversation_id": "perf-test"}
    )
    elapsed = time.time() - start
    print(f"Question: {question}")
    print(f"Time: {elapsed:.2f}s")
    print(f"Status: {response.status_code}\n")

Database Query Performance

Test search performance:

import asyncpg
import time

async def benchmark_search():
    conn = await asyncpg.connect(DATABASE_URL)
    
    queries = [
        "revenue growth",
        "operating expenses",
        "guidance outlook"
    ]
    
    for query in queries:
        start = time.time()
        results = await conn.fetch(
            """
            SELECT chunk_text, embedding <-> $1 AS distance
            FROM transcript_chunks
            WHERE ticker = $2
            ORDER BY distance
            LIMIT 10
            """,
            query_embedding, "AAPL"
        )
        elapsed = time.time() - start
        print(f"Query '{query}': {elapsed:.3f}s, {len(results)} results")
    
    await conn.close()

Validation Testing

FinanceBench Evaluation

Finance Agent is validated against the FinanceBench dataset:

Current benchmark: 91% accuracy on 112 10-K questions, averaging ~10 seconds per question

While the formal evaluation scripts are in the experiments/ directory (excluded from the repository), you can test individual FinanceBench questions:

from agent import create_agent

async def test_financebench_question():
    agent = create_agent()
    
    question = "What was Apple's total debt in fiscal 2023?"
    expected_answer = "$111.1 billion"  # From FinanceBench ground truth
    
    result = await agent.execute_rag_flow_async(question=question)
    
    print(f"Question: {question}")
    print(f"Expected: {expected_answer}")
    print(f"Got: {result['answer']}")
    print(f"Confidence: {result['metadata']['confidence']}")

Troubleshooting Tests

Common Issues

Database connection errors

Ensure PostgreSQL is running and the DATABASE_URL is correct:

# Test database connection
psql $DATABASE_URL -c "SELECT 1;"

# Check pgvector extension
psql $DATABASE_URL -c "SELECT * FROM pg_extension WHERE extname = 'vector';"

Missing API keys

Verify all required API keys are set in .env:

# Check environment variables
echo $OPENAI_API_KEY
echo $CEREBRAS_API_KEY
echo $API_NINJAS_KEY

Import errors

Ensure all dependencies are installed:

pip install -r requirements.txt

Slow query performance

Check database indexes:

-- Ensure vector index exists
CREATE INDEX IF NOT EXISTS idx_transcript_chunks_embedding 
  ON transcript_chunks USING ivfflat (embedding vector_cosine_ops)
  WITH (lists = 100);

-- Check query plan
EXPLAIN ANALYZE 
SELECT * FROM transcript_chunks 
WHERE ticker = 'AAPL' AND year = 2024 
ORDER BY embedding <-> '[...]' 
LIMIT 10;

Future Testing Plans

Upcoming test framework improvements:

Unit tests for core agent components
Integration tests for API endpoints
Automated regression testing
Performance benchmarking suite
Mock data fixtures for testing without external APIs

Interested in contributing to testing infrastructure? See the Contributing Guide to get started.

Setup

Architecture

Contributing

Current Testing Status

Local Testing

Running the Development Server

Testing API Endpoints

Using Swagger UI

Using cURL

Using Python Requests

Testing Key Features

Testing the RAG Agent

1. Via API Endpoint

2. Direct Python Usage

Testing Data Sources

Earnings Transcripts

SEC 10-K Filings

News Search

Testing Configuration

Environment Variables

Debug Mode

Performance Testing

Response Time Testing

Database Query Performance

Validation Testing

FinanceBench Evaluation

Troubleshooting Tests

Common Issues

Future Testing Plans

Additional Resources

Build docs developers (and LLMs) love

Setup

Architecture

Contributing

​Current Testing Status

​Local Testing

​Running the Development Server

​Testing API Endpoints

​Using Swagger UI

​Using cURL

​Using Python Requests

​Testing Key Features

​Testing the RAG Agent

​1. Via API Endpoint

​2. Direct Python Usage

​Testing Data Sources

​Earnings Transcripts

​SEC 10-K Filings

​News Search

​Testing Configuration

​Environment Variables

​Debug Mode

​Performance Testing

​Response Time Testing

​Database Query Performance

​Validation Testing

​FinanceBench Evaluation

​Troubleshooting Tests

​Common Issues

​Future Testing Plans

​Additional Resources

Build docs developers (and LLMs) love

Current Testing Status

Local Testing

Running the Development Server

Testing API Endpoints

Using Swagger UI

Using cURL

Using Python Requests

Testing Key Features

Testing the RAG Agent

1. Via API Endpoint

2. Direct Python Usage

Testing Data Sources

Earnings Transcripts

SEC 10-K Filings

News Search

Testing Configuration

Environment Variables

Debug Mode

Performance Testing

Response Time Testing

Database Query Performance

Validation Testing

FinanceBench Evaluation

Troubleshooting Tests

Common Issues

Future Testing Plans

Additional Resources