Skip to main content
This guide covers testing practices for Finance Agent. While a comprehensive test suite is under development, you can test the application locally using the methods described below.

Current Testing Status

Finance Agent is actively developing a formal testing framework. Currently, testing is primarily done through:
  • Manual API testing via Swagger UI
  • Local development server testing
  • Production validation via FinanceBench (91% accuracy on 10-K questions)
Contributions to expand test coverage are welcome! See the Contributing Guide for how to get involved.

Local Testing

Running the Development Server

1

Start the server

python -m uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload
The --reload flag enables auto-reloading when code changes.
2

Access the application

Open your browser to:
  • Frontend: http://localhost:8000
  • API Documentation: http://localhost:8000/docs (Swagger UI)
  • Alternative API Docs: http://localhost:8000/redoc (ReDoc)

Testing API Endpoints

Using Swagger UI

The interactive API documentation at http://localhost:8000/docs allows you to test endpoints directly:
1

Navigate to Swagger UI

Open http://localhost:8000/docs in your browser.
2

Expand an endpoint

Click on any endpoint (e.g., POST /message/stream-v2) to see details.
3

Try it out

Click Try it out, fill in the request parameters, and click Execute.
4

View response

Swagger will show the response status, headers, and body.

Using cURL

Test endpoints from the command line:
curl -X GET "http://localhost:8000/companies/search?query=apple" \
  -H "accept: application/json"

Using Python Requests

Test programmatically with Python:
import requests
import json

BASE_URL = "http://localhost:8000"

# Test company search
response = requests.get(f"{BASE_URL}/companies/search", params={"query": "microsoft"})
print(response.json())

# Test RAG chat
response = requests.post(
    f"{BASE_URL}/message/stream-v2",
    json={
        "message": "What is Microsoft's cloud revenue?",
        "conversation_id": "test-conv-1"
    }
)
print(response.json())

Testing Key Features

Testing the RAG Agent

The core RAG agent can be tested in several ways:

1. Via API Endpoint

Use the streaming chat endpoint:
curl -X POST "http://localhost:8000/message/stream-v2" \
  -H "Content-Type: application/json" \
  -d '{
    "message": "What did Apple say about iPhone sales in Q4 2024?",
    "conversation_id": "test-123"
  }'

2. Direct Python Usage

Test the agent directly in Python:
import asyncio
from agent import create_agent

async def test_rag_agent():
    agent = create_agent()
    
    # Test earnings transcript question
    async for event in agent.execute_rag_flow(
        question="What was Apple's Q4 2024 revenue?",
        stream=True
    ):
        if event['type'] == 'reasoning':
            print(f"Planning: {event['message']}")
        elif event['type'] == 'result':
            print(f"Answer: {event['data']['answer']}")

asyncio.run(test_rag_agent())

Testing Data Sources

Earnings Transcripts

Test transcript search and retrieval:
from agent.rag.search_engine import search_similar_chunks
import asyncpg

async def test_transcript_search():
    conn = await asyncpg.connect(DATABASE_URL)
    
    results = await search_similar_chunks(
        query="revenue growth",
        top_k=10,
        quarter="2024_q4",
        ticker="AAPL",
        db=conn
    )
    
    for chunk in results:
        print(f"Score: {chunk['score']}, Text: {chunk['chunk_text'][:100]}")
    
    await conn.close()

SEC 10-K Filings

Test the SEC agent:
from agent.rag.sec_filings_service_smart_parallel import SECFilingsService

async def test_sec_agent():
    service = SECFilingsService()
    
    result = await service.search_10k(
        question="What was Tim Cook's compensation in 2023?",
        ticker="AAPL",
        year=2023
    )
    
    print(f"Answer: {result['answer']}")
    print(f"Citations: {result['citations']}")
Test Tavily news integration:
from agent.rag.tavily_service import TavilyService

def test_news_search():
    service = TavilyService()
    
    results = service.search_news(
        query="NVIDIA latest developments",
        max_results=5
    )
    
    print(f"Summary: {results['answer']}")
    for article in results['results']:
        print(f"- {article['title']}: {article['url']}")

Testing Configuration

Environment Variables

For testing, you may want to use different configuration:
# .env.test
ENVIRONMENT=development
RAG_DEBUG_MODE=true
AUTH_DISABLED=true
LOG_LEVEL=DEBUG
Load test environment:
cp .env.test .env
python -m uvicorn app.main:app --reload

Debug Mode

Enable debug mode for verbose logging:
# config.py or .env
RAG_DEBUG_MODE = True
LOG_LEVEL = "DEBUG"
This will output detailed information about:
  • Question analysis and routing decisions
  • Search queries and results
  • Agent reasoning and iterations
  • LLM prompts and responses

Performance Testing

Response Time Testing

Measure API response times:
import time
import requests

BASE_URL = "http://localhost:8000"

questions = [
    "What was Apple's Q4 2024 revenue?",
    "What is Microsoft's cloud strategy?",
    "Compare AAPL and MSFT revenue growth"
]

for question in questions:
    start = time.time()
    response = requests.post(
        f"{BASE_URL}/message/stream-v2",
        json={"message": question, "conversation_id": "perf-test"}
    )
    elapsed = time.time() - start
    print(f"Question: {question}")
    print(f"Time: {elapsed:.2f}s")
    print(f"Status: {response.status_code}\n")

Database Query Performance

Test search performance:
import asyncpg
import time

async def benchmark_search():
    conn = await asyncpg.connect(DATABASE_URL)
    
    queries = [
        "revenue growth",
        "operating expenses",
        "guidance outlook"
    ]
    
    for query in queries:
        start = time.time()
        results = await conn.fetch(
            """
            SELECT chunk_text, embedding <-> $1 AS distance
            FROM transcript_chunks
            WHERE ticker = $2
            ORDER BY distance
            LIMIT 10
            """,
            query_embedding, "AAPL"
        )
        elapsed = time.time() - start
        print(f"Query '{query}': {elapsed:.3f}s, {len(results)} results")
    
    await conn.close()

Validation Testing

FinanceBench Evaluation

Finance Agent is validated against the FinanceBench dataset:
Current benchmark: 91% accuracy on 112 10-K questions, averaging ~10 seconds per question
While the formal evaluation scripts are in the experiments/ directory (excluded from the repository), you can test individual FinanceBench questions:
from agent import create_agent

async def test_financebench_question():
    agent = create_agent()
    
    question = "What was Apple's total debt in fiscal 2023?"
    expected_answer = "$111.1 billion"  # From FinanceBench ground truth
    
    result = await agent.execute_rag_flow_async(question=question)
    
    print(f"Question: {question}")
    print(f"Expected: {expected_answer}")
    print(f"Got: {result['answer']}")
    print(f"Confidence: {result['metadata']['confidence']}")

Troubleshooting Tests

Common Issues

Ensure PostgreSQL is running and the DATABASE_URL is correct:
# Test database connection
psql $DATABASE_URL -c "SELECT 1;"

# Check pgvector extension
psql $DATABASE_URL -c "SELECT * FROM pg_extension WHERE extname = 'vector';"
Verify all required API keys are set in .env:
# Check environment variables
echo $OPENAI_API_KEY
echo $CEREBRAS_API_KEY
echo $API_NINJAS_KEY
Ensure all dependencies are installed:
pip install -r requirements.txt
Check database indexes:
-- Ensure vector index exists
CREATE INDEX IF NOT EXISTS idx_transcript_chunks_embedding 
  ON transcript_chunks USING ivfflat (embedding vector_cosine_ops)
  WITH (lists = 100);

-- Check query plan
EXPLAIN ANALYZE 
SELECT * FROM transcript_chunks 
WHERE ticker = 'AAPL' AND year = 2024 
ORDER BY embedding <-> '[...]' 
LIMIT 10;

Future Testing Plans

Upcoming test framework improvements:
  • Unit tests for core agent components
  • Integration tests for API endpoints
  • Automated regression testing
  • Performance benchmarking suite
  • Mock data fixtures for testing without external APIs
Interested in contributing to testing infrastructure? See the Contributing Guide to get started.

Additional Resources

Build docs developers (and LLMs) love