Overview
Finance Agent uses PostgreSQL with the pgvector extension for:
- Vector embeddings storage and similarity search
- Hybrid search (vector + keyword/TF-IDF)
- User authentication and session management
- Chat history and conversation threads
- Query analytics and logging
The pgvector extension is required for Finance Agent to function. Standard PostgreSQL without pgvector will not work.
PostgreSQL Installation
macOS
# Using Homebrew
brew install postgresql@14
# Start PostgreSQL
brew services start postgresql@14
Ubuntu/Debian
# Install PostgreSQL
sudo apt update
sudo apt install postgresql postgresql-contrib
# Start PostgreSQL
sudo systemctl start postgresql
sudo systemctl enable postgresql
Windows
Download and install from postgresql.org/download/windows
pgvector Extension Setup
Install pgvector
macOS
# Using Homebrew
brew install pgvector
Ubuntu/Debian
# Install build dependencies
sudo apt install postgresql-server-dev-14 build-essential git
# Clone and build pgvector
git clone https://github.com/pgvector/pgvector.git
cd pgvector
make
sudo make install
Docker (Recommended for Development)
# Use official pgvector image
docker run -d \
--name finance-agent-db \
-e POSTGRES_PASSWORD=changeme \
-e POSTGRES_DB=stratalens \
-p 5432:5432 \
pgvector/pgvector:pg14
Create the database
# Connect to PostgreSQL
psql -U postgres
# Create database
CREATE DATABASE stratalens;
# Connect to the database
\c stratalens
# Enable pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;
# Verify installation
SELECT * FROM pg_extension WHERE extname = 'vector';
Expected output: oid | extname | extowner | extnamespace | ...
------+---------+----------+--------------+-----
... | vector | 10 | 2200 | ...
Create a database user (optional)
For production or multi-user setups:-- Create user
CREATE USER finance_agent WITH PASSWORD 'secure_password_here';
-- Grant privileges
GRANT ALL PRIVILEGES ON DATABASE stratalens TO finance_agent;
-- Connect to database and grant schema privileges
\c stratalens
GRANT ALL ON SCHEMA public TO finance_agent;
Update environment variables
Add the database URL to your .env file:# For default postgres user
DATABASE_URL=postgresql://postgres:changeme@localhost:5432/stratalens
# For custom user
DATABASE_URL=postgresql://finance_agent:secure_password_here@localhost:5432/stratalens
Schema Initialization
Finance Agent automatically creates all required tables on first startup. The schema includes:
Core Tables
Users and Authentication
- users - User accounts with Clerk integration
- invitations - User invitation codes and metadata
- user_preferences - User settings and preferences
RAG System
- transcript_chunks - Earnings transcript embeddings and metadata
- Contains: chunk_text, embedding (vector), ticker, year, quarter, chunk_index
- Primary data source for earnings call queries
- ten_k_chunks - SEC 10-K filing embeddings and metadata
- Contains: chunk_text, embedding (vector), ticker, fiscal_year, sec_section, chunk_index
- Used for annual report and financial statement queries
- ten_k_tables - Extracted tables from 10-K filings
- Contains: table_data, ticker, fiscal_year, is_financial_statement, statement_type
Chat and History
- chat_conversations - Conversation threads (like ChatGPT)
- chat_messages - Individual messages within conversations
- chat_history - Legacy chat history (backward compatibility)
- chat_analytics - Query analytics and performance metrics
Analytics and Logging
- query_history - User query history and execution details
- search_queries - Comprehensive search query logs
- user_usage - Usage tracking for rate limiting and billing
- system_logs - Application-level logging
All tables are created automatically when you first run the application. You don’t need to run SQL scripts manually.
Verify Database Setup
Check database connection
# Test PostgreSQL is running
pg_isready
# Expected output:
# /var/run/postgresql:5432 - accepting connections
Verify pgvector extension
psql -U postgres -d stratalens -c "SELECT extversion FROM pg_extension WHERE extname = 'vector';"
# Expected output:
# extversion
# ------------
# 0.5.0
Start the application
python -m uvicorn app.main:app --host 0.0.0.0 --port 8000
Check the logs for:✅ Database pool created: min_size=10, max_size=50
✅ Usage tracking table created
✅ Chat conversations table created
✅ System logs table created
Verify tables were created
psql -U postgres -d stratalens -c "\dt"
You should see tables like:
- users
- chat_conversations
- chat_messages
- transcript_chunks
- ten_k_chunks
- query_history
- And more…
Data Ingestion
After setting up the database, you need to ingest financial data for the RAG system to work.
Ingest Earnings Transcripts
# Download transcripts
python agent/rag/data_ingestion/download_transcripts.py
# Ingest transcripts for a specific company and date range
python agent/rag/data_ingestion/ingest_with_structure.py \
--ticker AAPL \
--year-start 2020 \
--year-end 2025
This creates embeddings and stores them in the transcript_chunks table with pgvector indexes for fast similarity search.
Ingest SEC 10-K Filings
# Ingest 10-K filings for a specific company
python agent/rag/data_ingestion/ingest_10k_to_database.py \
--ticker AMZN \
--year-start 2019 \
--year-end 2024
The ingestion process:
- Downloads filings via datamule
- Extracts text with structural metadata (sections, tables)
- Generates embeddings using sentence-transformers
- Stores chunks in
ten_k_chunks table
- Creates vector indexes for similarity search
Data ingestion can be resource-intensive. For large date ranges, consider running ingestion overnight or in batches.
Database Indexes
Finance Agent automatically creates indexes for optimal performance:
Vector Indexes (pgvector)
-- Automatically created during ingestion
CREATE INDEX ON transcript_chunks USING ivfflat (embedding vector_cosine_ops);
CREATE INDEX ON ten_k_chunks USING ivfflat (embedding vector_cosine_ops);
B-tree Indexes
-- User and authentication indexes
CREATE INDEX idx_users_clerk_id ON users(clerk_user_id);
CREATE INDEX idx_users_username ON users(username);
CREATE INDEX idx_users_email ON users(email);
-- Chat and analytics indexes
CREATE INDEX idx_chat_messages_conversation_id ON chat_messages(conversation_id);
CREATE INDEX idx_chat_analytics_created_at ON chat_analytics(created_at);
CREATE INDEX idx_query_history_user_id ON query_history(user_id);
All indexes are created automatically by the application. You don’t need to create them manually.
Connection Pool Configuration
Finance Agent uses connection pooling for optimal performance:
Development Settings
# From config.py
DEVELOPMENT_MIN_SIZE = 10 # Minimum connections
DEVELOPMENT_MAX_SIZE = 50 # Maximum connections
DEVELOPMENT_COMMAND_TIMEOUT = 30 # seconds
DEVELOPMENT_TIMEOUT = 20 # seconds
Production Settings
# From config.py
PRODUCTION_MIN_SIZE = 5 # Minimum connections
PRODUCTION_MAX_SIZE = 30 # Maximum connections
PRODUCTION_COMMAND_TIMEOUT = 20 # seconds
PRODUCTION_TIMEOUT = 15 # seconds
The application automatically selects the appropriate configuration based on the ENVIRONMENT variable.
Troubleshooting
pgvector Extension Not Found
Error: ERROR: extension "vector" does not exist
Solution:
- Verify pgvector is installed:
ls $(pg_config --pkglibdir) | grep vector
- Reinstall pgvector (see installation steps above)
- Restart PostgreSQL:
sudo systemctl restart postgresql
Connection Pool Exhausted
Error: asyncpg.exceptions.TooManyConnectionsError
Solution:
- Increase
max_connections in PostgreSQL:
-- Check current setting
SHOW max_connections;
-- Edit postgresql.conf
max_connections = 100
- Restart PostgreSQL
- Adjust pool sizes in
config.py if needed
Slow Vector Queries
Symptom: Vector similarity searches taking >1 second
Solution:
- Ensure vector indexes exist:
SELECT indexname FROM pg_indexes WHERE tablename = 'transcript_chunks';
- Rebuild indexes if needed:
REINDEX INDEX CONCURRENTLY transcript_chunks_embedding_idx;
- Enable debug mode to see query plans:
Database Locks
Error: deadlock detected
Solution:
- Check for long-running queries:
SELECT pid, now() - query_start as duration, query
FROM pg_stat_activity
WHERE state = 'active' AND now() - query_start > interval '5 minutes';
- Kill problematic queries if needed:
SELECT pg_terminate_backend(pid);
Production Recommendations
These settings are critical for production deployments.
Security
- Use strong passwords for database users
- Enable SSL/TLS for database connections:
DATABASE_URL=postgresql://user:pass@host:5432/db?sslmode=require
- Restrict network access using firewall rules
- Enable connection limits per user:
ALTER USER finance_agent CONNECTION LIMIT 50;
- Enable query logging for monitoring:
-- In postgresql.conf
log_min_duration_statement = 1000 # Log queries > 1 second
- Configure shared_buffers (25% of RAM):
- Enable autovacuum for maintenance:
- Monitor table bloat and run VACUUM regularly
Backup
- Set up automated backups:
# Daily backup
pg_dump -U postgres stratalens > backup_$(date +%Y%m%d).sql
- Use point-in-time recovery (PITR) with WAL archiving
- Test restore procedures regularly
Next Steps