Introduction
Pricing Intelligence is built as a distributed microservices architecture that combines AI-powered natural language processing with rigorous mathematical constraint solving. The system follows a layered approach where each service has a specific responsibility in the pricing analysis pipeline.High-Level Architecture
The platform consists of six main components working in concert:The architecture follows the ReAct pattern (Reasoning + Acting) where the Harvey agent orchestrates complex workflows by reasoning about user queries and acting through MCP tools.
System Components
Harvey API
AI agent implementing the ReAct pattern for natural language pricing queries
MCP Server
Model Context Protocol server exposing standardized tools to the agent
Analysis API
Node.js service for pricing validation, optimization, and statistics
A-MINT API
Python service for extracting structured pricing from URLs
CSP Service
Java-based Choco constraint solver for mathematical optimization
Frontend
React/Vite chat interface for user interaction
Architectural Principles
1. Separation of Concerns
Each service has a single, well-defined responsibility:- Harvey: Natural language understanding and orchestration
- MCP Server: Protocol translation and tool routing
- Analysis API: Business logic for pricing analysis
- A-MINT: Pricing data extraction and transformation
- CSP Service: Mathematical constraint satisfaction
- Frontend: User interface and visualization
2. Grounding Strategy
One of the key innovations is the grounding mechanism that reduces LLM hallucinations:Extract Structured Data
A-MINT converts unstructured pricing pages into structured Pricing2Yaml format
3. Protocol-First Design
The system uses Model Context Protocol (MCP) as the standard interface between the AI agent and backend tools. This provides:- Standardization: Consistent tool invocation across different services
- Composability: Tools can be easily combined in workflows
- Observability: All tool calls are logged and traceable
- Extensibility: New tools can be added without changing the agent
4. Caching & Performance
Multiple caching layers optimize performance:Deployment Architecture
All services are containerized and orchestrated via Docker Compose:Service Communication
Services communicate via:- HTTP/REST: Synchronous API calls between services
- Server-Sent Events (SSE): Real-time streaming from Harvey to frontend
- stdio: MCP Server launched as subprocess by Harvey
Port Mapping
| Service | Internal Port | External Port | Protocol |
|---|---|---|---|
| CSP Service | 8000 | 8000 | HTTP |
| A-MINT API | 8000 | 8001 | HTTP |
| Analysis API | 3000 | 8002 | HTTP |
| MCP Server | 8085 | 8085 | HTTP/SSE |
| Harvey API | 8086 | 8086 | HTTP/SSE |
| Frontend | 80 | 80 | HTTP |
Data Flow Patterns
Query Processing Flow
Pricing Extraction Flow
Scalability Considerations
Horizontal Scaling
- Stateless Services: Harvey, MCP, and Analysis APIs are stateless and can be replicated
- Load Balancing: Use nginx or cloud load balancers for traffic distribution
- Cache Layer: Redis can replace in-memory cache for shared state
Vertical Scaling
- CSP Service: Memory-intensive; can be scaled up for complex constraint problems
- Analysis API: CPU-intensive for large configuration spaces
Performance Optimizations
Caching Strategy
Caching Strategy
- Pricing YAML documents cached with TTL (default 3600s)
- CSP solver results cached by (yaml_hash, filters, solver)
- LLM responses not cached (dynamic context)
Asynchronous Processing
Asynchronous Processing
- Analysis API uses job queue for long-running CSP operations
- Clients poll
/api/v1/pricing/analysis/{jobId}for results - Prevents timeout issues for large configuration spaces
Connection Pooling
Connection Pooling
- HTTP clients use connection pooling (10 max connections)
- Database connections pooled (if persistent storage added)
Security Architecture
Authentication & Authorization
- API Keys: OpenAI API keys stored in environment variables
- No User Auth: Currently no user authentication (add OAuth2/JWT for production)
- CORS: Configured to allow frontend origin
Network Security
Input Validation
- Harvey: Validates action names against allowed list
- MCP: Validates tool parameters with TypedDict schemas
- Analysis API: Validates YAML structure and filter criteria
- CSP: Validates constraint model before solving
Monitoring & Observability
Logging
Structured logging with consistent event names:Health Checks
All services expose/health endpoints:
Tracing
Each request can be traced through the system:- User query → Harvey generates
request_id - Harvey → MCP: passes
request_idin context - MCP → Analysis: includes in headers
- Analysis → CSP: includes in request body
Technology Stack
Languages
- Python 3.11+
- TypeScript/Node.js
- Java 17
Frameworks
- FastAPI (Python)
- Express.js (Node)
- Spring Boot (Java)
Infrastructure
- Docker
- Docker Compose
- nginx (optional)
AI/ML
- OpenAI GPT models
- MCP Protocol
- ReAct pattern
Solvers
- Choco Solver
- MiniZinc
Frontend
- React
- Vite
- TypeScript
Next Steps
Deep Dive: Services
Explore each microservice in detail
Data Flow
Understand request/response patterns