Architecture Overview - Pricing Intelligence

Introduction

Pricing Intelligence is built as a distributed microservices architecture that combines AI-powered natural language processing with rigorous mathematical constraint solving. The system follows a layered approach where each service has a specific responsibility in the pricing analysis pipeline.

High-Level Architecture

The platform consists of six main components working in concert:

The architecture follows the ReAct pattern (Reasoning + Acting) where the Harvey agent orchestrates complex workflows by reasoning about user queries and acting through MCP tools.

System Components

Harvey API

AI agent implementing the ReAct pattern for natural language pricing queries

MCP Server

Model Context Protocol server exposing standardized tools to the agent

Analysis API

Node.js service for pricing validation, optimization, and statistics

A-MINT API

Python service for extracting structured pricing from URLs

CSP Service

Java-based Choco constraint solver for mathematical optimization

Frontend

React/Vite chat interface for user interaction

Architectural Principles

1. Separation of Concerns

Each service has a single, well-defined responsibility:

Harvey: Natural language understanding and orchestration
MCP Server: Protocol translation and tool routing
Analysis API: Business logic for pricing analysis
A-MINT: Pricing data extraction and transformation
CSP Service: Mathematical constraint satisfaction
Frontend: User interface and visualization

2. Grounding Strategy

One of the key innovations is the grounding mechanism that reduces LLM hallucinations:

Extract Structured Data

A-MINT converts unstructured pricing pages into structured Pricing2Yaml format

Verify Against Schema

Harvey validates feature names and limits against the actual YAML schema

Delegate to Solvers

Mathematical operations are delegated to CSP solvers rather than LLM estimation

Return Grounded Results

Answers include exact prices, plan names, and configurations from the data

3. Protocol-First Design

The system uses Model Context Protocol (MCP) as the standard interface between the AI agent and backend tools. This provides:

Standardization: Consistent tool invocation across different services
Composability: Tools can be easily combined in workflows
Observability: All tool calls are logged and traceable
Extensibility: New tools can be added without changing the agent

4. Caching & Performance

Multiple caching layers optimize performance:

// MCP Server caches pricing YAML documents
cache_key = f"pricing:{url}"
if cached := await cache.get(cache_key):
    return cached

// Analysis API caches CSP solver results
// Harvey API maintains conversation context

Deployment Architecture

All services are containerized and orchestrated via Docker Compose:

services:
  choco-api:
    build: ./csp
    ports: ["8000:8000"]
    
  analysis-api:
    build: ./analysis_api
    ports: ["8002:3000"]
    environment:
      - CHOCO_API=http://choco-api:8000
    depends_on: [choco-api]
    
  a-mint-api:
    build: ./src
    ports: ["8001:8000"]
    environment:
      - ANALYSIS_API=http://analysis-api:3000/api/v1
    depends_on: [analysis-api]
    
  mcp-server:
    build: ./mcp_server
    ports: ["8085:8085"]
    environment:
      - AMINT_BASE_URL=http://a-mint-api:8000
      - ANALYSIS_BASE_URL=http://analysis-api:3000
    depends_on: [a-mint-api, analysis-api]
    
  harvey-api:
    build: ./harvey_api
    ports: ["8086:8086"]
    environment:
      - MCP_SERVER_URL=http://mcp-server:8085/sse
    depends_on: [mcp-server]
    
  mcp-frontend:
    build: ./frontend
    ports: ["80:80"]
    depends_on: [harvey-api]

Service Communication

Services communicate via:

HTTP/REST: Synchronous API calls between services
Server-Sent Events (SSE): Real-time streaming from Harvey to frontend
stdio: MCP Server launched as subprocess by Harvey

Port Mapping

Service	Internal Port	External Port	Protocol
CSP Service	8000	8000	HTTP
A-MINT API	8000	8001	HTTP
Analysis API	3000	8002	HTTP
MCP Server	8085	8085	HTTP/SSE
Harvey API	8086	8086	HTTP/SSE
Frontend	80	80	HTTP

Data Flow Patterns

Query Processing Flow

Pricing Extraction Flow

Scalability Considerations

Horizontal Scaling

Stateless Services: Harvey, MCP, and Analysis APIs are stateless and can be replicated
Load Balancing: Use nginx or cloud load balancers for traffic distribution
Cache Layer: Redis can replace in-memory cache for shared state

Vertical Scaling

CSP Service: Memory-intensive; can be scaled up for complex constraint problems
Analysis API: CPU-intensive for large configuration spaces

Performance Optimizations

Caching Strategy

Pricing YAML documents cached with TTL (default 3600s)
CSP solver results cached by (yaml_hash, filters, solver)
LLM responses not cached (dynamic context)

Asynchronous Processing

Analysis API uses job queue for long-running CSP operations
Clients poll /api/v1/pricing/analysis/{jobId} for results
Prevents timeout issues for large configuration spaces

Connection Pooling

HTTP clients use connection pooling (10 max connections)
Database connections pooled (if persistent storage added)

Security Architecture

Authentication & Authorization

API Keys: OpenAI API keys stored in environment variables
No User Auth: Currently no user authentication (add OAuth2/JWT for production)
CORS: Configured to allow frontend origin

Network Security

# Internal network isolation
networks:
  pricing_network:
    driver: bridge
    internal: true  # No external access
    
services:
  harvey-api:
    networks:
      - pricing_network
      - default  # Only Harvey exposed externally

Input Validation

Harvey: Validates action names against allowed list
MCP: Validates tool parameters with TypedDict schemas
Analysis API: Validates YAML structure and filter criteria
CSP: Validates constraint model before solving

Monitoring & Observability

Logging

Structured logging with consistent event names:

# harvey_api/src/harvey_api/logging.py
logger.info(
    "harvey.agent.plan_generated",
    question=question,
    actions=[a.name for a in actions],
    filters=filters
)

Health Checks

All services expose /health endpoints:

curl http://localhost:8086/health
# {"status": "UP"}

Docker Compose uses health checks for service dependencies:

healthcheck:
  test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
  interval: 30s
  timeout: 10s
  retries: 3

Tracing

Each request can be traced through the system:

User query → Harvey generates request_id
Harvey → MCP: passes request_id in context
MCP → Analysis: includes in headers
Analysis → CSP: includes in request body

Technology Stack

Languages

Python 3.11+
TypeScript/Node.js
Java 17

Frameworks

FastAPI (Python)
Express.js (Node)
Spring Boot (Java)

Infrastructure

Docker
Docker Compose
nginx (optional)

AI/ML

OpenAI GPT models
MCP Protocol
ReAct pattern

Solvers

Choco Solver
MiniZinc

Frontend

React
Vite
TypeScript

Next Steps

Deep Dive: Services

Explore each microservice in detail

Data Flow

Understand request/response patterns

Get Started

Architecture

Core Concepts

Guides

​Introduction

​High-Level Architecture

​System Components

Harvey API

MCP Server

Analysis API

A-MINT API

CSP Service

Frontend

​Architectural Principles

​1. Separation of Concerns

​2. Grounding Strategy

​3. Protocol-First Design

​4. Caching & Performance

​Deployment Architecture

​Service Communication

​Port Mapping

​Data Flow Patterns

​Query Processing Flow

​Pricing Extraction Flow

​Scalability Considerations

​Horizontal Scaling

​Vertical Scaling

​Performance Optimizations

​Security Architecture

​Authentication & Authorization

​Network Security

​Input Validation

​Monitoring & Observability

​Logging

​Health Checks

​Tracing

​Technology Stack

Languages

Frameworks

Infrastructure

AI/ML

Solvers

Frontend

​Next Steps

Deep Dive: Services

Data Flow

Build docs developers (and LLMs) love

Introduction

High-Level Architecture

System Components

Architectural Principles

1. Separation of Concerns

2. Grounding Strategy

3. Protocol-First Design

4. Caching & Performance

Deployment Architecture

Service Communication

Port Mapping

Data Flow Patterns

Query Processing Flow

Pricing Extraction Flow

Scalability Considerations

Horizontal Scaling

Vertical Scaling

Performance Optimizations

Security Architecture

Authentication & Authorization

Network Security

Input Validation

Monitoring & Observability

Logging

Health Checks

Tracing

Technology Stack

Next Steps