Skip to main content

Introduction

Pricing Intelligence is built as a distributed microservices architecture that combines AI-powered natural language processing with rigorous mathematical constraint solving. The system follows a layered approach where each service has a specific responsibility in the pricing analysis pipeline.

High-Level Architecture

The platform consists of six main components working in concert:
The architecture follows the ReAct pattern (Reasoning + Acting) where the Harvey agent orchestrates complex workflows by reasoning about user queries and acting through MCP tools.

System Components

Harvey API

AI agent implementing the ReAct pattern for natural language pricing queries

MCP Server

Model Context Protocol server exposing standardized tools to the agent

Analysis API

Node.js service for pricing validation, optimization, and statistics

A-MINT API

Python service for extracting structured pricing from URLs

CSP Service

Java-based Choco constraint solver for mathematical optimization

Frontend

React/Vite chat interface for user interaction

Architectural Principles

1. Separation of Concerns

Each service has a single, well-defined responsibility:
  • Harvey: Natural language understanding and orchestration
  • MCP Server: Protocol translation and tool routing
  • Analysis API: Business logic for pricing analysis
  • A-MINT: Pricing data extraction and transformation
  • CSP Service: Mathematical constraint satisfaction
  • Frontend: User interface and visualization

2. Grounding Strategy

One of the key innovations is the grounding mechanism that reduces LLM hallucinations:
1

Extract Structured Data

A-MINT converts unstructured pricing pages into structured Pricing2Yaml format
2

Verify Against Schema

Harvey validates feature names and limits against the actual YAML schema
3

Delegate to Solvers

Mathematical operations are delegated to CSP solvers rather than LLM estimation
4

Return Grounded Results

Answers include exact prices, plan names, and configurations from the data

3. Protocol-First Design

The system uses Model Context Protocol (MCP) as the standard interface between the AI agent and backend tools. This provides:
  • Standardization: Consistent tool invocation across different services
  • Composability: Tools can be easily combined in workflows
  • Observability: All tool calls are logged and traceable
  • Extensibility: New tools can be added without changing the agent

4. Caching & Performance

Multiple caching layers optimize performance:
// MCP Server caches pricing YAML documents
cache_key = f"pricing:{url}"
if cached := await cache.get(cache_key):
    return cached

// Analysis API caches CSP solver results
// Harvey API maintains conversation context

Deployment Architecture

All services are containerized and orchestrated via Docker Compose:
services:
  choco-api:
    build: ./csp
    ports: ["8000:8000"]
    
  analysis-api:
    build: ./analysis_api
    ports: ["8002:3000"]
    environment:
      - CHOCO_API=http://choco-api:8000
    depends_on: [choco-api]
    
  a-mint-api:
    build: ./src
    ports: ["8001:8000"]
    environment:
      - ANALYSIS_API=http://analysis-api:3000/api/v1
    depends_on: [analysis-api]
    
  mcp-server:
    build: ./mcp_server
    ports: ["8085:8085"]
    environment:
      - AMINT_BASE_URL=http://a-mint-api:8000
      - ANALYSIS_BASE_URL=http://analysis-api:3000
    depends_on: [a-mint-api, analysis-api]
    
  harvey-api:
    build: ./harvey_api
    ports: ["8086:8086"]
    environment:
      - MCP_SERVER_URL=http://mcp-server:8085/sse
    depends_on: [mcp-server]
    
  mcp-frontend:
    build: ./frontend
    ports: ["80:80"]
    depends_on: [harvey-api]

Service Communication

Services communicate via:
  • HTTP/REST: Synchronous API calls between services
  • Server-Sent Events (SSE): Real-time streaming from Harvey to frontend
  • stdio: MCP Server launched as subprocess by Harvey

Port Mapping

ServiceInternal PortExternal PortProtocol
CSP Service80008000HTTP
A-MINT API80008001HTTP
Analysis API30008002HTTP
MCP Server80858085HTTP/SSE
Harvey API80868086HTTP/SSE
Frontend8080HTTP

Data Flow Patterns

Query Processing Flow

Pricing Extraction Flow

Scalability Considerations

Horizontal Scaling

  • Stateless Services: Harvey, MCP, and Analysis APIs are stateless and can be replicated
  • Load Balancing: Use nginx or cloud load balancers for traffic distribution
  • Cache Layer: Redis can replace in-memory cache for shared state

Vertical Scaling

  • CSP Service: Memory-intensive; can be scaled up for complex constraint problems
  • Analysis API: CPU-intensive for large configuration spaces

Performance Optimizations

  • Pricing YAML documents cached with TTL (default 3600s)
  • CSP solver results cached by (yaml_hash, filters, solver)
  • LLM responses not cached (dynamic context)
  • Analysis API uses job queue for long-running CSP operations
  • Clients poll /api/v1/pricing/analysis/{jobId} for results
  • Prevents timeout issues for large configuration spaces
  • HTTP clients use connection pooling (10 max connections)
  • Database connections pooled (if persistent storage added)

Security Architecture

Authentication & Authorization

  • API Keys: OpenAI API keys stored in environment variables
  • No User Auth: Currently no user authentication (add OAuth2/JWT for production)
  • CORS: Configured to allow frontend origin

Network Security

# Internal network isolation
networks:
  pricing_network:
    driver: bridge
    internal: true  # No external access
    
services:
  harvey-api:
    networks:
      - pricing_network
      - default  # Only Harvey exposed externally

Input Validation

  • Harvey: Validates action names against allowed list
  • MCP: Validates tool parameters with TypedDict schemas
  • Analysis API: Validates YAML structure and filter criteria
  • CSP: Validates constraint model before solving

Monitoring & Observability

Logging

Structured logging with consistent event names:
# harvey_api/src/harvey_api/logging.py
logger.info(
    "harvey.agent.plan_generated",
    question=question,
    actions=[a.name for a in actions],
    filters=filters
)

Health Checks

All services expose /health endpoints:
curl http://localhost:8086/health
# {"status": "UP"}
Docker Compose uses health checks for service dependencies:
healthcheck:
  test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
  interval: 30s
  timeout: 10s
  retries: 3

Tracing

Each request can be traced through the system:
  1. User query → Harvey generates request_id
  2. Harvey → MCP: passes request_id in context
  3. MCP → Analysis: includes in headers
  4. Analysis → CSP: includes in request body

Technology Stack

Languages

  • Python 3.11+
  • TypeScript/Node.js
  • Java 17

Frameworks

  • FastAPI (Python)
  • Express.js (Node)
  • Spring Boot (Java)

Infrastructure

  • Docker
  • Docker Compose
  • nginx (optional)

AI/ML

  • OpenAI GPT models
  • MCP Protocol
  • ReAct pattern

Solvers

  • Choco Solver
  • MiniZinc

Frontend

  • React
  • Vite
  • TypeScript

Next Steps

Deep Dive: Services

Explore each microservice in detail

Data Flow

Understand request/response patterns

Build docs developers (and LLMs) love