Pricing Extraction

Overview

The iPricing tool (part of the A-MINT service) transforms unstructured pricing pages into machine-readable Pricing2Yaml format. This enables H.A.R.V.E.Y. to perform rigorous analysis, validation, and optimization on any SaaS pricing model.

How iPricing Works

The extraction pipeline follows these steps:

URL Input

Provide a SaaS pricing page URL (e.g., https://buffer.com/pricing)

LLM-Powered Extraction

The A-MINT service uses OpenAI models to analyze the webpage content and extract:

Plan names and prices
Features and their types (DOMAIN, INTEGRATION, SUPPORT)
Usage limits (NUMERIC, BOOLEAN)
Plan relationships and constraints

Schema Generation

The extracted data is structured according to the Pricing2Yaml specification

Validation

The generated YAML is validated against the schema and can be checked for mathematical consistency

Caching

Results are cached to avoid re-extracting the same URL

Using iPricing via H.A.R.V.E.Y.

Automatic URL Detection

The easiest way to use iPricing is to include a URL directly in your question:

What is the cheapest plan for Buffer (https://buffer.com/pricing) that includes 10 channels?

H.A.R.V.E.Y. automatically:

Detects the URL in your question
Calls the iPricing tool to extract the pricing model
Adds the extracted YAML to the conversation context
Proceeds to answer your question using the structured data

Manual URL Addition

You can also add URLs manually through the control panel:

Click 'Add URL'

In the context management section

Paste the pricing page URL

Enter the full URL including https://

Wait for transformation

The system displays a “pending” status while extracting. Once complete, it shows “done” with a ✓

Query the model

Ask questions about the pricing model

Tool Invocation (MCP)

If you’re using H.A.R.V.E.Y. as an MCP client or calling the MCP server directly, the iPricing tool accepts the following parameters:

pricing_url

string

required

The URL of the SaaS pricing page to extract

pricing_yaml

string

Optional: Provide existing YAML content instead of fetching from URL

refresh

boolean

default:"false"

Force re-extraction even if the URL is cached

Response Format

The tool returns a JSON object with:

{
  "request": {
    "url": "https://buffer.com/pricing",
    "refresh": false
  },
  "pricing_yaml": "saasName: Buffer\nsyntaxVersion: '2.1'\n...",
  "source": "amint"
}

request

object

Echo of the input parameters

pricing_yaml

string

The extracted Pricing2Yaml document as a YAML string

source

string

Indicates whether the YAML came from amint (extraction) or upload (user-provided)

Pricing2Yaml Format

The extracted YAML follows the Pricing2Yaml specification. Here’s an example structure:

overleaf-extract.yaml

saasName: Overleaf - Individual
syntaxVersion: '2.1'
version: '2023-11-27'
createdAt: '2023-11-27'
currency: USD
variables: {}

features:
  githubIntegration:
    description: Link your Overleaf projects directly to a GitHub repository
    valueType: BOOLEAN
    defaultValue: false
    type: INTEGRATION
    integrationType: WEB_SAAS
  fullDocumentHistory:
    description: See all edits and who made every change
    valueType: BOOLEAN
    defaultValue: false
    type: INFORMATION
  prioritySupport:
    description: Priority and escalated support requests
    valueType: BOOLEAN
    defaultValue: false
    type: SUPPORT

usageLimits:
  maxCollaboratorsPerProject:
    description: Number of people you can invite per project
    valueType: NUMERIC
    defaultValue: 1
    unit: collaborator
    type: NON_RENEWABLE
  compileTimeoutLimit:
    description: Time allowed to compile your project
    valueType: NUMERIC
    defaultValue: 1
    unit: minute
    type: TIME_DRIVEN

plans:
  FREE:
    description: ''
    monthlyPrice: 0
    annualPrice: 0
    unit: /month
    features: null
    usageLimits: null
    price: 0
  
  STANDARD:
    description: ''
    monthlyPrice: 21
    annualPrice: 16.59
    unit: /month
    features:
      githubIntegration:
        value: true
      fullDocumentHistory:
        value: true
      prioritySupport:
        value: true
    usageLimits:
      maxCollaboratorsPerProject:
        value: 11
      compileTimeoutLimit:
        value: 4
    price: 21

Schema Components

Metadata Fields

saasName: Name of the SaaS product
syntaxVersion: Pricing2Yaml spec version (currently 2.1)
version: Version identifier for this pricing model
createdAt: Extraction timestamp
currency: Price currency (USD, EUR, etc.)

Features

Each feature has:

description: Human-readable explanation
valueType: BOOLEAN, NUMERIC, TEXT
defaultValue: Base value (usually for free plan)
type: DOMAIN, INTEGRATION, SUPPORT, INFORMATION, MANAGEMENT, GUARANTEE, AUTOMATION, PAYMENT
integrationType (optional): API, WEB_SAAS, IDENTITY_PROVIDER, EXTENSION

Usage Limits

Quantifiable constraints:

valueType: NUMERIC or BOOLEAN
defaultValue: Base limit
unit: Measurement unit (users, requests, GB, etc.)
type: RENEWABLE, NON_RENEWABLE, TIME_DRIVEN, RESPONSE_DRIVEN
linkedFeatures: Features that depend on this limit

Plans

Each plan specifies:

monthlyPrice: Price per month
annualPrice: Price per month if billed annually
unit: Pricing unit (usually /month)
features: Override feature values for this plan
usageLimits: Override limits for this plan
price: Effective price (usually equals monthlyPrice)

Extraction Best Practices

Use Official Pricing Pages

Extract from the official SaaS vendor pricing page, not third-party comparison sites.

Check for Completeness

After extraction, review the YAML to ensure all plans and features were captured correctly.

Refresh Stale Data

Use refresh: true to force re-extraction if pricing has changed since the last fetch.

Validate After Extraction

Use the validate tool to check the extracted model for mathematical consistency.

Caching Behavior

The MCP server caches extracted pricing models by URL:

Default TTL: Configured via CACHE_BACKEND and cache settings
Cache Key: pricing:<url>
Bypass Cache: Set refresh: true in the tool call

Caching reduces latency and API costs by avoiding redundant extractions. The cache is shared across all H.A.R.V.E.Y. sessions.

Configuration

The A-MINT service requires an OpenAI API key:

docker-compose.yml

a-mint-api:
  environment:
    - OPENAI_API_KEY=${AMINT_API_KEY}
    - OPENAI_API_KEYS=${AMINT_API_KEYS}

Set the key in your .env file:

.env

AMINT_API_KEY=sk-...

Without a valid OpenAI API key, the iPricing tool will fail. Ensure your key has sufficient quota and permissions.

Troubleshooting Extraction

Incomplete Feature Extraction

Problem: Not all features or plans are extracted Solutions:

Verify the pricing page has clear, structured pricing tables
Check if the page uses JavaScript rendering (may require Sphere integration)
Manual review and editing of the YAML may be needed for complex layouts

Type Misclassification

Problem: Features are classified with incorrect types Solutions:

Review the generated YAML and manually correct types
Update descriptions to provide clearer context for the LLM
Use the Pricing2Yaml specification as a guide for correct classifications

Pricing Discrepancies

Problem: Extracted prices don’t match the website Solutions:

Check if the page shows region-specific pricing
Verify the currency in the YAML matches the displayed currency
Use refresh: true to re-extract in case of cached stale data

Network or API Errors

Problem: Extraction fails with timeout or API errors Solutions:

Check A-MINT service logs: docker-compose logs a-mint-api
Verify OpenAI API key is valid and has available quota
Ensure the target URL is publicly accessible

Programmatic Usage

If you’re calling the MCP server programmatically, here’s an example tool call:

from mcp import ClientSession

async with ClientSession(server_params) as session:
    result = await session.call_tool(
        "iPricing",
        {
            "pricing_url": "https://buffer.com/pricing",
            "refresh": False
        }
    )
    pricing_yaml = result["pricing_yaml"]

Get Started

Architecture

Core Concepts

Guides

Overview

How iPricing Works

Using iPricing via H.A.R.V.E.Y.

Automatic URL Detection

Manual URL Addition

Tool Invocation (MCP)

Response Format

Pricing2Yaml Format

Schema Components

Extraction Best Practices

Use Official Pricing Pages

Check for Completeness

Refresh Stale Data

Validate After Extraction

Caching Behavior

Configuration

Troubleshooting Extraction

Incomplete Feature Extraction

Type Misclassification

Pricing Discrepancies

Network or API Errors

Programmatic Usage

Next Steps

Optimization Queries

Pricing Models

Build docs developers (and LLMs) love

Get Started

Architecture

Core Concepts

Guides

​Overview

​How iPricing Works

​Using iPricing via H.A.R.V.E.Y.

​Automatic URL Detection

​Manual URL Addition

​Tool Invocation (MCP)

​Response Format

​Pricing2Yaml Format

​Schema Components

​Extraction Best Practices

Use Official Pricing Pages

Check for Completeness

Refresh Stale Data

Validate After Extraction

​Caching Behavior

​Configuration

​Troubleshooting Extraction

​Incomplete Feature Extraction

​Type Misclassification

​Pricing Discrepancies

​Network or API Errors

​Programmatic Usage

​Next Steps

Optimization Queries

Pricing Models

Build docs developers (and LLMs) love

Overview

How iPricing Works

Using iPricing via H.A.R.V.E.Y.

Automatic URL Detection

Manual URL Addition

Tool Invocation (MCP)

Response Format

Pricing2Yaml Format

Schema Components

Extraction Best Practices

Caching Behavior

Configuration

Troubleshooting Extraction

Incomplete Feature Extraction

Type Misclassification

Pricing Discrepancies

Network or API Errors

Programmatic Usage

Next Steps