Skip to main content

POST /api/v1/transform

The transformation endpoint extracts pricing information from a SaaS website URL and converts it into the Pricing2YAML format. This is an asynchronous operation that returns a task ID for status polling.
Extraction can take several minutes depending on the complexity of the pricing page and the number of validation iterations required.

Endpoint Details

URL: http://localhost:8001/api/v1/transform Method: POST Content-Type: application/json

Request Body

url
string
required
The full URL of the SaaS pricing page to extract data from.Example: https://slack.com/pricing
model
string
default:"gpt-5.2"
The OpenAI model to use for extraction. Common options:
  • gpt-5.2 (default)
  • gpt-4o
  • gpt-3.5-turbo
temperature
float
default:"0.7"
Controls randomness in model responses. Range: 0.0 to 1.0
  • Lower values (0.0-0.3): More deterministic and focused
  • Higher values (0.7-1.0): More creative but less consistent
max_tries
integer
default:"50"
Maximum number of validation and fixing iterations. The system will attempt to validate and correct the extracted YAML up to this many times.
base_url
string
default:"https://api.openai.com/v1"
Custom endpoint URL for OpenAI-compatible APIs. Useful for:
  • Self-hosted LLM endpoints
  • Azure OpenAI Service
  • Other OpenAI-compatible providers
better_model
string
default:"gpt-5.2"
Model to use for higher-quality refinement passes.

Response

The endpoint immediately returns a response with a task ID:
task_id
string
required
Unique identifier for the transformation task. Use this to poll for status and retrieve the result.
status
string
required
Initial status of the task. Will be "pending" when first created.
message
string
Human-readable message about the task status.

Example Request

curl -X POST http://localhost:8001/api/v1/transform \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://buffer.com/pricing",
    "model": "gpt-5.2",
    "temperature": 0.7,
    "max_tries": 50
  }'

Example Response

{
  "task_id": "a3f7e8c9-4b2d-4f1e-8a9c-7d3e5f6a8b2c",
  "status": "pending",
  "message": "Transformation started"
}

Checking Task Status

Use the returned task_id to poll for completion: Endpoint: GET /api/v1/transform/status/{task_id}

Status Response

status
string
required
Current status of the task:
  • "pending" - Task is queued or in progress
  • "completed" - Task finished successfully
  • "error" - Task failed
result_file
string
Path to the generated YAML file (only present when status is "completed")
error
string
Error message if the task failed (only present when status is "error")

Polling Example

import requests
import time

task_id = "a3f7e8c9-4b2d-4f1e-8a9c-7d3e5f6a8b2c"

while True:
    response = requests.get(
        f"http://localhost:8001/api/v1/transform/status/{task_id}"
    )
    
    if response.status_code == 200:
        # Task completed - response is the YAML file
        with open("pricing.yaml", "wb") as f:
            f.write(response.content)
        print("Pricing data saved to pricing.yaml")
        break
    
    data = response.json()
    
    if data["status"] == "error":
        print(f"Error: {data['error']}")
        break
    
    print(f"Status: {data['status']}")
    time.sleep(5)  # Poll every 5 seconds

Extraction Process

The A-MINT service performs the following steps during transformation:
1

Fetch HTML Content

Uses Selenium WebDriver to load the pricing page and extract rendered HTML.From src/amint/extractors/web_driver.py
2

Convert to Markdown

Transforms HTML into structured Markdown using LLM-powered conversion.Normalizes table separators and removes excessive formatting.
3

Extract Plans

Identifies pricing tiers, costs, and billing cycles.Extracts configuration like currency and billing period.
4

Extract Features

Identifies features and categorizes them:
  • DOMAIN (core functionality)
  • INTEGRATION (external services)
  • SUPPORT (customer service)
  • AUTOMATION (workflow automation)
  • GUARANTEE (SLAs, compliance)
  • INFORMATION (analytics, reporting)
  • MANAGEMENT (admin controls)
5

Extract Usage Limits

Finds numeric quotas and thresholds:
  • RENEWABLE (monthly resets)
  • NON_RENEWABLE (permanent limits)
  • TIME_DRIVEN (time-based quotas)
6

Extract Add-ons

Identifies optional extensions and overage costs.
7

Validate & Fix

Iteratively validates the YAML against the Analysis API and fixes errors.Continues up to max_tries iterations until valid or gives up.
8

Save YAML

Writes the final validated YAML to the output directory.Files are saved as output/{uuid}.yaml

Error Handling

Common errors and their meanings:
The WebDriver couldn’t load the pricing page. Possible causes:
  • Invalid URL
  • Website blocking automated access
  • Network connectivity issues
  • Page requires JavaScript that failed to execute
The extracted YAML couldn’t be validated even after max_tries attempts. This can happen when:
  • The pricing page structure is too complex
  • The LLM model consistently misinterprets the content
  • The page contains ambiguous or inconsistent information
Solution: Try increasing max_tries or using a more capable model.
The provided task_id doesn’t exist. This can occur if:
  • The task ID is incorrect
  • The task expired (tasks are stored in memory)
  • The service restarted

Output Format

When the task completes successfully, the status endpoint returns the YAML file directly with: Content-Type: application/x-yaml Filename: pricing_{uuid}.yaml

YAML Structure

See the A-MINT Overview for the complete Pricing2YAML specification. The extracted YAML includes:
syntaxVersion: '2.1'
saasName: buffer
version: '2024-01-15'
currency: USD
url: https://buffer.com/pricing

features:
  # Boolean, TEXT, or NUMERIC features
  # with descriptions and type classifications

usageLimits:
  # Numeric quotas linked to features
  # with value types and renewal periods

plans:
  # Named pricing tiers with costs
  # and feature/limit overrides

addOns:
  # Optional extensions with pricing
  # and plan availability

Logging and Debugging

A-MINT logs detailed information during extraction:
  • Application logs: /app/logs/amint_api.log
  • Transformation logs: /app/logs/transformation_logs.csv
The CSV log includes:
  • transformation_call_id: Unique ID for the transformation
  • timestamp: Start time
  • response_time: Total processing time in seconds
  • raw_html_length: Size of original HTML
  • cleaned_html_length: Size after cleaning
  • llm_call_ids: Comma-separated list of LLM API calls made

Performance Considerations

Extraction is computationally expensive and can take 2-10 minutes per pricing page depending on complexity.
Factors affecting performance:
  • Page complexity: More plans and features take longer to extract
  • Model speed: Faster models like gpt-3.5-turbo reduce latency
  • Validation iterations: Complex pages may require many fix attempts
  • Network latency: OpenAI API call overhead
Optimization tips:
  1. Use a lower temperature (0.0-0.3) for more deterministic results
  2. Reduce max_tries if you’re willing to accept partial extractions
  3. Consider caching results for frequently accessed pricing pages
  4. Use faster models for initial extraction, better models for refinement

POST /api/v1/fix

Fix and validate an existing YAML file without re-extracting from HTML

Analysis API

Validate and analyze extracted YAML files

Next Steps

A-MINT Overview

Learn more about A-MINT’s architecture and capabilities

MCP Tools

See how Harvey agent uses iPricing to call A-MINT

Build docs developers (and LLMs) love