AnyMaths Adapter

Overview

The AnyMathsAdapter is designed for optimizing prompts for mathematical word problems of varying complexity. It:

Works with any dataset containing math problems (GSM8K, MATH, AIME, etc.)
Supports local models via Ollama (zero cost) or cloud APIs
Enforces structured output with separate reasoning and answer fields
Provides detailed feedback for incorrect solutions

Key Result: On GSM8K with ollama/gemma3:1b, GEPA improves accuracy from 9% → 38% (+29 pp) with budget of 500.

This adapter was contributed by Emmanuel G. Maminta.

Installation

pip install gepa

# Install adapter-specific dependencies
pip install -r src/gepa/adapters/anymaths_adapter/requirements.txt

# For Ollama (local models)
# Install from: https://ollama.com
ollama pull qwen3:4b
ollama pull qwen3:8b

Quick Start

import gepa
from gepa.adapters.anymaths_adapter import AnyMathsAdapter

# Load dataset (e.g., GSM8K)
train_data = [
    {
        "input": "John has 5 apples. He buys 3 more. How many does he have?",
        "answer": "8",
        "additional_context": {}
    },
    # ... more examples
]

# Create adapter with Ollama (FREE)
adapter = AnyMathsAdapter(
    model="ollama/qwen3:4b",
    api_base="http://localhost:11434",
    max_litellm_workers=4
)

# Optimize
result = gepa.optimize(
    seed_candidate={
        "system_prompt": """You are an AI assistant that solves mathematical word problems.
        Provide step-by-step solution and final numerical answer."""
    },
    trainset=train_data[:50],
    valset=train_data[50:100],
    adapter=adapter,
    max_metric_calls=500,
    reflection_lm="ollama/qwen3:8b"  # Larger model for reflection
)

print("Optimized prompt:")
print(result.best_candidate["system_prompt"])

Class Signature

Defined in src/gepa/adapters/anymaths_adapter/anymaths_adapter.py:31:

class AnyMathsAdapter(GEPAAdapter[AnyMathsDataInst, AnyMathsTrajectory, AnyMathsRolloutOutput]):
    def __init__(
        self,
        model: str,
        failure_score: float = 0.0,
        api_base: str | None = "http://localhost:11434",
        max_litellm_workers: int = 10,
    )

Parameters

model

str

required

Model for task execution. Supports:

Ollama: "ollama/qwen3:4b", "ollama/gemma3:1b"
OpenAI: "openai/gpt-4o-mini"
Google: "vertex_ai/gemini-2.5-flash-lite"
Any LiteLLM-supported provider

failure_score

float

default:"0.0"

Score assigned when answer is incorrect or parsing fails.

api_base

str | None

default:"'http://localhost:11434'"

API base URL. Required for Ollama, None for cloud providers.

max_litellm_workers

int

default:"10"

Maximum parallel workers for batch completion.

Data Types

AnyMathsDataInst

Input data structure (src/gepa/adapters/anymaths_adapter/anymaths_adapter.py:9):

class AnyMathsDataInst(TypedDict):
    input: str                      # Math problem statement
    additional_context: dict[str, str]  # Extra hints/context
    answer: str                     # Expected numerical answer (string)

AnyMathsStructuredOutput

Enforced output schema (src/gepa/adapters/anymaths_adapter/anymaths_adapter.py:24):

class AnyMathsStructuredOutput(BaseModel):
    final_answer: str               # Numerical answer only (no units/text)
    solution_pad: str               # Step-by-step solution

AnyMathsTrajectory

Execution trace (src/gepa/adapters/anymaths_adapter/anymaths_adapter.py:15):

class AnyMathsTrajectory(TypedDict):
    data: AnyMathsDataInst          # Original problem
    full_assistant_response: str    # Formatted response with reasoning

Structured Output

The adapter enforces structured JSON output:

{
  "final_answer": "42",
  "solution_pad": "Step 1: Calculate 20 + 22\nStep 2: Result is 42"
}

Key constraints:

final_answer: Must contain only the numerical answer (no units, no text)
solution_pad: Contains step-by-step reasoning
Model must follow this format strictly (enforced via JSON schema)

Methods

evaluate()

Evaluates candidate on batch of math problems.

def evaluate(
    self,
    batch: list[AnyMathsDataInst],
    candidate: dict[str, str],
    capture_traces: bool = False,
) -> EvaluationBatch[AnyMathsTrajectory, AnyMathsRolloutOutput]

Implementation: src/gepa/adapters/anymaths_adapter/anymaths_adapter.py:60

Behavior

Extracts system prompt from candidate (first value)
For each problem:
- Sends system prompt + problem to model
- Enforces AnyMathsStructuredOutput JSON schema
- Parses response to extract final_answer and solution_pad
- Checks if data["answer"] is contained in final_answer
Returns scores (1.0 for correct, 0.0 for incorrect)
Captures trajectories if capture_traces=True

make_reflective_dataset()

Generates reflective dataset with detailed feedback.

def make_reflective_dataset(
    self,
    candidate: dict[str, str],
    eval_batch: EvaluationBatch[AnyMathsTrajectory, AnyMathsRolloutOutput],
    components_to_update: list[str],
) -> dict[str, list[dict[str, Any]]]

Implementation: src/gepa/adapters/anymaths_adapter/anymaths_adapter.py:130

Returns

{
    "system_prompt": [
        {
            "Inputs": "John has 5 apples. He buys 3 more. How many?",
            "Generated Outputs": "Assistant's Solution: 5 + 3 = 8\nFinal Answer: 8",
            "Feedback": "The generated response is correct. The final answer is: 8"
        },
        # ... more examples
    ]
}

Dataset Preparation

Expected Format

Datasets should follow this schema:

{
    "question": "...",  # or "input"
    "solution": "...",  # Optional step-by-step solution
    "answer": "42"      # Numerical answer only
}

From Hugging Face

from datasets import load_dataset

# Load GSM8K
dataset = load_dataset("openai/gsm8k", "main")

# Convert to AnyMaths format
def convert_example(example):
    # Extract numerical answer from "#### 42" format
    answer = example["answer"].split("####")[-1].strip()
    return {
        "input": example["question"],
        "answer": answer,
        "additional_context": {}
    }

train_data = [convert_example(ex) for ex in dataset["train"]]
val_data = [convert_example(ex) for ex in dataset["test"][:100]]

Custom Dataset

For custom datasets, ensure answers are numerical strings:

train_data = [
    {
        "input": "What is 2 + 2?",
        "answer": "4",  # Not "4 apples" or "The answer is 4"
        "additional_context": {
            "difficulty": "easy",
            "category": "arithmetic"
        }
    },
    # ... more examples
]

Complete Example

import gepa
from gepa.adapters.anymaths_adapter import AnyMathsAdapter
from datasets import load_dataset

# 1. Load GSM8K dataset
dataset = load_dataset("openai/gsm8k", "main")

def convert_example(example):
    answer = example["answer"].split("####")[-1].strip()
    return {
        "input": example["question"],
        "answer": answer,
        "additional_context": {}
    }

train_data = [convert_example(ex) for ex in dataset["train"][:50]]
val_data = [convert_example(ex) for ex in dataset["train"][50:100]]
test_data = [convert_example(ex) for ex in dataset["test"][:50]]

# 2. Create seed prompt
seed_prompt = """
You are an AI assistant that solves mathematical word problems.

Provide:
1. Step-by-step solution in the solution_pad field
2. Final numerical answer in the final_answer field (no units, no text)

Be precise and show your work clearly.
"""

# 3. Create adapter (Ollama - FREE)
adapter = AnyMathsAdapter(
    model="ollama/qwen3:4b",
    api_base="http://localhost:11434",
    max_litellm_workers=4,
    failure_score=0.0
)

# 4. Optimize
result = gepa.optimize(
    seed_candidate={"system_prompt": seed_prompt},
    trainset=train_data,
    valset=val_data,
    adapter=adapter,
    max_metric_calls=500,
    reflection_lm="ollama/qwen3:8b"
)

# 5. Evaluate on test set
from gepa.adapters.anymaths_adapter import AnyMathsAdapter

test_adapter = AnyMathsAdapter(
    model="ollama/qwen3:4b",
    api_base="http://localhost:11434"
)

test_result = test_adapter.evaluate(
    batch=test_data,
    candidate=result.best_candidate,
    capture_traces=False
)

test_accuracy = sum(test_result.scores) / len(test_result.scores)

print(f"Test Accuracy: {test_accuracy:.1%}")
print(f"\nOptimized Prompt:")
print(result.best_candidate["system_prompt"])

Experimental Results

From the adapter README:

Dataset	Base LM	Reflection LM	Accuracy Before	Accuracy After	Gain	Budget
GSM8K	`ollama/qwen3:4b`	`ollama/qwen3:8b`	18%	23%	+5 pp	500
GSM8K	`vertex_ai/gemini-2.5-flash-lite`	`vertex_ai/gemini-2.5-flash`	31%	33%	+2 pp	500
GSM8K	`ollama/qwen3:0.6b`	`ollama/qwen3:8b`	7%	5%	-2 pp	500
GSM8K	`ollama/gemma3:1b`	`ollama/gemma3:4b`	9%	38%	+29 pp	500

Best result: +29 percentage points improvement on GSM8K with ollama/gemma3:1b.

Model-Specific Tips

Small Models (< 1B parameters)

Smaller models struggle with structured output:

# Use very explicit seed prompt
seed_prompt = """
You MUST respond with valid JSON in this exact format:
{
  "final_answer": "<number>",
  "solution_pad": "<step by step>"
}

The final_answer field must contain ONLY the numerical answer.
No units, no text, no explanations.
"""

adapter = AnyMathsAdapter(
    model="ollama/qwen3:0.6b",
    api_base="http://localhost:11434"
)

Medium Models (1-4B parameters)

Good balance of cost and performance:

# Works well with moderate guidance
adapter = AnyMathsAdapter(
    model="ollama/qwen3:4b",  # Sweet spot
    api_base="http://localhost:11434",
    max_litellm_workers=4
)

# Use larger model for reflection
result = gepa.optimize(
    ...,
    reflection_lm="ollama/qwen3:8b"  # 2x larger for better reflection
)

Cloud Models

For production use:

# Google Vertex AI
adapter = AnyMathsAdapter(
    model="vertex_ai/gemini-2.5-flash-lite",
    api_base=None,  # Uses default Vertex AI endpoint
    max_litellm_workers=10
)

result = gepa.optimize(
    ...,
    reflection_lm="vertex_ai/gemini-2.5-flash"  # Stronger for reflection
)

# OpenAI
adapter = AnyMathsAdapter(
    model="openai/gpt-4o-mini",
    api_base=None,
    max_litellm_workers=10
)

Prompt Evolution Patterns

GEPA discovers interesting patterns in optimal prompts:

For Small Models

Goal-oriented: Clearly states the task objective
Chain-of-Thought: Breaks down problem-solving into numbered steps
Instruction Detail: Specific guidance on parsing problems and applying formulas
Few-shot Learning: Concrete examples of different problem types
Knowledge Base: Mini-rulebook with common pitfalls and edge cases
Structured Output: Strict output format specification

For Provider Models

Concise: Fewer tokens, more direct instructions
Straightforward: Main instruction and output format at the top
Structured Guidelines: Detailed guidelines follow main instruction

See full prompt examples in the adapter README.

Cost Comparison

Ollama (Local - FREE)

# Total cost: $0.00
adapter = AnyMathsAdapter(
    model="ollama/qwen3:4b",
    api_base="http://localhost:11434"
)
result = gepa.optimize(
    ...,
    reflection_lm="ollama/qwen3:8b",
    max_metric_calls=500
)

Requirements:

Install Ollama locally
Download models (3-4GB each)
~8GB RAM minimum

OpenAI API

# Approximate cost: $5-10 for 500 calls
adapter = AnyMathsAdapter(
    model="openai/gpt-4o-mini",  # $0.15/1M input tokens
    api_base=None
)
result = gepa.optimize(
    ...,
    reflection_lm="openai/gpt-4",  # Proposal only (~10-20 calls)
    max_metric_calls=500
)

Google Vertex AI

# Approximate cost: $2-5 for 500 calls
adapter = AnyMathsAdapter(
    model="vertex_ai/gemini-2.5-flash-lite",
    api_base=None
)
result = gepa.optimize(
    ...,
    reflection_lm="vertex_ai/gemini-2.5-flash",
    max_metric_calls=500
)

Best Practices

Start Small: Test with 10-20 examples before full optimization
Answer Format: Ensure answers are purely numerical strings
Budget: Use 500 calls for small models, 200-300 for larger models
Reflection LM: Use a model 2-4x larger than task model
Local First: Develop with Ollama, deploy with cloud APIs
Test Set: Always evaluate on held-out test set

Troubleshooting

JSON Parsing Errors

# Model not following structured output
# Solution: Make seed prompt more explicit

seed_prompt = """
You MUST respond with JSON in this EXACT format:
{
  "final_answer": "42",
  "solution_pad": "Step 1: ..."
}

Do NOT include any text outside the JSON.
The final_answer must be ONLY a number.
"""

Low Accuracy

# Check if answer matching is too strict
# Try substring matching vs exact matching

# Adapter already uses substring matching:
# score = 1.0 if data["answer"] in assistant_response["final_answer"] else 0.0

# If still failing, check answer format in dataset
print(train_data[0]["answer"])  # Should be "42", not "42 apples"

Ollama Connection Error

# Check Ollama is running
curl http://localhost:11434/api/tags

# Start Ollama if needed
ollama serve

# Verify model is downloaded
ollama list

Advanced Usage

Mixed Provider Strategy

# Use cheap model for task, expensive for reflection
adapter = AnyMathsAdapter(
    model="ollama/qwen3:4b",  # Free local
    api_base="http://localhost:11434"
)

result = gepa.optimize(
    ...,
    reflection_lm="openai/gpt-4"  # Pay only for proposals
)

Domain-Specific Context

train_data = [
    {
        "input": "A train travels at 60 mph for 2 hours...",
        "answer": "120",
        "additional_context": {
            "domain": "physics",
            "concept": "speed-distance-time",
            "difficulty": "medium"
        }
    },
    # ... more examples
]

# Additional context is used in feedback but not shown to model

Core API

Adapters

Configuration

Advanced

​Overview

​Installation

​Quick Start

​Class Signature

​Parameters

​Data Types

​AnyMathsDataInst

​AnyMathsStructuredOutput

​AnyMathsTrajectory

​Structured Output

​Methods

​evaluate()

​Behavior

​make_reflective_dataset()

​Returns

​Dataset Preparation

​Expected Format

​From Hugging Face

​Custom Dataset

​Complete Example

​Experimental Results

​Model-Specific Tips

​Small Models (< 1B parameters)

​Medium Models (1-4B parameters)

​Cloud Models

​Prompt Evolution Patterns

​For Small Models

​For Provider Models

​Cost Comparison

​Ollama (Local - FREE)

​OpenAI API

​Google Vertex AI

​Best Practices

​Troubleshooting

​JSON Parsing Errors

​Low Accuracy

​Ollama Connection Error

​Advanced Usage

​Mixed Provider Strategy

​Domain-Specific Context

​See Also

Build docs developers (and LLMs) love

Overview

Installation

Quick Start

Class Signature

Parameters

Data Types

AnyMathsDataInst

AnyMathsStructuredOutput

AnyMathsTrajectory

Structured Output

Methods

evaluate()

Behavior

make_reflective_dataset()

Returns

Dataset Preparation

Expected Format

From Hugging Face

Custom Dataset

Complete Example

Experimental Results

Model-Specific Tips

Small Models (< 1B parameters)

Medium Models (1-4B parameters)

Cloud Models

Prompt Evolution Patterns

For Small Models

For Provider Models

Cost Comparison

Ollama (Local - FREE)

OpenAI API

Google Vertex AI

Best Practices

Troubleshooting

JSON Parsing Errors

Low Accuracy

Ollama Connection Error

Advanced Usage

Mixed Provider Strategy

Domain-Specific Context

See Also