Skip to main content

Overview

The AnyMathsAdapter is designed for optimizing prompts for mathematical word problems of varying complexity. It:
  • Works with any dataset containing math problems (GSM8K, MATH, AIME, etc.)
  • Supports local models via Ollama (zero cost) or cloud APIs
  • Enforces structured output with separate reasoning and answer fields
  • Provides detailed feedback for incorrect solutions
Key Result: On GSM8K with ollama/gemma3:1b, GEPA improves accuracy from 9% → 38% (+29 pp) with budget of 500.
This adapter was contributed by Emmanuel G. Maminta.

Installation

pip install gepa

# Install adapter-specific dependencies
pip install -r src/gepa/adapters/anymaths_adapter/requirements.txt

# For Ollama (local models)
# Install from: https://ollama.com
ollama pull qwen3:4b
ollama pull qwen3:8b

Quick Start

import gepa
from gepa.adapters.anymaths_adapter import AnyMathsAdapter

# Load dataset (e.g., GSM8K)
train_data = [
    {
        "input": "John has 5 apples. He buys 3 more. How many does he have?",
        "answer": "8",
        "additional_context": {}
    },
    # ... more examples
]

# Create adapter with Ollama (FREE)
adapter = AnyMathsAdapter(
    model="ollama/qwen3:4b",
    api_base="http://localhost:11434",
    max_litellm_workers=4
)

# Optimize
result = gepa.optimize(
    seed_candidate={
        "system_prompt": """You are an AI assistant that solves mathematical word problems.
        Provide step-by-step solution and final numerical answer."""
    },
    trainset=train_data[:50],
    valset=train_data[50:100],
    adapter=adapter,
    max_metric_calls=500,
    reflection_lm="ollama/qwen3:8b"  # Larger model for reflection
)

print("Optimized prompt:")
print(result.best_candidate["system_prompt"])

Class Signature

Defined in src/gepa/adapters/anymaths_adapter/anymaths_adapter.py:31:
class AnyMathsAdapter(GEPAAdapter[AnyMathsDataInst, AnyMathsTrajectory, AnyMathsRolloutOutput]):
    def __init__(
        self,
        model: str,
        failure_score: float = 0.0,
        api_base: str | None = "http://localhost:11434",
        max_litellm_workers: int = 10,
    )

Parameters

model
str
required
Model for task execution. Supports:
  • Ollama: "ollama/qwen3:4b", "ollama/gemma3:1b"
  • OpenAI: "openai/gpt-4o-mini"
  • Google: "vertex_ai/gemini-2.5-flash-lite"
  • Any LiteLLM-supported provider
failure_score
float
default:"0.0"
Score assigned when answer is incorrect or parsing fails.
api_base
str | None
default:"'http://localhost:11434'"
API base URL. Required for Ollama, None for cloud providers.
max_litellm_workers
int
default:"10"
Maximum parallel workers for batch completion.

Data Types

AnyMathsDataInst

Input data structure (src/gepa/adapters/anymaths_adapter/anymaths_adapter.py:9):
class AnyMathsDataInst(TypedDict):
    input: str                      # Math problem statement
    additional_context: dict[str, str]  # Extra hints/context
    answer: str                     # Expected numerical answer (string)

AnyMathsStructuredOutput

Enforced output schema (src/gepa/adapters/anymaths_adapter/anymaths_adapter.py:24):
class AnyMathsStructuredOutput(BaseModel):
    final_answer: str               # Numerical answer only (no units/text)
    solution_pad: str               # Step-by-step solution

AnyMathsTrajectory

Execution trace (src/gepa/adapters/anymaths_adapter/anymaths_adapter.py:15):
class AnyMathsTrajectory(TypedDict):
    data: AnyMathsDataInst          # Original problem
    full_assistant_response: str    # Formatted response with reasoning

Structured Output

The adapter enforces structured JSON output:
{
  "final_answer": "42",
  "solution_pad": "Step 1: Calculate 20 + 22\nStep 2: Result is 42"
}
Key constraints:
  • final_answer: Must contain only the numerical answer (no units, no text)
  • solution_pad: Contains step-by-step reasoning
  • Model must follow this format strictly (enforced via JSON schema)

Methods

evaluate()

Evaluates candidate on batch of math problems.
def evaluate(
    self,
    batch: list[AnyMathsDataInst],
    candidate: dict[str, str],
    capture_traces: bool = False,
) -> EvaluationBatch[AnyMathsTrajectory, AnyMathsRolloutOutput]
Implementation: src/gepa/adapters/anymaths_adapter/anymaths_adapter.py:60

Behavior

  1. Extracts system prompt from candidate (first value)
  2. For each problem:
    • Sends system prompt + problem to model
    • Enforces AnyMathsStructuredOutput JSON schema
    • Parses response to extract final_answer and solution_pad
    • Checks if data["answer"] is contained in final_answer
  3. Returns scores (1.0 for correct, 0.0 for incorrect)
  4. Captures trajectories if capture_traces=True

make_reflective_dataset()

Generates reflective dataset with detailed feedback.
def make_reflective_dataset(
    self,
    candidate: dict[str, str],
    eval_batch: EvaluationBatch[AnyMathsTrajectory, AnyMathsRolloutOutput],
    components_to_update: list[str],
) -> dict[str, list[dict[str, Any]]]
Implementation: src/gepa/adapters/anymaths_adapter/anymaths_adapter.py:130

Returns

{
    "system_prompt": [
        {
            "Inputs": "John has 5 apples. He buys 3 more. How many?",
            "Generated Outputs": "Assistant's Solution: 5 + 3 = 8\nFinal Answer: 8",
            "Feedback": "The generated response is correct. The final answer is: 8"
        },
        # ... more examples
    ]
}

Dataset Preparation

Expected Format

Datasets should follow this schema:
{
    "question": "...",  # or "input"
    "solution": "...",  # Optional step-by-step solution
    "answer": "42"      # Numerical answer only
}

From Hugging Face

from datasets import load_dataset

# Load GSM8K
dataset = load_dataset("openai/gsm8k", "main")

# Convert to AnyMaths format
def convert_example(example):
    # Extract numerical answer from "#### 42" format
    answer = example["answer"].split("####")[-1].strip()
    return {
        "input": example["question"],
        "answer": answer,
        "additional_context": {}
    }

train_data = [convert_example(ex) for ex in dataset["train"]]
val_data = [convert_example(ex) for ex in dataset["test"][:100]]

Custom Dataset

For custom datasets, ensure answers are numerical strings:
train_data = [
    {
        "input": "What is 2 + 2?",
        "answer": "4",  # Not "4 apples" or "The answer is 4"
        "additional_context": {
            "difficulty": "easy",
            "category": "arithmetic"
        }
    },
    # ... more examples
]

Complete Example

import gepa
from gepa.adapters.anymaths_adapter import AnyMathsAdapter
from datasets import load_dataset

# 1. Load GSM8K dataset
dataset = load_dataset("openai/gsm8k", "main")

def convert_example(example):
    answer = example["answer"].split("####")[-1].strip()
    return {
        "input": example["question"],
        "answer": answer,
        "additional_context": {}
    }

train_data = [convert_example(ex) for ex in dataset["train"][:50]]
val_data = [convert_example(ex) for ex in dataset["train"][50:100]]
test_data = [convert_example(ex) for ex in dataset["test"][:50]]

# 2. Create seed prompt
seed_prompt = """
You are an AI assistant that solves mathematical word problems.

Provide:
1. Step-by-step solution in the solution_pad field
2. Final numerical answer in the final_answer field (no units, no text)

Be precise and show your work clearly.
"""

# 3. Create adapter (Ollama - FREE)
adapter = AnyMathsAdapter(
    model="ollama/qwen3:4b",
    api_base="http://localhost:11434",
    max_litellm_workers=4,
    failure_score=0.0
)

# 4. Optimize
result = gepa.optimize(
    seed_candidate={"system_prompt": seed_prompt},
    trainset=train_data,
    valset=val_data,
    adapter=adapter,
    max_metric_calls=500,
    reflection_lm="ollama/qwen3:8b"
)

# 5. Evaluate on test set
from gepa.adapters.anymaths_adapter import AnyMathsAdapter

test_adapter = AnyMathsAdapter(
    model="ollama/qwen3:4b",
    api_base="http://localhost:11434"
)

test_result = test_adapter.evaluate(
    batch=test_data,
    candidate=result.best_candidate,
    capture_traces=False
)

test_accuracy = sum(test_result.scores) / len(test_result.scores)

print(f"Test Accuracy: {test_accuracy:.1%}")
print(f"\nOptimized Prompt:")
print(result.best_candidate["system_prompt"])

Experimental Results

From the adapter README:
DatasetBase LMReflection LMAccuracy BeforeAccuracy AfterGainBudget
GSM8Kollama/qwen3:4bollama/qwen3:8b18%23%+5 pp500
GSM8Kvertex_ai/gemini-2.5-flash-litevertex_ai/gemini-2.5-flash31%33%+2 pp500
GSM8Kollama/qwen3:0.6bollama/qwen3:8b7%5%-2 pp500
GSM8Kollama/gemma3:1bollama/gemma3:4b9%38%+29 pp500
Best result: +29 percentage points improvement on GSM8K with ollama/gemma3:1b.

Model-Specific Tips

Small Models (< 1B parameters)

Smaller models struggle with structured output:
# Use very explicit seed prompt
seed_prompt = """
You MUST respond with valid JSON in this exact format:
{
  "final_answer": "<number>",
  "solution_pad": "<step by step>"
}

The final_answer field must contain ONLY the numerical answer.
No units, no text, no explanations.
"""

adapter = AnyMathsAdapter(
    model="ollama/qwen3:0.6b",
    api_base="http://localhost:11434"
)

Medium Models (1-4B parameters)

Good balance of cost and performance:
# Works well with moderate guidance
adapter = AnyMathsAdapter(
    model="ollama/qwen3:4b",  # Sweet spot
    api_base="http://localhost:11434",
    max_litellm_workers=4
)

# Use larger model for reflection
result = gepa.optimize(
    ...,
    reflection_lm="ollama/qwen3:8b"  # 2x larger for better reflection
)

Cloud Models

For production use:
# Google Vertex AI
adapter = AnyMathsAdapter(
    model="vertex_ai/gemini-2.5-flash-lite",
    api_base=None,  # Uses default Vertex AI endpoint
    max_litellm_workers=10
)

result = gepa.optimize(
    ...,
    reflection_lm="vertex_ai/gemini-2.5-flash"  # Stronger for reflection
)

# OpenAI
adapter = AnyMathsAdapter(
    model="openai/gpt-4o-mini",
    api_base=None,
    max_litellm_workers=10
)

Prompt Evolution Patterns

GEPA discovers interesting patterns in optimal prompts:

For Small Models

  • Goal-oriented: Clearly states the task objective
  • Chain-of-Thought: Breaks down problem-solving into numbered steps
  • Instruction Detail: Specific guidance on parsing problems and applying formulas
  • Few-shot Learning: Concrete examples of different problem types
  • Knowledge Base: Mini-rulebook with common pitfalls and edge cases
  • Structured Output: Strict output format specification

For Provider Models

  • Concise: Fewer tokens, more direct instructions
  • Straightforward: Main instruction and output format at the top
  • Structured Guidelines: Detailed guidelines follow main instruction
See full prompt examples in the adapter README.

Cost Comparison

Ollama (Local - FREE)

# Total cost: $0.00
adapter = AnyMathsAdapter(
    model="ollama/qwen3:4b",
    api_base="http://localhost:11434"
)
result = gepa.optimize(
    ...,
    reflection_lm="ollama/qwen3:8b",
    max_metric_calls=500
)
Requirements:
  • Install Ollama locally
  • Download models (3-4GB each)
  • ~8GB RAM minimum

OpenAI API

# Approximate cost: $5-10 for 500 calls
adapter = AnyMathsAdapter(
    model="openai/gpt-4o-mini",  # $0.15/1M input tokens
    api_base=None
)
result = gepa.optimize(
    ...,
    reflection_lm="openai/gpt-4",  # Proposal only (~10-20 calls)
    max_metric_calls=500
)

Google Vertex AI

# Approximate cost: $2-5 for 500 calls
adapter = AnyMathsAdapter(
    model="vertex_ai/gemini-2.5-flash-lite",
    api_base=None
)
result = gepa.optimize(
    ...,
    reflection_lm="vertex_ai/gemini-2.5-flash",
    max_metric_calls=500
)

Best Practices

  1. Start Small: Test with 10-20 examples before full optimization
  2. Answer Format: Ensure answers are purely numerical strings
  3. Budget: Use 500 calls for small models, 200-300 for larger models
  4. Reflection LM: Use a model 2-4x larger than task model
  5. Local First: Develop with Ollama, deploy with cloud APIs
  6. Test Set: Always evaluate on held-out test set

Troubleshooting

JSON Parsing Errors

# Model not following structured output
# Solution: Make seed prompt more explicit

seed_prompt = """
You MUST respond with JSON in this EXACT format:
{
  "final_answer": "42",
  "solution_pad": "Step 1: ..."
}

Do NOT include any text outside the JSON.
The final_answer must be ONLY a number.
"""

Low Accuracy

# Check if answer matching is too strict
# Try substring matching vs exact matching

# Adapter already uses substring matching:
# score = 1.0 if data["answer"] in assistant_response["final_answer"] else 0.0

# If still failing, check answer format in dataset
print(train_data[0]["answer"])  # Should be "42", not "42 apples"

Ollama Connection Error

# Check Ollama is running
curl http://localhost:11434/api/tags

# Start Ollama if needed
ollama serve

# Verify model is downloaded
ollama list

Advanced Usage

Mixed Provider Strategy

# Use cheap model for task, expensive for reflection
adapter = AnyMathsAdapter(
    model="ollama/qwen3:4b",  # Free local
    api_base="http://localhost:11434"
)

result = gepa.optimize(
    ...,
    reflection_lm="openai/gpt-4"  # Pay only for proposals
)

Domain-Specific Context

train_data = [
    {
        "input": "A train travels at 60 mph for 2 hours...",
        "answer": "120",
        "additional_context": {
            "domain": "physics",
            "concept": "speed-distance-time",
            "difficulty": "medium"
        }
    },
    # ... more examples
]

# Additional context is used in feedback but not shown to model

See Also

Build docs developers (and LLMs) love