Skip to main content
All features are generally available as of February 18, 2026.

Overview

FeatureProblem SolvedToken SavingsAvailability
Programmatic Tool CallingMulti-step agent loops burn tokens on round trips~37% reductionAPI, Foundry (GA)
Dynamic FilteringWeb search/fetch results bloat context with irrelevant content~24% fewer input tokensAPI, Foundry (GA)
Tool Search ToolToo many tool definitions bloat context~85% reductionAPI, Foundry (GA)
Tool Use ExamplesSchema alone can’t express usage patterns72% → 90% accuracyAPI, Foundry (GA)
Strategic layering — start with your biggest bottleneck:
  • Context bloat from tool definitions → Tool Search Tool
  • Large intermediate results → Programmatic Tool Calling
  • Web search noise → Dynamic Filtering
  • Parameter errors → Tool Use Examples

Programmatic Tool Calling (PTC)

The Paradigm Shift

User prompt → Claude → Tool call 1 → Response 1 → Claude → Tool call 2 → Response 2 → Claude → Tool call 3 → Response 3 → Claude → Final answer
Before: Each tool call requires a full model round trip. 3 tools = 3 inference passes. After: Claude writes code that orchestrates all tools. Only the final stdout enters the context window. 3 tools = 1 inference pass.

How It Works

1

Define tools

You define tools with allowed_callers: ["code_execution_20250825"]
2

Claude writes Python

Claude writes Python that calls those tools as async functions inside a sandbox
3

Tool execution

When a tool function is called, the sandbox pauses and the API returns a tool_use block
4

Provide result

You provide the tool result — it goes to the running code, not Claude’s context
5

Code resumes

Code resumes, processes results, calls more tools if needed
6

Final output

Only stdout from the final execution reaches Claude

Key Configuration

{
  "tools": [
    {
      "type": "code_execution_20250825",
      "name": "code_execution"
    },
    {
      "name": "query_database",
      "description": "Execute a SQL query. Returns rows as JSON objects with fields: id (str), name (str), revenue (float).",
      "input_schema": {
        "type": "object",
        "properties": {
          "sql": { "type": "string", "description": "SQL query to execute" }
        },
        "required": ["sql"]
      },
      "allowed_callers": ["code_execution_20250825"]
    }
  ]
}

The allowed_callers Field

ValueBehavior
["direct"]Traditional tool calling only (default if omitted)
["code_execution_20250825"]Only callable from Python sandbox
["direct", "code_execution_20250825"]Both modes available
Recommendation: Choose one mode per tool, not both. This gives Claude clearer guidance.

Advanced Patterns

Process N items in 1 inference pass:
regions = ["West", "East", "Central", "North", "South"]
results = {}
for region in regions:
    data = await query_database(
        f"SELECT SUM(revenue) FROM sales WHERE region='{region}'"
    )
    results[region] = data[0]["revenue"]

top = max(results.items(), key=lambda x: x[1])
print(f"Top region: {top[0]} with ${top[1]:,}")

Model Compatibility

Claude Opus 4.6
Claude Sonnet 4.6
Claude Sonnet 4.5
Claude Opus 4.5

Constraints

ConstraintDetail
Not on Bedrock/VertexAPI and Foundry only
No MCP toolsMCP connector tools cannot be called programmatically
No web search/fetchWeb tools not supported in PTC
No structured outputsstrict: true tools incompatible
No forced tool choicetool_choice cannot force PTC
Container lifetime~4.5 minutes before expiry
ZDRNot covered by Zero Data Retention
Tool results as stringsValidate external results for code injection risks

When to Use PTC

Good Use Cases

  • Processing large datasets needing aggregates
  • 3+ dependent tool calls in sequence
  • Filtering/transforming results before Claude sees them
  • Parallel operations across many items
  • Conditional logic based on intermediate results

Less Ideal

  • Single tool calls with simple responses
  • Tools needing immediate user feedback
  • Very fast operations (overhead > benefit)

Token Efficiency

  • Tool results from programmatic calls are not added to Claude’s context — only final stdout
  • Intermediate processing happens in code, not model tokens
  • 10 tools programmatically ≈ 1/10th the tokens of 10 direct calls

Dynamic Filtering for Web Search/Fetch

The Problem

Web search and fetch tools dump full HTML pages into Claude’s context window. Most of that content is irrelevant — navigation, ads, boilerplate. Claude then reasons over all of it, wasting tokens and reducing accuracy.

The Solution

Claude now writes and executes Python code to filter web results before they enter the context window. Instead of reasoning over raw HTML, Claude filters, parses, and extracts only relevant content in a sandbox.
Query → Search results → Fetch full HTML × N pages → All content enters context → Claude reasons over everything

API Configuration

Uses updated tool type versions with a beta header:
{
  "model": "claude-opus-4-6",
  "max_tokens": 4096,
  "tools": [
    {
      "type": "web_search_20260209",
      "name": "web_search"
    },
    {
      "type": "web_fetch_20260209",
      "name": "web_fetch"
    }
  ]
}
Header required: anthropic-beta: code-execution-web-tools-2026-02-09
Enabled by default when using the new tool type versions with Sonnet 4.6 and Opus 4.6.

Benchmark Results

BrowseComp (finding specific information on websites):
ModelWithout FilteringWith FilteringImprovement
Sonnet 4.633.3%46.6%+13.3 pp
Opus 4.645.3%61.6%+16.3 pp
DeepsearchQA (multi-step research, F1 score):
ModelWithout FilteringWith FilteringImprovement
Sonnet 4.652.6%59.4%+6.8 pp
Opus 4.669.8%77.3%+7.5 pp
Token efficiency: Average 24% fewer input tokens. Sonnet 4.6 sees cost reduction; Opus 4.6 may increase slightly due to more complex filtering code.

Use Cases

  • Sifting through technical documentation
  • Verifying citations across multiple sources
  • Cross-referencing search results
  • Multi-step research queries
  • Finding specific data points buried in large pages

Tool Search Tool

The Problem

Loading all tool definitions upfront wastes context. If you have 50 MCP tools at ~1.5K tokens each, that’s 75K tokens before the user even asks a question.

The Solution

Mark infrequently-used tools with defer_loading: true. They’re excluded from the initial context. Claude discovers them on-demand via a Tool Search Tool.

Configuration

{
  "tools": [
    {
      "type": "mcp_toolset",
      "mcp_server_name": "google-drive",
      "default_config": { "defer_loading": true },
      "configs": {
        "search_files": { "defer_loading": false }
      }
    }
  ]
}

Best Practices

Keep essential tools loaded

Keep 3-5 most-used tools always loaded, defer the rest

Clear descriptions

Write clear, descriptive tool names and descriptions (search relies on them)

Document capabilities

Document available capabilities in the system prompt

When to Use

  • Tool definitions consuming > 10K tokens
  • 10+ tools available
  • Multiple MCP servers
  • Tool selection accuracy issues from too many options

Token Savings

~85% reduction in tool definition tokens (77K → 8.7K in Anthropic’s benchmarks)

Claude Code Equivalent

Claude Code has MCP tool search auto mode (enabled by default since v2.1.7). When MCP tool descriptions exceed 10% of context, they’re deferred and discovered via MCPSearch. Configure the threshold with ENABLE_TOOL_SEARCH=auto:N where N is the context percentage (0-100).

Tool Use Examples

The Problem

JSON schemas define structure but can’t express:
  • When to include optional parameters
  • Which parameter combinations make sense
  • Format conventions (date formats, ID patterns)
  • Nested structure usage

The Solution

Add input_examples to tool definitions — concrete usage patterns beyond the schema.

Configuration

{
  "name": "create_ticket",
  "description": "Create a support ticket",
  "input_schema": {
    "type": "object",
    "properties": {
      "title": { "type": "string" },
      "priority": { 
        "type": "string", 
        "enum": ["low", "medium", "high", "critical"] 
      },
      "assignee": { "type": "string" },
      "labels": { 
        "type": "array", 
        "items": { "type": "string" } 
      }
    },
    "required": ["title"]
  },
  "input_examples": [
    {
      "title": "Login page returns 500 error",
      "priority": "critical",
      "assignee": "oncall-team",
      "labels": ["bug", "auth", "production"]
    },
    {
      "title": "Add dark mode support",
      "priority": "low",
      "labels": ["feature-request", "ui"]
    },
    {
      "title": "Update API docs for v2 endpoints"
    }
  ]
}

Best Practices

Use realistic data, not placeholder strings like “example_value”
Show variety: minimal, partial, and full specifications
Keep concise: 1-5 examples per tool
Focus on resolving ambiguity — target behavioral clarity over schema completeness
Show parameter correlations (e.g., priority: "critical" tends to have assignee)

Results

72% → 90% accuracy on complex parameter handling in Anthropic’s benchmarks

Claude Code Relevance

What applies directly to Claude Code users

FeatureClaude Code StatusAction
Tool SearchBuilt-in since v2.1.7 as MCPSearch auto modeTune ENABLE_TOOL_SEARCH=auto:N if you have many MCP tools
Dynamic FilteringNot available in CLI (API-level web tools)Relevant for Agent SDK users doing web research
PTCNot available in CLIRelevant for Agent SDK users building custom agents
Tool Use ExamplesNot configurable in CLIRelevant for custom MCP server authors

For Agent SDK developers

If you’re building agents with @anthropic-ai/claude-agent-sdk, PTC is immediately actionable:
1

Add code execution

Add code_execution_20250825 to your tools array
2

Set allowed_callers

Set allowed_callers on tools that benefit from batching/filtering
3

Implement tool result loop

Implement the tool result loop (pause → provide result → resume)
4

Return structured data

Return structured data (JSON) from tools for easier programmatic parsing

For MCP server authors

If you’re building custom MCP servers, Tool Use Examples can improve how Claude uses your tools:
  • Add input_examples to tool schemas
  • Document return formats clearly in descriptions (PTC needs to parse them)

Sources

Build docs developers (and LLMs) love