Skip to main content

AI Model Configurations

AI coding assistants use various large language models (LLMs) with specific configurations to balance performance, cost, and capabilities. This page documents the models and their configurations across different tools.

Primary Models Used

Claude Sonnet 4

Provider: AnthropicUsed By: Cursor, Augment, Claude Code, AmpCapabilities:
  • Advanced reasoning
  • Long context (200k+ tokens)
  • Tool use and function calling
  • Multi-modal (text + images)

GPT-4.1

Provider: OpenAIUsed By: Cursor AgentCapabilities:
  • High-quality code generation
  • Structured output
  • Function calling
  • Vision capabilities

GPT-5

Provider: OpenAIUsed By: Amp (experimental)Capabilities:
  • Enhanced reasoning
  • Better context utilization
  • Improved code understanding
  • Faster inference

o3

Provider: OpenAIUsed By: Amp OracleCapabilities:
  • Deep reasoning model
  • Code reviews
  • Architecture planning
  • Complex debugging

Model Configurations

Amp Configuration (Claude 4 Sonnet)

system:
  - type: text
    text: >
      You are Amp, a powerful AI coding agent built by Sourcegraph.
      You help the user with software engineering tasks.
      
      # Role & Agency
      - Do the task end to end. Don't hand back half-baked work.
      - Balance initiative with restraint
      - Do not add explanations unless asked
  - type: text
    text: >
      # Environment
      Today's date: Mon Sep 15 2025
      Working directory: /c:/Users/user/project
      Operating system: windows
    cache_control:
      type: ephemeral
  - type: text
    text: >
      You MUST answer concisely with fewer than 4 lines of text.
Key Features:
  • Multi-section system prompt
  • Ephemeral caching for environment data
  • Strict conciseness requirements

Amp Configuration (GPT-5)

model: gpt-5
~debugParamsUsed:
  model: gpt-5
  input:
    - role: system
      content: >
        You are Amp, a powerful AI coding agent.
        
        # Guardrails
        - Simple-first: prefer smallest, local fix
        - Reuse-first: search for existing patterns
        - No surprise edits: show plan if >3 files
        - No new deps without approval
        
        MINIMIZE REASONING: Think efficiently and act quickly.
  store: false
  include:
    - reasoning.encrypted_content
Key Features:
  • Emphasis on minimal reasoning
  • Guardrails for safe operations
  • Encrypted reasoning content
  • Non-persistent storage

Cursor Agent Configuration

You are powered by the model named GPT-4.1.
Knowledge cutoff: 2024-06

Image input capabilities: Enabled

You operate in Cursor.
You are an agent - keep going until resolved.
Features:
  • Explicit knowledge cutoff date
  • Multi-modal input support
  • Autonomous agent mode

Claude Code Configuration

You are powered by the model named Sonnet 4.
The exact model ID is claude-sonnet-4-20250514.
Assistant knowledge cutoff is January 2025.
Features:
  • Specific model version tracking
  • Clear knowledge cutoff
  • Minimal configuration overhead

Augment Code Configuration

# Identity
You are Augment Agent developed by Augment Code.
Base model: Claude Sonnet 4 by Anthropic.
The current date is 1848-15-03.
Features:
  • Explicit base model attribution
  • Dynamic date injection
  • Brand identity emphasis

Token Budget Management

Cursor Approach

<budget:token_budget>1000000</budget:token_budget>
  • Large token budget for complex tasks
  • Allows extensive context gathering
  • Supports parallel tool operations

Claude Code Approach

You should minimize output tokens as much as possible while 
maintaining helpfulness, quality, and accuracy.

Keep responses under 4 lines unless user asks for detail.
  • Aggressive token conservation
  • Minimal explanations
  • Direct, concise responses

Model Selection by Task

Best Models: GPT-4.1, Claude Sonnet 4Why:
  • Strong code completion
  • Pattern recognition
  • Syntax accuracy
  • Multi-language support
Example Configuration:
When making code changes, NEVER output code to the USER.
Instead use code edit tools to implement the change.

Generated code must be run immediately by the USER.
Add all necessary imports and dependencies.

Temperature and Sampling

Most tools use default or near-default temperature settings:
# Typical configuration
temperature = 0.7  # Balanced creativity and consistency
top_p = 0.95       # Nucleus sampling
max_tokens = 4096  # Response length limit
Variations by Tool:
ToolTemperatureTop PMax TokensNotes
CursorDefaultDefault4096Standard settings
Claude CodeDefaultDefaultVariableOptimized for conciseness
AmpDefaultDefaultVariableContext-dependent
AugmentDefaultDefault8192Longer responses allowed

Function Calling & Tool Use

Anthropic Format (Claude)

{
  "name": "multi_tool_use.parallel",
  "description": "Run multiple tools simultaneously",
  "parameters": {
    "tool_uses": [
      {
        "recipient_name": "functions.read_file",
        "parameters": {"target_file": "main.py"}
      },
      {
        "recipient_name": "functions.grep",
        "parameters": {"pattern": "class.*:"}
      }
    ]
  }
}
Features:
  • Parallel tool execution
  • Namespaced functions
  • Structured parameters

OpenAI Format (GPT)

{
  "tool_calls": [
    {
      "id": "call_abc123",
      "type": "function",
      "function": {
        "name": "read_file",
        "arguments": "{\"path\": \"main.py\"}"
      }
    }
  ]
}
Features:
  • Unique call IDs
  • JSON string arguments
  • Sequential by default

Context Window Optimization

Caching Strategies

cache_control:
  type: ephemeral
Used for:
  • Environment information
  • File directory listings
  • Repository context
  • Static documentation
Benefits:
  • Reduced token costs
  • Faster response times
  • Consistent context
As conversations grow, older messages may be pruned to:
- Stay within context limits
- Reduce latency
- Lower costs

Critical information is retained:
- User goals
- Recent file changes
- Active task state

Streaming and Real-Time Updates

Most tools support streaming responses:
// Typical streaming implementation
const stream = await model.streamCompletion({
  messages: [...],
  tools: [...],
  stream: true
});

for await (const chunk of stream) {
  if (chunk.type === 'content_block_delta') {
    process(chunk.delta);
  } else if (chunk.type === 'tool_use') {
    executeTools(chunk.tools);
  }
}
Advantages:
  • Perceived faster responses
  • Progressive rendering
  • Early tool execution
  • Better user experience

Cost Optimization Patterns

Minimize Reasoning

MINIMIZE REASONING: Avoid verbose
reasoning blocks. Think efficiently
and act quickly.
Saves tokens by reducing explanation overhead.

Parallel Execution

Default to parallel for all independent work:
reads, searches, diagnostics, writes.
Reduces round trips and total conversation length.

Tool Result Filtering

Results capped for responsiveness.
Output limited to 50000 characters.
Use head_limit to control size.
Prevents excessive token usage from large results.

Aggressive Conciseness

Answer concisely with fewer than 4 lines.
One word answers are best.
Dramatically reduces output token costs.

Extended Context Windows

Models are moving toward:
  • 1M+ token context windows
  • Better long-range coherence
  • Reduced need for pruning
  • Full repository context

Specialized Models

Trend toward role-specific models:
  • Fast models: Quick responses, simple tasks
  • Reasoning models: Complex planning, reviews
  • Code models: Optimized for programming
  • Multimodal models: Code + diagrams + UI
Model configurations are frequently updated. Check with your tool’s documentation for the latest supported models and parameters.

Build docs developers (and LLMs) love