Skip to main content
The agents section controls how Grip’s AI agents behave, which models they use, and how they manage context and memory.

Agent Defaults

Default parameters applied to every agent run unless overridden by profiles or CLI flags.

Model Selection

agents.defaults.model
string
default:"openrouter/anthropic/claude-sonnet-4"
Default LLM model in provider/model format.Examples:
  • openrouter/anthropic/claude-sonnet-4
  • anthropic/claude-sonnet-4-20250514
  • openai/gpt-4o
  • deepseek/deepseek-chat
agents.defaults.provider
string
default:""
Explicit provider name to override prefix-based detection.Useful when model names are ambiguous (e.g., openai/gpt-oss-120b on OpenRouter).Options: openrouter, anthropic, openai, deepseek, groq, gemini, etc.
agents.defaults.engine
string
default:"claude_sdk"
Agent execution engine.
  • claude_sdk - Primary engine using Claude’s Agent SDK (Claude models only)
  • litellm - Fallback engine supporting any model via LiteLLM
agents.defaults.sdk_model
string
default:"claude-sonnet-4-6"
Claude model to use when engine=claude_sdk.Options:
  • claude-opus-4-6
  • claude-sonnet-4-6
  • claude-haiku-4-5-20251001

Generation Parameters

agents.defaults.max_tokens
integer
default:"8192"
Maximum tokens the LLM can generate per response.
agents.defaults.temperature
float
default:"0.7"
Sampling temperature for LLM responses.
  • Lower (0.0-0.5): More deterministic, focused
  • Medium (0.5-1.0): Balanced creativity
  • Higher (1.0-2.0): More creative, varied

Execution Control

agents.defaults.max_tool_iterations
integer
default:"0"
Maximum LLM-tool round-trips before the agent stops.
  • 0 = unlimited (default)
  • Set to a positive number to limit agent autonomy
agents.defaults.dry_run
boolean
default:"false"
When true, tools simulate execution without writing files or running commands.Useful for testing agent behavior safely.

Memory and Context

agents.defaults.memory_window
integer
default:"50"
Number of recent messages to include in LLM context.Larger values provide more context but consume more tokens.
agents.defaults.auto_consolidate
boolean
default:"true"
Automatically consolidate old messages when session exceeds 2x memory window.Summarizes older messages to reduce token usage while preserving key information.
agents.defaults.consolidation_model
string
default:""
LLM model for summarization/consolidation.
  • Empty string = use main model
  • Set to a cheaper model (e.g., openrouter/google/gemini-flash-2.0) to save tokens
agents.defaults.enable_self_correction
boolean
default:"true"
When true, the agent reflects on failed tool calls before proceeding.Improves reliability but adds extra LLM calls for error recovery.

Caching and Rate Limiting

agents.defaults.semantic_cache_enabled
boolean
default:"true"
Cache LLM responses for identical queries to save tokens and latency.
agents.defaults.semantic_cache_ttl
integer
default:"3600"
Time-to-live for cached responses in seconds (default: 1 hour).
agents.defaults.max_daily_tokens
integer
default:"0"
Maximum total tokens (prompt + completion) per day.
  • 0 = unlimited
  • Set a limit to control costs

Workspace

agents.defaults.workspace
string
default:"~/.grip/workspace"
Root workspace directory for agent files, sessions, and memory.
agents.defaults.sdk_permission_mode
string
default:"acceptEdits"
SDK permission mode for file operations.
  • acceptEdits - Auto-accept file edits (default)
  • bypassPermissions - Skip all permission checks
  • default - Prompt for each operation

Model Tiers (Cost-Aware Routing)

Automatic model routing based on prompt complexity. Allows using cheaper models for simple tasks and powerful models for complex ones.
agents.model_tiers.enabled
boolean
default:"false"
Enable automatic model routing based on prompt complexity.
agents.model_tiers.low
string
default:""
Model for simple queries (greetings, lookups, regex).Example: openrouter/google/gemini-flash-2.0
agents.model_tiers.medium
string
default:""
Model for moderate tasks (code changes, explanations).Leave empty to use agents.defaults.model.
agents.model_tiers.high
string
default:""
Model for complex tasks (architecture, refactors, debugging).Example: anthropic/claude-opus-4-6

Example Configuration

{
  "agents": {
    "defaults": {
      "model": "openrouter/anthropic/claude-sonnet-4",
      "temperature": 0.7,
      "max_tokens": 8192,
      "memory_window": 50
    },
    "model_tiers": {
      "enabled": true,
      "low": "openrouter/google/gemini-flash-2.0",
      "medium": "",
      "high": "anthropic/claude-opus-4-6"
    }
  }
}

Agent Profiles

Named profiles with custom models, tool subsets, and system prompts. Profiles let you create specialized agents for specific tasks.
agents.profiles.{name}.model
string
default:""
Model override for this profile. Empty = inherit from defaults.
agents.profiles.{name}.max_tokens
integer
default:"0"
Max tokens override. 0 = inherit from defaults.
agents.profiles.{name}.temperature
float
default:"-1.0"
Temperature override. -1.0 = inherit from defaults.
agents.profiles.{name}.max_tool_iterations
integer
default:"0"
Max iterations override. 0 = inherit from defaults.
agents.profiles.{name}.tools_allowed
array
default:"[]"
Tool names this profile can use. Empty = all tools.Supports wildcards: ["read", "write", "mcp__*"]
agents.profiles.{name}.tools_denied
array
default:"[]"
Tool names explicitly blocked for this profile.
agents.profiles.{name}.system_prompt_file
string
default:""
Workspace-relative path to a custom identity file.Example: agents/researcher.md

Example Profiles

{
  "agents": {
    "profiles": {
      "researcher": {
        "model": "openrouter/google/gemini-flash-2.0",
        "tools_allowed": ["web_search", "read", "write"],
        "system_prompt_file": "agents/researcher.md"
      },
      "coder": {
        "model": "anthropic/claude-sonnet-4",
        "tools_denied": ["web_search"],
        "max_tool_iterations": 20
      },
      "safe_agent": {
        "tools_denied": ["bash", "shell"]
      }
    }
  }
}

Using Profiles

# Use a specific profile
grip chat --profile researcher

# Environment variable
export GRIP_PROFILE=coder
grip chat

CLI Overrides

All agent settings can be overridden via CLI flags:
# Override model
grip chat --model "anthropic/claude-opus-4-6"

# Override temperature
grip chat --temperature 0.9

# Set max iterations
grip chat --max-iterations 10

# Dry run mode
grip chat --dry-run

Best Practices

  • Development/Testing: Use faster, cheaper models like gemini-flash or gpt-4o-mini
  • Production: Use claude-sonnet-4 for balanced performance
  • Complex Tasks: Use claude-opus-4 for architecture and refactoring
  • Cost Optimization: Enable model tiers to route automatically
  • Default memory_window=50 works for most conversations
  • Increase to 100-200 for complex, long-running tasks
  • Enable auto_consolidate to prevent context overflow
  • Use a cheap consolidation_model to reduce costs
  • Set max_tool_iterations for untrusted environments
  • Use dry_run=true to test agent behavior
  • Create profiles with tools_denied for restricted agents
  • Monitor max_daily_tokens to control costs

Build docs developers (and LLMs) love