agents section controls how Grip’s AI agents behave, which models they use, and how they manage context and memory.
Agent Defaults
Default parameters applied to every agent run unless overridden by profiles or CLI flags.Model Selection
Default LLM model in
provider/model format.Examples:openrouter/anthropic/claude-sonnet-4anthropic/claude-sonnet-4-20250514openai/gpt-4odeepseek/deepseek-chat
Explicit provider name to override prefix-based detection.Useful when model names are ambiguous (e.g.,
openai/gpt-oss-120b on OpenRouter).Options: openrouter, anthropic, openai, deepseek, groq, gemini, etc.Agent execution engine.
claude_sdk- Primary engine using Claude’s Agent SDK (Claude models only)litellm- Fallback engine supporting any model via LiteLLM
Claude model to use when
engine=claude_sdk.Options:claude-opus-4-6claude-sonnet-4-6claude-haiku-4-5-20251001
Generation Parameters
Maximum tokens the LLM can generate per response.
Sampling temperature for LLM responses.
- Lower (0.0-0.5): More deterministic, focused
- Medium (0.5-1.0): Balanced creativity
- Higher (1.0-2.0): More creative, varied
Execution Control
Maximum LLM-tool round-trips before the agent stops.
0= unlimited (default)- Set to a positive number to limit agent autonomy
When true, tools simulate execution without writing files or running commands.Useful for testing agent behavior safely.
Memory and Context
Number of recent messages to include in LLM context.Larger values provide more context but consume more tokens.
Automatically consolidate old messages when session exceeds 2x memory window.Summarizes older messages to reduce token usage while preserving key information.
LLM model for summarization/consolidation.
- Empty string = use main model
- Set to a cheaper model (e.g.,
openrouter/google/gemini-flash-2.0) to save tokens
When true, the agent reflects on failed tool calls before proceeding.Improves reliability but adds extra LLM calls for error recovery.
Caching and Rate Limiting
Cache LLM responses for identical queries to save tokens and latency.
Time-to-live for cached responses in seconds (default: 1 hour).
Maximum total tokens (prompt + completion) per day.
0= unlimited- Set a limit to control costs
Workspace
Root workspace directory for agent files, sessions, and memory.
SDK permission mode for file operations.
acceptEdits- Auto-accept file edits (default)bypassPermissions- Skip all permission checksdefault- Prompt for each operation
Model Tiers (Cost-Aware Routing)
Automatic model routing based on prompt complexity. Allows using cheaper models for simple tasks and powerful models for complex ones.Enable automatic model routing based on prompt complexity.
Model for simple queries (greetings, lookups, regex).Example:
openrouter/google/gemini-flash-2.0Model for moderate tasks (code changes, explanations).Leave empty to use
agents.defaults.model.Model for complex tasks (architecture, refactors, debugging).Example:
anthropic/claude-opus-4-6Example Configuration
Agent Profiles
Named profiles with custom models, tool subsets, and system prompts. Profiles let you create specialized agents for specific tasks.Model override for this profile. Empty = inherit from defaults.
Max tokens override. 0 = inherit from defaults.
Temperature override. -1.0 = inherit from defaults.
Max iterations override. 0 = inherit from defaults.
Tool names this profile can use. Empty = all tools.Supports wildcards:
["read", "write", "mcp__*"]Tool names explicitly blocked for this profile.
Workspace-relative path to a custom identity file.Example:
agents/researcher.mdExample Profiles
Using Profiles
CLI Overrides
All agent settings can be overridden via CLI flags:Best Practices
Choosing Models
Choosing Models
- Development/Testing: Use faster, cheaper models like
gemini-flashorgpt-4o-mini - Production: Use
claude-sonnet-4for balanced performance - Complex Tasks: Use
claude-opus-4for architecture and refactoring - Cost Optimization: Enable model tiers to route automatically
Memory Management
Memory Management
- Default
memory_window=50works for most conversations - Increase to 100-200 for complex, long-running tasks
- Enable
auto_consolidateto prevent context overflow - Use a cheap
consolidation_modelto reduce costs
Safety Controls
Safety Controls
- Set
max_tool_iterationsfor untrusted environments - Use
dry_run=trueto test agent behavior - Create profiles with
tools_deniedfor restricted agents - Monitor
max_daily_tokensto control costs