Skip to main content

Debug mode

Enable debug logging as a first step for any issue. It writes a detailed trace of all API calls, tool executions, and internal events:
# Logs to ~/.cagent/cagent.debug.log by default
docker agent run config.yaml --debug

# Write to a custom location
docker agent run config.yaml --debug --log-file ./debug.log

# Enable OpenTelemetry tracing for deeper analysis
docker agent run config.yaml --otel
Always enable --debug when reporting issues. Attach the log file to your GitHub issue.

Common errors

Error message: context_length_exceeded or similar.
  • Use /compact in the TUI to summarize and reduce conversation history
  • Set num_history_items in your agent config to limit messages sent per turn
  • Switch to a model with a larger context window (for example, Claude 200K or Gemini 2M)
  • Break large tasks into smaller conversations
The agent hit its max_iterations limit without completing the task.
  • Increase max_iterations in the agent config (default is unlimited, but many agents set 20–50)
  • Enable --debug to check whether the agent is looping on the same tool call
  • Break complex tasks into smaller steps
When the primary model fails, docker-agent automatically switches to configured fallback models. Look for log messages like "Switching to fallback model".
Error codeBehavior
HTTP 429Rate limited — stays on fallback for the cooldown period
HTTP 5xxRetries with exponential backoff, then falls back
HTTP 4xxClient error — skips directly to next fallback model
Configure fallback behavior in your agent config:
agents:
  root:
    model: anthropic/claude-sonnet-4-0
    fallback:
      models: [openai/gpt-4o, openai/gpt-4o-mini]
      retries: 2
      cooldown: 1m

Agent not responding

Each model provider requires its own API key as an environment variable:
ProviderEnvironment variable
OpenAIOPENAI_API_KEY
AnthropicANTHROPIC_API_KEY
Google GeminiGOOGLE_API_KEY
MistralMISTRAL_API_KEY
xAIXAI_API_KEY
AWS BedrockAWS_BEARER_TOKEN_BEDROCK or AWS credentials chain
# Check that your keys are set
env | grep API_KEY
Model names must match the provider’s naming exactly. Common mistakes:
  • Using gpt-4 instead of gpt-4o
  • Using a deprecated model name
  • Model references are case-sensitive: openai/gpt-4oopenai/GPT-4o
If the agent hangs or times out without an error message, check that your machine can reach the provider’s API endpoint. Firewalls, VPNs, and proxy settings can silently block outbound requests.

Tool execution failures

  • Ensure the MCP tool command is installed and on your PATH
  • Check file permissions — tools must be executable
  • Test MCP tools independently before integrating with docker-agent
  • For Docker-based MCP tools (ref: docker:*), ensure Docker Desktop is running
  • Verify the agent has the correct toolset configured (type: filesystem, type: shell)
  • Check that the working directory exists and is accessible
  • On macOS, ensure the terminal app has the necessary permissions (for example, Full Disk Access)
MCP tools using stdio transport must complete the initialization handshake before becoming available. If tools fail silently:
  1. Enable --debug and look for MCP protocol messages in the log
  2. Check that the MCP server process starts and responds to initialize
  3. Verify that environment variables required by the tool are set (check env and env_file in the toolset config)

Configuration errors

docker-agent validates config at startup and reports errors with line numbers. Common problems:
  • Incorrect indentation (YAML is whitespace-sensitive)
  • Missing quotes around values containing special characters (:, #, {, })
  • Tabs instead of spaces
Use the JSON schema in your editor for real-time validation and autocompletion.
  • All agents listed in sub_agents must be defined in the agents section
  • Named model references must exist in the models section (or use inline format like openai/gpt-4o)
  • RAG source names referenced by agents must be defined in the rag section
  • The path field is only valid for memory toolsets
  • MCP toolsets need either command (stdio), remote (SSE/HTTP), or ref (Docker)
  • Provider names must be one of the supported values: openai, anthropic, google, amazon-bedrock, dmr, etc.

Session and connectivity issues

When running docker-agent as an API server or MCP server, ensure the port is not already in use:
# Check if port 8080 is in use
lsof -i :8080

# Use a different port
docker agent serve api config.yaml --listen :9090
For remote MCP servers, verify the endpoint is accessible:
# Test an SSE endpoint
curl -v https://mcp-server.example.com/sse
In API server mode, each client receives isolated sessions. If sessions appear to bleed into each other:
  • Verify that client IDs are unique per connection
  • Check session timeouts and cleanup events in the debug log

Performance issues

  • Large context windows (64K+ tokens) consume significant memory — consider reducing max_tokens
  • Set num_history_items in the agent config to cap conversation history
  • For DMR (local models), tune runtime_flags for your hardware (for example, --ngl for GPU layers)
  • Check if MCP tools are adding latency — this is visible in the debug log as time between tool call and result events
  • Use /cost in the TUI to see token usage and identify expensive interactions

Log analysis

When reviewing debug logs, search for these patterns:
Log patternWhat it indicates
"Starting runtime stream"Agent execution beginning
"Tool call"A tool is being executed
"Tool call result"Tool execution completed
"Stream stopped"Agent finished processing
HTTP 429Rate limiting — consider adding a fallback model
context canceledOperation was interrupted (timeout or user cancel)
[RAG Manager]RAG retrieval operations
[Reranker]Reranking operations

Agent store issues

# Test registry connectivity
docker pull docker.io/username/agent:latest

# Verify pulled agent content
docker agent share pull docker.io/username/agent:latest
  • Validate the YAML locally with docker agent run before pushing
  • Check that resources referenced in the config (MCP tools, files) are available on the target machine
  • For auto-refresh (--pull-interval), verify that the registry is reachable from the server
If these steps don’t resolve your issue, file a bug on the GitHub issue tracker with your debug log attached, or ask on Slack.

Build docs developers (and LLMs) love