ApiHandler interface, allowing seamless switching between different AI services.
Supported Providers
The code generator supports over 40 LLM providers:Major Cloud Providers
- Anthropic - Claude models with extended thinking
- OpenAI - GPT models including o1/o3/o4 reasoning models
- OpenRouter - Access to multiple models through a unified API
- Google Gemini - Gemini models with thinking capabilities
- AWS Bedrock - Claude and other models via AWS
- Azure OpenAI - OpenAI models through Azure
- Google Vertex AI - Gemini models via Vertex
Specialized Providers
- Reasoning Models: DeepSeek, XAI (Grok)
- Fast Inference: Groq, Cerebras, Fireworks
- Open Source: Ollama, LM Studio, Together AI
- Enterprise: SAP AI Core, Huawei Cloud MaaS
- Development: LiteLLM, Vercel AI Gateway
- Regional: Qwen, Doubao, Moonshot, Minimax, ZAi
Self-Hosted & Custom
- Ollama - Run models locally
- LM Studio - Local model hosting
- Custom Providers - Add your own provider
Provider Architecture
ApiHandler Interface
All providers implement theApiHandler interface:
Key Components
Main method that sends messages to the LLM and returns a streaming response
Returns the current model ID and metadata (context window, pricing, features)
Retrieves detailed usage statistics after streaming completes
Cancels an in-progress request
How Provider Selection Works
The system builds handlers dynamically based on configuration:Dual Mode Support
The code generator supports two operational modes:- Plan Mode: Strategic planning and decision-making
- Act Mode: Code execution and implementation
Common Handler Options
All handlers accept these base options:Callback invoked when retrying failed requests
Stream Format
Providers return anApiStream that yields different chunk types:
Text Chunks
Tool Call Chunks
Reasoning Chunks
Usage Chunks
Message Transformation
Each provider transforms messages to its native format:- Anthropic: Uses
sanitizeAnthropicMessages()with cache control - OpenAI: Uses
convertToOpenAiMessages() - Gemini: Uses
convertAnthropicMessageToGemini() - Ollama: Uses
convertToOllamaMessages()
Features by Provider
| Feature | Anthropic | OpenAI | OpenRouter | Gemini |
|---|---|---|---|---|
| Tool Calling | ✓ | ✓ | ✓ | ✓ |
| Streaming | ✓ | ✓ | ✓ | ✓ |
| Prompt Caching | ✓ | ✓ | ✓ | ✓ |
| Extended Thinking | ✓ | o1/o3/o4 | Model-dependent | ✓ |
| Reasoning Effort | Budget tokens | low/medium/high | Model-dependent | low/high |
Error Handling
All handlers use the@withRetry() decorator for automatic retry logic:
- Max retries: 3
- Base delay: 1000ms
- Max delay: 10000ms
- Exponential backoff with jitter
Next Steps
Anthropic Setup
Configure Claude models with extended thinking
OpenAI Setup
Set up GPT models and reasoning models
OpenRouter Setup
Access multiple providers through one API
Custom Provider
Add your own LLM provider