Overview
HAI Build Code Generator supports multiple LLM providers, giving you flexibility to use the best models for your workflow. Configure API keys, base URLs, and model-specific settings for each provider.Supported Providers
HAI Build integrates with the following LLM providers:Anthropic
Claude models with prompt caching
OpenAI
GPT models including GPT-4 and reasoning models
OpenRouter
Access multiple providers through one API
Google Gemini
Gemini models with native API
AWS Bedrock
Claude and other models via AWS
Vertex AI
Google Cloud AI models
Azure OpenAI
OpenAI models via Azure
Ollama
Local model execution
Groq
Ultra-fast inference
Mistral
Mistral AI models
DeepSeek
DeepSeek reasoning models
Cerebras
High-performance inference
Additional providers: Together AI, Fireworks, Hugging Face, LiteLLM, LM Studio, SambaNova, xAI, Qwen, Moonshot, Nebius, SAP AI Core, and more.
Provider Configuration
- Anthropic
- OpenAI
- Azure OpenAI
- OpenRouter
- Google Gemini
- Ollama
Anthropic (Claude)
Configure Claude models with support for prompt caching and extended thinking.Your Anthropic API key from console.anthropic.com
Custom base URL for Anthropic API (optional)
Token budget for extended thinking mode (reasoning models)
Supported Models
claude-3-5-sonnet-20241022- Latest Claude 3.5 Sonnetclaude-3-5-haiku-20241022- Fast and efficientclaude-3-opus-20240229- Most capable modelclaude-sonnet-4-20250514- Claude 4 Sonnet
Features
Prompt Caching
Prompt Caching
Anthropic models support prompt caching to reduce costs on repeated context:
Extended Thinking
Extended Thinking
Enable reasoning capabilities with thinking budget:
Thinking is not compatible with temperature, top_p, or top_k modifications.
1M Context Window
1M Context Window
Some models support extended context windows:
- Add
-1msuffix to model ID - Requires beta header:
anthropic-beta: context-1m-2025-08-07
Example Configuration
Advanced Provider Settings
Custom Headers
Add custom HTTP headers for API requests:Proxy Support
HAI Build respects system proxy settings. All providers use a configured fetch implementation with proxy support:Retry Logic
All providers implement automatic retry with exponential backoff:Model Information
Each provider exposes model metadata:Cost Optimization
Prompt Caching
Providers that support prompt caching (Anthropic, Gemini) can significantly reduce costs:- System prompts are automatically cached
- Cache breakpoints minimize redundant processing
- Costs are split between immediate and ongoing storage
Reasoning Budgets
Control reasoning token usage:Troubleshooting
API Key Errors
API Key Errors
Error:
API key is requiredSolution: Ensure your API key is correctly set:- Check for typos
- Verify the key is active
- Confirm it has the necessary permissions
Rate Limiting
Rate Limiting
Error:
429 Too Many RequestsSolution: Automatic retry handles most rate limits. For persistent issues:- Upgrade your provider tier
- Implement request throttling
- Consider using OpenRouter for automatic fallback
Azure Connection Issues
Azure Connection Issues
Error: Connection fails to Azure endpointSolution: Verify your configuration:
- Ensure baseUrl includes full Azure domain
- Check azureApiVersion is current
- For Azure Identity, verify permissions in Azure Portal
Model Not Found
Model Not Found
Error: Model ID not recognizedSolution:
- Check model ID spelling and format
- Verify model is available in your region
- Ensure you have access to the model tier
Next Steps
Settings
Configure extension settings
Telemetry
Set up monitoring and analytics