LLM Providers - HAI Build Code Generator

Overview

HAI Build Code Generator supports multiple LLM providers, giving you flexibility to use the best models for your workflow. Configure API keys, base URLs, and model-specific settings for each provider.

Supported Providers

HAI Build integrates with the following LLM providers:

Anthropic

Claude models with prompt caching

OpenAI

GPT models including GPT-4 and reasoning models

OpenRouter

Access multiple providers through one API

Google Gemini

Gemini models with native API

AWS Bedrock

Claude and other models via AWS

Vertex AI

Google Cloud AI models

Azure OpenAI

OpenAI models via Azure

Ollama

Local model execution

Groq

Ultra-fast inference

Mistral

Mistral AI models

DeepSeek

DeepSeek reasoning models

Cerebras

High-performance inference

Additional providers: Together AI, Fireworks, Hugging Face, LiteLLM, LM Studio, SambaNova, xAI, Qwen, Moonshot, Nebius, SAP AI Core, and more.

Provider Configuration

Anthropic (Claude)

Configure Claude models with support for prompt caching and extended thinking.

apiKey

string

required

Your Anthropic API key from console.anthropic.com

baseUrl

string

default:"https://api.anthropic.com"

Custom base URL for Anthropic API (optional)

thinkingBudgetTokens

number

default:"0"

Token budget for extended thinking mode (reasoning models)

Supported Models

claude-3-5-sonnet-20241022 - Latest Claude 3.5 Sonnet
claude-3-5-haiku-20241022 - Fast and efficient
claude-3-opus-20240229 - Most capable model
claude-sonnet-4-20250514 - Claude 4 Sonnet

Features

Prompt Caching

Anthropic models support prompt caching to reduce costs on repeated context:

// System prompts are automatically cached
system: [{
  text: systemPrompt,
  type: "text",
  cache_control: { type: "ephemeral" }
}]

Extended Thinking

Enable reasoning capabilities with thinking budget:

thinking: {
  type: "enabled",
  budget_tokens: 10000
}

Thinking is not compatible with temperature, top_p, or top_k modifications.

1M Context Window

Some models support extended context windows:

Add -1m suffix to model ID
Requires beta header: anthropic-beta: context-1m-2025-08-07

Example Configuration

{
  "provider": "anthropic",
  "apiKey": "sk-ant-...",
  "modelId": "claude-3-5-sonnet-20241022",
  "thinkingBudgetTokens": 5000
}

OpenAI

Configure GPT models including reasoning models like o1 and o3.

apiKey

string

required

Your OpenAI API key from platform.openai.com

baseUrl

string

default:"https://api.openai.com/v1"

Custom base URL for OpenAI-compatible APIs

reasoningEffort

string

default:"medium"

Reasoning effort for o1/o3 models: none, low, medium, high, xhigh

Supported Models

gpt-4o - GPT-4 Omni
gpt-4-turbo - Fast GPT-4
gpt-3.5-turbo - Cost-effective
o1-preview - Reasoning model
o3-mini - Compact reasoning

Reasoning Models

OpenAI’s o1 and o3 models support reasoning effort configuration:

{
  model: "o1-preview",
  reasoning_effort: "high" // low, medium, high, xhigh
}

Use xhigh reasoning effort for complex problems requiring deep analysis.

Example Configuration

{
  "provider": "openai",
  "apiKey": "sk-...",
  "modelId": "gpt-4o",
  "reasoningEffort": "medium"
}

Azure OpenAI

Use OpenAI models deployed on Azure.

apiKey

string

Azure OpenAI API key (or use Azure Identity)

baseUrl

string

required

Your Azure OpenAI endpoint (e.g., https://your-resource.openai.azure.com)

azureApiVersion

string

default:"2024-10-21"

Azure API version

azureIdentity

boolean

default:"false"

Use Azure Active Directory authentication instead of API key

Azure Identity Authentication

Enable passwordless authentication with Azure AD:

{
  azureIdentity: true,
  baseUrl: "https://your-resource.openai.azure.com",
  // API key not required when using Azure Identity
}

The handler automatically detects the Azure environment:

azure.com → https://cognitiveservices.azure.com/.default
azure.us → https://cognitiveservices.azure.us/.default

Example Configuration

{
  "provider": "openai",
  "baseUrl": "https://your-resource.openai.azure.com",
  "apiKey": "your-azure-key",
  "azureApiVersion": "2024-10-21",
  "modelId": "gpt-4o"
}

OpenRouter

Access multiple LLM providers through a single API.

apiKey

string

required

Your OpenRouter API key from openrouter.ai

modelId

string

required

Model identifier (e.g., anthropic/claude-3.5-sonnet)

providerSorting

string

Provider preference order for routing

Available Models

OpenRouter provides access to models from:

Anthropic (Claude)
OpenAI (GPT)
Google (Gemini)
Meta (Llama)
Mistral AI
And many more

Provider Routing

Control which providers serve your requests:

{
  model: "anthropic/claude-3.5-sonnet",
  provider: {
    order: ["Anthropic", "AWS"],  // Prefer Anthropic, fallback to AWS
    require_parameters: true
  }
}

Error Handling

OpenRouter returns detailed error information:

{
  "error": {
    "code": 429,
    "message": "Rate limit exceeded",
    "metadata": {
      "provider_name": "Anthropic",
      "raw": {...}
    }
  }
}

Example Configuration

{
  "provider": "openrouter",
  "apiKey": "sk-or-...",
  "modelId": "anthropic/claude-3.5-sonnet",
  "providerSorting": "Anthropic,AWS"
}

Google Gemini

Use Google’s Gemini models via native API or Vertex AI.

apiKey

string

Google AI API key (for non-Vertex)

baseUrl

string

Custom base URL (optional)

isVertex

boolean

default:"false"

Use Vertex AI instead of native API

vertexProjectId

string

Google Cloud project ID (Vertex only)

vertexRegion

string

Vertex AI region (e.g., us-central1)

Native API Configuration

{
  "provider": "gemini",
  "apiKey": "your-api-key",
  "modelId": "gemini-2.0-flash-exp"
}

Vertex AI Configuration

{
  "provider": "gemini",
  "isVertex": true,
  "vertexProjectId": "your-project-id",
  "vertexRegion": "us-central1",
  "modelId": "gemini-2.0-flash-exp"
}

Thinking Modes

Gemini supports reasoning with thinking levels:

thinking: {
  level: ThinkingLevel.HIGH,  // LOW or HIGH
  budget_tokens: 10000
}

Gemini’s caching system charges for holding tokens by the hour. HAI Build optimizes cache usage to minimize costs.

Ollama (Local Models)

Run models locally on your machine.

baseUrl

string

default:"http://localhost:11434"

Ollama server URL

modelId

string

required

Local model name (e.g., llama3.1, codellama)

Setup

Install Ollama: ollama.ai
Pull a model: ollama pull llama3.1
Configure HAI Build to use the model

Popular Models

llama3.1 - Meta’s latest Llama
codellama - Code-specialized Llama
mistral - Mistral 7B
phi3 - Microsoft Phi-3

Example Configuration

{
  "provider": "ollama",
  "baseUrl": "http://localhost:11434",
  "modelId": "llama3.1"
}

Local models don’t require an API key and keep your code completely private.

Advanced Provider Settings

Custom Headers

Add custom HTTP headers for API requests:

{
  openAiHeaders: {
    "X-Custom-Header": "value",
    "Authorization": "Bearer custom-token"
  }
}

Proxy Support

HAI Build respects system proxy settings. All providers use a configured fetch implementation with proxy support:

import { fetch } from "@/shared/net"

const client = new Anthropic({
  apiKey: apiKey,
  fetch, // Proxy-aware fetch
})

Retry Logic

All providers implement automatic retry with exponential backoff:

@withRetry()
async *createMessage(...) {
  // API call with automatic retry on transient failures
}

Model Information

Each provider exposes model metadata:

interface ModelInfo {
  maxTokens: number              // Maximum output tokens
  contextWindow: number          // Input context size
  supportsPromptCache: boolean   // Prompt caching support
  supportsReasoning: boolean     // Reasoning capabilities
  inputPrice: number             // Price per 1M input tokens
  outputPrice: number            // Price per 1M output tokens
  cacheWritesPrice?: number      // Cache write cost
  cacheReadsPrice?: number       // Cache read cost
}

Cost Optimization

Prompt Caching

Providers that support prompt caching (Anthropic, Gemini) can significantly reduce costs:

System prompts are automatically cached
Cache breakpoints minimize redundant processing
Costs are split between immediate and ongoing storage

Reasoning Budgets

Control reasoning token usage:

// Anthropic
thinkingBudgetTokens: 5000

// OpenAI
reasoningEffort: "medium"  // or "low", "high", "xhigh"

// Gemini
thinking: {
  level: ThinkingLevel.LOW,
  budget_tokens: 3000
}

Troubleshooting

API Key Errors

Error: API key is requiredSolution: Ensure your API key is correctly set:

Check for typos
Verify the key is active
Confirm it has the necessary permissions

Rate Limiting

Error: 429 Too Many RequestsSolution: Automatic retry handles most rate limits. For persistent issues:

Upgrade your provider tier
Implement request throttling
Consider using OpenRouter for automatic fallback

Azure Connection Issues

Error: Connection fails to Azure endpointSolution: Verify your configuration:

Ensure baseUrl includes full Azure domain
Check azureApiVersion is current
For Azure Identity, verify permissions in Azure Portal

Model Not Found

Error: Model ID not recognizedSolution:

Check model ID spelling and format
Verify model is available in your region
Ensure you have access to the model tier

Next Steps

Settings

Configure extension settings

Telemetry

Set up monitoring and analytics

Get Started

Core Features

Configuration

Usage Guides

Advanced

​Overview

​Supported Providers

Anthropic

OpenAI

OpenRouter

Google Gemini

AWS Bedrock

Vertex AI

Azure OpenAI

Ollama

Groq

Mistral

DeepSeek

Cerebras

​Provider Configuration

​Anthropic (Claude)

​Supported Models

​Features

​Example Configuration

​OpenAI

​Supported Models

​Reasoning Models

​Example Configuration

​Azure OpenAI

​Azure Identity Authentication

​Example Configuration

​OpenRouter

​Available Models

​Provider Routing

​Error Handling

​Example Configuration

​Google Gemini

​Native API Configuration

​Vertex AI Configuration

​Thinking Modes

​Ollama (Local Models)

​Setup

​Popular Models

​Example Configuration

​Advanced Provider Settings

​Custom Headers

​Proxy Support

​Retry Logic

​Model Information

​Cost Optimization

​Prompt Caching

​Reasoning Budgets

​Troubleshooting

​Next Steps

Settings

Telemetry

Build docs developers (and LLMs) love

Overview

Supported Providers

Provider Configuration

Anthropic (Claude)

Supported Models

Features

Example Configuration

OpenAI

Supported Models

Reasoning Models

Example Configuration

Azure OpenAI

Azure Identity Authentication

Example Configuration

OpenRouter

Available Models

Provider Routing

Error Handling

Example Configuration

Google Gemini

Native API Configuration

Vertex AI Configuration

Thinking Modes

Ollama (Local Models)

Setup

Popular Models

Example Configuration

Advanced Provider Settings

Custom Headers

Proxy Support

Retry Logic

Model Information

Cost Optimization

Prompt Caching

Reasoning Budgets

Troubleshooting

Next Steps