AI Providers

Asta supports multiple AI providers with automatic fallback. This page explains how to configure each provider, their unique features, and how the fallback chain works.

Supported Providers

Claude

Anthropic Claude

Models: Claude 3.5 Sonnet, Claude 4.0
Native vision and PDF support
Extended thinking mode
Tool calling support

Google Gemini

Google AI

Models: Gemini 2.0, Gemini 1.5 Pro/Flash
Native vision support
Fast response times
Free tier available

OpenRouter

OpenRouter

Access to 100+ models
Kimi k2.5 with reasoning
Trinity models
Pay-per-token pricing

Ollama

Local Ollama

Run models locally
No API key required
Privacy-focused
Works offline

OpenAI

OpenAI GPT

GPT-4, GPT-4 Turbo
Vision and tool support
Production-grade reliability

Groq

Groq

Ultra-fast inference
Llama 3, Mixtral models
Tool calling support

Provider Configuration

Setting Up API Keys

You can configure provider API keys in two ways:

Settings UI
Environment Variables

Desktop App:

Open Settings (gear icon or Cmd/Ctrl+,)
Go to Keys tab
Enter your API key for each provider
Click Save

Supported Keys:

Anthropic API Key (Claude)
Google AI API Key (Gemini)
OpenRouter API Key
OpenAI API Key
Groq API Key

Ollama doesn’t require an API key - it connects to localhost:11434 by default.

Create or edit backend/.env:

# Claude
ANTHROPIC_API_KEY=sk-ant-...

# Google Gemini
GEMINI_API_KEY=AIza...

# OpenRouter
OPENROUTER_API_KEY=sk-or-v1-...

# OpenAI
OPENAI_API_KEY=sk-proj-...

# Groq
GROQ_API_KEY=gsk_...

# Ollama (optional, defaults to localhost:11434)
OLLAMA_BASE_URL=http://localhost:11434

Restart Asta after editing .env:

./asta.sh restart

Choosing Your Default Provider

Open Settings

Navigate to Settings → General tab

Select Default Provider

Choose from the dropdown:

Claude (recommended for best results)
Google (fast and free)
OpenRouter (most model choices)
Ollama (local and private)

Select Model

Choose the specific model for that provider in Settings → Models tab

Asta will automatically fall back to other providers if your default fails, so you don’t need to worry about API limits or outages.

Provider Fallback Chain

Asta uses a fixed fallback order inspired by OpenClaw:

How Fallback Works

Provider Runtime State (backend/app/provider_flow.py):

Manual Disable: You can disable providers in Settings
Auto-Disable: Asta automatically disables providers on:
- Authentication failures (invalid API key)
- Billing failures (quota exceeded, payment required)
- Rate limit errors (temporarily)
Re-enable: Auto-disabled providers can be re-enabled in Settings once the issue is resolved

Fallback Criteria (backend/app/providers/fallback.py):

Provider Selection Logic

For a provider to be used:

Must have valid API key (except Ollama)
Must be enabled (not manually disabled)
Must not be auto-disabled
Must not be excluded from current request

# From backend/app/providers/fallback.py
async def get_available_fallback_providers(
    db,
    user_id: str,
    exclude_provider: str,
) -> list[str]:
    """Return fixed-order fallback providers that are configured and active."""
    ordered = resolve_main_provider_order(exclude_provider)
    states = await db.get_provider_runtime_states(user_id, ordered)
    available: list[str] = []
    for provider_name in ordered:
        if provider_name == exclude_provider:
            continue
        state = states.get(provider_name) or {}
        if not bool(state.get("enabled", True)):
            continue
        if bool(state.get("auto_disabled", False)):
            continue
        if await _provider_has_key(db, provider_name):
            available.append(provider_name)
    return available

Provider-Specific Features

Claude (Anthropic)

Claude

File: backend/app/providers/claude.pyUnique Features:

Extended thinking mode with <thinking> blocks
Native PDF document reading (full-fidelity)
Supports reasoning effort levels
Best for complex reasoning tasks

Configuration:

# Environment variable
ANTHROPIC_API_KEY=sk-ant-...

# Default model
claude-3-5-sonnet-20241022

# Thinking levels supported
thinking_level: off | minimal | low | medium | high | xhigh

When to use:

Complex multi-step reasoning
Document analysis (PDFs)
Creative writing
Code generation with detailed explanations

Google Gemini

File: backend/app/providers/google.pyUnique Features:

Free tier with generous limits
Fast response times
Native vision support
Tool calling support

Configuration:

# Environment variable
GEMINI_API_KEY=AIza...

# Default models
gemini-2.0-flash-exp
gemini-1.5-pro-latest

# Free tier limits
- 15 requests per minute
- 1 million tokens per minute

When to use:

Everyday chat interactions
Quick questions and answers
When you need fast responses
Image analysis

OpenRouter

File: backend/app/providers/openrouter.pyUnique Features:

Access to 100+ models from different providers
Kimi k2.5 with native reasoning (moonshotai/kimi-k2.5)
Trinity models with reasoning support
Vision preprocessor fallback for non-vision models
Reasoning effort and streaming

Configuration:

# Environment variable
OPENROUTER_API_KEY=sk-or-v1-...

# Recommended models
moonshotai/kimi-k2.5              # Reasoning model
anthropic/claude-3.5-sonnet       # Claude via OpenRouter
google/gemini-2.0-flash-exp:free  # Free Gemini

# Reasoning configuration
reasoning_effort: low | medium | high
include_reasoning: true

Reasoning Support: OpenRouter injects <think>...</think> tags around reasoning content, which Asta’s stream state machine parses for UI display.When to use:

Comparing multiple models
Accessing models not available elsewhere
Using reasoning models like Kimi k2.5
Vision preprocessing for non-vision providers

Ollama (Local)

Ollama

File: backend/app/providers/ollama.pyUnique Features:

Runs models locally on your machine
No API key required
Complete privacy (no data sent to cloud)
Works offline
Free to use

Configuration:

# Install Ollama first
# macOS: brew install ollama
# Linux: curl -fsSL https://ollama.ai/install.sh | sh

# Pull models
ollama pull llama3.2
ollama pull qwen2.5-coder
ollama pull mistral

# Optional: custom base URL
OLLAMA_BASE_URL=http://localhost:11434

Model Capabilities: Asta detects tool-capable Ollama models automatically. Recommended tool-capable models:

llama3.2 (3B, 1B)
mistral (7B)
qwen2.5-coder (7B)

When to use:

Privacy-sensitive tasks
Offline work
Learning and experimentation
Avoiding API costs

OpenAI

File: backend/app/providers/openai.pyUnique Features:

Production-grade reliability
Vision support
Tool calling
Streaming responses

Configuration:

# Environment variable
OPENAI_API_KEY=sk-proj-...

# Recommended models
gpt-4-turbo
gpt-4
gpt-3.5-turbo

When to use:

Production applications requiring stability
When you have OpenAI credits
Tool-heavy workflows

Groq

File: backend/app/providers/groq.pyUnique Features:

Ultra-fast inference (500+ tokens/sec)
Open-source models (Llama, Mixtral)
Tool calling support
Generous free tier

Configuration:

# Environment variable
GROQ_API_KEY=gsk_...

# Available models
llama-3.3-70b-versatile
llama-3.1-70b-versatile
mixtral-8x7b-32768

When to use:

When speed is critical
Rapid prototyping and testing
Batch processing of many requests

Vision Pipeline

Asta has a hybrid vision pipeline (backend/app/handler.py:_run_vision_preprocessor):

Native Vision Providers

Claude, Google, OpenAI receive images and PDFs directly:

Images as base64-encoded content
PDFs as native document blocks (Claude) or extracted text
No preprocessing required

Vision Preprocessor Fallback

Ollama, Groq (non-vision providers) use OpenRouter vision models:Fallback chain:

google/gemma-3-27b-it:free
nvidia/nemotron-nano-12b-v2-vl:free
google/gemma-3-12b-it:free
openrouter/auto

The vision analysis is injected as [VISION_ANALYSIS ...] into the user message.

Final Reasoning

Main provider (even if non-vision) receives the preprocessed vision analysis and performs reasoning/tool execution.

Reasoning and Thinking Modes

Asta supports advanced reasoning modes with provider-specific implementations:

Thinking Levels
Reasoning Mode
Telegram Commands

Configure in Settings → General → Thinking Level:

off - No thinking blocks
minimal - Brief internal thoughts
low - Short reasoning
medium - Moderate reasoning
high - Detailed reasoning
xhigh - Extended thinking (Claude only)

Supported by: Claude, OpenRouter (Kimi/Trinity)

Configure in Settings → General → Reasoning Mode:

off - No reasoning display
on - Show reasoning after completion
stream - Stream reasoning in real-time

How it works:

Reasoning content appears in <think>...</think> tags
Stream state machine (backend/app/stream_state_machine.py) parses reasoning
UI displays reasoning separately from main response

Supported by: OpenRouter (Kimi, Trinity)

Control thinking/reasoning via Telegram:

# Set thinking level
/think medium
/t high  # alias

# Set reasoning mode
/reasoning stream
/reasoning off

# Check current settings
/think
/reasoning

Troubleshooting

Provider keeps getting auto-disabled

Causes:

Invalid API key → Check key in Settings
Quota exceeded → Add credits or wait for reset
Billing issue → Verify payment method

Solution:

Fix the underlying issue (add credits, update key)
Go to Settings → Models
Find the provider in runtime state
Click Re-enable

Ollama not working

Checklist:

Ollama is installed: ollama --version
Ollama service is running: ollama list
Models are pulled: ollama pull llama3.2
Correct base URL: Check OLLAMA_BASE_URL in Settings

Common fix:

# Start Ollama service
ollama serve

# In another terminal, pull a model
ollama pull llama3.2

Vision not working

For native vision providers (Claude, Google, OpenAI):

Ensure the model supports vision (e.g., claude-3-5-sonnet, gemini-2.0-flash)
Check image format is supported (JPEG, PNG, WebP)
Verify image size is within limits (< 5MB)

For non-vision providers (Ollama, Groq):

Ensure OpenRouter API key is configured (used for vision preprocessing)
Check OpenRouter has credits

All providers failing

Check:

Internet connectivity
API keys are valid (test in provider’s dashboard)
No firewall blocking requests
Check backend logs: tail -f backend/backend.log

Emergency fallback: Set up Ollama for offline access:

ollama pull llama3.2
# Asta will automatically use Ollama as last resort

Best Practices

Cost Optimization

Use Google Gemini for everyday tasks (free tier)
Reserve Claude for complex reasoning
Use Ollama for privacy and cost-free experimentation
Monitor usage in provider dashboards

Performance

Use Groq for speed-critical tasks
Enable streaming for better UX
Choose smaller models for simple tasks
Use Ollama for low-latency local inference

Reliability

Configure at least 2 providers with valid keys
Keep fallback chain enabled
Monitor runtime state in Settings
Set up Ollama as ultimate fallback

Privacy

Use Ollama for sensitive data
Be aware of provider data policies
Check provider terms for commercial use
Consider on-premise deployments for compliance

Next Steps

Architecture

Understand how providers fit into Asta’s architecture

Skills System

Learn how skills leverage different provider capabilities

Quickstart

Get started with configuring your first provider

API Reference

See how to use providers via the API

Get Started

Core Concepts

Desktop App

Features

Configuration

Guides

Troubleshooting

​Supported Providers

Claude

Google Gemini

OpenRouter

Ollama

OpenAI

Groq

​Provider Configuration

​Setting Up API Keys

​Choosing Your Default Provider

​Provider Fallback Chain

​How Fallback Works

​Provider-Specific Features

​Claude (Anthropic)

Claude

​Google Gemini

Google Gemini

​OpenRouter

OpenRouter

​Ollama (Local)

Ollama

​OpenAI

OpenAI

​Groq

Groq

​Vision Pipeline

​Reasoning and Thinking Modes

​Troubleshooting

​Best Practices

Cost Optimization

Performance

Reliability

Privacy

​Next Steps

Architecture

Skills System

Quickstart

API Reference

Build docs developers (and LLMs) love

Supported Providers

Provider Configuration

Setting Up API Keys

Choosing Your Default Provider

Provider Fallback Chain

How Fallback Works

Provider-Specific Features

Claude (Anthropic)

Google Gemini

OpenRouter

Ollama (Local)

OpenAI

Groq

Vision Pipeline

Reasoning and Thinking Modes

Troubleshooting

Best Practices

Next Steps