LLM Providers

BioAgents supports multiple LLM providers with automatic fallback and retry logic. Each agent type can be configured with a specific provider and model.

Supported Providers

OpenAI

GPT-4, GPT-3.5, and GPT-5 models

Anthropic

Claude 3 Opus, Sonnet, and Haiku

Google

Gemini Pro and Gemini Ultra

OpenRouter

Access to 100+ models from multiple providers

Provider Configuration

OpenAI

Get API Key

Set Environment Variable

OPENAI_API_KEY=sk-...

Configure Models

REPLY_LLM_PROVIDER=openai
REPLY_LLM_MODEL=gpt-4

Available OpenAI Models

gpt-5 - Latest GPT-5 model
gpt-4-turbo - GPT-4 Turbo with 128k context
gpt-4 - GPT-4 with 8k context
gpt-3.5-turbo - Fast and cost-effective

Anthropic

Get API Key

Set Environment Variable

ANTHROPIC_API_KEY=sk-ant-...

Configure Models

REPLY_LLM_PROVIDER=anthropic
REPLY_LLM_MODEL=claude-3-opus-20240229

Available Anthropic Models

claude-sonnet-4-5-20250929 - Latest Claude Sonnet 4.5
claude-3-opus-20240229 - Most capable model
claude-3-sonnet-20240229 - Balanced performance
claude-3-haiku-20240307 - Fast and cost-effective

Google

Get API Key

Enable the Generative Language API in Google AI Studio.

Set Environment Variable

GOOGLE_API_KEY=AIza...

Configure Models

REPLY_LLM_PROVIDER=google
REPLY_LLM_MODEL=gemini-pro

Available Google Models

gemini-pro - Balanced performance
gemini-ultra - Most capable model

OpenRouter

Get API Key

Set Environment Variable

OPENROUTER_API_KEY=sk-or-...

Configure Models

REPLY_LLM_PROVIDER=openrouter
REPLY_LLM_MODEL=anthropic/claude-3-opus

OpenRouter Model Format

OpenRouter uses the format provider/model-name:

anthropic/claude-3-opus
openai/gpt-4
google/gemini-pro
meta-llama/llama-3-70b

See OpenRouter Models for the full list.

Agent-Specific Configuration

BioAgents uses different LLMs for different agent types to optimize for cost and performance.

Reply Agent (Chat)

Generates conversational responses to user messages.

REPLY_LLM_PROVIDER=openai
REPLY_LLM_MODEL=gpt-4

Use a fast model like gpt-3.5-turbo or claude-3-haiku for cost-effective chat.

Hypothesis Agent

Synthesizes research outputs into scientific claims.

HYP_LLM_PROVIDER=anthropic
HYP_LLM_MODEL=claude-3-opus-20240229

Use a capable reasoning model like claude-3-opus or gpt-4 for hypothesis generation.

Planning Agent

Decides what research tasks to execute.

PLANNING_LLM_PROVIDER=openai
PLANNING_LLM_MODEL=gpt-4

Use a strong reasoning model for planning. GPT-4 or Claude Opus work well.

Structured Output Agent

Generates structured data (JSON, etc.).

STRUCTURED_LLM_PROVIDER=openai
STRUCTURED_LLM_MODEL=gpt-4

OpenAI models have native JSON mode support, making them ideal for structured output.

Continue Research Agent

Decides whether to continue autonomous research.

CONTINUE_RESEARCH_LLM_PROVIDER=anthropic
CONTINUE_RESEARCH_LLM_MODEL=claude-sonnet-4-5-20250929

Implementation Details

BioAgents implements a robust LLM system with adapters for each provider.

Provider Adapter Pattern

Each provider has an adapter that implements the LLMAdapter interface:

src/llm/provider.ts

private createAdapter(provider: LLMProvider): LLMAdapter {
  switch (provider.name) {
    case "openai":
      return new OpenAIAdapter(provider);
    case "google":
      return new GoogleAdapter(provider);
    case "anthropic":
      return new AnthropicAdapter(provider);
    case "openrouter":
      return new OpenRouterAdapter(provider);
    default:
      throw new Error(`Unsupported provider: ${provider.name}`);
  }
}

Automatic Retry & Fallback

BioAgents automatically retries failed requests and falls back to alternative providers:

src/llm/provider.ts

async createChatCompletion(request: LLMRequest): Promise<LLMResponse> {
  const startTime = Date.now();

  try {
    // Try with retries on primary provider
    const result = await withRetry(
      () => this.adapter.createChatCompletion(request),
      this.providerName
    );

    const duration = Date.now() - startTime;
    this.trackTokenUsage(result, request, duration);
    return result;
  } catch (error: any) {
    // Check if we need to try fallback provider
    if (error.requiresFallback) {
      return this.attemptFallbackCompletion(request, startTime, error);
    }
    throw error;
  }
}

Finish Reason Monitoring

BioAgents monitors response finish reasons to detect truncation:

src/llm/provider.ts

private static readonly NORMAL_FINISH_REASONS = new Set([
  "stop",        // OpenAI, OpenRouter
  "end_turn",    // Anthropic
  "STOP",        // Google
]);

private checkFinishReason(result: LLMResponse, request: LLMRequest): void {
  if (!result.finishReason) return;

  const isNormal = LLM.NORMAL_FINISH_REASONS.has(result.finishReason);
  if (!isNormal) {
    logger.warn(
      {
        finishReason: result.finishReason,
        provider: this.providerName,
        model: request.model,
        maxTokens: request.maxTokens,
      },
      "llm_response_truncated_or_abnormal_finish"
    );
  }
}

Token Usage Tracking

All LLM calls track token usage in the database:

src/llm/provider.ts

private trackTokenUsage(
  result: LLMResponse,
  request: LLMRequest,
  duration: number,
  providerOverride?: string
): void {
  if (
    result.usage &&
    request.usageType &&
    (request.messageId || request.paperId)
  ) {
    createTokenUsage({
      message_id: request.messageId,
      paper_id: request.paperId,
      type: request.usageType,
      provider: providerOverride || this.providerName,
      model: request.model,
      prompt_tokens: result.usage.promptTokens,
      completion_tokens: result.usage.completionTokens,
      total_tokens: result.usage.totalTokens,
      duration_ms: duration,
    });
  }
}

Cost Optimization Strategies

Use Different Models per Agent

Configure cost-effective models for high-volume tasks:

# Cheap for chat
REPLY_LLM_PROVIDER=openai
REPLY_LLM_MODEL=gpt-3.5-turbo

# Expensive for critical reasoning
HYP_LLM_PROVIDER=anthropic
HYP_LLM_MODEL=claude-3-opus-20240229

Use OpenRouter for Rate Limits

OpenRouter provides access to the same models with higher rate limits:

REPLY_LLM_PROVIDER=openrouter
REPLY_LLM_MODEL=anthropic/claude-3-opus

Monitor Token Usage

Track token usage in the database to identify expensive operations:

SELECT 
  type,
  provider,
  model,
  AVG(total_tokens) as avg_tokens,
  SUM(total_tokens) as total_tokens
FROM token_usage
GROUP BY type, provider, model
ORDER BY total_tokens DESC;

Troubleshooting

Rate Limit Errors

Symptom: 429 Too Many Requests errorsSolutions:

Increase retry delay in src/llm/retry.ts
Use OpenRouter as an alternative
Reduce concurrent requests with CHAT_QUEUE_CONCURRENCY

Model Not Found

Symptom: 404 Model not found errorsSolutions:

Verify model name matches provider documentation
Check API key has access to the model
For OpenRouter, use provider/model-name format

Response Truncation

Symptom: Incomplete responses with length finish reasonSolutions:

Increase maxTokens in the request
Use a model with larger context window
Simplify the prompt to reduce input tokens

API Key Invalid

Symptom: 401 Unauthorized errorsSolutions:

Verify API key is correct and not expired
Check environment variable is loaded: echo $OPENAI_API_KEY
Restart server after changing .env

Example Configurations

# Use cheap models for most tasks
REPLY_LLM_PROVIDER=openai
REPLY_LLM_MODEL=gpt-3.5-turbo

HYP_LLM_PROVIDER=openai
HYP_LLM_MODEL=gpt-4

PLANNING_LLM_PROVIDER=anthropic
PLANNING_LLM_MODEL=claude-3-haiku-20240307

STRUCTURED_LLM_PROVIDER=openai
STRUCTURED_LLM_MODEL=gpt-3.5-turbo

CONTINUE_RESEARCH_LLM_PROVIDER=anthropic
CONTINUE_RESEARCH_LLM_MODEL=claude-3-haiku-20240307

Get Started

Core Concepts

Agents

Configuration

Features

Deployment

Advanced

Supported Providers

OpenAI

Anthropic

Google

OpenRouter

Provider Configuration

OpenAI

Anthropic

Google

OpenRouter

Agent-Specific Configuration

Reply Agent (Chat)

Hypothesis Agent

Planning Agent

Structured Output Agent

Continue Research Agent

Implementation Details

Provider Adapter Pattern

Automatic Retry & Fallback

Finish Reason Monitoring

Token Usage Tracking

Cost Optimization Strategies

Troubleshooting

Example Configurations

Next Steps

Environment Variables

Authentication

Build docs developers (and LLMs) love

Get Started

Core Concepts

Agents

Configuration

Features

Deployment

Advanced

​Supported Providers

OpenAI

Anthropic

Google

OpenRouter

​Provider Configuration

​OpenAI

​Anthropic

​Google

​OpenRouter

​Agent-Specific Configuration

​Reply Agent (Chat)

​Hypothesis Agent

​Planning Agent

​Structured Output Agent

​Continue Research Agent

​Implementation Details

​Provider Adapter Pattern

​Automatic Retry & Fallback

​Finish Reason Monitoring

​Token Usage Tracking

​Cost Optimization Strategies

​Troubleshooting

​Example Configurations

​Next Steps

Environment Variables

Authentication

Build docs developers (and LLMs) love

Supported Providers

Provider Configuration

OpenAI

Anthropic

Google

OpenRouter

Agent-Specific Configuration

Reply Agent (Chat)

Hypothesis Agent

Planning Agent

Structured Output Agent

Continue Research Agent

Implementation Details

Provider Adapter Pattern

Automatic Retry & Fallback

Finish Reason Monitoring

Token Usage Tracking

Cost Optimization Strategies

Troubleshooting

Example Configurations

Next Steps