Skip to main content
BioAgents supports multiple LLM providers with automatic fallback and retry logic. Each agent type can be configured with a specific provider and model.

Supported Providers

OpenAI

GPT-4, GPT-3.5, and GPT-5 models

Anthropic

Claude 3 Opus, Sonnet, and Haiku

Google

Gemini Pro and Gemini Ultra

OpenRouter

Access to 100+ models from multiple providers

Provider Configuration

OpenAI

1

Get API Key

Sign up at OpenAI Platform and create an API key.
2

Set Environment Variable

OPENAI_API_KEY=sk-...
3

Configure Models

REPLY_LLM_PROVIDER=openai
REPLY_LLM_MODEL=gpt-4
  • gpt-5 - Latest GPT-5 model
  • gpt-4-turbo - GPT-4 Turbo with 128k context
  • gpt-4 - GPT-4 with 8k context
  • gpt-3.5-turbo - Fast and cost-effective

Anthropic

1

Get API Key

Sign up at Anthropic Console and create an API key.
2

Set Environment Variable

ANTHROPIC_API_KEY=sk-ant-...
3

Configure Models

REPLY_LLM_PROVIDER=anthropic
REPLY_LLM_MODEL=claude-3-opus-20240229
  • claude-sonnet-4-5-20250929 - Latest Claude Sonnet 4.5
  • claude-3-opus-20240229 - Most capable model
  • claude-3-sonnet-20240229 - Balanced performance
  • claude-3-haiku-20240307 - Fast and cost-effective

Google

1

Get API Key

Enable the Generative Language API in Google AI Studio.
2

Set Environment Variable

GOOGLE_API_KEY=AIza...
3

Configure Models

REPLY_LLM_PROVIDER=google
REPLY_LLM_MODEL=gemini-pro
  • gemini-pro - Balanced performance
  • gemini-ultra - Most capable model

OpenRouter

1

Get API Key

Sign up at OpenRouter and create an API key.
2

Set Environment Variable

OPENROUTER_API_KEY=sk-or-...
3

Configure Models

REPLY_LLM_PROVIDER=openrouter
REPLY_LLM_MODEL=anthropic/claude-3-opus
OpenRouter uses the format provider/model-name:
  • anthropic/claude-3-opus
  • openai/gpt-4
  • google/gemini-pro
  • meta-llama/llama-3-70b
See OpenRouter Models for the full list.

Agent-Specific Configuration

BioAgents uses different LLMs for different agent types to optimize for cost and performance.

Reply Agent (Chat)

Generates conversational responses to user messages.
REPLY_LLM_PROVIDER=openai
REPLY_LLM_MODEL=gpt-4
Use a fast model like gpt-3.5-turbo or claude-3-haiku for cost-effective chat.

Hypothesis Agent

Synthesizes research outputs into scientific claims.
HYP_LLM_PROVIDER=anthropic
HYP_LLM_MODEL=claude-3-opus-20240229
Use a capable reasoning model like claude-3-opus or gpt-4 for hypothesis generation.

Planning Agent

Decides what research tasks to execute.
PLANNING_LLM_PROVIDER=openai
PLANNING_LLM_MODEL=gpt-4
Use a strong reasoning model for planning. GPT-4 or Claude Opus work well.

Structured Output Agent

Generates structured data (JSON, etc.).
STRUCTURED_LLM_PROVIDER=openai
STRUCTURED_LLM_MODEL=gpt-4
OpenAI models have native JSON mode support, making them ideal for structured output.

Continue Research Agent

Decides whether to continue autonomous research.
CONTINUE_RESEARCH_LLM_PROVIDER=anthropic
CONTINUE_RESEARCH_LLM_MODEL=claude-sonnet-4-5-20250929

Implementation Details

BioAgents implements a robust LLM system with adapters for each provider.

Provider Adapter Pattern

Each provider has an adapter that implements the LLMAdapter interface:
src/llm/provider.ts
private createAdapter(provider: LLMProvider): LLMAdapter {
  switch (provider.name) {
    case "openai":
      return new OpenAIAdapter(provider);
    case "google":
      return new GoogleAdapter(provider);
    case "anthropic":
      return new AnthropicAdapter(provider);
    case "openrouter":
      return new OpenRouterAdapter(provider);
    default:
      throw new Error(`Unsupported provider: ${provider.name}`);
  }
}

Automatic Retry & Fallback

BioAgents automatically retries failed requests and falls back to alternative providers:
src/llm/provider.ts
async createChatCompletion(request: LLMRequest): Promise<LLMResponse> {
  const startTime = Date.now();

  try {
    // Try with retries on primary provider
    const result = await withRetry(
      () => this.adapter.createChatCompletion(request),
      this.providerName
    );

    const duration = Date.now() - startTime;
    this.trackTokenUsage(result, request, duration);
    return result;
  } catch (error: any) {
    // Check if we need to try fallback provider
    if (error.requiresFallback) {
      return this.attemptFallbackCompletion(request, startTime, error);
    }
    throw error;
  }
}

Finish Reason Monitoring

BioAgents monitors response finish reasons to detect truncation:
src/llm/provider.ts
private static readonly NORMAL_FINISH_REASONS = new Set([
  "stop",        // OpenAI, OpenRouter
  "end_turn",    // Anthropic
  "STOP",        // Google
]);

private checkFinishReason(result: LLMResponse, request: LLMRequest): void {
  if (!result.finishReason) return;

  const isNormal = LLM.NORMAL_FINISH_REASONS.has(result.finishReason);
  if (!isNormal) {
    logger.warn(
      {
        finishReason: result.finishReason,
        provider: this.providerName,
        model: request.model,
        maxTokens: request.maxTokens,
      },
      "llm_response_truncated_or_abnormal_finish"
    );
  }
}

Token Usage Tracking

All LLM calls track token usage in the database:
src/llm/provider.ts
private trackTokenUsage(
  result: LLMResponse,
  request: LLMRequest,
  duration: number,
  providerOverride?: string
): void {
  if (
    result.usage &&
    request.usageType &&
    (request.messageId || request.paperId)
  ) {
    createTokenUsage({
      message_id: request.messageId,
      paper_id: request.paperId,
      type: request.usageType,
      provider: providerOverride || this.providerName,
      model: request.model,
      prompt_tokens: result.usage.promptTokens,
      completion_tokens: result.usage.completionTokens,
      total_tokens: result.usage.totalTokens,
      duration_ms: duration,
    });
  }
}

Cost Optimization Strategies

Configure cost-effective models for high-volume tasks:
# Cheap for chat
REPLY_LLM_PROVIDER=openai
REPLY_LLM_MODEL=gpt-3.5-turbo

# Expensive for critical reasoning
HYP_LLM_PROVIDER=anthropic
HYP_LLM_MODEL=claude-3-opus-20240229
OpenRouter provides access to the same models with higher rate limits:
REPLY_LLM_PROVIDER=openrouter
REPLY_LLM_MODEL=anthropic/claude-3-opus
Track token usage in the database to identify expensive operations:
SELECT 
  type,
  provider,
  model,
  AVG(total_tokens) as avg_tokens,
  SUM(total_tokens) as total_tokens
FROM token_usage
GROUP BY type, provider, model
ORDER BY total_tokens DESC;

Troubleshooting

Symptom: 429 Too Many Requests errorsSolutions:
  1. Increase retry delay in src/llm/retry.ts
  2. Use OpenRouter as an alternative
  3. Reduce concurrent requests with CHAT_QUEUE_CONCURRENCY
Symptom: 404 Model not found errorsSolutions:
  1. Verify model name matches provider documentation
  2. Check API key has access to the model
  3. For OpenRouter, use provider/model-name format
Symptom: Incomplete responses with length finish reasonSolutions:
  1. Increase maxTokens in the request
  2. Use a model with larger context window
  3. Simplify the prompt to reduce input tokens
Symptom: 401 Unauthorized errorsSolutions:
  1. Verify API key is correct and not expired
  2. Check environment variable is loaded: echo $OPENAI_API_KEY
  3. Restart server after changing .env

Example Configurations

# Use cheap models for most tasks
REPLY_LLM_PROVIDER=openai
REPLY_LLM_MODEL=gpt-3.5-turbo

HYP_LLM_PROVIDER=openai
HYP_LLM_MODEL=gpt-4

PLANNING_LLM_PROVIDER=anthropic
PLANNING_LLM_MODEL=claude-3-haiku-20240307

STRUCTURED_LLM_PROVIDER=openai
STRUCTURED_LLM_MODEL=gpt-3.5-turbo

CONTINUE_RESEARCH_LLM_PROVIDER=anthropic
CONTINUE_RESEARCH_LLM_MODEL=claude-3-haiku-20240307

Next Steps

Environment Variables

View all configuration options

Authentication

Configure JWT or payment-based auth

Build docs developers (and LLMs) love