Configuring LLM Providers

Longshot uses an OpenAI-compatible API interface for LLM requests. The LLM client supports:

Multiple endpoints with weighted load balancing
Automatic failover on errors
Latency-adaptive weight rebalancing
Health tracking and recovery probes

Basic Configuration

Set your LLM provider in the .env file:

.env

# Base URL for OpenAI-compatible API
LLM_BASE_URL=https://api.openai.com/v1

# Your API key
LLM_API_KEY=sk-your-api-key-here

# Model name
LLM_MODEL=gpt-4o

# Max tokens for responses
LLM_MAX_TOKENS=65536

# Temperature (0.0 = deterministic, 1.0 = creative)
LLM_TEMPERATURE=0.7

Provider Examples

OpenAI
Anthropic
Azure OpenAI
Local (Ollama)

.env

LLM_BASE_URL=https://api.openai.com/v1
LLM_API_KEY=sk-proj-...
LLM_MODEL=gpt-4o
LLM_MAX_TOKENS=65536
LLM_TEMPERATURE=0.7

.env

# Use anthropic-compatible proxy or SDK wrapper
LLM_BASE_URL=https://api.anthropic.com/v1
LLM_API_KEY=sk-ant-...
LLM_MODEL=claude-sonnet-4-20250514
LLM_MAX_TOKENS=65536
LLM_TEMPERATURE=0.7

.env

LLM_BASE_URL=https://YOUR_RESOURCE.openai.azure.com/openai/deployments/YOUR_DEPLOYMENT
LLM_API_KEY=your-azure-api-key
LLM_MODEL=gpt-4o
LLM_MAX_TOKENS=65536
LLM_TEMPERATURE=0.7

.env

LLM_BASE_URL=http://localhost:11434/v1
# API key not required for local Ollama
LLM_API_KEY=dummy
LLM_MODEL=qwen2.5-coder:32b
LLM_MAX_TOKENS=32768
LLM_TEMPERATURE=0.7

Advanced: Multiple Endpoints

For production deployments, use multiple endpoints for load balancing and failover:

.env

LLM_ENDPOINTS='[
  {
    "name": "openai-primary",
    "baseUrl": "https://api.openai.com/v1",
    "apiKey": "sk-proj-...",
    "weight": 70
  },
  {
    "name": "azure-backup",
    "baseUrl": "https://YOUR_RESOURCE.openai.azure.com/openai/deployments/gpt-4o",
    "apiKey": "your-azure-key",
    "weight": 30
  }
]'

# These settings apply to all endpoints:
LLM_MODEL=gpt-4o
LLM_MAX_TOKENS=65536
LLM_TEMPERATURE=0.7

When LLM_ENDPOINTS is set, it overrides LLM_BASE_URL and LLM_API_KEY.

Weight-Based Load Balancing

Endpoints are selected using weighted random sampling:

Weight 70 gets ~70% of requests
Weight 30 gets ~30% of requests

Latency-Adaptive Rebalancing

The LLM client automatically adjusts effective weights based on latency:

Faster endpoints get up to 2x their base weight
Endpoints 2x slower than the fastest get 0.5x their base weight

This ensures traffic naturally flows to performant endpoints.

Health Tracking

Endpoints are automatically marked unhealthy after 3 consecutive failures. When unhealthy:

The endpoint moves to the end of the selection order
Healthy endpoints handle all requests
After 30 seconds, a recovery probe is sent
If successful, the endpoint is marked healthy again

Timeouts and Retries

Request Timeout

Set a maximum duration for individual LLM requests:

.env

# Timeout in milliseconds (default: 120000 = 2 minutes)
LLM_TIMEOUT_MS=180000

When a request times out, the client automatically fails over to the next endpoint.

Readiness Timeout

On startup, Longshot waits for at least one LLM endpoint to become ready:

.env

# How long to wait for endpoint readiness (default: 120000ms)
LLM_READINESS_TIMEOUT_MS=120000

This is useful when using cloud providers that cold-start model instances.

Configuration Validation

Required Fields

The following environment variables are required:

LLM_BASE_URL (or LLM_ENDPOINTS)
LLM_API_KEY (or LLM_ENDPOINTS)
LLM_MODEL

Validation Rules

LLM_MAX_TOKENS must be positive (default: 65536)
LLM_TEMPERATURE must be between 0.0 and 1.0 (default: 0.7)
At least one endpoint must be configured

Understanding LLM Client Behavior

Request Flow

Endpoint Selection: Weighted random selection from healthy endpoints
Request Execution: POST to /v1/chat/completions with timeout
Success: Record latency, update effective weights
Failure: Mark failure, try next endpoint in priority order
All Failed: Throw error with details from last failure

Logging

The LLM client logs detailed information at different levels:

Info
Debug
Warn
Error

Endpoint initialization
Readiness probes
Health status changes

Set LOG_LEVEL=debug in .env to see detailed LLM client operations.

Monitoring Endpoint Health

The orchestrator exposes endpoint statistics via the LLM client:

const stats = llmClient.getEndpointStats();
// Returns:
// [
//   {
//     name: "openai-primary",
//     endpoint: "https://api.openai.com/v1",
//     healthy: true,
//     effectiveWeight: 78.5,
//     avgLatencyMs: 892,
//     totalRequests: 156,
//     totalFailures: 2
//   },
//   ...
// ]

Use this data to:

Monitor endpoint health
Identify slow providers
Debug configuration issues
Analyze cost distribution

Best Practices

Production Deployments

Use multiple endpoints from different providers for maximum reliability:

Primary: High-performance provider (70-80% weight)
Secondary: Backup provider (20-30% weight)
This ensures zero downtime even if one provider has an outage

Development

Start with a single endpoint for simplicity
Use local models (Ollama) for cost-free testing
Enable debug logging to understand request patterns

Cost Optimization

Use tiered endpoints: Route most traffic to cheaper models, failover to premium models only when needed
Set appropriate max tokens: Don’t use 65536 if your tasks typically need <10k tokens
Monitor total requests: Check llmClient.totalRequests to understand usage patterns

Performance Tuning

Lower temperature (0.2-0.5) for more deterministic, focused outputs
Higher temperature (0.7-0.9) for creative tasks or when generating diverse solutions
Increase timeout for complex tasks that require long reasoning chains

Troubleshooting

Error: “All N LLM endpoints failed”

Check that LLM_BASE_URL is accessible from your network
Verify LLM_API_KEY is valid and not expired
Confirm LLM_MODEL is available on your endpoint
Check firewall/proxy settings

Endpoint Shows as Unhealthy

Review logs for:

Network connectivity issues
Rate limiting (HTTP 429)
Invalid credentials (HTTP 401/403)
Model not found (HTTP 404)

Slow Response Times

If requests take longer than expected:

Check avgLatencyMs in endpoint stats
Consider using a faster model
Enable multiple endpoints to distribute load
Verify network latency to provider

Readiness Timeout on Startup

If Longshot fails to start:

Increase LLM_READINESS_TIMEOUT_MS

Verify endpoint is accessible via curl:

curl -H "Authorization: Bearer $LLM_API_KEY" \
     $LLM_BASE_URL/v1/models

Check for cold-start delays with cloud providers

Next Steps

Sandbox Configuration

Configure Modal sandboxes for agent execution

Running with Dashboard

Monitor LLM usage in real-time

Overview

Getting Started

Core Concepts

Guides

Agent Development

Examples

Configuring LLM Providers

Basic Configuration

Provider Examples

Advanced: Multiple Endpoints

Weight-Based Load Balancing

Latency-Adaptive Rebalancing

Health Tracking

Timeouts and Retries

Request Timeout

Readiness Timeout

Configuration Validation

Required Fields

Validation Rules

Understanding LLM Client Behavior

Request Flow

Logging

Monitoring Endpoint Health

Best Practices

Production Deployments

Development

Cost Optimization

Performance Tuning

Troubleshooting

Error: “All N LLM endpoints failed”

Endpoint Shows as Unhealthy

Slow Response Times

Readiness Timeout on Startup

Next Steps

Sandbox Configuration

Running with Dashboard

Build docs developers (and LLMs) love

Overview

Getting Started

Core Concepts

Guides

Agent Development

Examples

​Basic Configuration

​Provider Examples

​Advanced: Multiple Endpoints

​Weight-Based Load Balancing

​Latency-Adaptive Rebalancing

​Health Tracking

​Timeouts and Retries

​Request Timeout

​Readiness Timeout

​Configuration Validation

​Required Fields

​Validation Rules

​Understanding LLM Client Behavior

​Request Flow

​Logging

​Monitoring Endpoint Health

​Best Practices

​Production Deployments

​Development

​Cost Optimization

​Performance Tuning

​Troubleshooting

​Error: “All N LLM endpoints failed”

​Endpoint Shows as Unhealthy

​Slow Response Times

​Readiness Timeout on Startup

​Next Steps

Sandbox Configuration

Running with Dashboard

Build docs developers (and LLMs) love

Basic Configuration

Provider Examples

Advanced: Multiple Endpoints

Weight-Based Load Balancing

Latency-Adaptive Rebalancing

Health Tracking

Timeouts and Retries

Request Timeout

Readiness Timeout

Configuration Validation

Required Fields

Validation Rules

Understanding LLM Client Behavior

Request Flow

Logging

Monitoring Endpoint Health

Best Practices

Production Deployments

Development

Cost Optimization

Performance Tuning

Troubleshooting

Error: “All N LLM endpoints failed”

Endpoint Shows as Unhealthy

Slow Response Times

Readiness Timeout on Startup

Next Steps