Skip to main content
Longshot uses an OpenAI-compatible API interface for LLM requests. The LLM client supports:
  • Multiple endpoints with weighted load balancing
  • Automatic failover on errors
  • Latency-adaptive weight rebalancing
  • Health tracking and recovery probes

Basic Configuration

Set your LLM provider in the .env file:
.env
# Base URL for OpenAI-compatible API
LLM_BASE_URL=https://api.openai.com/v1

# Your API key
LLM_API_KEY=sk-your-api-key-here

# Model name
LLM_MODEL=gpt-4o

# Max tokens for responses
LLM_MAX_TOKENS=65536

# Temperature (0.0 = deterministic, 1.0 = creative)
LLM_TEMPERATURE=0.7

Provider Examples

.env
LLM_BASE_URL=https://api.openai.com/v1
LLM_API_KEY=sk-proj-...
LLM_MODEL=gpt-4o
LLM_MAX_TOKENS=65536
LLM_TEMPERATURE=0.7

Advanced: Multiple Endpoints

For production deployments, use multiple endpoints for load balancing and failover:
.env
LLM_ENDPOINTS='[
  {
    "name": "openai-primary",
    "baseUrl": "https://api.openai.com/v1",
    "apiKey": "sk-proj-...",
    "weight": 70
  },
  {
    "name": "azure-backup",
    "baseUrl": "https://YOUR_RESOURCE.openai.azure.com/openai/deployments/gpt-4o",
    "apiKey": "your-azure-key",
    "weight": 30
  }
]'

# These settings apply to all endpoints:
LLM_MODEL=gpt-4o
LLM_MAX_TOKENS=65536
LLM_TEMPERATURE=0.7
When LLM_ENDPOINTS is set, it overrides LLM_BASE_URL and LLM_API_KEY.

Weight-Based Load Balancing

Endpoints are selected using weighted random sampling:
  • Weight 70 gets ~70% of requests
  • Weight 30 gets ~30% of requests

Latency-Adaptive Rebalancing

The LLM client automatically adjusts effective weights based on latency:
  • Faster endpoints get up to 2x their base weight
  • Endpoints 2x slower than the fastest get 0.5x their base weight
This ensures traffic naturally flows to performant endpoints.

Health Tracking

Endpoints are automatically marked unhealthy after 3 consecutive failures. When unhealthy:
  1. The endpoint moves to the end of the selection order
  2. Healthy endpoints handle all requests
  3. After 30 seconds, a recovery probe is sent
  4. If successful, the endpoint is marked healthy again

Timeouts and Retries

Request Timeout

Set a maximum duration for individual LLM requests:
.env
# Timeout in milliseconds (default: 120000 = 2 minutes)
LLM_TIMEOUT_MS=180000
When a request times out, the client automatically fails over to the next endpoint.

Readiness Timeout

On startup, Longshot waits for at least one LLM endpoint to become ready:
.env
# How long to wait for endpoint readiness (default: 120000ms)
LLM_READINESS_TIMEOUT_MS=120000
This is useful when using cloud providers that cold-start model instances.

Configuration Validation

Required Fields

The following environment variables are required:
  • LLM_BASE_URL (or LLM_ENDPOINTS)
  • LLM_API_KEY (or LLM_ENDPOINTS)
  • LLM_MODEL

Validation Rules

  • LLM_MAX_TOKENS must be positive (default: 65536)
  • LLM_TEMPERATURE must be between 0.0 and 1.0 (default: 0.7)
  • At least one endpoint must be configured

Understanding LLM Client Behavior

Request Flow

  1. Endpoint Selection: Weighted random selection from healthy endpoints
  2. Request Execution: POST to /v1/chat/completions with timeout
  3. Success: Record latency, update effective weights
  4. Failure: Mark failure, try next endpoint in priority order
  5. All Failed: Throw error with details from last failure

Logging

The LLM client logs detailed information at different levels:
  • Endpoint initialization
  • Readiness probes
  • Health status changes
Set LOG_LEVEL=debug in .env to see detailed LLM client operations.

Monitoring Endpoint Health

The orchestrator exposes endpoint statistics via the LLM client:
const stats = llmClient.getEndpointStats();
// Returns:
// [
//   {
//     name: "openai-primary",
//     endpoint: "https://api.openai.com/v1",
//     healthy: true,
//     effectiveWeight: 78.5,
//     avgLatencyMs: 892,
//     totalRequests: 156,
//     totalFailures: 2
//   },
//   ...
// ]
Use this data to:
  • Monitor endpoint health
  • Identify slow providers
  • Debug configuration issues
  • Analyze cost distribution

Best Practices

Production Deployments

Use multiple endpoints from different providers for maximum reliability:
  • Primary: High-performance provider (70-80% weight)
  • Secondary: Backup provider (20-30% weight)
  • This ensures zero downtime even if one provider has an outage

Development

  • Start with a single endpoint for simplicity
  • Use local models (Ollama) for cost-free testing
  • Enable debug logging to understand request patterns

Cost Optimization

  1. Use tiered endpoints: Route most traffic to cheaper models, failover to premium models only when needed
  2. Set appropriate max tokens: Don’t use 65536 if your tasks typically need <10k tokens
  3. Monitor total requests: Check llmClient.totalRequests to understand usage patterns

Performance Tuning

  • Lower temperature (0.2-0.5) for more deterministic, focused outputs
  • Higher temperature (0.7-0.9) for creative tasks or when generating diverse solutions
  • Increase timeout for complex tasks that require long reasoning chains

Troubleshooting

Error: “All N LLM endpoints failed”

  1. Check that LLM_BASE_URL is accessible from your network
  2. Verify LLM_API_KEY is valid and not expired
  3. Confirm LLM_MODEL is available on your endpoint
  4. Check firewall/proxy settings

Endpoint Shows as Unhealthy

Review logs for:
  • Network connectivity issues
  • Rate limiting (HTTP 429)
  • Invalid credentials (HTTP 401/403)
  • Model not found (HTTP 404)

Slow Response Times

If requests take longer than expected:
  1. Check avgLatencyMs in endpoint stats
  2. Consider using a faster model
  3. Enable multiple endpoints to distribute load
  4. Verify network latency to provider

Readiness Timeout on Startup

If Longshot fails to start:
  1. Increase LLM_READINESS_TIMEOUT_MS
  2. Verify endpoint is accessible via curl:
    curl -H "Authorization: Bearer $LLM_API_KEY" \
         $LLM_BASE_URL/v1/models
    
  3. Check for cold-start delays with cloud providers

Next Steps

Sandbox Configuration

Configure Modal sandboxes for agent execution

Running with Dashboard

Monitor LLM usage in real-time

Build docs developers (and LLMs) love