- Multiple endpoints with weighted load balancing
- Automatic failover on errors
- Latency-adaptive weight rebalancing
- Health tracking and recovery probes
Basic Configuration
Set your LLM provider in the.env file:
.env
Provider Examples
- OpenAI
- Anthropic
- Azure OpenAI
- Local (Ollama)
.env
Advanced: Multiple Endpoints
For production deployments, use multiple endpoints for load balancing and failover:.env
Weight-Based Load Balancing
Endpoints are selected using weighted random sampling:- Weight 70 gets ~70% of requests
- Weight 30 gets ~30% of requests
Latency-Adaptive Rebalancing
The LLM client automatically adjusts effective weights based on latency:- Faster endpoints get up to 2x their base weight
- Endpoints 2x slower than the fastest get 0.5x their base weight
Health Tracking
Endpoints are automatically marked unhealthy after 3 consecutive failures. When unhealthy:- The endpoint moves to the end of the selection order
- Healthy endpoints handle all requests
- After 30 seconds, a recovery probe is sent
- If successful, the endpoint is marked healthy again
Timeouts and Retries
Request Timeout
Set a maximum duration for individual LLM requests:.env
Readiness Timeout
On startup, Longshot waits for at least one LLM endpoint to become ready:.env
Configuration Validation
Required Fields
The following environment variables are required:LLM_BASE_URL(orLLM_ENDPOINTS)LLM_API_KEY(orLLM_ENDPOINTS)LLM_MODEL
Validation Rules
LLM_MAX_TOKENSmust be positive (default: 65536)LLM_TEMPERATUREmust be between 0.0 and 1.0 (default: 0.7)- At least one endpoint must be configured
Understanding LLM Client Behavior
Request Flow
- Endpoint Selection: Weighted random selection from healthy endpoints
- Request Execution: POST to
/v1/chat/completionswith timeout - Success: Record latency, update effective weights
- Failure: Mark failure, try next endpoint in priority order
- All Failed: Throw error with details from last failure
Logging
The LLM client logs detailed information at different levels:- Info
- Debug
- Warn
- Error
- Endpoint initialization
- Readiness probes
- Health status changes
LOG_LEVEL=debug in .env to see detailed LLM client operations.
Monitoring Endpoint Health
The orchestrator exposes endpoint statistics via the LLM client:- Monitor endpoint health
- Identify slow providers
- Debug configuration issues
- Analyze cost distribution
Best Practices
Production Deployments
Development
- Start with a single endpoint for simplicity
- Use local models (Ollama) for cost-free testing
- Enable debug logging to understand request patterns
Cost Optimization
- Use tiered endpoints: Route most traffic to cheaper models, failover to premium models only when needed
- Set appropriate max tokens: Don’t use 65536 if your tasks typically need <10k tokens
- Monitor total requests: Check
llmClient.totalRequeststo understand usage patterns
Performance Tuning
- Lower temperature (0.2-0.5) for more deterministic, focused outputs
- Higher temperature (0.7-0.9) for creative tasks or when generating diverse solutions
- Increase timeout for complex tasks that require long reasoning chains
Troubleshooting
Error: “All N LLM endpoints failed”
- Check that
LLM_BASE_URLis accessible from your network - Verify
LLM_API_KEYis valid and not expired - Confirm
LLM_MODELis available on your endpoint - Check firewall/proxy settings
Endpoint Shows as Unhealthy
Review logs for:- Network connectivity issues
- Rate limiting (HTTP 429)
- Invalid credentials (HTTP 401/403)
- Model not found (HTTP 404)
Slow Response Times
If requests take longer than expected:- Check
avgLatencyMsin endpoint stats - Consider using a faster model
- Enable multiple endpoints to distribute load
- Verify network latency to provider
Readiness Timeout on Startup
If Longshot fails to start:- Increase
LLM_READINESS_TIMEOUT_MS - Verify endpoint is accessible via
curl: - Check for cold-start delays with cloud providers
Next Steps
Sandbox Configuration
Configure Modal sandboxes for agent execution
Running with Dashboard
Monitor LLM usage in real-time