Overview
PicoClaw supports any OpenAI-compatible API endpoint, enabling you to use:
- Custom API proxies and gateways
- Self-hosted models (VLLM, Ollama)
- LiteLLM proxy for unified access
- Local inference servers
- Enterprise deployments
Custom API Endpoints
Basic Configuration
Any OpenAI-compatible endpoint can be configured:
{
"model_list": [
{
"model_name": "my-custom-model",
"model": "openai/custom-model",
"api_base": "https://my-api.example.com/v1",
"api_key": "your-api-key",
"request_timeout": 300
}
],
"agents": {
"defaults": {
"model_name": "my-custom-model"
}
}
}
Configuration Parameters
| Parameter | Type | Required | Default | Description |
|---|
model_name | string | Yes | - | Alias for this model configuration |
model | string | Yes | - | Model identifier (any prefix) |
api_base | string | Yes | - | Your custom API endpoint URL |
api_key | string | No | - | API key (if required by endpoint) |
request_timeout | integer | No | 120 | Request timeout in seconds |
LiteLLM Proxy
What is LiteLLM?
LiteLLM is a unified proxy that translates requests across 100+ LLM providers. It provides:
- Single API for multiple providers
- Load balancing and fallbacks
- Cost tracking and budgets
- Rate limiting
- Caching
Setup LiteLLM
1. Install LiteLLM
pip install litellm[proxy]
2. Create Configuration
Create litellm_config.yaml:
model_list:
- model_name: gpt-4
litellm_params:
model: openai/gpt-4
api_key: sk-...
- model_name: claude
litellm_params:
model: anthropic/claude-sonnet-4.6
api_key: sk-ant-...
- model_name: llama
litellm_params:
model: ollama/llama3
api_base: http://localhost:11434
general_settings:
master_key: sk-1234 # Your LiteLLM proxy key
3. Start LiteLLM Proxy
litellm --config litellm_config.yaml --port 4000
Edit ~/.picoclaw/config.json:
{
"model_list": [
{
"model_name": "gpt4",
"model": "litellm/gpt-4",
"api_base": "http://localhost:4000/v1",
"api_key": "sk-1234"
},
{
"model_name": "claude",
"model": "litellm/claude",
"api_base": "http://localhost:4000/v1",
"api_key": "sk-1234"
}
]
}
PicoClaw strips the litellm/ prefix, so litellm/gpt-4 sends gpt-4 to the proxy.
5. Test Connection
picoclaw agent -m "Test LiteLLM proxy"
Advanced LiteLLM Features
Load Balancing
LiteLLM config with multiple endpoints:
model_list:
- model_name: gpt-4
litellm_params:
model: openai/gpt-4
api_key: sk-key1
api_base: https://api1.example.com/v1
- model_name: gpt-4
litellm_params:
model: openai/gpt-4
api_key: sk-key2
api_base: https://api2.example.com/v1
PicoClaw config:
{
"model_list": [
{
"model_name": "gpt4",
"model": "litellm/gpt-4",
"api_base": "http://localhost:4000/v1",
"api_key": "sk-1234"
}
]
}
VLLM (Self-Hosted)
What is VLLM?
VLLM is a high-performance inference server for running LLMs locally or in the cloud.
Setup VLLM
1. Install VLLM
2. Start VLLM Server
vllm serve meta-llama/Llama-3-8B-Instruct \
--host 0.0.0.0 \
--port 8000 \
--api-key your-api-key
Edit ~/.picoclaw/config.json:
{
"model_list": [
{
"model_name": "llama3",
"model": "vllm/Llama-3-8B-Instruct",
"api_base": "http://localhost:8000/v1",
"api_key": "your-api-key",
"request_timeout": 600
}
],
"agents": {
"defaults": {
"model_name": "llama3"
}
}
}
4. Test Connection
picoclaw agent -m "Test VLLM server"
VLLM with Multiple GPUs
vllm serve meta-llama/Llama-3-70B-Instruct \
--tensor-parallel-size 4 \
--host 0.0.0.0 \
--port 8000
VLLM Best Practices
- GPU memory: Ensure sufficient VRAM for your model
- Batch size: Tune for throughput vs latency
- Context length: Set
--max-model-len appropriately
- Timeouts: Increase
request_timeout for large contexts
Ollama (Local Models)
What is Ollama?
Ollama makes it easy to run open-source LLMs locally on your machine.
Setup Ollama
1. Install Ollama
# macOS / Linux
curl -fsSL https://ollama.ai/install.sh | sh
# Or download from https://ollama.ai
2. Pull a Model
Available models:
3. Start Ollama Server
Default endpoint: http://localhost:11434
Edit ~/.picoclaw/config.json:
{
"model_list": [
{
"model_name": "llama3",
"model": "ollama/llama3",
"api_base": "http://localhost:11434/v1"
}
],
"agents": {
"defaults": {
"model_name": "llama3"
}
}
}
No API key needed for Ollama - it’s completely local!
5. Test Connection
picoclaw agent -m "Test Ollama"
Ollama Best Practices
- Model selection: Choose models that fit your hardware
- Context window: Larger models support longer contexts
- Performance: Use GPU for better performance
- Updates: Keep Ollama updated for latest features
Custom Proxy Configuration
HTTP Proxy
Route requests through an HTTP proxy:
{
"model_list": [
{
"model_name": "proxied-model",
"model": "openai/gpt-4",
"api_base": "https://api.openai.com/v1",
"api_key": "sk-..."
}
],
"providers": {
"openai": {
"proxy": "http://proxy.example.com:8080"
}
}
}
Reverse Proxy
Run your own reverse proxy:
# nginx.conf
server {
listen 443 ssl;
server_name my-llm-proxy.com;
location /v1/ {
proxy_pass https://api.openai.com/v1/;
proxy_set_header Authorization "Bearer sk-...";
proxy_set_header Content-Type "application/json";
}
}
PicoClaw config:
{
"model_list": [
{
"model_name": "gpt4",
"model": "openai/gpt-4",
"api_base": "https://my-llm-proxy.com/v1"
}
]
}
Enterprise Deployments
Azure OpenAI
{
"model_list": [
{
"model_name": "azure-gpt4",
"model": "openai/gpt-4",
"api_base": "https://your-resource.openai.azure.com/openai/deployments/gpt-4",
"api_key": "your-azure-key"
}
]
}
AWS Bedrock
Use through LiteLLM proxy:
model_list:
- model_name: claude-bedrock
litellm_params:
model: bedrock/anthropic.claude-3-sonnet-20240229-v1:0
aws_access_key_id: xxx
aws_secret_access_key: xxx
aws_region_name: us-east-1
GCP Vertex AI
Use through LiteLLM proxy:
model_list:
- model_name: gemini-vertex
litellm_params:
model: vertex_ai/gemini-pro
vertex_project: your-project
vertex_location: us-central1
Troubleshooting
Connection Refused
Ensure your custom endpoint is running:
curl http://localhost:8000/v1/models
Timeout Errors
Increase timeout for slow endpoints:
{
"model_name": "slow-model",
"model": "custom/model",
"api_base": "http://localhost:8000/v1",
"request_timeout": 600
}
API Key Issues
Some endpoints don’t require keys:
{
"model_name": "local-model",
"model": "ollama/llama3",
"api_base": "http://localhost:11434/v1"
}
Omit api_key for local servers.
Protocol Mismatches
Ensure your endpoint is OpenAI-compatible:
- Endpoint:
/v1/chat/completions
- Request format: OpenAI JSON schema
- Response format: OpenAI JSON schema
Best Practices
- Use LiteLLM for complex multi-provider setups
- Local development with Ollama for privacy
- Production use VLLM for performance
- Monitoring add health checks and logging
- Security use HTTPS and authentication
- Timeouts set appropriate timeouts for your use case
- Fallbacks configure backup providers
Example Configurations
Multi-Provider Setup
{
"model_list": [
{
"model_name": "primary",
"model": "openai/gpt-5.2",
"api_key": "sk-..."
},
{
"model_name": "fallback",
"model": "anthropic/claude-sonnet-4.6",
"api_key": "sk-ant-..."
},
{
"model_name": "local",
"model": "ollama/llama3",
"api_base": "http://localhost:11434/v1"
},
{
"model_name": "proxy",
"model": "litellm/gpt-4",
"api_base": "http://localhost:4000/v1",
"api_key": "sk-1234"
}
]
}
Development Setup
{
"model_list": [
{
"model_name": "dev",
"model": "ollama/llama3",
"api_base": "http://localhost:11434/v1"
}
],
"agents": {
"defaults": {
"model_name": "dev",
"max_tokens": 2048
}
}
}