Overview
Virtual keys allow you to create API keys with:
- Custom budgets and rate limits
- Model access restrictions
- Expiration dates
- Team associations
- Metadata and tags
Generate a Key
Create a virtual key using the master key:
curl -X POST 'http://localhost:4000/key/generate' \
-H 'Authorization: Bearer sk-1234' \
-H 'Content-Type: application/json' \
-d '{
"models": ["gpt-3.5-turbo", "gpt-4"],
"max_budget": 10.0,
"duration": "30d"
}'
Response:
{
"key": "sk-1234567890abcdef",
"key_name": null,
"expires": "2024-04-15T10:30:00Z",
"models": ["gpt-3.5-turbo", "gpt-4"],
"max_budget": 10.0,
"budget_duration": "30d",
"budget_reset_at": "2024-04-15T10:30:00Z"
}
Key Generation Parameters
Basic Parameters
{
"key_name": "production-api-key", // Optional friendly name
"duration": "30d", // Key expiration (e.g., 30d, 24h, null for no expiry)
"models": ["gpt-3.5-turbo", "gpt-4"], // Allowed models
"metadata": { // Custom metadata
"environment": "production",
"team": "backend"
}
}
Budget Parameters
{
"max_budget": 100.0, // Maximum spend in USD
"budget_duration": "30d", // Budget reset period
"soft_budget": 80.0 // Alert threshold (80% of max_budget)
}
Rate Limiting
{
"rpm": 100, // Requests per minute
"tpm": 100000, // Tokens per minute
"max_parallel_requests": 10
}
Team Association
{
"team_id": "team-abc-123",
"user_id": "user-xyz-456"
}
Complete Example
curl -X POST 'http://localhost:4000/key/generate' \
-H 'Authorization: Bearer sk-1234' \
-H 'Content-Type: application/json' \
-d '{
"key_name": "production-backend",
"duration": "90d",
"models": ["gpt-3.5-turbo", "gpt-4"],
"max_budget": 100.0,
"budget_duration": "30d",
"soft_budget": 80.0,
"rpm": 100,
"tpm": 100000,
"metadata": {
"environment": "production",
"team": "backend"
}
}'
Retrieve information about a key:
curl -X GET 'http://localhost:4000/key/info' \
-H 'Authorization: Bearer sk-1234567890abcdef'
Response:
{
"key": "sk-1234...def",
"key_name": "production-backend",
"team_id": null,
"max_budget": 100.0,
"spend": 45.23,
"budget_reset_at": "2024-04-15T10:30:00Z",
"models": ["gpt-3.5-turbo", "gpt-4"],
"rpm": 100,
"tpm": 100000,
"expires": "2024-07-15T10:30:00Z",
"metadata": {
"environment": "production",
"team": "backend"
}
}
Update a Key
Modify key properties:
curl -X POST 'http://localhost:4000/key/update' \
-H 'Authorization: Bearer sk-1234' \
-H 'Content-Type: application/json' \
-d '{
"key": "sk-1234567890abcdef",
"max_budget": 200.0,
"models": ["gpt-3.5-turbo", "gpt-4", "claude-3-opus"],
"rpm": 200
}'
You can only update a key using the master key, not the key itself.
Delete a Key
Revoke a virtual key:
curl -X POST 'http://localhost:4000/key/delete' \
-H 'Authorization: Bearer sk-1234' \
-H 'Content-Type: application/json' \
-d '{
"keys": ["sk-1234567890abcdef"]
}'
List All Keys
Get all virtual keys:
curl -X GET 'http://localhost:4000/key/list' \
-H 'Authorization: Bearer sk-1234'
Key Auto-Rotation
Configure automatic key rotation:
curl -X POST 'http://localhost:4000/key/generate' \
-H 'Authorization: Bearer sk-1234' \
-H 'Content-Type: application/json' \
-d '{
"key_alias": "production-key",
"auto_rotate": true,
"rotation_interval": "90d",
"models": ["gpt-3.5-turbo"],
"max_budget": 100.0
}'
The key will automatically rotate every 90 days. The key_alias remains constant while the underlying key changes.
Budget Tracking
Check Spend
Monitor key spending:
curl -X GET 'http://localhost:4000/key/info' \
-H 'Authorization: Bearer sk-1234567890abcdef'
The response includes:
spend: Current spend
max_budget: Budget limit
budget_reset_at: When budget resets
Budget Alerts
Set soft budget for alerts:
{
"max_budget": 100.0,
"soft_budget": 80.0 // Alert at 80% usage
}
Configure webhook for alerts in your config:
litellm_settings:
alerting:
- slack
alerting_threshold: 0.8 # Alert at 80% budget
slack_webhook_url: os.environ/SLACK_WEBHOOK_URL
Model Access Control
Restrict to Specific Models
{
"models": ["gpt-3.5-turbo", "gpt-4"]
}
Requests to other models will be rejected:
{
"error": {
"message": "API key does not have access to model: claude-3-opus",
"type": "invalid_request_error"
}
}
Allow All Models
Omit the models parameter or use null:
{
"models": null // Access to all configured models
}
Rate Limiting
Per-Key Rate Limits
{
"rpm": 100, // 100 requests per minute
"tpm": 100000, // 100k tokens per minute
"max_parallel_requests": 10 // Max concurrent requests
}
When rate limit is exceeded:
{
"error": {
"message": "Rate limit exceeded. Retry after 60 seconds.",
"type": "rate_limit_error"
}
}
Team Keys
Generate keys associated with teams:
Create a Team
curl -X POST 'http://localhost:4000/team/new' \
-H 'Authorization: Bearer sk-1234' \
-H 'Content-Type: application/json' \
-d '{
"team_alias": "engineering",
"max_budget": 1000.0,
"budget_duration": "30d"
}'
Generate Team Key
curl -X POST 'http://localhost:4000/key/generate' \
-H 'Authorization: Bearer sk-1234' \
-H 'Content-Type: application/json' \
-d '{
"team_id": "team-abc-123",
"models": ["gpt-3.5-turbo"],
"max_budget": 100.0
}'
Team keys inherit team budgets and settings. The key budget is separate from the team budget.
Attach custom metadata to keys:
{
"metadata": {
"environment": "production",
"service": "backend-api",
"owner": "[email protected]",
"cost_center": "engineering"
}
}
Use metadata for:
- Cost allocation
- Usage tracking
- Access auditing
- Organizational reporting
Security Best Practices
1. Master Key Protection
Never expose the master key in client applications. Use virtual keys instead.
# Store master key securely
export LITELLM_MASTER_KEY=$(cat /secure/path/master_key.txt)
2. Key Rotation
Rotate keys regularly:
# Generate new key
curl -X POST 'http://localhost:4000/key/generate' ...
# Update applications
# Delete old key
curl -X POST 'http://localhost:4000/key/delete' \
-H 'Authorization: Bearer sk-1234' \
-d '{"keys": ["old-key"]}'
3. Principle of Least Privilege
Grant minimum required access:
{
"models": ["gpt-3.5-turbo"], // Only specific model
"max_budget": 10.0, // Low budget
"duration": "7d", // Short expiration
"rpm": 10 // Low rate limit
}
4. Monitor Usage
Regularly audit key usage:
# List all keys
curl -X GET 'http://localhost:4000/key/list' \
-H 'Authorization: Bearer sk-1234'
# Check spend
curl -X GET 'http://localhost:4000/spend/keys' \
-H 'Authorization: Bearer sk-1234'
Programmatic Key Management
import requests
class LiteLLMKeyManager:
def __init__(self, base_url, master_key):
self.base_url = base_url
self.headers = {
'Authorization': f'Bearer {master_key}',
'Content-Type': 'application/json'
}
def create_key(self, **kwargs):
response = requests.post(
f'{self.base_url}/key/generate',
headers=self.headers,
json=kwargs
)
return response.json()
def delete_key(self, key):
response = requests.post(
f'{self.base_url}/key/delete',
headers=self.headers,
json={'keys': [key]}
)
return response.json()
def get_key_info(self, key):
response = requests.get(
f'{self.base_url}/key/info',
headers={'Authorization': f'Bearer {key}'}
)
return response.json()
# Usage
manager = LiteLLMKeyManager(
base_url='http://localhost:4000',
master_key='sk-1234'
)
# Create key
key = manager.create_key(
models=['gpt-3.5-turbo'],
max_budget=10.0,
duration='30d'
)
print(f"Created: {key['key']}")
# Get info
info = manager.get_key_info(key['key'])
print(f"Spend: ${info['spend']}")
# Delete key
manager.delete_key(key['key'])
Next Steps
Budget Alerts
Set up spending alerts and notifications
Configuration
Advanced proxy configuration
Quick Start
Get started with the proxy
Docker Deployment
Deploy in production