Skip to main content
Fishnet provides secure proxying for OpenAI and Anthropic APIs with automatic cost tracking, rate limiting, and prompt guardrails.

Supported Providers

Fishnet proxies requests to:
  • OpenAI (/proxy/openai/*) - ChatGPT, GPT-4, and other OpenAI models
  • Anthropic (/proxy/anthropic/*) - Claude models
Each provider has its credentials stored securely in Fishnet’s encrypted vault.

API Base URLs

By default, Fishnet proxies to the official APIs:
  • OpenAI: https://api.openai.com
  • Anthropic: https://api.anthropic.com
You can override these with environment variables:
export FISHNET_OPENAI_API_BASE="https://api.openai.com"
export FISHNET_ANTHROPIC_API_BASE="https://api.anthropic.com"
This is useful for:
  • Custom OpenAI-compatible endpoints
  • Self-hosted models with OpenAI API compatibility
  • Testing with mock servers

Credential Setup

Store your API keys in Fishnet’s encrypted vault:
1

Add OpenAI Credential

fishnet vault set openai
# Enter your OpenAI API key when prompted
The key is stored with the service name openai and automatically attached to requests as:
Authorization: Bearer sk-...
2

Add Anthropic Credential

fishnet vault set anthropic
# Enter your Anthropic API key when prompted
The key is stored with the service name anthropic and attached as:
x-api-key: sk-ant-...
Credentials are encrypted at rest using AES-256-GCM with a vault master key derived from your password.

Making Proxied Requests

OpenAI

Replace https://api.openai.com with http://localhost:8473/proxy/openai:
from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8473/proxy/openai/v1",
    api_key="dummy"  # Fishnet provides the real key from vault
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the capital of France?"}
    ]
)

Anthropic

Replace https://api.anthropic.com with http://localhost:8473/proxy/anthropic:
import anthropic

client = anthropic.Anthropic(
    base_url="http://localhost:8473/proxy/anthropic",
    api_key="dummy"  # Fishnet provides the real key from vault
)

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "What is the capital of France?"}
    ]
)

Spend Tracking

Fishnet automatically tracks token usage and costs for every LLM request.

Enabling Spend Tracking

[llm]
track_spend = true
daily_budget_usd = 20.0  # Daily limit across all providers
budget_warning_pct = 80   # Alert at 80% of budget
llm.track_spend
boolean
default:"true"
Parse response bodies to extract token usage and calculate costs based on model_pricing
llm.daily_budget_usd
float
default:"20.0"
Combined daily budget for OpenAI + Anthropic in USD. Spend is tracked per-day in UTC.
llm.budget_warning_pct
integer
default:"80"
Alert when daily spend reaches this percentage of the budget (0-100)

How Costs Are Calculated

  1. Parse Usage: Fishnet extracts token counts from response bodies:
    • OpenAI: usage.prompt_tokens and usage.completion_tokens
    • Anthropic: usage.input_tokens and usage.output_tokens
  2. Lookup Pricing: Match the model name to llm.model_pricing configuration
  3. Calculate Cost:
    cost_usd = (input_tokens * input_per_million_usd / 1_000_000) +
               (output_tokens * output_per_million_usd / 1_000_000)
    
  4. Record: Store in SQLite database with service name, date, and cost in micro-USD

Streaming Responses

For streaming requests (stream: true), Fishnet:
  • Automatically adds stream_options: {include_usage: true} for OpenAI chat completions
  • Parses usage events in the Server-Sent Events stream
  • Tracks costs when usage data appears in the stream
Note: Interrupted streams may not include usage data, resulting in untracked costs.

Model Pricing

Define per-token costs for accurate spend tracking.

Default Pricing

Fishnet includes built-in pricing for common models:
ModelInput (per 1M tokens)Output (per 1M tokens)
gpt-4o$2.50$10.00
gpt-4o-mini$0.15$0.60
claude-sonnet$3.00$15.00

Custom Pricing

Add or override model pricing in fishnet.toml:
[llm.model_pricing.gpt-4o]
input_per_million_usd = 2.50
output_per_million_usd = 10.0

[llm.model_pricing."gpt-4o-mini"]
input_per_million_usd = 0.15
output_per_million_usd = 0.60

[llm.model_pricing."claude-sonnet"]
input_per_million_usd = 3.0
output_per_million_usd = 15.0

[llm.model_pricing."claude-opus"]
input_per_million_usd = 15.0
output_per_million_usd = 75.0
llm.model_pricing.<model>
object
Pricing for a specific model. Model names are trimmed and normalized.
llm.model_pricing.<model>.input_per_million_usd
float
required
Cost per million input tokens in USD. Must be non-negative and finite.
llm.model_pricing.<model>.output_per_million_usd
float
required
Cost per million output tokens in USD. Must be non-negative and finite.

Model Name Matching

Fishnet uses fuzzy matching to handle model versioning:
  1. Exact match: gpt-4o matches gpt-4o
  2. Case-insensitive: GPT-4O matches gpt-4o
  3. Prefix match: gpt-4o-2024-05-13 matches pricing for gpt-4o
  4. Longest match wins: If both gpt-4o and gpt-4o-mini are configured, gpt-4o-mini-2024-07-18 uses the gpt-4o-mini pricing
This allows you to define pricing for model families without listing every version.

Rate Limiting

Prevent runaway agent loops from exhausting API quotas.
[llm]
rate_limit_per_minute = 60  # Max 60 requests/minute per provider
llm.rate_limit_per_minute
integer
default:"60"
Maximum requests per minute per provider (OpenAI and Anthropic have separate limits). Set to 0 to disable rate limiting.
When the rate limit is exceeded:
  • Request is denied with HTTP 429 Too Many Requests
  • Response includes retry_after_seconds field
  • Alert is created if alerts.rate_limit_hit = true

Model Restrictions

Restrict which models can be used (e.g., prevent expensive GPT-4 usage in development).
[llm]
allowed_models = ["gpt-4o-mini", "claude-sonnet"]
llm.allowed_models
array[string]
default:"[]"
Whitelist of allowed model names. Empty array allows all models. Model names are matched case-insensitively.
When a request uses a non-allowed model:
  • Request is denied with HTTP 403 Forbidden
  • Error message: model not in allowlist: <model_name>

Budget Alerts

Fishnet automatically creates alerts when spend thresholds are reached.
[llm]
track_spend = true
daily_budget_usd = 20.0
budget_warning_pct = 80

[alerts]
budget_warning = true
budget_exceeded = true

Alert Deduplication

Fishnet only creates one budget alert per day:
  • One budget_warning alert per UTC day
  • One budget_exceeded alert per UTC day (replaces warning)
Alerts are dispatched via configured webhooks.

Prompt Guardrails

See the main fishnet.toml reference for:
  • Prompt Drift Detection: Alert on unexpected system prompt changes
  • Prompt Size Guard: Prevent excessively large prompts

Viewing Spend Data

Access spend data through:

Dashboard

View daily spend by provider at http://localhost:8473/dashboard

API

Query spend programmatically:
# Get today's spend totals
curl -H "Authorization: Bearer $SESSION_TOKEN" \
  http://localhost:8473/api/spend/today

# Response:
# [
#   {"service": "openai", "cost_usd": 1.2345},
#   {"service": "anthropic", "cost_usd": 0.5678}
# ]

Audit Log

Every LLM request is logged with cost information:
curl -H "Authorization: Bearer $SESSION_TOKEN" \
  http://localhost:8473/api/audit
{
  "id": "aud_...",
  "intent_type": "api_call",
  "service": "openai",
  "action": "POST /v1/chat/completions",
  "decision": "approved",
  "cost_usd": 0.0015,
  "timestamp": 1735689600
}

Complete Example

[llm]
# Spend tracking
track_spend = true
daily_budget_usd = 50.0
budget_warning_pct = 80

# Rate limiting
rate_limit_per_minute = 100

# Model restrictions
allowed_models = [
  "gpt-4o",
  "gpt-4o-mini",
  "claude-sonnet",
]

# Custom pricing for new models
[llm.model_pricing."gpt-4o"]
input_per_million_usd = 2.50
output_per_million_usd = 10.0

[llm.model_pricing."claude-sonnet-4-20250514"]
input_per_million_usd = 3.0
output_per_million_usd = 15.0

# Prompt guardrails
[llm.prompt_drift]
enabled = true
mode = "alert"

[llm.prompt_size_guard]
enabled = true
max_prompt_tokens = 50000
action = "deny"

# Alerts
[alerts]
budget_warning = true
budget_exceeded = true
rate_limit_hit = true
prompt_drift = true
prompt_size = true

Build docs developers (and LLMs) love