LLM Providers

Fishnet provides secure proxying for OpenAI and Anthropic APIs with automatic cost tracking, rate limiting, and prompt guardrails.

Supported Providers

Fishnet proxies requests to:

OpenAI (/proxy/openai/*) - ChatGPT, GPT-4, and other OpenAI models
Anthropic (/proxy/anthropic/*) - Claude models

Each provider has its credentials stored securely in Fishnet’s encrypted vault.

API Base URLs

By default, Fishnet proxies to the official APIs:

OpenAI: https://api.openai.com
Anthropic: https://api.anthropic.com

You can override these with environment variables:

export FISHNET_OPENAI_API_BASE="https://api.openai.com"
export FISHNET_ANTHROPIC_API_BASE="https://api.anthropic.com"

This is useful for:

Custom OpenAI-compatible endpoints
Self-hosted models with OpenAI API compatibility
Testing with mock servers

Credential Setup

Store your API keys in Fishnet’s encrypted vault:

Add OpenAI Credential

fishnet vault set openai
# Enter your OpenAI API key when prompted

The key is stored with the service name openai and automatically attached to requests as:

Authorization: Bearer sk-...

Add Anthropic Credential

fishnet vault set anthropic
# Enter your Anthropic API key when prompted

The key is stored with the service name anthropic and attached as:

x-api-key: sk-ant-...

Credentials are encrypted at rest using AES-256-GCM with a vault master key derived from your password.

Making Proxied Requests

OpenAI

Replace https://api.openai.com with http://localhost:8473/proxy/openai:

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8473/proxy/openai/v1",
    api_key="dummy"  # Fishnet provides the real key from vault
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the capital of France?"}
    ]
)

Anthropic

Replace https://api.anthropic.com with http://localhost:8473/proxy/anthropic:

import anthropic

client = anthropic.Anthropic(
    base_url="http://localhost:8473/proxy/anthropic",
    api_key="dummy"  # Fishnet provides the real key from vault
)

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "What is the capital of France?"}
    ]
)

Spend Tracking

Fishnet automatically tracks token usage and costs for every LLM request.

Enabling Spend Tracking

[llm]
track_spend = true
daily_budget_usd = 20.0  # Daily limit across all providers
budget_warning_pct = 80   # Alert at 80% of budget

llm.track_spend

boolean

default:"true"

Parse response bodies to extract token usage and calculate costs based on model_pricing

llm.daily_budget_usd

float

default:"20.0"

Combined daily budget for OpenAI + Anthropic in USD. Spend is tracked per-day in UTC.

llm.budget_warning_pct

integer

default:"80"

Alert when daily spend reaches this percentage of the budget (0-100)

How Costs Are Calculated

Parse Usage: Fishnet extracts token counts from response bodies:
- OpenAI: usage.prompt_tokens and usage.completion_tokens
- Anthropic: usage.input_tokens and usage.output_tokens
Lookup Pricing: Match the model name to llm.model_pricing configuration

Calculate Cost:

cost_usd = (input_tokens * input_per_million_usd / 1_000_000) +
           (output_tokens * output_per_million_usd / 1_000_000)

Record: Store in SQLite database with service name, date, and cost in micro-USD

Streaming Responses

For streaming requests (stream: true), Fishnet:

Automatically adds stream_options: {include_usage: true} for OpenAI chat completions
Parses usage events in the Server-Sent Events stream
Tracks costs when usage data appears in the stream

Note: Interrupted streams may not include usage data, resulting in untracked costs.

Model Pricing

Define per-token costs for accurate spend tracking.

Default Pricing

Fishnet includes built-in pricing for common models:

Model	Input (per 1M tokens)	Output (per 1M tokens)
gpt-4o	$2.50	$10.00
gpt-4o-mini	$0.15	$0.60
claude-sonnet	$3.00	$15.00

Custom Pricing

Add or override model pricing in fishnet.toml:

[llm.model_pricing.gpt-4o]
input_per_million_usd = 2.50
output_per_million_usd = 10.0

[llm.model_pricing."gpt-4o-mini"]
input_per_million_usd = 0.15
output_per_million_usd = 0.60

[llm.model_pricing."claude-sonnet"]
input_per_million_usd = 3.0
output_per_million_usd = 15.0

[llm.model_pricing."claude-opus"]
input_per_million_usd = 15.0
output_per_million_usd = 75.0

llm.model_pricing.<model>

object

Pricing for a specific model. Model names are trimmed and normalized.

llm.model_pricing.<model>.input_per_million_usd

float

required

Cost per million input tokens in USD. Must be non-negative and finite.

llm.model_pricing.<model>.output_per_million_usd

float

required

Cost per million output tokens in USD. Must be non-negative and finite.

Model Name Matching

Fishnet uses fuzzy matching to handle model versioning:

Exact match: gpt-4o matches gpt-4o
Case-insensitive: GPT-4O matches gpt-4o
Prefix match: gpt-4o-2024-05-13 matches pricing for gpt-4o
Longest match wins: If both gpt-4o and gpt-4o-mini are configured, gpt-4o-mini-2024-07-18 uses the gpt-4o-mini pricing

This allows you to define pricing for model families without listing every version.

Rate Limiting

Prevent runaway agent loops from exhausting API quotas.

[llm]
rate_limit_per_minute = 60  # Max 60 requests/minute per provider

llm.rate_limit_per_minute

integer

default:"60"

Maximum requests per minute per provider (OpenAI and Anthropic have separate limits). Set to 0 to disable rate limiting.

When the rate limit is exceeded:

Request is denied with HTTP 429 Too Many Requests
Response includes retry_after_seconds field
Alert is created if alerts.rate_limit_hit = true

Model Restrictions

Restrict which models can be used (e.g., prevent expensive GPT-4 usage in development).

[llm]
allowed_models = ["gpt-4o-mini", "claude-sonnet"]

llm.allowed_models

array[string]

default:"[]"

Whitelist of allowed model names. Empty array allows all models. Model names are matched case-insensitively.

When a request uses a non-allowed model:

Request is denied with HTTP 403 Forbidden
Error message: model not in allowlist: <model_name>

Budget Alerts

Fishnet automatically creates alerts when spend thresholds are reached.

[llm]
track_spend = true
daily_budget_usd = 20.0
budget_warning_pct = 80

[alerts]
budget_warning = true
budget_exceeded = true

Alert Deduplication

Fishnet only creates one budget alert per day:

One budget_warning alert per UTC day
One budget_exceeded alert per UTC day (replaces warning)

Alerts are dispatched via configured webhooks.

Prompt Guardrails

See the main fishnet.toml reference for:

Prompt Drift Detection: Alert on unexpected system prompt changes
Prompt Size Guard: Prevent excessively large prompts

Viewing Spend Data

Access spend data through:

Dashboard

View daily spend by provider at http://localhost:8473/dashboard

API

Query spend programmatically:

# Get today's spend totals
curl -H "Authorization: Bearer $SESSION_TOKEN" \
  http://localhost:8473/api/spend/today

# Response:
# [
#   {"service": "openai", "cost_usd": 1.2345},
#   {"service": "anthropic", "cost_usd": 0.5678}
# ]

Audit Log

Every LLM request is logged with cost information:

curl -H "Authorization: Bearer $SESSION_TOKEN" \
  http://localhost:8473/api/audit

{
  "id": "aud_...",
  "intent_type": "api_call",
  "service": "openai",
  "action": "POST /v1/chat/completions",
  "decision": "approved",
  "cost_usd": 0.0015,
  "timestamp": 1735689600
}

Complete Example

[llm]
# Spend tracking
track_spend = true
daily_budget_usd = 50.0
budget_warning_pct = 80

# Rate limiting
rate_limit_per_minute = 100

# Model restrictions
allowed_models = [
  "gpt-4o",
  "gpt-4o-mini",
  "claude-sonnet",
]

# Custom pricing for new models
[llm.model_pricing."gpt-4o"]
input_per_million_usd = 2.50
output_per_million_usd = 10.0

[llm.model_pricing."claude-sonnet-4-20250514"]
input_per_million_usd = 3.0
output_per_million_usd = 15.0

# Prompt guardrails
[llm.prompt_drift]
enabled = true
mode = "alert"

[llm.prompt_size_guard]
enabled = true
max_prompt_tokens = 50000
action = "deny"

# Alerts
[alerts]
budget_warning = true
budget_exceeded = true
rate_limit_hit = true
prompt_drift = true
prompt_size = true

Get Started

Core Concepts

Security Features

Configuration

Operations

Integrations

Supported Providers

API Base URLs

Credential Setup

Making Proxied Requests

OpenAI

Anthropic

Spend Tracking

Enabling Spend Tracking

How Costs Are Calculated

Streaming Responses

Model Pricing

Default Pricing

Custom Pricing

Model Name Matching

Rate Limiting

Model Restrictions

Budget Alerts

Alert Deduplication

Prompt Guardrails

Viewing Spend Data

Dashboard

API

Audit Log

Complete Example

Build docs developers (and LLMs) love

Get Started

Core Concepts

Security Features

Configuration

Operations

Integrations

​Supported Providers

​API Base URLs

​Credential Setup

​Making Proxied Requests

​OpenAI

​Anthropic

​Spend Tracking

​Enabling Spend Tracking

​How Costs Are Calculated

​Streaming Responses

​Model Pricing

​Default Pricing

​Custom Pricing

​Model Name Matching

​Rate Limiting

​Model Restrictions

​Budget Alerts

​Alert Deduplication

​Prompt Guardrails

​Viewing Spend Data

​Dashboard

​API

​Audit Log

​Complete Example

Build docs developers (and LLMs) love

Supported Providers

API Base URLs

Credential Setup

Making Proxied Requests

OpenAI

Anthropic

Spend Tracking

Enabling Spend Tracking

How Costs Are Calculated

Streaming Responses

Model Pricing

Default Pricing

Custom Pricing

Model Name Matching

Rate Limiting

Model Restrictions

Budget Alerts

Alert Deduplication

Prompt Guardrails

Viewing Spend Data

Dashboard

API

Audit Log

Complete Example