Overview
Prompt drift detection protects against prompt injection attacks by hashing system prompts and comparing them to a stored baseline. When drift is detected, Fishnet can alert or block requests based on your configured policy.How It Works
Baseline Capture
On the first request for each provider, Fishnet:- Extracts the system prompt from the request body
- Normalizes the text (optional whitespace collapsing)
- Computes a Keccak256 hash of the prompt
- Stores the hash as the baseline for that provider
~/workspace/source/crates/server/src/llm_guard.rs:265-329
Drift Detection
On subsequent requests:- The current system prompt is hashed using the same algorithm
- The hash is compared to the stored baseline
- If hashes differ, drift is detected and the configured action is taken
Supported Providers
- OpenAI: Extracts from
messages[].contentwhererole == "system" - Anthropic: Extracts from top-level
systemfield
Configuration
Add to yourfishnet.toml:
Parameters
Enable prompt drift detection
Action when drift is detected:
"alert": Log and create alert, but allow request"deny": Block request immediately"ignore": Detect but take no action (logs only)
Number of characters to hash from start of prompt:
0: Hash the entire prompt (detects any change)>0: Hash only first N characters (ignores changes beyond that point)
Collapse all whitespace to single spaces before hashing. Prevents false positives from formatting changes.
Hash algorithm (currently only
"keccak256" is supported)Detection Modes
Alert Mode (Recommended)
- Requests proceed normally
- Alert is created with
AlertType::PromptDriftand severityCritical - Alert message includes old hash, new hash, and character range
- Webhook notifications are dispatched if configured
Deny Mode (High Security)
- Request is immediately rejected with
403 Forbidden - Error message:
"System prompt drift detected. Request blocked by policy." - Alert is still created for audit trail
- Repeated injection attempts continue to be blocked
~/workspace/source/crates/server/src/llm_guard.rs:362-383
Ignore Mode (Monitoring Only)
- Drift is logged to stderr
- No alerts or blocking
- Useful for testing baseline behavior before enforcement
Baseline Storage
Persistence
Baselines are persisted to disk by default: Default path:~/.local/share/fishnet/baselines.json
Baselines survive server restarts. On startup, Fishnet loads existing baselines and continues monitoring.
Source: ~/workspace/source/crates/server/src/llm_guard.rs:44-90
Clearing Baselines
Use the API to reset baselines:Advanced Usage
Partial Prompt Hashing
Hash only the static prefix of your system prompt:Verify dynamic changes are ignored
Send requests with different dynamic suffixes. No drift should be detected.
Whitespace Normalization
Prevent false positives from formatting changes:Alert Integration
Enable webhook notifications for prompt drift:Alert Payload
Security Considerations
Baseline capture timing: The first request sets the baseline. If an attacker’s request arrives first, their injected prompt becomes the baseline.Mitigation: Capture baselines during application startup by sending a known-good request from your own code before accepting external traffic.
Implementation Details
Hash Algorithm
Fishnet uses Keccak256 (the Ethereum-compatible SHA-3 variant) for prompt hashing:~/workspace/source/crates/server/src/llm_guard.rs:256-263
Normalization Process
~/workspace/source/crates/server/src/llm_guard.rs:242-254
Example Scenarios
Scenario 1: Detecting Prompt Injection
Baseline prompt:Scenario 2: Ignoring Dynamic Context
Troubleshooting
Baseline not captured
Symptom: Every request shows “BaselineCaptured” in logs. Cause: Persistence path is not writable or baselines are being cleared on restart. Fix:False positives from formatting
Symptom: Drift detected for identical prompts with different whitespace. Fix: Enable whitespace normalization:Drift not detected for suffix changes
Symptom: Changes to the end of the prompt are ignored. Cause:hash_chars is set too low.
Fix: Increase hash_chars or set to 0 to hash the full prompt.
Next Steps
Endpoint Blocking
Block dangerous API endpoints by pattern
Onchain Permits
Sign EIP-712 permits for blockchain transactions