Skip to main content
This guide covers operational best practices for running AIP in production. Whether you’re deploying locally or in Kubernetes, these recommendations will help you harden security, maintain auditability, and ensure high availability.

Security Hardening

Policy File Protection

The policy file is the root of trust for AIP. Compromising it means bypassing all authorization checks.
Critical: AIP automatically protects the policy file from agent access, but you must secure it at the filesystem and deployment level.

Local Deployment

# Set restrictive permissions (owner read-only)
chmod 400 /etc/aip/agent.yaml
chown root:root /etc/aip/agent.yaml

# Verify
ls -la /etc/aip/agent.yaml
# Expected: -r-------- 1 root root 1234 Mar 03 10:00 agent.yaml

Kubernetes Deployment

Store policies in Secrets instead of ConfigMaps (encrypted at rest):
apiVersion: v1
kind: Secret
metadata:
  name: agent-policy
  namespace: default
type: Opaque
stringData:
  policy.yaml: |
    apiVersion: aip.io/v1alpha1
    kind: AgentPolicy
    metadata:
      name: production-agent
    spec:
      mode: enforce
      allowed_tools:
        - read_file
Mount as read-only volume:
volumeMounts:
  - name: aip-policy
    mountPath: /etc/aip
    readOnly: true  # Prevent writes
volumes:
  - name: aip-policy
    secret:
      secretName: agent-policy
      defaultMode: 0400  # Read-only for owner

Policy Signing (v1alpha2)

Sign policies to prevent tampering:
# Generate signing key
openssl genpkey -algorithm Ed25519 -out policy-signing-key.pem

# Sign policy
aip sign-policy --key policy-signing-key.pem --policy agent.yaml > agent.signed.yaml
The signed policy includes a cryptographic signature in metadata:
metadata:
  name: production-agent
  signature: "ed25519:YWJjZGVm..."
AIP verifies the signature on load. Unsigned or tampered policies are rejected.
Store signing keys in a secrets manager (AWS Secrets Manager, HashiCorp Vault, Kubernetes Secrets with encryption at rest).

Principle of Least Privilege

Start with minimal permissions and expand only as needed:
1

Deploy with an empty allowlist

spec:
  mode: monitor  # Don't block yet
  allowed_tools: []  # Deny all by default
2

Run for 24-48 hours and collect audit logs

# Identify all tools the agent attempted to use
cat aip-audit.jsonl | jq -r '.tool' | sort | uniq
3

Add only the necessary tools to allowed_tools

spec:
  mode: enforce  # Now block violations
  allowed_tools:
    - read_file
    - list_directory
    - github_get_repo  # Only what's needed
4

Review logs monthly and prune unused tools

# Find tools allowed but never used in 30 days
cat aip-audit.jsonl | jq -r 'select(.timestamp > "2026-02-01") | .tool' | sort | uniq

DLP Pattern Hardening

Use comprehensive DLP patterns to prevent exfiltration:
spec:
  dlp:
    enabled: true
    patterns:
      # API Keys and Tokens
      - name: "AWS Key"
        regex: "(A3T[A-Z0-9]|AKIA|AGPA|AIDA|AROA|AIPA|ANPA|ANVA|ASIA)[A-Z0-9]{16}"
      
      - name: "GitHub Token"
        regex: "gh[pousr]_[a-zA-Z0-9]{36,255}"
      
      - name: "OpenAI API Key"
        regex: "sk-[a-zA-Z0-9]{48}"
      
      - name: "Stripe Key"
        regex: "sk_(test|live)_[a-zA-Z0-9]{24,99}"
      
      # PII
      - name: "Email Address"
        regex: "[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}"
      
      - name: "SSN"
        regex: "\\b\\d{3}-\\d{2}-\\d{4}\\b"
      
      - name: "Credit Card"
        regex: "\\b(?:\\d{4}[- ]?){3}\\d{4}\\b"
      
      # Private Keys
      - name: "Private Key"
        regex: "-----BEGIN (RSA |EC |DSA |OPENSSH )?PRIVATE KEY-----"
      
      # Internal URLs
      - name: "Internal IP"
        regex: "\\b(10\\.|172\\.(1[6-9]|2[0-9]|3[01])\\.|192\\.168\\.)\\d{1,3}\\.\\d{1,3}\\b"
DLP patterns must use RE2 syntax (not PCRE). Test patterns before deployment:
echo "AKIAIOSFODNN7EXAMPLE" | grep -P "AKIA[A-Z0-9]{16}"

Protected Paths

Block access to sensitive files and directories:
spec:
  protected_paths:
    # SSH keys
    - ~/.ssh
    - /root/.ssh
    
    # Cloud credentials
    - ~/.aws/credentials
    - ~/.config/gcloud
    - ~/.azure
    
    # Environment files
    - .env
    - .env.local
    - .env.production
    
    # Package manager credentials
    - ~/.npmrc
    - ~/.pypirc
    - ~/.docker/config.json
    
    # Database credentials
    - /etc/postgresql
    - /var/lib/mysql
AIP automatically blocks any tool argument containing these paths.

Policy Management

Versioning and Change Control

Treat policies as infrastructure-as-code:
metadata:
  name: production-agent
  version: "2.1.0"  # Semantic versioning
  owner: [email protected]
spec:
  # ...
Recommended workflow:
  1. Store policies in Git
  2. Require pull requests for changes
  3. Run conformance tests in CI/CD
  4. Tag releases (e.g., v2.1.0)
  5. Deploy via GitOps (ArgoCD, Flux)

Policy Review Cadence

FrequencyReview TypeAction
WeeklyAudit log reviewCheck for unexpected tool usage
MonthlyPermission pruningRemove unused tools from allowed_tools
QuarterlyDLP pattern updatesAdd new secret patterns (e.g., new API key formats)
AnnuallyFull security auditExternal review of policy logic

Environment-Specific Policies

Use separate policies for dev, staging, and production:
# Development (monitor mode)
policies/dev/agent.yaml

# Staging (enforce with logging)
policies/staging/agent.yaml

# Production (enforce + DLP + rate limits)
policies/production/agent.yaml
Example production policy (stricter than dev):
# policies/production/agent.yaml
apiVersion: aip.io/v1alpha1
kind: AgentPolicy
metadata:
  name: production-agent
  version: "2.0.0"
spec:
  mode: enforce  # Never use monitor in production
  allowed_tools:
    - read_file
    - github_get_repo
  tool_rules:
    - tool: github_create_pull
      action: ask  # Human approval required
      rate_limit: "5/hour"
  dlp:
    enabled: true
    patterns:
      - name: "Production Database URL"
        regex: "postgres://.*@prod-db\\."

Monitoring and Alerting

Audit Log Monitoring

Critical Events to Alert On

# Alert if >10 blocked requests in 5 minutes
cat aip-audit.jsonl | jq -r 'select(.decision == "BLOCK")' | wc -l

Prometheus Alerting Rules

For Kubernetes deployments:
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: aip-alerts
  namespace: aip-system
spec:
  groups:
    - name: aip_policy_violations
      interval: 1m
      rules:
        - alert: AIPHighBlockRate
          expr: rate(aip_policy_decisions{decision="BLOCK"}[5m]) > 0.1
          for: 5m
          annotations:
            summary: "High rate of blocked agent requests"
            description: "Agent {{ $labels.agent }} has {{ $value }} blocked requests/sec"
        
        - alert: AIPDLPTriggered
          expr: increase(aip_dlp_redactions_total[5m]) > 0
          for: 1m
          annotations:
            summary: "DLP pattern matched in agent response"
            description: "Rule {{ $labels.rule }} triggered in agent {{ $labels.agent }}"
        
        - alert: AIPPolicyLoadFailed
          expr: aip_policy_load_errors_total > 0
          for: 1m
          annotations:
            summary: "AIP sidecar failed to load policy"
            description: "Check policy syntax and signature"

Centralized Logging

Forward audit logs to a SIEM or log aggregator:

Splunk

# Forward JSONL logs to Splunk HEC
curl -X POST https://splunk.company.com:8088/services/collector \
  -H "Authorization: Splunk <HEC-TOKEN>" \
  -d @aip-audit.jsonl

Elasticsearch

# Bulk import to Elasticsearch
cat aip-audit.jsonl | jq -c '{"index": {"_index": "aip-audit"}}, .' | \
  curl -X POST http://elasticsearch:9200/_bulk \
    -H 'Content-Type: application/x-ndjson' \
    --data-binary @-

AWS CloudWatch

# Stream logs to CloudWatch
aws logs put-log-events \
  --log-group-name /aip/audit \
  --log-stream-name $(hostname) \
  --log-events file://aip-audit.jsonl

Compliance Reporting

Generate compliance reports from audit logs:
# SOC 2: Who accessed what, when?
cat aip-audit.jsonl | jq -r '[.timestamp, .tool, .decision] | @csv'

# GDPR: All actions on customer data
cat aip-audit.jsonl | jq 'select(.args.user_id != null)'

# HIPAA: All access to PHI
cat aip-audit.jsonl | jq 'select(.tool == "read_patient_record")'

High Availability

Local Deployment HA

For critical agents, run AIP in a supervisor that restarts on failure:
# systemd unit file: /etc/systemd/system/aip-proxy.service
[Unit]
Description=AIP Proxy for Production Agent
After=network.target

[Service]
Type=simple
User=aip
ExecStart=/usr/local/bin/aip --target "python /opt/agent/server.py" --policy /etc/aip/agent.yaml
Restart=always
RestartSec=5s

[Install]
WantedBy=multi-user.target
Enable:
sudo systemctl enable aip-proxy
sudo systemctl start aip-proxy

Kubernetes HA

Sidecars inherit pod-level HA from Kubernetes:
apiVersion: apps/v1
kind: Deployment
metadata:
  name: agent
spec:
  replicas: 3  # Multiple instances for HA
  template:
    spec:
      containers:
        - name: aip-proxy
          livenessProbe:
            httpGet:
              path: /healthz
              port: 9091
            initialDelaySeconds: 5
            periodSeconds: 10
          readinessProbe:
            httpGet:
              path: /readyz
              port: 9091
            initialDelaySeconds: 3
            periodSeconds: 5

Failure Modes

Configure fail-closed behavior:
spec:
  mode: enforce
  failover:
    on_policy_load_error: block  # Block all if policy invalid
    on_regex_error: block        # Block if regex patterns fail to compile
    on_dlp_error: allow          # Allow if DLP scanner crashes (performance vs security)
Never use failover: allow in production. Always fail-closed to prevent bypass.

Key Rotation and Credential Management

Policy Signing Key Rotation

1

Generate new signing key

openssl genpkey -algorithm Ed25519 -out policy-signing-key-v2.pem
2

Re-sign all policies with new key

for policy in policies/*.yaml; do
  aip sign-policy --key policy-signing-key-v2.pem --policy $policy > ${policy}.signed
done
3

Update AIP proxy to trust both keys (grace period)

aip --policy agent.yaml --trusted-keys key-v1.pub,key-v2.pub
4

After 30 days, remove old key

aip --policy agent.yaml --trusted-keys key-v2.pub

MCP Server Credential Rotation

When rotating credentials for MCP servers (e.g., GitHub tokens):
  1. Update DLP patterns to detect the old token format
  2. Rotate the secret in your secrets manager
  3. Restart agents to pick up new credentials
  4. Monitor audit logs for old token usage (should be zero)
# Add old token format to DLP to detect leaks
dlp:
  patterns:
    - name: "GitHub Token (OLD - REVOKED)"
      regex: "ghp_OldTokenPattern123456"

Performance Optimization

Policy Evaluation Latency

AIP policy evaluation is designed to be less than 1ms per request. Optimize further:

Reduce Regex Complexity

# Slow (catastrophic backtracking risk)
allow_args:
  url: "(https?://)?(www\\.)?(github\\.com|gitlab\\.com|bitbucket\\.org)/.*"

# Fast (linear time)
allow_args:
  url: "^https://github\\.com/[a-zA-Z0-9_-]+/[a-zA-Z0-9_-]+$"

Cache Policy Evaluation Results

For Kubernetes deployments with high request volume:
sidecar:
  cache:
    enabled: true
    ttl: "5m"  # Cache allow/block decisions
    max_entries: 10000
Only enable caching for deterministic policies (no action: ask or time-based rules).

DLP Scanning Performance

DLP scanning can add latency for large responses:
dlp:
  enabled: true
  max_scan_size: "1MB"  # Skip DLP for responses >1MB
  timeout: "100ms"      # Fail-open if scan takes >100ms

Audit Log Performance

Rotate logs frequently to prevent I/O bottlenecks:
# Rotate daily
0 0 * * * mv /var/log/aip/audit.jsonl /var/log/aip/audit-$(date +\%Y\%m\%d).jsonl
Use async log shipping to avoid blocking requests:
audit:
  async: true  # Write logs in background thread
  buffer_size: 1000  # Buffer up to 1000 entries

Disaster Recovery

Backup Policies

Store policy backups in version control and object storage:
# Automated daily backup
0 2 * * * kubectl get agentpolicies -A -o yaml > /backups/aip-policies-$(date +\%Y\%m\%d).yaml
0 2 * * * aws s3 cp /backups/aip-policies-$(date +\%Y\%m\%d).yaml s3://backups/aip/

Audit Log Retention

Comply with regulatory requirements:
RegulationMinimum RetentionRecommended Storage
SOC 21 yearS3 Glacier
GDPR6 monthsEncrypted EBS/S3
HIPAA6 yearsWORM storage (AWS S3 Object Lock)
SOX7 yearsImmutable storage
Example S3 lifecycle policy:
{
  "Rules": [
    {
      "Id": "archive-aip-logs",
      "Status": "Enabled",
      "Transitions": [
        {
          "Days": 90,
          "StorageClass": "GLACIER"
        },
        {
          "Days": 365,
          "StorageClass": "DEEP_ARCHIVE"
        }
      ],
      "Expiration": {
        "Days": 2555
      }
    }
  ]
}

Recovery Testing

Quarterly DR drills:
  1. Delete all policies in staging environment
  2. Restore from backup within SLA (e.g., 15 minutes)
  3. Verify audit logs are intact and queryable
  4. Test agent functionality after restore

Security Incident Response

Suspected Policy Bypass

1

Immediately switch to monitor mode

kubectl patch agentpolicy production-agent --type=merge -p '{"spec":{"mode":"monitor"}}'
This logs all requests without blocking (preserves forensic evidence).
2

Export audit logs for analysis

kubectl logs -l app=agent -c aip-proxy --since=24h > incident-logs.jsonl
3

Identify the bypass vector

# Look for unexpected tools being allowed
cat incident-logs.jsonl | jq 'select(.decision == "ALLOW" and (.tool | IN("exec_command", "delete_file")))'
4

Patch the policy and redeploy

# Add explicit block rule
kubectl edit agentpolicy production-agent
# Switch back to enforce mode
kubectl patch agentpolicy production-agent --type=merge -p '{"spec":{"mode":"enforce"}}'

DLP Leak Detection

If DLP detects a secret in agent output:
  1. Revoke the secret immediately (GitHub token, API key, etc.)
  2. Audit all requests from that agent in the past 7 days
  3. Check for exfiltration (did the secret appear in logs, external services?)
  4. Root cause analysis: Why did the agent access the secret?

Operational Checklists

Pre-Deployment Checklist

  • Policy file has restrictive permissions (chmod 400)
  • Policy is signed with Ed25519 key
  • DLP patterns cover all known secret formats
  • Audit logging is enabled and forwarded to SIEM
  • Alerts configured for policy violations and DLP triggers
  • mode: enforce is set (not monitor)
  • Protected paths include SSH keys, cloud credentials, .env files
  • Backup and recovery procedures tested

Monthly Review Checklist

  • Review blocked requests for false positives
  • Prune unused tools from allowed_tools
  • Update DLP patterns for new secret formats
  • Verify audit log retention meets compliance requirements
  • Check for policy drift (prod vs staging)
  • Review and rotate policy signing keys if >90 days old

Quarterly Security Audit Checklist

  • Penetration test: Attempt policy bypass
  • Review all action: ask approvals (were they legitimate?)
  • Analyze DLP redaction patterns (new threat vectors?)
  • Validate NetworkPolicy rules still align with AIP policy
  • Disaster recovery drill (restore policies and logs from backup)
  • External security review of policy logic

Next Steps

Local Deployment

Run AIP on your local machine

Kubernetes Deployment

Deploy AIP in Kubernetes with sidecar pattern

Policy Reference

Complete policy schema and examples

Security Policy

Report vulnerabilities responsibly

Build docs developers (and LLMs) love