Self-Hosting Guide

Overview

Grip AI is designed for safe self-hosting with built-in security features. This guide covers production deployment best practices, security considerations, and common configurations.

Security Architecture

Grip implements multiple layers of security:

1. Non-Root User

The official Dockerfile creates and runs as a non-root user by default:

# User: grip (UID 1000, GID 1000)
USER grip

This prevents privilege escalation attacks and limits blast radius if the container is compromised.

Never run Grip as root in production. The container is designed to run as UID 1000.

2. Directory Trust Model

Grip restricts file access by default:

Workspace First: Agent can always access its workspace directory
Explicit Consent: External directories require explicit trust via /trust <path>
Persistent Decisions: Trust settings saved in workspace/state/trusted_dirs.json

Configure trust mode:

# Prompt before accessing new directories (default, safest)
GRIP_TOOLS__TRUST_MODE="prompt"

# Allow any directory the OS user can access (not recommended for production)
GRIP_TOOLS__TRUST_MODE="trust_all"

# Restrict to workspace only (most restrictive)
GRIP_TOOLS__TRUST_MODE="workspace_only"

3. Shell Command Deny-List

Every shell command is scanned against 50+ dangerous patterns before execution:

Destructive commands: rm -rf /, mkfs, dd if=/dev/zero
System control: shutdown, reboot, systemctl poweroff
Credential exfiltration: cat ~/.ssh/id_rsa, cat .env
Remote code injection: curl | bash, wget -O - | sh

This prevents accidental or malicious system damage.

4. Shield Policy (Runtime Threat Feed)

The agent’s system prompt includes a SHIELD.md policy that evaluates actions against active threats:

Scopes: prompt, skill.install, tool.call, network.egress, secrets.read, mcp
Actions: block, require_approval, log
Confidence threshold: >= 0.85 for enforcement

Shield policy is stored at workspace/SHIELD.md and can be customized per deployment.

5. Credential Scrubbing

Tool outputs are automatically redacted before storage:

sk-... API keys (OpenAI/Anthropic)
ghp_... GitHub tokens
xoxb-... Slack tokens
Bearer <token> headers
password=... parameters

This prevents credential leakage in logs and session history.

6. API Security

The REST API includes multiple security layers:

# Bearer token authentication
GRIP_GATEWAY__API__AUTH_TOKEN="grip_your_secret_token"

# Rate limiting (per-IP and per-token)
GRIP_GATEWAY__API__RATE_LIMIT_PER_MINUTE=60
GRIP_GATEWAY__API__RATE_LIMIT_PER_MINUTE_PER_IP=30

# Request size limit (1MB default)
GRIP_GATEWAY__API__MAX_REQUEST_BODY_BYTES=1048576

# Disable direct tool execution (disabled by default)
GRIP_GATEWAY__API__ENABLE_TOOL_EXECUTE=false

Security headers are automatically set:

X-Content-Type-Options: nosniff
X-Frame-Options: DENY
Content-Security-Policy: default-src 'self'

Production Deployment

Recommended Docker Configuration

version: '3.8'

services:
  grip:
    image: grip:latest
    container_name: grip-production
    restart: unless-stopped
    user: "1000:1000"
    
    # Network
    ports:
      - "127.0.0.1:18800:18800"
    
    # Environment
    environment:
      # Engine
      - ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
      - GRIP_AGENTS__DEFAULTS__ENGINE=claude_sdk
      - GRIP_AGENTS__DEFAULTS__SDK_MODEL=claude-sonnet-4-6
      
      # Channels
      - GRIP_CHANNELS__TELEGRAM__ENABLED=true
      - GRIP_CHANNELS__TELEGRAM__TOKEN=${TELEGRAM_BOT_TOKEN}
      - GRIP_CHANNELS__TELEGRAM__ALLOW_FROM=${TELEGRAM_ALLOWED_USERS}
      
      # Gateway
      - GRIP_GATEWAY__HOST=0.0.0.0
      - GRIP_GATEWAY__PORT=18800
      - GRIP_GATEWAY__API__AUTH_TOKEN=${API_AUTH_TOKEN}
      - GRIP_GATEWAY__API__RATE_LIMIT_PER_MINUTE=60
      - GRIP_GATEWAY__API__RATE_LIMIT_PER_MINUTE_PER_IP=30
      - GRIP_GATEWAY__API__ENABLE_TOOL_EXECUTE=false
      
      # Security
      - GRIP_TOOLS__TRUST_MODE=prompt
      - GRIP_TOOLS__SHELL_TIMEOUT=60
    
    # Volumes
    volumes:
      - grip-data:/home/grip/.grip
    
    # Resource limits
    deploy:
      resources:
        limits:
          cpus: '2.0'
          memory: 2G
        reservations:
          cpus: '1.0'
          memory: 1G
    
    # Health check
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:18800/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 10s
    
    # Logging
    logging:
      driver: "json-file"
      options:
        max-size: "10m"
        max-file: "3"

volumes:
  grip-data:
    driver: local

Network Configuration

Local Access Only

Bind to localhost for local-only access:

# Only accessible from the host machine
GRIP_GATEWAY__HOST="127.0.0.1"

# Or in docker-compose ports
ports:
  - "127.0.0.1:18800:18800"

External Access with Reverse Proxy

For external access, use a reverse proxy (nginx, Caddy, Traefik) with HTTPS:

server {
    listen 443 ssl http2;
    server_name grip.example.com;
    
    ssl_certificate /etc/letsencrypt/live/grip.example.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/grip.example.com/privkey.pem;
    
    # Security headers
    add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always;
    add_header X-Content-Type-Options "nosniff" always;
    add_header X-Frame-Options "DENY" always;
    
    # Rate limiting
    limit_req_zone $binary_remote_addr zone=grip_limit:10m rate=10r/s;
    limit_req zone=grip_limit burst=20 nodelay;
    
    location / {
        proxy_pass http://127.0.0.1:18800;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        
        # WebSocket support for SSE
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        proxy_read_timeout 86400;
    }
}

Never expose Grip directly to the internet without a reverse proxy. Always use HTTPS and rate limiting.

Resource Management

Memory Limits

Grip’s memory usage scales with session count and context size:

Minimum: 512MB (basic operation)
Recommended: 1-2GB (production with caching)
Large deployments: 4GB+ (many concurrent sessions)

deploy:
  resources:
    limits:
      memory: 2G
    reservations:
      memory: 1G

CPU Limits

CPU usage spikes during:

LLM API calls (minimal, mostly network I/O)
Message consolidation (LLM-based summarization)
Tool execution (especially shell commands)

deploy:
  resources:
    limits:
      cpus: '2.0'
    reservations:
      cpus: '0.5'

Disk Space

Monitor these directories:

Path	Typical Size	Notes
`~/.grip/sessions/`	10-100MB	Session history (JSON files)
`~/.grip/workspace/`	Varies	Agent workspace files
`~/.grip/logs/`	50-200MB	Application logs (if enabled)

Implement log rotation:

logging:
  driver: "json-file"
  options:
    max-size: "10m"
    max-file: "3"

Monitoring & Observability

Health Checks

Grip provides two health endpoints:

# Public health check (no auth, for load balancers)
curl http://localhost:18800/health
# Response: {"status": "ok"}

# Authenticated health check (with version and uptime)
curl -H "Authorization: Bearer grip_your_token" \
  http://localhost:18800/api/v1/health
# Response: {"status": "ok", "version": "0.1.0", "uptime": 3600}

Metrics

Query runtime metrics:

curl -H "Authorization: Bearer grip_your_token" \
  http://localhost:18800/api/v1/metrics

Returns:

Request counts
Token usage
Session counts
Tool execution stats
Error rates

Logging

Grip logs to stdout/stderr by default. Configure structured logging:

# View logs
docker logs grip-production

# Follow logs
docker logs -f grip-production

# Filter by level
docker logs grip-production 2>&1 | grep ERROR

OpenTelemetry (Optional)

Enable tracing for observability:

# Install with observability extra
uv sync --extra observe

# Configure OTEL endpoint
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318

Backup & Recovery

What to Back Up

# Configuration
~/.grip/config.json

# Session data
~/.grip/sessions/

# Workspace files
~/.grip/workspace/

# Trust decisions
~/.grip/workspace/state/trusted_dirs.json

# Memory & history
~/.grip/workspace/MEMORY.md
~/.grip/workspace/HISTORY.md

Backup Script

backup.sh

#!/bin/bash
set -e

BACKUP_DIR="/backups/grip/$(date +%Y%m%d-%H%M%S)"
mkdir -p "$BACKUP_DIR"

# Stop container gracefully
docker stop grip-production

# Copy data
sudo cp -r /var/lib/docker/volumes/grip-data/_data "$BACKUP_DIR/"

# Start container
docker start grip-production

# Compress backup
tar -czf "$BACKUP_DIR.tar.gz" "$BACKUP_DIR"
rm -rf "$BACKUP_DIR"

echo "Backup complete: $BACKUP_DIR.tar.gz"

Restore

# Stop container
docker stop grip-production

# Extract backup
tar -xzf backup-20260228-120000.tar.gz

# Restore volume
sudo rm -rf /var/lib/docker/volumes/grip-data/_data
sudo cp -r backup-20260228-120000/ /var/lib/docker/volumes/grip-data/_data

# Fix permissions
sudo chown -R 1000:1000 /var/lib/docker/volumes/grip-data/_data

# Start container
docker start grip-production

Updates

Update Strategy

Pull latest image:
```
docker pull grip:latest
```
Stop current container:
```
docker stop grip-production
```
Backup data (see Backup & Recovery above)
Start new container:
```
docker-compose up -d
```

Verify health:

docker logs grip-production
curl http://localhost:18800/health

Rolling Updates

For zero-downtime updates, run multiple instances behind a load balancer:

docker-compose.yml

services:
  grip-1:
    # ... config ...
  grip-2:
    # ... config ...
  
  nginx:
    image: nginx:alpine
    ports:
      - "443:443"
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf
    depends_on:
      - grip-1
      - grip-2

Update one instance at a time:

docker-compose up -d --no-deps grip-1
# Wait for health check
docker-compose up -d --no-deps grip-2

Troubleshooting

Container Won’t Start

# Check logs
docker logs grip-production

# Verify environment variables
docker exec grip-production env | grep GRIP_

# Validate config
docker exec grip-production grip config show

Permission Errors

# Fix volume permissions
sudo chown -R 1000:1000 ~/.grip

# Or in volume
sudo chown -R 1000:1000 /var/lib/docker/volumes/grip-data/_data

API Not Responding

# Check if service is listening
docker exec grip-production netstat -tlnp | grep 18800

# Test health endpoint
docker exec grip-production curl http://localhost:18800/health

# Check firewall
sudo ufw status
sudo iptables -L

High Memory Usage

# Check session count
curl -H "Authorization: Bearer grip_your_token" \
  http://localhost:18800/api/v1/sessions | jq 'length'

# Clear old sessions
docker exec grip-production rm -rf /home/grip/.grip/sessions/*

# Restart container
docker restart grip-production

Security Checklist

Review this checklist before deploying to production:

Next Steps

Configure Docker deployment with volumes and ports
Review environment variables reference
Configure Docker deployment for containerized environments
Explore API endpoints for integration

Getting Started

Core Concepts

Channels

Features

Configuration

Deployment

Advanced

​Overview

​Security Architecture

​1. Non-Root User

​2. Directory Trust Model

​3. Shell Command Deny-List

​4. Shield Policy (Runtime Threat Feed)

​5. Credential Scrubbing

​6. API Security

​Production Deployment

​Recommended Docker Configuration

​Network Configuration

​Local Access Only

​External Access with Reverse Proxy

​Resource Management

​Memory Limits

​CPU Limits

​Disk Space

​Monitoring & Observability

​Health Checks

​Metrics

​Logging

​OpenTelemetry (Optional)

​Backup & Recovery

​What to Back Up

​Backup Script

​Restore

​Updates

​Update Strategy

​Rolling Updates

​Troubleshooting

​Container Won’t Start

​Permission Errors

​API Not Responding

​High Memory Usage

​Security Checklist

​Next Steps

Build docs developers (and LLMs) love

Overview

Security Architecture

1. Non-Root User

2. Directory Trust Model

3. Shell Command Deny-List

4. Shield Policy (Runtime Threat Feed)

5. Credential Scrubbing

6. API Security

Production Deployment

Recommended Docker Configuration

Network Configuration

Local Access Only

External Access with Reverse Proxy

Resource Management

Memory Limits

CPU Limits

Disk Space

Monitoring & Observability

Health Checks

Metrics

Logging

OpenTelemetry (Optional)

Backup & Recovery

What to Back Up

Backup Script

Restore

Updates

Update Strategy

Rolling Updates

Troubleshooting

Container Won’t Start

Permission Errors

API Not Responding

High Memory Usage

Security Checklist

Next Steps