Skip to main content

System Overview

Grip AI is built around a gateway-centric architecture that orchestrates multiple components through a central message bus. The gateway runs as a single long-lived process (grip gateway) that coordinates:
  • REST API (FastAPI) — 27 endpoints with bearer auth and rate limiting
  • Chat Channels (Telegram, Discord, Slack) — Multi-platform bot integration
  • Message Bus (asyncio.Queue) — Decouples channels from the engine
  • Pluggable Engine (SDKRunner | LiteLLMRunner) — Dual-engine for Claude + 15 providers
  • Tool Registry — 26 built-in tools across 16 modules
  • MCP Manager — Model Context Protocol server integration
  • Memory Layer — Dual-layer memory with TF-IDF retrieval
  • Session Manager — Per-key JSON files with LRU cache
  • Cron Service — Scheduled task execution with channel delivery
  • Heartbeat Service — Periodic autonomous agent wake-up
  • Workflow Engine — Multi-agent DAG orchestration

Architecture Diagram

grip gateway
├── REST API (FastAPI :18800)          27 endpoints, bearer auth, rate limiting
│   ├── /api/v1/chat                   blocking + SSE streaming
│   ├── /api/v1/sessions               CRUD
│   ├── /api/v1/tools                  list + execute
│   ├── /api/v1/mcp                    server management + OAuth
│   └── /api/v1/management             config, cron, skills, memory, metrics, workflows
├── Channels
│   ├── Telegram                       bot commands, photos, docs, voice
│   ├── Discord                        discord.py integration
│   └── Slack                          Socket Mode (slack-sdk)
├── Message Bus                        asyncio.Queue decoupling channels ↔ engine
├── Engine (pluggable)
│   ├── SDKRunner (claude_sdk)         Claude Agent SDK — full agentic loop
│   └── LiteLLMRunner (litellm)        any model via LiteLLM + grip's AgentLoop
├── Tool Registry                      26 tools across 16 modules
│   ├── filesystem                     read/write/edit/append/list/delete/trash
│   ├── shell                          exec with 50+ pattern deny-list
│   ├── web                            web_search + web_fetch
│   ├── research                       deep web_research
│   ├── message                        send_message + send_file
│   ├── spawn                          subagent spawn/check/list
│   ├── todo                           todo_write + todo_read (task tracking)
│   ├── workflow                       multi-agent DAG execution
│   ├── scheduler                      cron scheduling
│   ├── finance                        yfinance (optional)
│   └── mcp                            MCP tool proxy
├── MCP Manager                        stdio + HTTP/SSE servers, OAuth 2.0 + PKCE
├── Memory
│   ├── MEMORY.md                      durable facts (TF-IDF search, Jaccard dedup)
│   ├── HISTORY.md                     timestamped summaries (time-decay search)
│   ├── SemanticCache                  SHA-256 keyed response cache with TTL
│   └── KnowledgeBase                  structured typed facts
├── Session Manager                    per-key JSON files, LRU cache (200)
├── Cron Service                       croniter schedules, channel delivery
├── Heartbeat Service                  periodic autonomous agent wake-up
└── Workflow Engine                    DAG execution with topological parallelism

Dual-Engine Architecture

Grip uses a pluggable engine design with two implementations:

SDKRunner (claude_sdk)

The Claude Agent SDK is the recommended engine for Anthropic’s Claude models. It provides:
  • Full agentic loop — SDK handles tool execution, context management, and iteration
  • Native Claude support — Optimized for Claude 3.5 Sonnet, Claude 4 Sonnet, Claude Opus
  • Streaming responses — Real-time token streaming for low-latency UX
  • Automatic retries — Built-in retry logic with exponential backoff
  • Context window management — SDK handles message compaction and summarization
Configuration:
{
  "agents": {
    "defaults": {
      "engine": "claude_sdk",
      "sdk_model": "claude-sonnet-4-20250514"
    }
  },
  "providers": {
    "anthropic": {
      "api_key": "sk-ant-api03-..."
    }
  }
}
When to use:
  • You’re using Claude models (Sonnet, Opus, Haiku)
  • You want the best agentic experience with minimal configuration
  • You need streaming responses and automatic context management
The engine is configured via agents.defaults.engine and can be changed at runtime with grip config set agents.defaults.engine "litellm".

Message Flow

Interactive CLI Flow

User Input (grip agent)

CLI Handler

Engine.run(message)

[Tool Execution Loop]
    ├── Parse tool calls
    ├── Execute tools (filesystem, web, shell, etc.)
    ├── Update memory (MEMORY.md, HISTORY.md)
    ├── Check task list (tasks.json)
    └── Repeat until done

Response → CLI Output

Gateway Flow (Multi-Channel)

Telegram/Discord/Slack Bot

Channel Handler

Message Bus (asyncio.Queue)

Gateway Dispatcher

Engine.run(message)

[Tool Execution Loop]

Response → Message Bus

Channel Handler

Bot sends reply

REST API Flow

HTTP Request → POST /api/v1/chat

Auth Middleware (bearer token)

Rate Limiter (30/min per-IP, 60/min per-token)

Chat Endpoint Handler

Engine.run(message)

[Tool Execution Loop]

JSON Response

Tool Registry

Grip includes 26 built-in tools organized into 16 modules:

filesystem

ToolDescription
file_readRead file contents (text or binary)
file_writeWrite/overwrite file
file_editPatch-based editing (line ranges)
file_appendAppend to file
file_listList directory contents
file_deleteDelete file or directory
file_trashMove to trash (safer than delete)
All file tools respect the directory trust model — only workspace and explicitly trusted paths are accessible.
See the Built-in Tools reference for detailed documentation.

Memory System

Grip implements a dual-layer memory architecture for long-term knowledge retention:

MEMORY.md

Durable facts and knowledge
  • TF-IDF search for relevant fact retrieval
  • Jaccard deduplication (threshold: 0.8)
  • Automatic consolidation when size exceeds 100KB
  • Injected into system prompt on every request
Example:
# Grip AI Memory

## User Preferences
- Prefers Python 3.12+
- Uses vim for editing

## Project Context
- Building a FastAPI REST API
- Using PostgreSQL for database

HISTORY.md

Timestamped conversation summaries
  • Time-decay weighted search
  • Auto-summarization of old sessions
  • Used for context when sessions are compacted
Example:
# Conversation History

## 2026-02-28 14:30
User asked for help building a REST API.
Created 5-step task plan and implemented
authentication endpoints.

## 2026-02-27 09:15
Discussed deployment strategies.
Recommended Docker + nginx reverse proxy.

SemanticCache

Response caching
  • SHA-256 keyed cache
  • TTL-based expiration (default: 1 hour)
  • Reduces API calls for repeated queries
Automatically caches:
  • Web search results
  • File read operations
  • Tool execution outputs

KnowledgeBase

Structured facts
  • Typed entities (person, place, concept)
  • Confidence scores
  • Relationship tracking
Example:
{
  "entity": "FastAPI",
  "type": "technology",
  "facts": [
    "Python web framework",
    "Supports async/await",
    "Auto-generates OpenAPI docs"
  ],
  "confidence": 0.95
}

Mid-Run Compaction

When in-flight messages exceed 50, Grip automatically:
  1. Selects the oldest 30 messages
  2. Sends them to the consolidation_model for summarization
  3. Replaces the 30 messages with a single summary block
  4. Keeps the 20 most recent messages intact
  5. Saves the full history to HISTORY.md
This enables unlimited iterations without context overflow. Configuration:
# Use a cheap model for consolidation
grip config set agents.defaults.consolidation_model "openrouter/google/gemini-flash-2.0"

Session Management

Sessions are identified by a session_key (default: "default"):
  • Storage: JSON files in ~/.grip/workspace/state/sessions/
  • LRU Cache: 200 sessions in-memory for fast access
  • Persistence: Auto-saved after every exchange
  • Isolation: Each session has independent memory, history, and task list
Session structure:
{
  "key": "default",
  "model": "claude-sonnet-4-20250514",
  "engine": "claude_sdk",
  "messages": [
    {"role": "user", "content": "Hello"},
    {"role": "assistant", "content": "Hi! How can I help?"}
  ],
  "created_at": "2026-02-28T10:00:00Z",
  "updated_at": "2026-02-28T10:01:00Z",
  "message_count": 2,
  "tool_calls": 0
}

Security Architecture

Grip implements multi-layered security to make self-hosting safe:

Directory Trust Model

The agent is sandboxed to its workspace by default. Access to external directories requires explicit opt-in:
  • Workspace: Always accessible (~/.grip/workspace/)
  • Trusted Directories: User must grant access via /trust <path>
  • Persistent: Trust decisions saved to state/trusted_dirs.json
  • Validation: Every file operation checks trust before execution
Example flow:
❯ Read the file ~/Downloads/report.pdf

❌ Permission denied: /home/user/Downloads is not trusted

❯ /trust ~/Downloads
✓ Granted access to: /home/user/Downloads

❯ Read the file ~/Downloads/report.pdf
✓ Reading /home/user/Downloads/report.pdf...
While Grip implements multiple security layers, no system is infallible. Always:
  • Run Grip with a non-root user
  • Review critical actions before execution
  • Use /trust sparingly and only for necessary directories
  • Monitor agent behavior in production environments

Workflow Engine

Grip supports multi-agent workflows with DAG-based orchestration:

Workflow Structure

{
  "name": "research-and-summarize",
  "description": "Research a topic and produce a summary",
  "steps": [
    {
      "name": "research",
      "prompt": "Research the latest developments in quantum computing",
      "profile": "researcher",
      "timeout_seconds": 600
    },
    {
      "name": "summarize",
      "prompt": "Summarize the following research: {{research.output}}",
      "profile": "writer",
      "depends_on": ["research"]
    },
    {
      "name": "visualize",
      "prompt": "Create a chart of key findings: {{research.output}}",
      "profile": "analyst",
      "depends_on": ["research"]
    },
    {
      "name": "report",
      "prompt": "Compile final report: {{summarize.output}} {{visualize.output}}",
      "depends_on": ["summarize", "visualize"]
    }
  ]
}

Execution Model

  1. DAG Construction — Build dependency graph from depends_on fields
  2. Cycle Detection — Validate no circular dependencies (fails at creation time)
  3. Topological Sort — Kahn’s algorithm for execution order
  4. Parallel Execution — Steps with no dependencies run in parallel
  5. Variable Interpolation{{step_name.output}} replaced with actual outputs
  6. Error Handling — Failed step stops dependent steps but allows independent paths
Execution:
grip workflow run research-and-summarize
Output:
[Parallel: research]
✓ research completed in 45s

[Parallel: summarize, visualize]
✓ summarize completed in 12s
✓ visualize completed in 8s

[Sequential: report]
✓ report completed in 5s

Workflow completed in 70s
See Workflows for detailed documentation.

Performance & Scalability

Benchmarks

MetricValue
Cold start time< 2s
API response time (chat)50-200ms (excluding LLM latency)
Concurrent sessions200 (LRU cached)
Tool execution throughput10-50/sec (depends on tool type)
Memory consolidation< 5s for 100KB MEMORY.md
Workflow DAG build< 100ms for 100-node graph

Resource Usage

Minimal deployment (Docker):
  • RAM: 256MB idle, 512MB under load
  • CPU: < 5% idle, 20-40% during tool execution
  • Disk: 50MB (code + dependencies)
Production deployment:
  • RAM: 1GB (recommended for 50+ concurrent sessions)
  • CPU: 2 cores (for parallel workflow execution)
  • Disk: 1GB (includes session storage and memory files)

Scaling Strategies

  1. Horizontal Scaling — Run multiple gateway instances behind a load balancer
  2. Session Sharding — Route sessions to specific instances by hash
  3. Database Backend — Replace JSON file storage with PostgreSQL/Redis
  4. Message Queue — Use RabbitMQ/Redis for inter-service communication
  5. Caching — Add Redis cache layer for semantic cache and session storage

Next Steps

Core Concepts

Learn about engines, agents, tools, memory, and sessions

Configuration

Configure engines, providers, tools, and security

Built-in Tools

Explore all 26 tools and their capabilities

Deployment

Deploy to production with Docker and environment variables

Build docs developers (and LLMs) love