Long-Running Tasks

Grip AI supports long-running agent tasks that may require dozens or hundreds of tool iterations, with automatic memory management to prevent context overflow.

Unlimited Iterations

Configuration

From grip/config/schema.py:76:

max_tool_iterations: int = Field(
    default=0,
    ge=0,
    description="Maximum LLM-tool round-trips before the agent stops. 0 = unlimited (default).",
)

Set in ~/.grip/config.json:

{
  "agents": {
    "defaults": {
      "max_tool_iterations": 0
    }
  }
}

max_tool_iterations: 0 means unlimited iterations. The agent continues working until the task is complete or it decides to stop.

How It Works

Each iteration consists of:

Agent reasoning: LLM analyzes the current state and decides next action
Tool execution: One or more tools are called (e.g., read_file, exec, web_search)
Result processing: Tool outputs are fed back to the LLM
Repeat: Process continues until agent returns a final text response

Example: Complex build-fix loop

Iteration 1: read_file("src/main.py")  
Iteration 2: exec("pytest tests/") → 5 failures
Iteration 3: read_file("tests/test_auth.py")
Iteration 4: edit_file("src/auth.py") → fix bug 1
Iteration 5: exec("pytest tests/test_auth.py") → 2 failures
Iteration 6: read_file("src/models.py")
Iteration 7: edit_file("src/models.py") → fix bug 2
Iteration 8: exec("pytest tests/") → all pass
Iteration 9: Final response: "All tests passing. Fixed authentication and model validation bugs."

This task required 9 iterations. With max_tool_iterations: 5, it would have stopped prematurely.

When to Use Unlimited Iterations

Good use cases:

Build/test/fix cycles: Iterate until all tests pass
Multi-file refactoring: Touch dozens of files in a complex codebase
Research tasks: Search, fetch, analyze, synthesize across many sources
Data processing: ETL pipelines with validation and retry logic
System debugging: Trace through logs, config, code to find root cause

Not recommended for:

User-facing chatbots: Can lead to long response times
Tight budget constraints: Each iteration costs tokens
Untrusted tasks: Risk of infinite loops in edge cases

Mid-Run Compaction

The Problem

Long tasks generate large conversation histories:

Iteration 1: Read file (500 tokens)
Iteration 2: Run tests (2000 tokens of output)
Iteration 3: Read test file (800 tokens)
Iteration 4: Fix code (200 tokens)
...
Iteration 50: Total context = 80,000 tokens (exceeds model limit)

At some point, the context window fills up and the agent loses early context.

Automatic Consolidation

From grip/config/schema.py:87:

auto_consolidate: bool = Field(
    default=True,
    description="Automatically consolidate old messages when session exceeds 2x memory_window.",
)

How it works:

Agent tracks message count per session
When len(messages) > 2 × memory_window, consolidation triggers
Old messages (beyond memory_window) are summarized using consolidation_model
Summary replaces old messages, freeing context space
Agent continues with fresh context

Example (with memory_window: 50):

Messages before consolidation: 120  
Trigger threshold: 100 (2 × 50)

Step 1: Take messages 0-70 (oldest 70 messages)
Step 2: Send to consolidation_model: "Summarize this conversation"
Step 3: Replace messages 0-70 with summary (1 message, ~300 tokens)
Step 4: Keep messages 71-120 (recent 50 messages) intact

New message count: 51 (1 summary + 50 recent)
Context freed: ~30,000 tokens

From grip/engines/litellm_engine.py:138, consolidation is triggered automatically during run() when auto_consolidate: true.

Manual Consolidation

Force compaction mid-run:

# Interactive CLI
grip agent
> /compact

# Via Python
from grip.engines import create_engine

engine = create_engine(config, workspace, session_mgr, memory_mgr)
await engine.consolidate_session("my-session-key")

Consolidation Model Selection

Use a cheap model for summarization to save costs:

{
  "agents": {
    "defaults": {
      "model": "anthropic/claude-sonnet-4",
      "consolidation_model": "openrouter/google/gemini-flash-2.0",
      "auto_consolidate": true,
      "memory_window": 50
    }
  }
}

Cost comparison (per consolidation):

Model	Input (70 msgs @ 40K tokens)	Output (summary @ 300 tokens)	Total Cost
Claude Sonnet-4	$0.12	$0.045	$0.165
Gemini Flash 2.0	$0.004	$0.0003	$0.0043

Gemini Flash is 38x cheaper for consolidation and produces equivalent summaries for most tasks.

Task Persistence

Session Management

Sessions are automatically persisted to disk:

~/.grip/workspace/sessions/
├── cli:default.json
├── telegram:12345.json
└── api:task-xyz.json

Each session file stores:

Full message history
Conversation summary (if consolidated)
Metadata (timestamps, token counts)

From grip/session.py (inferred from usage in agent_cmd.py:387):

class SessionManager:
    def get_or_create(self, session_key: str) -> Session:
        """Load existing session or create new one."""
    
    def save(self, session: Session) -> None:
        """Persist session to disk."""
    
    def delete(self, session_key: str) -> None:
        """Remove session from disk."""

Long-Running Task Pattern

Scenario: Process a large dataset over multiple days

import asyncio
from grip import GripClient

client = GripClient()

# Day 1: Start processing
result = await client.run(
    "Process files 1-100 from data/input/",
    session_key="batch-processing-job-1"
)
print(result.response)

# System reboot, process restarts

# Day 2: Resume from same session
result = await client.run(
    "Continue processing files 101-200",
    session_key="batch-processing-job-1"  # Same key = same session
)
print(result.response)
# Agent remembers: "I previously processed files 1-100..."

Resuming Interrupted Tasks

Interactive CLI:

grip agent
> Start building the project and fix all errors

# Agent runs for 20 iterations, then you Ctrl+C

# Later, resume:
grip agent
> Continue where you left off

# Agent has full context from previous session

Via API:

# Start task
curl -X POST http://localhost:18800/api/v1/agent/run \
  -H "Authorization: Bearer $TOKEN" \
  -d '{
    "message": "Analyze all Python files and generate coverage report",
    "session_key": "analysis-job-123"
  }'

# Check status later
curl http://localhost:18800/api/v1/agent/sessions/analysis-job-123 \
  -H "Authorization: Bearer $TOKEN"

# Resume
curl -X POST http://localhost:18800/api/v1/agent/run \
  -H "Authorization: Bearer $TOKEN" \
  -d '{
    "message": "Continue analysis",
    "session_key": "analysis-job-123"
  }'

Memory Window Tuning for Long Tasks

Small Window (10-30 messages)

Pros:

Low token usage per iteration
Fast consolidation
Efficient for tool-heavy workflows

Cons:

Frequent consolidation (every ~50 iterations)
Agent may forget early context

Best for: Data processing, file operations, automated testing

Large Window (100-200 messages)

Pros:

Agent retains full context for hundreds of iterations
Rare consolidation
Better decision-making with complete history

Cons:

High token usage (10K-50K tokens per iteration)
Slow requests when context is full

Best for: Complex debugging, architecture design, research synthesis

Configuration Example

{
  "agents": {
    "defaults": {
      "memory_window": 50
    },
    "profiles": {
      "batch-processor": {
        "memory_window": 20,
        "auto_consolidate": true,
        "max_tool_iterations": 0
      },
      "architect": {
        "memory_window": 150,
        "auto_consolidate": true,
        "max_tool_iterations": 0
      }
    }
  }
}

Monitoring Long Tasks

Token Usage Tracking

From grip/engines/types.py:24:

@dataclass(slots=True)
class AgentRunResult:
    response: str
    iterations: int = 0
    prompt_tokens: int = 0
    completion_tokens: int = 0
    tool_calls_made: list[str] = field(default_factory=list)
    tool_details: list[ToolCallDetail] = field(default_factory=list)
    
    @property
    def total_tokens(self) -> int:
        return self.prompt_tokens + self.completion_tokens

Check token usage:

result = await client.run("Long task...", session_key="job-1")

print(f"Iterations: {result.iterations}")
print(f"Total tokens: {result.total_tokens}")
print(f"Tools used: {result.tool_calls_made}")

# Output:
# Iterations: 47
# Total tokens: 183920
# Tools used: ['read_file', 'exec', 'edit_file', 'write_file']

Iteration Count Limits

Set a soft limit to prevent runaway tasks:

{
  "agents": {
    "profiles": {
      "safe-automation": {
        "max_tool_iterations": 100,
        "memory_window": 30
      }
    }
  }
}

The agent will stop after 100 iterations even if the task is incomplete, preventing infinite loops.

Example: Multi-Day Research Project

Day 1: Initial research

grip agent
> Research quantum computing trends 2024-2025. Search academic papers, 
  industry reports, and company announcements. Organize findings in 
  research/quantum-computing.md

# Agent runs 30 iterations:
# - 10 web searches
# - 15 web_fetch calls
# - 5 write_file/edit_file operations
# Result: Draft report with 20 sources

Day 2: Deep dive

grip agent
> Continue quantum computing research. Focus on IBM and Google's latest 
  hardware developments. Add a "Hardware" section to the report.

# Agent remembers:
# - Previous searches (avoids duplicates)
# - Report structure
# - Sources already cited
# Runs 25 more iterations, updates report

Day 3: Finalization

grip agent
> Finalize quantum computing report. Add executive summary, 
  verify all citations, generate bibliography.

# Agent:
# - Reviews consolidated summary of days 1-2
# - Accesses recent 50 messages for context
# - Completes report in 12 iterations

Total: 67 iterations across 3 days, single persistent session, automatic consolidation prevented context overflow.

Best Practices

Enable auto_consolidate: Always set to true for long tasks
Use unlimited iterations cautiously: Monitor first few runs to ensure no infinite loops
Set max_daily_tokens: Prevent cost overruns on runaway tasks
Choose appropriate memory_window: Smaller for automation, larger for research/debugging
Use cheap consolidation_model: Gemini Flash or GPT-4o-mini saves 90%+ on compaction costs
Monitor token usage: Check AgentRunResult.total_tokens to track costs
Name sessions descriptively: Use session_key like "project-build-fix-123" for easy tracking
Clear old sessions: Run /new or delete_session() when starting unrelated tasks

Getting Started

Core Concepts

Channels

Features

Configuration

Deployment

Advanced

Long-Running Tasks

Unlimited Iterations

Configuration

How It Works

When to Use Unlimited Iterations

Mid-Run Compaction

The Problem

Automatic Consolidation

Manual Consolidation

Consolidation Model Selection

Task Persistence

Session Management

Long-Running Task Pattern

Resuming Interrupted Tasks

Memory Window Tuning for Long Tasks

Small Window (10-30 messages)

Large Window (100-200 messages)

Configuration Example

Monitoring Long Tasks

Token Usage Tracking

Iteration Count Limits

Example: Multi-Day Research Project

Best Practices

Build docs developers (and LLMs) love

Getting Started

Core Concepts

Channels

Features

Configuration

Deployment

Advanced

​Unlimited Iterations

​Configuration

​How It Works

​When to Use Unlimited Iterations

​Mid-Run Compaction

​The Problem

​Automatic Consolidation

​Manual Consolidation

​Consolidation Model Selection

​Task Persistence

​Session Management

​Long-Running Task Pattern

​Resuming Interrupted Tasks

​Memory Window Tuning for Long Tasks

​Small Window (10-30 messages)

​Large Window (100-200 messages)

​Configuration Example

​Monitoring Long Tasks

​Token Usage Tracking

​Iteration Count Limits

​Example: Multi-Day Research Project

​Best Practices

Build docs developers (and LLMs) love

Unlimited Iterations

Configuration

How It Works

When to Use Unlimited Iterations

Mid-Run Compaction

The Problem

Automatic Consolidation

Manual Consolidation

Consolidation Model Selection

Task Persistence

Session Management

Long-Running Task Pattern

Resuming Interrupted Tasks

Memory Window Tuning for Long Tasks

Small Window (10-30 messages)

Large Window (100-200 messages)

Configuration Example

Monitoring Long Tasks

Token Usage Tracking

Iteration Count Limits

Example: Multi-Day Research Project

Best Practices