Debugging and Troubleshooting

Longshot provides multiple debugging tools and observability features to help you diagnose issues quickly.

Log Levels

Configure logging verbosity in .env:

.env

# Log level: debug | info | warn | error
LOG_LEVEL=info

debug
info
warn
error

Maximum verbosity:

All LLM requests/responses
Git operations
Sandbox lifecycle events
Merge queue operations
Task state transitions

Use for: Deep troubleshooting, understanding system behavior

Start with info level, then switch to debug when troubleshooting a specific issue.

NDJSON Logs

Longshot writes structured NDJSON (newline-delimited JSON) logs to logs/run-*.ndjson:

{"timestamp":1705334522000,"level":"info","agentId":"main","agentRole":"root-planner","message":"Task created","data":{"taskId":"task-001","desc":"Implement user authentication"}}
{"timestamp":1705334523000,"level":"info","agentId":"worker-pool","agentRole":"root-planner","message":"Dispatching task to ephemeral sandbox","data":{"taskId":"task-001"}}
{"timestamp":1705334545000,"level":"info","agentId":"main","agentRole":"root-planner","message":"Task completed","data":{"taskId":"task-001","status":"complete"}}

Analyzing Logs

Use jq to filter and analyze logs:

# Show all errors
cat logs/run-latest.ndjson | jq 'select(.level == "error")'

# Show task completions
cat logs/run-latest.ndjson | jq 'select(.message == "Task completed")'

# Show merge results
cat logs/run-latest.ndjson | jq 'select(.message == "Merge result")'

# Extract task durations
cat logs/run-latest.ndjson | jq 'select(.message == "Task completed") | .data.durationMs'

# Count events by message type
cat logs/run-latest.ndjson | jq -r '.message' | sort | uniq -c | sort -rn

Dashboard Debugging

The Rich TUI dashboard is your primary debugging interface.

Identifying Stuck Tasks

In the Planner Tree (In Progress panel), look for:

Long durations: Tasks showing [5m30s] or longer in yellow/red
No progress updates: Tasks without recent “Worker progress” messages
Pending state: Tasks that stay pending despite available workers

Activity Feed Analysis

Press Tab to view the Activity Feed. Look for patterns:

Rapid Failures

23:45  ERR  Worker timed out  task-045
24:12  ERR  Worker timed out  task-046
24:39  ERR  Worker timed out  task-047

Diagnosis: Systematic timeout issue. Check WORKER_TIMEOUT, LLM endpoint health, or task complexity.

Merge Conflicts

25:10  >> merged  worker/task-048
25:15  !! conflict  worker/task-049
25:20  !! conflict  worker/task-049 (retry)
25:25  !! conflict  worker/task-049 (retry)

Diagnosis: Persistent conflict. Check conflicting files, review task scope overlap.

LLM Errors

26:00  ERR  LLM request failed (500) from openai-primary
26:05  ERR  LLM request failed (500) from azure-backup
26:10  ERR  All 2 LLM endpoints failed

Diagnosis: All endpoints down. Check API status, credentials, network connectivity.

Metrics Analysis

Low Commits/Hour

If commits/hour is below expectations:

Check Velocity sparkline - is it consistently low?
Review Active Workers - are agents actually running?
Look at Pending count - is the planner creating tasks?
Check Failed count - are tasks failing silently?

High Failure Rate

If Failed count grows:

Switch to Activity Feed to see error messages
Check Merge Queue - are failures due to merge issues?
Review logs for common error patterns
Verify LLM endpoint health in metrics panel

Poor Merge Success Rate

If merge rate is <70%:

Review Conflicts count in Merge Queue panel
Check if conflict retries are working (Activity Feed)
Consider switching merge strategy (see Merge Strategies)
Review task boundaries for overlap

Span Traces

Longshot includes a distributed tracing system for detailed execution analysis.

Enabling Traces

Traces are automatically written to logs/spans/:

logs/spans/
  span-1705334522000-abc123.json
  span-1705334523000-def456.json
  ...

Span Structure

Each span captures a unit of work:

{
  "spanId": "abc123",
  "parentSpanId": null,
  "name": "planner.decomposeTask",
  "startTime": 1705334522000,
  "endTime": 1705334525000,
  "durationMs": 3000,
  "attributes": {
    "taskId": "task-001",
    "agentId": "root-planner",
    "complexity": 8
  },
  "status": "ok",
  "events": [
    {"time": 1705334523000, "name": "llm.start"},
    {"time": 1705334524500, "name": "llm.end"}
  ]
}

Common Span Types

planner.decomposeTask - Task decomposition by planner
sandbox.worker - Full worker execution
llm.complete - LLM request
merge.attempt - Merge queue operation
reconciler.sweep - Build/test validation

Analyzing Traces

Find slow operations:

# Find spans longer than 30 seconds
find logs/spans -name '*.json' -exec jq 'select(.durationMs > 30000) | {name, durationMs, taskId: .attributes.taskId}' {} \;

# Find failed LLM requests
find logs/spans -name '*.json' -exec jq 'select(.name == "llm.complete" and .status == "error")' {} \;

# Build execution timeline
find logs/spans -name '*.json' -exec jq '{start: .startTime, end: .endTime, name, task: .attributes.taskId}' {} \; | jq -s 'sort_by(.start)'

Common Issues

Planner Not Creating Tasks

Symptoms:

Dashboard shows 0 active workers
Pending count is 0
No “Task created” events in Activity Feed

Diagnosis:

Check planner LLM requests in debug logs
Verify planningPrompt.md exists and is valid
Review planner span traces for errors
Check if iteration completed successfully

Solutions:

# Enable debug logging
LOG_LEVEL=debug pnpm start

# Check planner prompt
cat planning-prompt.md

# Review planner spans
find logs/spans -name '*.json' -exec jq 'select(.name == "planner.decomposeTask")' {} \;

Workers Timing Out

Symptoms:

Tasks show “TIMEOUT” in red in Activity Feed
Workers run for exactly WORKER_TIMEOUT seconds
Tasks stay in “running” state until timeout

Diagnosis:

Check worker logs: grep "\[worker:task-" logs/run-latest.ndjson
Review sandbox progress events
Check if agent is stuck in a loop
Verify tests aren’t hanging

Solutions:

# Increase timeout temporarily
WORKER_TIMEOUT=3600 pnpm start

# Review worker output
cat logs/run-latest.ndjson | jq 'select(.message == "Worker progress" and .data.taskId == "task-045")'

# Check for hanging tests
# (Look for "Running tests..." without completion)

Merge Conflicts Not Resolving

Symptoms:

Same branch conflicts multiple times
Conflict retry count reaches max (2)
Fix tasks are created but don’t help

Diagnosis:

Check conflicting files in merge result data
Review if files overlap across multiple tasks
Verify rebase is working (check for “rebased” in logs)
Check if fix task is actually resolving conflicts

Solutions:

# Review conflict details
cat logs/run-latest.ndjson | jq 'select(.message == "Merge result" and .data.status == "conflict")'

# Switch to merge-commit strategy temporarily
MERGE_STRATEGY=merge-commit pnpm start

# Manually inspect conflict
git checkout worker/task-049
git rebase main
# Review conflict markers

LLM Endpoint Failures

Symptoms:

“All N LLM endpoints failed” errors
Tasks fail immediately after assignment
Planner shows “PLANNING” indefinitely

Diagnosis:

Check endpoint health: curl -H "Authorization: Bearer $LLM_API_KEY" $LLM_BASE_URL/v1/models
Review LLM client logs for HTTP status codes
Check API key validity and rate limits
Verify network connectivity to provider

Solutions:

# Test endpoint manually
curl -H "Authorization: Bearer $LLM_API_KEY" \
     -H "Content-Type: application/json" \
     -d '{"model":"gpt-4o","messages":[{"role":"user","content":"test"}]}' \
     $LLM_BASE_URL/v1/chat/completions

# Check endpoint stats in logs
cat logs/run-latest.ndjson | jq 'select(.message == "LLM request starting")'

# Enable debug for detailed LLM logs
LOG_LEVEL=debug pnpm start

Sandbox Creation Slow

Symptoms:

“sandbox created” messages take >10 seconds
Workers spend most time in “Dispatching task to ephemeral sandbox” phase
Low overall throughput despite high MAX_WORKERS

Diagnosis:

Check Modal dashboard for cold start times
Review sandbox image size
Verify Modal account has sufficient capacity
Check regional availability

Solutions:

# Test sandbox creation speed
modal run infra/sandbox_image.py

# Optimize image (remove unused packages)
# Edit infra/sandbox_image.py

# Pre-warm Modal capacity (create dummy sandboxes)
modal run infra/prewarm.py

Git Push Failures

Symptoms:

Workers complete successfully but branches aren’t pushed
“Failed to push origin/main” errors in merge queue
Dashboard shows completed tasks but merge queue is empty

Diagnosis:

Verify GIT_TOKEN has push access: git ls-remote https://x-access-token:[email protected]/org/repo.git
Check for branch protection rules
Review git credentials configuration
Verify network connectivity to GitHub

Solutions:

# Test git push manually
cd target-repo
git push https://x-access-token:$GIT_TOKEN@github.com/org/repo.git main

# Regenerate token with correct permissions
# Go to GitHub → Settings → Developer Settings → Personal Access Tokens
# Ensure "repo" scope is enabled

# Update .env with new token
GIT_TOKEN=ghp_new_token_here

Debug Workflow

Follow this systematic approach when debugging:

1. Observe

Start the dashboard and observe:

Are tasks being created? (Pending count increasing)
Are workers running? (Active count > 0)
Are tasks completing? (Completed count increasing)
What’s the failure rate? (Failed count growth)

2. Narrow Down

Identify the failing component:

Planner: No tasks created → Check planner logs
Worker Pool: Tasks pending but not assigned → Check worker pool
Sandboxes: Tasks assigned but timing out → Check sandbox logs
Merge Queue: Tasks complete but not merged → Check merge logs

3. Enable Debug Logging

LOG_LEVEL=debug pnpm start

4. Isolate

Reduce parallelism to isolate issues:

MAX_WORKERS=1 pnpm start

This makes logs easier to follow and exposes race conditions.

5. Reproduce

Create a minimal reproduction:

Use a small test repository
Create a single simple task
Run with MAX_WORKERS=1
Review logs for the specific failure

6. Review Logs

Analyze NDJSON logs for the failing task:

cat logs/run-latest.ndjson | jq 'select(.data.taskId == "task-001")' | jq -s 'sort_by(.timestamp)'

This gives you a complete timeline of the task.

Advanced Debugging

Attach to Sandbox

For deep debugging, you can attach to a running sandbox:

import modal

app = modal.App.lookup("longshot")
sandboxes = modal.Sandbox.list(app=app)
if sandboxes:
    sb = sandboxes[0]
    # Execute commands interactively
    result = sb.exec("ls", "-la", "/workspace/repo").wait()
    print(result.stdout)

Replay NDJSON Logs

Replay a previous run for analysis:

python dashboard.py --replay logs/run-20250115-143022.ndjson --speed 1.0

Step through slowly to identify exactly when an issue occurred.

Custom Instrumentation

Add custom logging to the orchestrator:

import { createLogger } from "@longshot/core";

const logger = createLogger("my-component", "root-planner");

logger.debug("Custom debug point", { variable: value });

This appears in NDJSON logs and the dashboard.

Getting Help

If you’re stuck:

Check logs: Review logs/run-latest.ndjson with jq
Enable debug logging: Set LOG_LEVEL=debug
Review spans: Analyze logs/spans/ for slow operations
Search issues: Check GitHub issues for similar problems
Create reproduction: Minimize the issue to a small test case
Report bug: Open an issue with logs and reproduction steps

Overview

Getting Started

Core Concepts

Guides

Agent Development

Examples

​Log Levels

​NDJSON Logs

​Analyzing Logs

​Dashboard Debugging

​Identifying Stuck Tasks

​Activity Feed Analysis

​Rapid Failures

​Merge Conflicts

​LLM Errors

​Metrics Analysis

​Low Commits/Hour

​High Failure Rate

​Poor Merge Success Rate

​Span Traces

​Enabling Traces

​Span Structure

​Common Span Types

​Analyzing Traces

​Common Issues

​Planner Not Creating Tasks

​Workers Timing Out

​Merge Conflicts Not Resolving

​LLM Endpoint Failures

​Sandbox Creation Slow

​Git Push Failures

​Debug Workflow

​1. Observe

​2. Narrow Down

​3. Enable Debug Logging

​4. Isolate

​5. Reproduce

​6. Review Logs

​Advanced Debugging

​Attach to Sandbox

​Replay NDJSON Logs

​Custom Instrumentation

​Getting Help

​Next Steps

Running with Dashboard

LLM Configuration

Build docs developers (and LLMs) love

Log Levels

NDJSON Logs

Analyzing Logs

Dashboard Debugging

Identifying Stuck Tasks

Activity Feed Analysis

Rapid Failures

Merge Conflicts

LLM Errors

Metrics Analysis

Low Commits/Hour

High Failure Rate

Poor Merge Success Rate

Span Traces

Enabling Traces

Span Structure

Common Span Types

Analyzing Traces

Common Issues

Planner Not Creating Tasks

Workers Timing Out

Merge Conflicts Not Resolving

LLM Endpoint Failures

Sandbox Creation Slow

Git Push Failures

Debug Workflow

1. Observe

2. Narrow Down

3. Enable Debug Logging

4. Isolate

5. Reproduce

6. Review Logs

Advanced Debugging

Attach to Sandbox

Replay NDJSON Logs

Custom Instrumentation

Getting Help

Next Steps