Skip to main content
Longshot provides multiple debugging tools and observability features to help you diagnose issues quickly.

Log Levels

Configure logging verbosity in .env:
.env
# Log level: debug | info | warn | error
LOG_LEVEL=info
Maximum verbosity:
  • All LLM requests/responses
  • Git operations
  • Sandbox lifecycle events
  • Merge queue operations
  • Task state transitions
Use for: Deep troubleshooting, understanding system behavior
Start with info level, then switch to debug when troubleshooting a specific issue.

NDJSON Logs

Longshot writes structured NDJSON (newline-delimited JSON) logs to logs/run-*.ndjson:
{"timestamp":1705334522000,"level":"info","agentId":"main","agentRole":"root-planner","message":"Task created","data":{"taskId":"task-001","desc":"Implement user authentication"}}
{"timestamp":1705334523000,"level":"info","agentId":"worker-pool","agentRole":"root-planner","message":"Dispatching task to ephemeral sandbox","data":{"taskId":"task-001"}}
{"timestamp":1705334545000,"level":"info","agentId":"main","agentRole":"root-planner","message":"Task completed","data":{"taskId":"task-001","status":"complete"}}

Analyzing Logs

Use jq to filter and analyze logs:
# Show all errors
cat logs/run-latest.ndjson | jq 'select(.level == "error")'

# Show task completions
cat logs/run-latest.ndjson | jq 'select(.message == "Task completed")'

# Show merge results
cat logs/run-latest.ndjson | jq 'select(.message == "Merge result")'

# Extract task durations
cat logs/run-latest.ndjson | jq 'select(.message == "Task completed") | .data.durationMs'

# Count events by message type
cat logs/run-latest.ndjson | jq -r '.message' | sort | uniq -c | sort -rn

Dashboard Debugging

The Rich TUI dashboard is your primary debugging interface.

Identifying Stuck Tasks

In the Planner Tree (In Progress panel), look for:
  • Long durations: Tasks showing [5m30s] or longer in yellow/red
  • No progress updates: Tasks without recent “Worker progress” messages
  • Pending state: Tasks that stay pending despite available workers

Activity Feed Analysis

Press Tab to view the Activity Feed. Look for patterns:

Rapid Failures

14:23:45  ERR  Worker timed out  task-045
14:24:12  ERR  Worker timed out  task-046
14:24:39  ERR  Worker timed out  task-047
Diagnosis: Systematic timeout issue. Check WORKER_TIMEOUT, LLM endpoint health, or task complexity.

Merge Conflicts

14:25:10  >> merged  worker/task-048
14:25:15  !! conflict  worker/task-049
14:25:20  !! conflict  worker/task-049 (retry)
14:25:25  !! conflict  worker/task-049 (retry)
Diagnosis: Persistent conflict. Check conflicting files, review task scope overlap.

LLM Errors

14:26:00  ERR  LLM request failed (500) from openai-primary
14:26:05  ERR  LLM request failed (500) from azure-backup
14:26:10  ERR  All 2 LLM endpoints failed
Diagnosis: All endpoints down. Check API status, credentials, network connectivity.

Metrics Analysis

Low Commits/Hour

If commits/hour is below expectations:
  1. Check Velocity sparkline - is it consistently low?
  2. Review Active Workers - are agents actually running?
  3. Look at Pending count - is the planner creating tasks?
  4. Check Failed count - are tasks failing silently?

High Failure Rate

If Failed count grows:
  1. Switch to Activity Feed to see error messages
  2. Check Merge Queue - are failures due to merge issues?
  3. Review logs for common error patterns
  4. Verify LLM endpoint health in metrics panel

Poor Merge Success Rate

If merge rate is <70%:
  1. Review Conflicts count in Merge Queue panel
  2. Check if conflict retries are working (Activity Feed)
  3. Consider switching merge strategy (see Merge Strategies)
  4. Review task boundaries for overlap

Span Traces

Longshot includes a distributed tracing system for detailed execution analysis.

Enabling Traces

Traces are automatically written to logs/spans/:
logs/spans/
  span-1705334522000-abc123.json
  span-1705334523000-def456.json
  ...

Span Structure

Each span captures a unit of work:
{
  "spanId": "abc123",
  "parentSpanId": null,
  "name": "planner.decomposeTask",
  "startTime": 1705334522000,
  "endTime": 1705334525000,
  "durationMs": 3000,
  "attributes": {
    "taskId": "task-001",
    "agentId": "root-planner",
    "complexity": 8
  },
  "status": "ok",
  "events": [
    {"time": 1705334523000, "name": "llm.start"},
    {"time": 1705334524500, "name": "llm.end"}
  ]
}

Common Span Types

  • planner.decomposeTask - Task decomposition by planner
  • sandbox.worker - Full worker execution
  • llm.complete - LLM request
  • merge.attempt - Merge queue operation
  • reconciler.sweep - Build/test validation

Analyzing Traces

Find slow operations:
# Find spans longer than 30 seconds
find logs/spans -name '*.json' -exec jq 'select(.durationMs > 30000) | {name, durationMs, taskId: .attributes.taskId}' {} \;

# Find failed LLM requests
find logs/spans -name '*.json' -exec jq 'select(.name == "llm.complete" and .status == "error")' {} \;

# Build execution timeline
find logs/spans -name '*.json' -exec jq '{start: .startTime, end: .endTime, name, task: .attributes.taskId}' {} \; | jq -s 'sort_by(.start)'

Common Issues

Planner Not Creating Tasks

Symptoms:
  • Dashboard shows 0 active workers
  • Pending count is 0
  • No “Task created” events in Activity Feed
Diagnosis:
  1. Check planner LLM requests in debug logs
  2. Verify planningPrompt.md exists and is valid
  3. Review planner span traces for errors
  4. Check if iteration completed successfully
Solutions:
# Enable debug logging
LOG_LEVEL=debug pnpm start

# Check planner prompt
cat planning-prompt.md

# Review planner spans
find logs/spans -name '*.json' -exec jq 'select(.name == "planner.decomposeTask")' {} \;

Workers Timing Out

Symptoms:
  • Tasks show “TIMEOUT” in red in Activity Feed
  • Workers run for exactly WORKER_TIMEOUT seconds
  • Tasks stay in “running” state until timeout
Diagnosis:
  1. Check worker logs: grep "\[worker:task-" logs/run-latest.ndjson
  2. Review sandbox progress events
  3. Check if agent is stuck in a loop
  4. Verify tests aren’t hanging
Solutions:
# Increase timeout temporarily
WORKER_TIMEOUT=3600 pnpm start

# Review worker output
cat logs/run-latest.ndjson | jq 'select(.message == "Worker progress" and .data.taskId == "task-045")'

# Check for hanging tests
# (Look for "Running tests..." without completion)

Merge Conflicts Not Resolving

Symptoms:
  • Same branch conflicts multiple times
  • Conflict retry count reaches max (2)
  • Fix tasks are created but don’t help
Diagnosis:
  1. Check conflicting files in merge result data
  2. Review if files overlap across multiple tasks
  3. Verify rebase is working (check for “rebased” in logs)
  4. Check if fix task is actually resolving conflicts
Solutions:
# Review conflict details
cat logs/run-latest.ndjson | jq 'select(.message == "Merge result" and .data.status == "conflict")'

# Switch to merge-commit strategy temporarily
MERGE_STRATEGY=merge-commit pnpm start

# Manually inspect conflict
git checkout worker/task-049
git rebase main
# Review conflict markers

LLM Endpoint Failures

Symptoms:
  • “All N LLM endpoints failed” errors
  • Tasks fail immediately after assignment
  • Planner shows “PLANNING” indefinitely
Diagnosis:
  1. Check endpoint health: curl -H "Authorization: Bearer $LLM_API_KEY" $LLM_BASE_URL/v1/models
  2. Review LLM client logs for HTTP status codes
  3. Check API key validity and rate limits
  4. Verify network connectivity to provider
Solutions:
# Test endpoint manually
curl -H "Authorization: Bearer $LLM_API_KEY" \
     -H "Content-Type: application/json" \
     -d '{"model":"gpt-4o","messages":[{"role":"user","content":"test"}]}' \
     $LLM_BASE_URL/v1/chat/completions

# Check endpoint stats in logs
cat logs/run-latest.ndjson | jq 'select(.message == "LLM request starting")'

# Enable debug for detailed LLM logs
LOG_LEVEL=debug pnpm start

Sandbox Creation Slow

Symptoms:
  • “sandbox created” messages take >10 seconds
  • Workers spend most time in “Dispatching task to ephemeral sandbox” phase
  • Low overall throughput despite high MAX_WORKERS
Diagnosis:
  1. Check Modal dashboard for cold start times
  2. Review sandbox image size
  3. Verify Modal account has sufficient capacity
  4. Check regional availability
Solutions:
# Test sandbox creation speed
modal run infra/sandbox_image.py

# Optimize image (remove unused packages)
# Edit infra/sandbox_image.py

# Pre-warm Modal capacity (create dummy sandboxes)
modal run infra/prewarm.py

Git Push Failures

Symptoms:
  • Workers complete successfully but branches aren’t pushed
  • “Failed to push origin/main” errors in merge queue
  • Dashboard shows completed tasks but merge queue is empty
Diagnosis:
  1. Verify GIT_TOKEN has push access: git ls-remote https://x-access-token:[email protected]/org/repo.git
  2. Check for branch protection rules
  3. Review git credentials configuration
  4. Verify network connectivity to GitHub
Solutions:
# Test git push manually
cd target-repo
git push https://x-access-token:$GIT_TOKEN@github.com/org/repo.git main

# Regenerate token with correct permissions
# Go to GitHub → Settings → Developer Settings → Personal Access Tokens
# Ensure "repo" scope is enabled

# Update .env with new token
GIT_TOKEN=ghp_new_token_here

Debug Workflow

Follow this systematic approach when debugging:

1. Observe

Start the dashboard and observe:
  • Are tasks being created? (Pending count increasing)
  • Are workers running? (Active count > 0)
  • Are tasks completing? (Completed count increasing)
  • What’s the failure rate? (Failed count growth)

2. Narrow Down

Identify the failing component:
  • Planner: No tasks created → Check planner logs
  • Worker Pool: Tasks pending but not assigned → Check worker pool
  • Sandboxes: Tasks assigned but timing out → Check sandbox logs
  • Merge Queue: Tasks complete but not merged → Check merge logs

3. Enable Debug Logging

LOG_LEVEL=debug pnpm start

4. Isolate

Reduce parallelism to isolate issues:
MAX_WORKERS=1 pnpm start
This makes logs easier to follow and exposes race conditions.

5. Reproduce

Create a minimal reproduction:
  1. Use a small test repository
  2. Create a single simple task
  3. Run with MAX_WORKERS=1
  4. Review logs for the specific failure

6. Review Logs

Analyze NDJSON logs for the failing task:
cat logs/run-latest.ndjson | jq 'select(.data.taskId == "task-001")' | jq -s 'sort_by(.timestamp)'
This gives you a complete timeline of the task.

Advanced Debugging

Attach to Sandbox

For deep debugging, you can attach to a running sandbox:
import modal

app = modal.App.lookup("longshot")
sandboxes = modal.Sandbox.list(app=app)
if sandboxes:
    sb = sandboxes[0]
    # Execute commands interactively
    result = sb.exec("ls", "-la", "/workspace/repo").wait()
    print(result.stdout)

Replay NDJSON Logs

Replay a previous run for analysis:
python dashboard.py --replay logs/run-20250115-143022.ndjson --speed 1.0
Step through slowly to identify exactly when an issue occurred.

Custom Instrumentation

Add custom logging to the orchestrator:
import { createLogger } from "@longshot/core";

const logger = createLogger("my-component", "root-planner");

logger.debug("Custom debug point", { variable: value });
This appears in NDJSON logs and the dashboard.

Getting Help

If you’re stuck:
  1. Check logs: Review logs/run-latest.ndjson with jq
  2. Enable debug logging: Set LOG_LEVEL=debug
  3. Review spans: Analyze logs/spans/ for slow operations
  4. Search issues: Check GitHub issues for similar problems
  5. Create reproduction: Minimize the issue to a small test case
  6. Report bug: Open an issue with logs and reproduction steps

Next Steps

Running with Dashboard

Master the dashboard for real-time debugging

LLM Configuration

Troubleshoot LLM endpoint issues

Build docs developers (and LLMs) love