Skip to main content

Overview

RAPTOR uses multi-agent orchestration to coordinate specialized AI agents that work together on complex security analysis tasks. Each agent has a specific role, skills, and tools.

17 Specialized Agents

Purpose-built agents for evidence collection, analysis, verification, and reporting

Sequential & Parallel Execution

Run agents in parallel for efficiency or sequentially when dependencies exist

Agent Communication

Agents communicate through shared files and structured data formats

Workflow Orchestration

Main orchestrator manages agent lifecycle and coordinates complex workflows

Architecture

Orchestrator Responsibilities

1

Agent Spawning

The orchestrator is the ONLY component that spawns agents. Agents never spawn other agents.
2

Workflow Coordination

Manages phase transitions and ensures agents run in correct order with proper dependencies.
3

Data Passing

Provides working directory path to all agents for shared file access.
4

Error Handling

Catches agent failures, implements retry logic, and handles graceful degradation.

Agent Types

Evidence Collection Agents

These agents collect forensic evidence from different sources and write to the shared evidence store.
Purpose: Query GitHub Archive via BigQuery for tamper-proof event dataSkills: github-archive, github-evidence-kitCollects:
  • PushEvent (commits pushed)
  • PullRequestEvent (PRs opened/closed/merged)
  • IssuesEvent (issues opened/closed)
  • CreateEvent/DeleteEvent (branches/tags created/deleted)
  • WorkflowRunEvent (GitHub Actions runs)
Output: Writes events to evidence.jsonInvocation:
Task: oss-investigator-gh-archive-agent
  Prompt: "Collect evidence from GH Archive for <research question>.
           Working directory: <workdir>
           Targets: repos=<repos>, actors=<actors>, dates=<dates>"
Purpose: Query live GitHub API for current repository stateSkills: github-evidence-kitCollects:
  • Commit content and metadata
  • File contents at specific refs
  • Branch and tag information
  • PR and issue details (if not deleted)
  • Fork relationships
Output: Writes observations to evidence.jsonUse Case: Retrieve commit content after getting SHA from GH Archive
Purpose: Recover deleted content from Wayback MachineSkills: github-wayback-recovery, github-evidence-kitCollects:
  • Archived README and documentation
  • Deleted issue/PR content
  • Repository metadata snapshots
  • Wiki pages
Output: Writes snapshots to evidence.jsonLimitation: Cannot recover private content or full git history
Purpose: Analyze cloned repositories for dangling commitsSkills: github-evidence-kitCollects:
  • Dangling commits (not reachable from any ref)
  • Reflog entries
  • Force-pushed commit content
Output: Writes commit observations to evidence.jsonValue: Reveals force-pushed or deleted commits that attackers tried to hide
Purpose: Extract IOCs (Indicators of Compromise) from vendor security reportsSkills: github-evidence-kitCollects:
  • Commit SHAs
  • File hashes
  • Usernames
  • Repository URLs
  • IP addresses and domains
Output: Writes IOC observations to evidence.jsonInvocation: Only spawned if vendor report URL is in research question

Analysis Agents

These agents analyze collected evidence and form hypotheses about security incidents.
Purpose: Analyze evidence and form hypotheses OR request additional evidenceSkills: github-evidence-kitInput: Reads evidence.json, previous rebuttal (if retry)Output: Writes either:
  • evidence-request-{N}.md if more evidence needed
  • hypothesis-{N}.md if evidence sufficient
Key Behavior: Can request specific follow-up investigations:
# Evidence Request 001

## Missing Evidence
- **Need**: PushEvents for actor 'user' on 2025-07-13
- **Agent**: oss-investigator-gh-archive-agent
- **Query**: "Query PushEvents where actor.login='user'"

## Reason
Cannot determine timeline without push events.
Hypothesis Format:
  • Research question restatement
  • Executive summary
  • Timeline table with evidence citations
  • Attribution with confidence levels
  • Intent analysis (evidence-based)
  • Impact assessment
  • Evidence citations table
Purpose: Validate hypothesis claims against verified evidenceSkills: github-evidence-kitInput: Reads hypothesis-{N}.md and evidence.jsonOutput: Writes either:
  • hypothesis-{N}-confirmed.md if all claims validated
  • hypothesis-{N}-rebuttal.md if claims fail verification
Validation Rules:
  • Every claim must cite evidence by ID
  • Evidence IDs must exist in store with verified status
  • Attribution confidence must match evidence quality
  • Timeline must use exact UTC timestamps from evidence
  • No speculation or unsupported claims
Rebuttal Format:
# Hypothesis Rebuttal

## Issues Found
1. Claim "attacker created tag at 19:41" has no evidence citation
2. Evidence [EVD-003] cited but not in evidence store
3. Attribution confidence "HIGH" but only one source

## Required Fixes
- Add evidence citation for tag creation claim
- Remove or verify [EVD-003]
- Reduce confidence to MEDIUM or add corroborating source
Purpose: Deep root-cause analysis of C/C++ crashes using rr debuggerSkills: rr-debugger, function-tracing, gcov-coverageInput: Bug tracker URL, git repo URLOutput: Root cause analysis with:
  • Crash location and stack trace
  • Function call sequence leading to crash
  • Code coverage data
  • Memory state at crash point
Workflow: Deterministic replay debugging with rr
Purpose: Validate crash analysis accuracyInput: Crash analysis reportOutput: Verification report with:
  • Claim validation results
  • Code path verification
  • Alternative explanations (if any)
Method: Re-executes traces and checks coverage data
Purpose: Determine if vulnerabilities are exploitableSkills: exploitability-validationInput: SARIF findings, target binary pathOutput: Validation report with:
  • Reachability analysis
  • Exploit feasibility assessment
  • Mitigation effectiveness check
Stages: 0 (Inventory) → A (One-Shot) → B (Process) → C (Sanity) → D (Ruling) → E (Feasibility)

Verification Agents

Purpose: Re-verify all collected evidence against original sourcesSkills: github-evidence-kitInput: Reads evidence.jsonOutput: Writes evidence-verification-report.mdVerification Process:
from src import EvidenceStore
from src.verifiers import ConsistencyVerifier

store = EvidenceStore.load(f"{workdir}/evidence.json")
verifier = ConsistencyVerifier()

results = verifier.verify_all(store.get_all())

# Report includes:
# - Total evidence count
# - Verified count
# - Failed verifications with reasons
# - Unverifiable evidence (source unavailable)
Verification Methods:
  • GH Archive: Re-query BigQuery with same parameters
  • GitHub API: Re-fetch from API endpoints
  • Wayback: Re-check snapshot availability
  • Local Git: Re-validate commit existence

Reporting Agents

Purpose: Generate final forensic report from confirmed hypothesisSkills: github-evidence-kitInput: Reads:
  • hypothesis-{N}-confirmed.md
  • evidence.json
  • evidence-verification-report.md
Output: Writes forensic-report.mdReport Sections:
  1. Executive Summary
  2. Timeline (chronological with evidence)
  3. Attribution (actors, confidence, evidence)
  4. Intent Analysis
  5. Impact Assessment
  6. IOCs (Indicators of Compromise)
  7. Evidence Appendix (full details)

Execution Modes

Parallel Execution

Use When: Agents have no dependencies on each other’s outputs Pattern: Spawn multiple agents in a single message
# CORRECT: Single message with multiple Task calls
Task: oss-investigator-gh-archive-agent
  Prompt: "Collect from GH Archive..."
  
Task: oss-investigator-github-agent
  Prompt: "Collect from GitHub API..."
  
Task: oss-investigator-wayback-agent
  Prompt: "Collect from Wayback..."
  
Task: oss-investigator-local-git-agent
  Prompt: "Analyze local repo..."
Benefit: 4x faster than sequential execution for evidence collection phase
Common Mistake: Spawning agents in separate messages
# WRONG: Separate messages (runs sequentially)
message 1: spawn oss-investigator-gh-archive-agent
message 2: spawn oss-investigator-github-agent
message 3: spawn oss-investigator-wayback-agent
This defeats the purpose of parallelization!

Sequential Execution

Use When: Agent B depends on Agent A’s output Pattern: Wait for completion before next spawn
# Phase 3: Hypothesis Formation
result1 = spawn_agent("oss-hypothesis-former-agent")
# Wait for completion and check output

if evidence_request_exists:
    # Phase 3b: Follow-up Evidence Collection
    result2 = spawn_agent("oss-investigator-gh-archive-agent", query=request)
    # Wait for completion
    
    # Phase 3c: Retry Hypothesis Formation
    result3 = spawn_agent("oss-hypothesis-former-agent")
Dependencies Example:

Agent Communication

Shared File System

Agents communicate through files in the working directory:
.out/oss-forensics-20250713-143022/
├── evidence.json              # Shared evidence store
├── evidence-request-001.md    # Hypothesis former → Orchestrator
├── hypothesis-001.md          # Hypothesis former → Checker
├── hypothesis-001-rebuttal.md # Checker → Hypothesis former
├── hypothesis-002.md          # Revised hypothesis
├── hypothesis-002-confirmed.md # Checker → Report generator
├── evidence-verification-report.md # Verifier → Report generator
└── forensic-report.md         # Final output

Data Formats

Evidence Store (JSON):
{
  "evidence": [
    {
      "evidence_id": "evt-001",
      "type": "PushEvent",
      "observed_when": "2025-07-13T20:30:24Z",
      "observed_by": "gharchive",
      "observed_what": "Commit pushed to main branch",
      "verification": {
        "source": "gharchive",
        "url": "bigquery://githubarchive.day.20250713",
        "verified_at": "2025-07-14T10:15:33Z"
      },
      "payload": { /* full event data */ }
    }
  ]
}
Agent Communication Pattern:
1

Write

Agent A writes structured data to shared file
2

Signal

Agent A completes and returns to orchestrator
3

Orchestrator Checks

Orchestrator checks for expected output files
4

Read

Agent B reads shared file created by Agent A

Error Handling

Agent Failure Strategies

When: Transient errors (network, API rate limits)
max_retries = 3
for attempt in range(max_retries):
    try:
        result = spawn_agent("oss-investigator-github-agent", ...)
        break
    except Exception as e:
        if attempt < max_retries - 1:
            time.sleep(2 ** attempt)  # Exponential backoff
        else:
            logger.error(f"Agent failed after {max_retries} attempts")

Error Recovery Workflow

def orchestrate_with_recovery():
    try:
        # Phase 2: Parallel evidence collection
        results = spawn_parallel_agents([
            "oss-investigator-gh-archive-agent",
            "oss-investigator-github-agent",
            "oss-investigator-wayback-agent",
        ])
        
        # Check which agents succeeded
        successful = [r for r in results if r.success]
        failed = [r for r in results if not r.success]
        
        if len(successful) == 0:
            raise RuntimeError("All evidence collection failed")
        
        if failed:
            logger.warning(f"{len(failed)} agents failed, continuing with available evidence")
        
        # Phase 3: Hypothesis formation (proceeds with available evidence)
        spawn_agent("oss-hypothesis-former-agent", ...)
        
    except Exception as e:
        logger.error(f"Orchestration failed: {e}")
        generate_partial_report()

Best Practices

Spawn Parallel When Possible

Evidence collectors have no dependencies - always spawn in parallel for 4-5x speedup

Single Responsibility

Each agent does ONE thing well. Don’t ask evidence collectors to form hypotheses.

Pass Working Directory

Every agent needs the working directory path to read/write shared files

Verify Outputs

Orchestrator should check that expected output files exist before proceeding

Agent Design Principles

Only the orchestrator spawns agents. This prevents infinite loops and makes workflows debuggable.
Agents declare their skills (e.g., github-archive, github-evidence-kit). Skills are documentation loaded into agent context.
Agents declare their tools (e.g., Bash, Read, Write). Tools are actual capabilities the agent can use.
Agents communicate through files, not return values. This makes workflows resumable and debuggable.

Performance Optimization

Parallelization Impact

Evidence Collection Phase:
Execution ModeTimeSpeedup
Sequential (4 agents)120s1x
Parallel (4 agents)30s4x
Why Parallel Works:
  • GH Archive agent: Waits for BigQuery
  • GitHub API agent: Waits for API responses
  • Wayback agent: Waits for Archive.org
  • Local Git agent: Waits for git commands
All are I/O-bound, not CPU-bound!

Orchestrator Optimization

# ❌ BAD: Spawn agents one at a time
for agent in ["agent1", "agent2", "agent3"]:
    spawn_agent(agent)
    # Total time: 30s + 30s + 30s = 90s

# ✅ GOOD: Spawn all agents in parallel
spawn_parallel_agents(["agent1", "agent2", "agent3"])
# Total time: max(30s, 30s, 30s) = 30s

Debugging

Workflow Tracing

Enable detailed orchestrator logging:
export RAPTOR_LOG_LEVEL=DEBUG
/oss-forensics "your question"
Log Output:
[ORCHESTRATOR] Phase 0: Initialize → workdir created
[ORCHESTRATOR] Phase 1: Parse prompt → repo=aws/aws-toolkit-vscode, actor=lkmanka58
[ORCHESTRATOR] Phase 2: Spawn 4 investigators in parallel
[AGENT] oss-investigator-gh-archive-agent → started
[AGENT] oss-investigator-github-agent → started
[AGENT] oss-investigator-wayback-agent → started
[AGENT] oss-investigator-local-git-agent → started
[AGENT] oss-investigator-gh-archive-agent → completed (42 events)
[AGENT] oss-investigator-github-agent → completed (5 commits)
[AGENT] oss-investigator-wayback-agent → failed (no snapshots)
[AGENT] oss-investigator-local-git-agent → completed (2 dangling commits)
[ORCHESTRATOR] Phase 3: Spawn hypothesis former

Common Issues

Agent spawns but produces no output
error
Cause: Agent completed but didn’t write expected output fileDebug:
ls -la .out/oss-forensics-*/
# Check if evidence.json, hypothesis-*.md, etc. exist
Fix: Check agent logs for errors during file write
Parallel agents run sequentially
warning
Cause: Agents spawned in separate messages instead of single messageFix: Use single message with multiple Task calls
Evidence store grows unbounded
info
Expected behavior - evidence accumulates across investigationIf concern: Evidence is deduplicated by ID, so duplicates don’t inflate size

Further Reading

Agent Definitions

Full agent specifications with skills and tools

Orchestrator Code

Source code for workflow orchestration logic

Evidence Kit API

Python API for evidence collection and storage

Creating Custom Agents

Guide to building your own specialized agents

Build docs developers (and LLMs) love