Data Flow

Overview

This page traces a pull request review from the moment a developer opens a PR on GitHub to when Nectr posts the AI-generated review as a GitHub comment.

The entire flow is asynchronous and non-blocking. The webhook returns HTTP 200 within 1 second, then processes the PR in a background task.

PR Review Flow Diagram

  Developer opens / updates a Pull Request on GitHub
           │
           ▼
  GitHub → POST /api/v1/webhooks/github
           │
           ├─ Verify HMAC-SHA256 signature
           ├─ Deduplicate (ignore duplicate events within 1hr)
           ├─ Create Event row (status = pending)
           └─ Return HTTP 200 immediately
                    │
                    ▼
  BackgroundTask: process_pr_in_background()
           │
           ├─ 1. Fetch PR data from GitHub
           │      get_pr_diff()  get_pr_files()  get_file_content()
           │
           ├─ 2. Pull MCP context (if configured)
           │      ├─ Linear: linked issues & task descriptions
           │      ├─ Sentry: related errors for changed files
           │      └─ Slack: relevant channel messages
           │
           ├─ 3. Build ReviewContext (parallel)
           │      ├─ Mem0: project patterns, decisions, rules
           │      ├─ Mem0: developer-specific patterns & strengths
           │      ├─ Neo4j: file experts (who touched these files most)
           │      └─ Neo4j: related past PRs with file overlap
           │
           ├─ 4. AI Analysis — two modes (set PARALLEL_REVIEW_AGENTS)
           │
           │      STANDARD (default)               PARALLEL (opt-in)
           │      ──────────────────               ────────────────────
           │      Single agentic loop              asyncio.gather() runs:
           │      with 8 MCP-style tools           ├─ Security agent
           │      (search code, fetch              ├─ Performance agent
           │       issues, get errors…)            └─ Style agent
           │                                        ▼
           │                                  Synthesis agent combines
           │                                  all three into final review
           │
           ├─ 5. Post Review on GitHub PR
           │      • Posts as your GitHub account (PAT)
           │      • Inline review comments + top-level summary
           │
           ├─ 6. Index PR in Neo4j Graph
           │      Creates: PullRequest + Developer nodes
           │      Edges:   TOUCHES → Files
           │               AUTHORED_BY → Developer
           │               CLOSES → Issues
           │
           ├─ 7. Extract & Store Memories in Mem0
           │      Claude extracts: project_pattern, decision,
           │      developer_pattern, developer_strength, risk_module,
           │      contributor_profile
           │
           └─ 8. Update Event status → completed / failed

Step-by-Step Walkthrough

Step 1: GitHub Webhook Event

Location: app/api/v1/webhooks.py When a developer opens or updates a PR, GitHub sends a pull_request event to the configured webhook URL:

@router.post("/webhooks/github")
async def handle_github_webhook(
    request: Request,
    background_tasks: BackgroundTasks,
    db: AsyncSession = Depends(get_db),
):
    # 1. Parse payload
    payload = await request.json()
    event_type = request.headers.get("X-GitHub-Event")
    signature = request.headers.get("X-Hub-Signature-256")
    
    # 2. Verify HMAC-SHA256 signature
    body = await request.body()
    if not verify_signature(body, signature, webhook_secret):
        raise HTTPException(status_code=401, detail="Invalid signature")
    
    # 3. Deduplicate
    event_hash = hashlib.sha256(json.dumps(payload, sort_keys=True).encode()).hexdigest()
    existing = await db.execute(
        select(Event).where(
            Event.event_type == event_type,
            Event.payload_hash == event_hash,
            Event.created_at > datetime.now() - timedelta(hours=1),
        )
    )
    if existing.scalar_one_or_none():
        return {"status": "duplicate", "message": "Event already processed"}
    
    # 4. Create Event row
    event = Event(
        event_type=event_type,
        source="github",
        payload=json.dumps(payload),
        payload_hash=event_hash,
        status="pending",
    )
    db.add(event)
    await db.commit()
    
    # 5. Return HTTP 200 immediately (< 1 second)
    background_tasks.add_task(process_pr_in_background, payload, event.id)
    return {"status": "received", "event_id": event.id}

Why Deduplicate?

GitHub sometimes sends duplicate webhook events (network retries, misconfigured hooks). Deduplication prevents processing the same PR twice within a 1-hour window.

Step 2: Fetch PR Data from GitHub

Location: app/services/pr_review_service.py:473

async def process_pr_review(self, payload: dict, event: Event, db: AsyncSession, github_token: str | None = None):
    pr = payload["pull_request"]
    repo_full_name = payload["repository"]["full_name"]
    pr_number = pr["number"]
    owner, repo = repo_full_name.split("/")
    
    # Fetch diff + files in parallel
    diff = await github_client.get_pr_diff(owner, repo, pr_number, token=github_token)
    files = await github_client.get_pr_files(owner, repo, pr_number, token=github_token)

GitHub REST API calls:

GET /repos/{owner}/{repo}/pulls/{pr_number} - PR metadata
GET /repos/{owner}/{repo}/pulls/{pr_number}/files - Changed files
GET /repos/{owner}/{repo}/pulls/{pr_number} with Accept: application/vnd.github.v3.diff - Unified diff

Step 3: Pull MCP Context (Optional)

Location: app/services/pr_review_service.py (ReviewToolExecutor) If MCP integrations are configured, Nectr pulls live context:

from app.mcp.client import mcp_client

# Linear issues
issues = await mcp_client.get_linear_issues(team_id="", query="authentication")

# Sentry errors
errors = await mcp_client.get_sentry_errors(project="backend", filename="app/auth/jwt_utils.py")

If LINEAR_MCP_URL or SENTRY_MCP_URL are not set, these calls return empty lists gracefully.

Step 4: Build Review Context

Location: app/services/context_service.py:58

context = await build_review_context(
    repo_full_name=repo_full_name,
    pr_title=pr["title"],
    pr_description=pr["body"],
    file_paths=file_paths,
    author=author,
    pr_number=pr_number,
)

Parallel queries:

Mem0: Project patterns, decisions, rules
Mem0: Developer-specific patterns, strengths
Neo4j: File experts (developers who touched these files most)
Neo4j: Related past PRs (PRs that touched the same files)

Step 5: Agentic AI Analysis

Location: app/services/ai_service.py:536 Nectr supports two review modes:

Standard Mode (Default)

review_result = await ai_service.analyze_pull_request_agentic(
    pr, diff, files, tool_executor, issue_refs=issue_refs
)

Claude receives:

PR metadata (title, body, author)
Diff (up to 15,000 chars)
File list (name, additions, deletions)
8 tools to fetch additional context on-demand

Tool execution loop:

Claude analyzes diff
Calls read_file("app/auth/jwt_utils.py") - needs full context
Calls search_project_memory("JWT token handling") - checks past decisions
Calls get_file_history(["app/auth/jwt_utils.py"]) - finds experts
Returns final review with verdict + inline suggestions

Tool Call Example

{
  "type": "tool_use",
  "id": "toolu_123",
  "name": "read_file",
  "input": {
    "path": "app/auth/jwt_utils.py"
  }
}

Tool result:

"### app/auth/jwt_utils.py\n```python\nimport jwt\nfrom datetime import datetime, timedelta\n\ndef create_token(user_id: int) -> str:\n    payload = {\n        'user_id': user_id,\n        'exp': datetime.utcnow() + timedelta(hours=24)\n    }\n    return jwt.encode(payload, SECRET_KEY, algorithm='HS256')\n```"

Parallel Mode (Opt-In)

review_result = await ai_service.analyze_pull_request_parallel(
    pr, diff, files, tool_executor, issue_refs=issue_refs
)

Runs 3 specialized agents concurrently:

Security agent - Injection, auth flaws, secrets
Performance agent - N+1 queries, memory leaks, O(n²)
Style agent - Missing tests, unclear names, dead code

Then a synthesis agent combines all findings into one review.

Step 6: Post Review to GitHub

Location: app/services/pr_review_service.py:716

await github_client.post_pr_review(
    owner, repo, pr_number,
    commit_id=head_sha,
    body=comment_body,
    event=github_event,  # "APPROVE" | "REQUEST_CHANGES" | "COMMENT"
    comments=inline_comments,
    token=github_token,
)

GitHub REST API:

POST /repos/{owner}/{repo}/pulls/{pr_number}/reviews

Payload:

{
  "commit_id": "abc123...",
  "body": "## Summary\n...",
  "event": "APPROVE",
  "comments": [
    {
      "path": "app/auth/jwt_utils.py",
      "line": 42,
      "side": "RIGHT",
      "body": "Consider using `timedelta(days=1)` for clarity\n\n```suggestion\nexp = datetime.utcnow() + timedelta(days=1)\n```"
    }
  ]
}

Inline suggestions use GitHub’s suggestion format. Users can click “Commit suggestion” to apply the fix directly.

Step 7: Index PR in Neo4j

Location: app/services/graph_builder.py:197

await index_pr(
    repo_full_name=repo_full_name,
    pr_number=pr_number,
    title=pr_title,
    author=author,
    files_changed=file_paths,
    verdict=review_result.verdict,
    issue_numbers=issue_refs,
)

Cypher queries:

-- Create PullRequest node
MERGE (pr:PullRequest {repo: $repo, number: $number})
SET pr.title = $title, pr.author = $author, pr.verdict = $verdict

-- Create Developer node + AUTHORED_BY edge
MERGE (d:Developer {login: $author})
MERGE (pr)-[:AUTHORED_BY]->(d)

-- Create TOUCHES edges for each file
UNWIND $files AS f
MERGE (file:File {repo: $repo, path: f.path})
MERGE (pr)-[:TOUCHES]->(file)

-- Create CLOSES edges for linked issues
UNWIND $issue_nums AS num
MERGE (i:Issue {repo: $repo, number: num})
MERGE (pr)-[:CLOSES]->(i)

Step 8: Extract Memories to Mem0

Location: app/services/memory_extractor.py Claude extracts learnings from the PR review:

await extract_and_store(
    repo_full_name=repo_full_name,
    pr_number=pr_number,
    author=author,
    title=pr_title,
    files=files,
    review_summary=summary,
)

Memory extraction prompt:

Analyze this PR review and extract learnings:

Categories:
1. project_pattern - Repo-wide patterns (e.g., "All auth routes use JWT middleware")
2. decision - Architectural decisions (e.g., "We use Pydantic for config, not env vars")
3. developer_pattern - Author habits (e.g., "@alice tends to forget error handling")
4. developer_strength - Author strengths (e.g., "@bob writes excellent tests")
5. risk_module - Fragile areas (e.g., "app/auth/jwt_utils.py needs extra scrutiny")

Output JSON:
[
  {"category": "project_pattern", "content": "All auth routes require JWT middleware"},
  {"category": "developer_strength", "content": "@alice writes clear docstrings"}
]

Memories are stored in Mem0 and indexed by:

repo: Repository full name
developer: GitHub username (for developer-specific memories)
category: Memory type

Step 9: Update Event Status

Location: app/services/pr_review_service.py:744

workflow.status = "completed"
workflow.result = json.dumps({
    "ai_summary": summary,
    "files_analyzed": len(files),
    "comment_posted": True,
    "verdict": review_result.verdict,
    "inline_suggestions": len(inline_comments),
})
workflow.completed_at = datetime.now()

event.status = "completed"
event.processed_at = datetime.now()

await db.flush()

Failure Handling

If any step fails (GitHub API error, Claude timeout, Neo4j unreachable), the workflow:

Logs the error with full traceback
Updates event.status = "failed"
Stores error message in workflow.error
Does not retry automatically (prevents duplicate reviews)

Users can view failed events in the dashboard and manually trigger a rescan if needed.

Performance Metrics

Step	Typical Duration
Webhook verification	< 50 ms
Fetch PR data from GitHub	200-500 ms
Build review context (Neo4j + Mem0)	300-800 ms
Agentic AI analysis (3-5 tool calls)	8-15 seconds
Post review to GitHub	200-400 ms
Index PR in Neo4j	100-300 ms
Extract memories to Mem0	2-4 seconds
Total (background)	10-25 seconds

The webhook returns HTTP 200 in < 1 second. All processing happens in the background.

System Design

Components

Overview

PR Review Flow Diagram

Step-by-Step Walkthrough

Step 1: GitHub Webhook Event

Step 2: Fetch PR Data from GitHub

Step 3: Pull MCP Context (Optional)

Step 4: Build Review Context

Step 5: Agentic AI Analysis

Standard Mode (Default)

Parallel Mode (Opt-In)

Step 6: Post Review to GitHub

Step 7: Index PR in Neo4j

Step 8: Extract Memories to Mem0

Step 9: Update Event Status

Failure Handling

Performance Metrics

Next Steps

Service Layer

Neo4j Graph

Build docs developers (and LLMs) love

System Design

Components

​Overview

​PR Review Flow Diagram

​Step-by-Step Walkthrough

​Step 1: GitHub Webhook Event

​Step 2: Fetch PR Data from GitHub

​Step 3: Pull MCP Context (Optional)

​Step 4: Build Review Context

​Step 5: Agentic AI Analysis

​Standard Mode (Default)

​Parallel Mode (Opt-In)

​Step 6: Post Review to GitHub

​Step 7: Index PR in Neo4j

​Step 8: Extract Memories to Mem0

​Step 9: Update Event Status

​Failure Handling

​Performance Metrics

​Next Steps

Service Layer

Neo4j Graph

Build docs developers (and LLMs) love

Overview

PR Review Flow Diagram

Step-by-Step Walkthrough

Step 1: GitHub Webhook Event

Step 2: Fetch PR Data from GitHub

Step 3: Pull MCP Context (Optional)

Step 4: Build Review Context

Step 5: Agentic AI Analysis

Standard Mode (Default)

Parallel Mode (Opt-In)

Step 6: Post Review to GitHub

Step 7: Index PR in Neo4j

Step 8: Extract Memories to Mem0

Step 9: Update Event Status

Failure Handling

Performance Metrics

Next Steps