Overview
The GitHub Webhook Server is built on a FastAPI-based event-driven architecture that processes GitHub webhooks through specialized handlers. The system is designed for high performance, type safety, and fail-fast reliability.
Core Architecture Components
FastAPI Application Layer
The webhook server runs as a FastAPI application (webhook_server/app.py) with:
Asynchronous processing - Non-blocking webhook handling using Python asyncio
IP-based security - GitHub and Cloudflare IP allowlist verification
Webhook signature validation - HMAC-SHA256 signature verification
Health check endpoints - /webhook_server/healthcheck for monitoring
Structured logging - JSON-based webhook execution tracking
# Main webhook endpoint
@FASTAPI_APP.post (
f " { APP_URL_ROOT_PATH } " ,
dependencies = [Depends(gate_by_allowlist_ips_dependency)]
)
async def webhook_handler ( request : Request) -> JSONResponse:
# Create structured logging context
ctx = create_context(
hook_id = hook_id,
event_type = event_type,
repository = repository_full_name,
action = action,
sender = sender_login,
api_user = api_user,
)
# Process webhook with GithubWebhook class
github_webhook = GithubWebhook(hook_data, headers, logger)
await github_webhook.process()
GithubWebhook Class - Central Orchestrator
The GithubWebhook class (webhook_server/libs/github_api.py) is the central orchestrator that:
Validates webhook data - Ensures required fields exist
Loads configuration - Merges global config with repository-specific .github-webhook-server.yaml
Authenticates with GitHub - Selects token with highest rate limit
Routes to handlers - Dispatches events to specialized handlers based on event type
Tracks metrics - Monitors API rate limit consumption and processing time
Initialization
Event Processing Flow
class GithubWebhook :
def __init__ ( self , hook_data : dict , headers : Headers, logger : logging.Logger):
self .hook_data = hook_data
self .repository_name = hook_data[ "repository" ][ "name" ]
self .repository_full_name = hook_data[ "repository" ][ "full_name" ]
self .github_event = headers[ "X-GitHub-Event" ]
self .config = Config( repository = self .repository_name)
# Get GitHub API with highest rate limit
github_api, self .token, self .api_user = get_api_with_highest_rate_limit(
config = self .config, repository_name = self .repository_name
)
# Get repository instances
self .repository = get_github_repo_api(
github_app_api = github_api,
repository = self .repository_full_name
)
Event-Driven Handler Architecture
Handler Pattern
All handlers follow a consistent pattern (webhook_server/libs/handlers/):
class SomeHandler :
def __init__ ( self , github_webhook : GithubWebhook, owners_file_handler : OwnersFileHandler):
self .github_webhook = github_webhook
self .owners_file_handler = owners_file_handler
self .logger = github_webhook.logger
self .repository = github_webhook.repository
async def process_event ( self , event_data : dict ) -> None :
# Event-specific processing logic
pass
Specialized Handlers
Handler File Responsibilities PullRequestHandler pull_request_handler.pyPR opened/reopened/edited events, reviewer assignment, label management, merge checks IssueCommentHandler issue_comment_handler.pyUser commands (/verified, /lgtm, /retest, /cherry-pick) PullRequestReviewHandler pull_request_review_handler.pyReview submitted/dismissed, approval tracking, review labels CheckRunHandler check_run_handler.pyCI check completion, merge eligibility, auto-merge PushHandler push_handler.pyBranch pushes, tag creation, container building OwnersFileHandler owners_files_handler.pyOWNERS file parsing, reviewer/approver assignment LabelsHandler labels_handler.pyLabel application, PR size calculation, branch labels RunnerHandler runner_handler.pyTest execution (tox, pre-commit, container builds)
Repository Data Pre-Fetch Pattern
Performance Optimization: Repository data is fetched once per webhook before handlers are instantiated, preventing duplicate API calls.
The GithubWebhook class pre-fetches comprehensive repository data:
# In GithubWebhook.process() - after PR data, before handlers
self .repository_data = await self .unified_api.get_comprehensive_repository_data(
owner, repo
)
# Handlers access pre-fetched data directly
collaborators = self .github_webhook.repository_data[ 'collaborators' ][ 'edges' ]
protected_branches = self .github_webhook.repository_data[ 'protected_branches' ]
Benefits:
⚡ Reduced API calls - Single fetch vs multiple handler calls
🚀 Faster processing - Parallel data access from cached dict
💰 Lower rate limit consumption - Critical for high-volume repositories
Repository Cloning Strategy
Optimized Cloning for check_run Events
For check_run events, the server implements early exit conditions to avoid unnecessary cloning:
if self .github_event == "check_run" :
action = self .hook_data.get( "action" , "" )
if action != "completed" :
return None # Skip clone for 'created' action
check_run_name = self .hook_data.get( "check_run" , {}).get( "name" , "" )
check_run_conclusion = self .hook_data.get( "check_run" , {}).get( "conclusion" , "" )
if check_run_name == "can-be-merged" and check_run_conclusion != "success" :
return None # Skip clone for failed can-be-merged checks
# Only clone when actually needed
await self ._clone_repository( pull_request = pull_request)
Impact:
🎯 90-95% reduction in unnecessary repository cloning
⏱️ 5-30 seconds saved per skipped clone
📉 Lower server resource usage
Worktree Isolation
Handlers create isolated worktrees from a single repository clone:
# Single clone per webhook
await self ._clone_repository( pull_request = pull_request)
# Handlers create isolated worktrees for concurrent operations
worktree_dir = f " { self .clone_repo_dir } -worktree- { handler_name } "
await run_command( f "git worktree add { worktree_dir } { branch } " )
Non-Blocking PyGithub Operations
Critical Requirement: PyGithub is synchronous - ALL operations MUST use asyncio.to_thread() to prevent blocking the event loop.
The architecture requires wrapping all PyGithub operations in asyncio.to_thread():
# ✅ CORRECT - Non-blocking
await asyncio.to_thread(pull_request.create_issue_comment, "Comment" )
await asyncio.to_thread(pull_request.add_to_labels, "verified" )
is_draft = await asyncio.to_thread( lambda : pull_request.draft)
# ❌ WRONG - Blocks event loop
pull_request.create_issue_comment( "Comment" ) # Freezes server!
is_draft = pull_request.draft # Blocks for 100ms-2s
Why this matters:
🔴 Blocking calls freeze the entire server
🚫 Incoming webhooks must wait
⏱️ Each GitHub API call blocks 100ms-2 seconds
🎯 asyncio.to_thread() keeps event loop responsive
Configuration System
The configuration system (webhook_server/libs/config.py) supports:
Hierarchical Configuration
class Config :
def get_value ( self , value : str , return_on_none : Any = None ) -> Any:
# Order of precedence:
# 1. Repository-specific .github-webhook-server.yaml
# 2. Repository level in global config.yaml
# 3. Root level in global config.yaml
for scope in ( self .repository_data, self .root_data):
result = self ._get_nested_value(value, scope)
if result is not None :
return result
return return_on_none
Schema Validation
JSON Schema - webhook_server/config/schema.yaml defines all valid fields
IDE Support - Schema URL in config YAML enables autocompletion
Type Checking - Validates strings, integers, booleans, arrays, objects
Cross-field Validation - Ensures configuration consistency
# yaml-language-server: $schema=https://raw.githubusercontent.com/myk-org/github-webhook-server/refs/heads/main/webhook_server/config/schema.yaml
github-app-id : 123456
webhook-ip : https://your-domain.com/webhook_server
github-tokens :
- ghp_your_github_token
repositories :
my-repository :
name : my-org/my-repository
protected-branches :
main : []
Structured Logging & Metrics
Webhook Context Tracking
Every webhook execution creates a structured context (webhook_server/utils/context.py):
ctx = create_context(
hook_id = "github-delivery-id" ,
event_type = "pull_request" ,
repository = "org/repo" ,
action = "opened" ,
sender = "username" ,
api_user = "api-token-user" ,
)
# Step tracking
ctx.start_step( "assign_reviewers" , pr_number = 123 )
ctx.complete_step( "assign_reviewers" , reviewers_assigned = 3 )
Logs are written to {config.data_dir}/logs/webhooks_YYYY-MM-DD.json:
{
"hook_id" : "github-delivery-id" ,
"event_type" : "pull_request" ,
"pr" : { "number" : 968 , "title" : "Add new feature" },
"timing" : {
"started_at" : "2026-01-05T10:30:00.123Z" ,
"duration_ms" : 7712
},
"workflow_steps" : {
"clone_repository" : { "status" : "completed" , "duration_ms" : 4823 },
"assign_reviewers" : { "status" : "completed" , "duration_ms" : 1234 }
},
"token_spend" : 4 ,
"success" : true
}
Architecture Diagram
Fail-Fast Philosophy
The architecture follows a fail-fast philosophy - exceptions propagate immediately to abort webhook processing rather than hiding bugs with fake data.
# ✅ CORRECT - Fail-fast
collaborators = self .github_webhook.repository_data[ 'collaborators' ]
if 'edges' not in collaborators:
raise ValueError ( "Missing collaborators data" )
# ❌ WRONG - Hiding bugs
collaborators = self .github_webhook.repository_data.get( 'collaborators' , {})
return collaborators.get( 'edges' , []) # Returns empty list, hides missing data
Benefits:
🐛 Bugs surface immediately - No silent failures
🔍 Clear error messages - Traceback shows exact issue
🚀 Faster debugging - Root cause visible in logs
✅ Type safety enforced - mypy strict mode catches issues at development time
Metric Value Notes Webhook Processing 2-10 seconds Depends on handler complexity Repository Clone 5-30 seconds Optimized with early exits API Rate Limit 2-10 calls/webhook Pre-fetch reduces calls Concurrent Webhooks 10 workers (default) Configurable via max-workers Memory Usage ~200MB per worker Scales linearly with workers
Webhook Events Supported GitHub events and processing flow
OWNERS Files Approver and reviewer assignment system
Configuration Schema validation and configuration system
API Reference Webhook endpoint specifications