Architecture Overview
AWX can be deployed in a clustered configuration with multiple control plane nodes working together to handle API requests and execute jobs.Deployment Types
There are two main deployment types:Virtual Machines (VM)
Ansible Automation Platform (AAP) can be installed on VMs with traditional OS-level processes
Kubernetes (K8S)
Both AAP and upstream AWX support K8S deployments with containerized services
The upstream AWX project can only be installed via a K8S deployment. Either deployment type supports cluster scaling.
Control Node Components
VM Deployments
Control plane nodes run background services managed by supervisord:- dispatcher - Job scheduling and task management
- wsbroadcast - WebSocket communication between nodes
- callback receiver - Ansible callback processing
- receptor - Mesh networking (managed under systemd)
- redis - Caching and message broker (managed under systemd)
- uwsgi - WSGI application server
- daphne - ASGI server for WebSockets
- rsyslog - Logging service
Kubernetes Deployments
Background processes are containerized:awx-ee
receptor
awx-web
uwsgi, daphne, wsbroadcast, rsyslog
awx-task
dispatcher, callback receiver
redis
redis
Monolithic Design
Each control node is monolithic and contains all necessary components for handling API requests and running jobs.
- Load balancer distributes incoming requests across control nodes
- All control nodes interact with a single, shared PostgreSQL database
- If any service fails sufficiently, the entire instance is placed offline automatically for remediation
Scaling the Cluster
AAP Deployments
Kubernetes Deployments
Instance Types
Nodes can be configured with different types based on their role:| Type | AAP Only | Description |
|---|---|---|
| control | No | Control plane node that cannot run jobs |
| hybrid | Yes | Control plane node that can also run jobs |
| execution | No | Not a control node, can only run jobs |
| hop | Yes | Routes traffic from control to execution nodes |
Communication Between Nodes
Connection Matrix
| Node Type | Connection Type | Purpose |
|---|---|---|
| Control node | websockets, receptor | Sending websockets, heartbeat |
| Execution | receptor | Submitting jobs, heartbeat |
| Hop (AAP only) | receptor | Routing traffic to execution nodes |
| Postgres | postgres TCP/IP | Read and write queries, pg notify |
Receptor
Receptor provides an overlay network connecting control, execution, and hop nodes. How It Works:- Establishes periodic heartbeats between nodes
- Submits jobs to execution nodes
- Forms a mesh via persistent TCP/IP connections
- Routes traffic through intermediate nodes
Node A is reachable from node C (and vice versa) even without a direct connection. Receptor routes traffic through node B.
WebSocket Backplane
Each control node establishes websocket connections to all other control nodes.- Stream real-time data to UI (job events, logs)
- Load balancer determines which control node browsers connect to
- Control nodes broadcast messages to all other nodes
- Ensures users see real-time updates regardless of which node generates them
The websocket backplane is handled by the
wsbroadcast service that starts with the application.PostgreSQL
AWX uses psycopg3 to connect to PostgreSQL:- Only control nodes need direct database access
- Uses
pg_notifyfor inter-process communication - Enables dispatcher system to coordinate parallel processes
- Task manager communicates with main dispatcher thread via notifications
Node Health Management
Node health is determined by thecluster_node_heartbeat periodic task running on each control node.
Heartbeat Process
Inspect Execution Nodes
- Acquire DB advisory lock (single control node inspects at a time)
- Set
last_seenbased on Receptor heartbeat - Gather node info via
receptorctl status - Run
execution_node_health_check - Execute
ansible-runner --worker-infoto get CPU, memory, version - Calculate capacity for the instance
Detect Lost Nodes
- Calculate grace period:
CLUSTER_NODE_HEARTBEAT_PERIOD * CLUSTER_NODE_MISSED_HEARTBEAT_TOLERANCE - Mark instances as lost if
last_seenexceeds grace period
Check Local Health
- Determine if current node is lost
- Call
get_cpu_countandget_mem_in_bytesfrom ansible-runner
Version Comparison
- Compare current node’s ansible-runner version with others
- If older, call
stop_local_servicesand shut down
Handle Lost Instances
- Reap running, pending, and waiting jobs (mark as failed)
- Delete instance from database
Instance Groups
Instances can be organized into Instance Groups for workload management and resource allocation.Creating Instance Groups
System Administrators can create Instance Groups:Associating Instances
Add instances to groups:Instances automatically reconfigure to listen on the group’s work queue when added.
Instance Group Policies
Policies determine automatic instance assignment to groups:Policy Fields
policy_instance_percentage
Percentage (0-100) of active instances to assign to this group
policy_instance_minimum
Minimum number of instances to maintain in the group
policy_instance_list
Fixed list of instance names to always include
Policy Behavior
Percentage + Minimum Work Together: If you have 50% percentage and minimum of 2:- With 6 instances → 3 assigned to group
- With 2 instances → 2 assigned (meets minimum)
- With 1 instance → 1 assigned (can’t meet minimum)
- 4 instance groups with 25% each
- Instances distributed with no overlap
Manually Pinning Instances
To exclusively assign an instance to specific groups:Job Runtime Behavior
When a job is submitted:- Pushed into dispatcher queue via postgres notify/listen
- Handled by dispatcher process on a specific AWX node
- If instance fails during job execution, work is marked as permanently failed
Instance Group Job Assignment
If cluster has separate Instance Groups:- Any instance in the group can receive jobs
- Capacity reduced from all groups an instance belongs to
- Provisioning instances expands work capacity
- De-provisioning removes capacity
Controlling Job Placement
Default Behavior
Jobs are submitted to:- Default queue: For regular jobs (see
DEFAULT_EXECUTION_QUEUE_NAME) - Control plane queue: For administrative actions like project updates (see
DEFAULT_CONTROL_PLANE_QUEUE_NAME)
Restricting Job Placement
Instance Groups can be associated with:- Job Template (highest priority)
- Inventory (medium priority)
- Organization (lowest priority, via Inventory)
If all associated instance groups are at capacity, jobs remain in pending state until capacity frees up.
Preferred Instance Group Order
AWX checks in this order:- Job Template instance groups
- Inventory instance groups (if template groups at capacity)
- Organization instance groups (if inventory groups at capacity)
Project Synchronization
Project syncs run on the instance that prepares the ansible-runner private data directory. Sync Behavior:- Performed by dispatcher control/launch process
- Updates source tree to correct version immediately before job transmission
- Skipped if correct revision already checked out and no Galaxy/Collections updates needed
- Recorded as project update with
launch_type: syncandjob_type: run - Does not change project status or version (except for “never updated” projects)
- Runs with container isolation, volume mounts to persistent projects folder
Instance Enable/Disable
Temporarily take instances offline:When disabled:
- No new jobs assigned to the instance
- Existing jobs finish normally
- Useful for maintenance without terminating running jobs
Status and Monitoring
Cluster Health Endpoint
- Instance servicing the HTTP request
- Last heartbeat time of all other instances
- Instance Groups and membership
Detailed Views
Instances
/api/v2/instances/ - View instance details and running jobsInstance Groups
/api/v2/instance_groups/ - View groups and membershipBest Practices
Load Balancer
Configure proper health checks and session affinity for WebSocket connections
Database Performance
Use dedicated PostgreSQL instance with appropriate resources and tuning
Network Reliability
Ensure stable, low-latency connections between cluster nodes
Capacity Planning
Monitor capacity and scale before reaching limits
Backup Strategy
Regular database backups are critical in clustered environments
Version Consistency
Keep all nodes on the same AWX version to prevent automatic shutdowns