What is AF_XDP?
AF_XDP is a Linux API providing kernel-bypass networking through shared memory ring buffers accessible from userspace. Key characteristics:- Redirects packets from/to shared memory buffers
- Hardware-agnostic (unlike DPDK)
- Can share NIC with Linux networking stack
- Allows deployment in existing, heterogeneous networks
- Shared memory region called “UMEM”
XDP (eXpress Data Path)
XDP is a framework for installing hooks (eBPF programs) at an early stage of packet processing:- Runs before tc and netfilter
- eBPF is JIT-compiled bytecode
- Some hardware/driver combinations offload eBPF to NICs
- Not to be confused with Solana BPF (sBPF)
Packet Flow Architecture
RX Block Diagram
XDP_FLAGS_DRV_MODE mode:
- NIC receives packet via hardware
- Driver passes packet to XDP facility
- XDP eBPF program inspects packet:
XDP_PASS: Continue to Linux stack (sk_buff, ip_rcv, tc, netfilter)XDP_REDIRECT: Copy to XSK UMEM, allocate RX queue entry
- fd_aio backend provides async I/O interface
The
XDP_FLAGS_SKB_MODE fallback uses sk_buff-based memory management (still skips most of the generic path) but is more widely available.TX Block Diagram
- fd_aio: Userspace delivers packets to XSK/UMEM buffers
- Kernel: Forwards packets to NIC driver
- Driver: NIC transmits packets
The application is responsible for maintaining a routing table to resolve layer-3 destination addresses to NICs and layer-2 addresses. Netfilter (iptables, nftables) is not available in the XDP path.
Memory Management
UMEM Overview
The UMEM (Universal Memory) area:- Allocated from userspace (recommended: huge pages via fd_util shmem/wksp APIs)
- Divided into equally sized frames
- Each frame owned by either userspace or kernel at any time
- All frames initially owned by userspace
Ring Buffers
Ownership changes and packet events are transmitted via four rings allocated by the kernel (mmap’d to userspace):XDP Ring Structure
Ring Descriptor
The kernel-provided descriptor ring memory layout is unstable. Field offsets must be queried using
getsockopt(SOL_XDP, XDP_MMAP_OFFSETS) on each join.Ring Operations
Rings are synchronized via incrementing sequence numbers that wrap at 2^64.- Caches producer/consumer sequence numbers
- Only refreshes from shared memory when potentially empty/full
- Reduces expensive memory fence operations
XSK Socket Structure
XSK Parameters
Important Flags
XDP_FLAGS_DRV_MODE:- Driver mode (fast path)
- XDP support in NIC driver before sk_buff allocation
- Best performance
- Hardware/driver dependent
- Socket buffer mode (fallback)
- Uses Linux sk_buff for memory management
- Works everywhere
- Slower than driver mode
- Enables zero-copy I/O
- UMEM directly accessible to NIC via DMA
- Requires driver mode
- Eliminates software packet copies
Zero-Copy I/O
WithXDP_FLAGS_DRV_MODE and XDP_ZEROCOPY:
- PCIe device initiates DMA write to DRAM
- Device driver signals new packet arrival
- Kernel XDP passes pointer through to net tile
- Net tile identifies recipient app tile
- Net tile passes pointer via mcache to app tile
- App tile reads packet data directly from UMEM
Firedancer did not initially have zero-copy RX. Changing to zero-copy required less than 500 lines of code changed at the net tile. No changes to app tile code were required. This demonstrates the power of the Tango message queue design.
IOMMU Protection
The UMEM region is shared with hardware: Access permissions:- Firedancer app tiles: read-only
- Firedancer net tiles: read-write
- Linux kernel: read-write
- PCIe network devices: read-write (via IOMMU)
- Provides memory protection for DMA
- Prevents NIC from accessing arbitrary memory
- Allows zero-copy while maintaining security
Supported Drivers
Well-tested drivers for Firedancer:- Driver mode support varies by driver
- Zero-copy support varies by driver
- Performance characteristics differ
Results may vary across drivers. Popular, well-tested drivers generally have better XDP support and fewer bugs.
Netlink Integration
Firedancer’s AF_XDP stack integrates with Linux networking via netlink:Configuration Sources
Interface table: Network interface information Route tables: IPv4 routes fromlocal and main tables
Neighbor tables: ARP entries for XDP-enabled Ethernet interfaces
Netbase Shared Memory
The netlink tile maintains a read-only cache (“netbase” workspace) containing:- Interface configurations
- Routing information
- Neighbor (ARP) entries
ARP Handling
Neighbor discovery (ARP on IPv4):- App tile needs to send to IP address
- Net tile checks neighbor table for MAC address
- If not found, net tile notifies netlink tile
- Netlink tile requests neighbor solicitation via netlink
- Kernel broadcasts ARP request
- ARP response received, kernel updates neighbor table
- Netlink tile updates netbase shared memory
- Net tile can now send packet
The netlink tile deduplicates neighbor solicitation requests to prevent flooding from line-rate traffic.
Security Considerations
Required Capabilities
AF_XDP requires:CAP_SYS_ADMIN- System administrationCAP_NET_RAW- Raw network access
Isolation
The netlink tile isolates itself from untrusted inputs:- Separate process with own sandbox
- Only net tiles can communicate with it
- Net tiles have read-only access to netbase
- Malicious netlink tile can compromise net tiles, but not vice versa
XDP Program
Firedancer installs an XDP program on the interface that:- Redirects Firedancer traffic to AF_XDP sockets
- Passes other traffic to Linux stack (XDP_PASS)
- Unloaded when Firedancer exits
Limitations
Monitoring
Use Firedancer’s built-in packet capture facilities instead.Interface Sharing
- Firedancer cannot share an interface with other AF_XDP apps
- May cause performance impact to other apps using Linux networking on shared interfaces
- Only one external network interface supported (plus loopback)
IPv6
Firedancer does not support IPv6:- Practically all Solana traffic uses IPv4 (as of Feb 2025)
- IPv6 would be more expensive (lower MTU, mandatory UDP checksums, longer addresses)
- Adding IPv6 support would be straightforward if needed
Performance Characteristics
Target Performance
- RX throughput: ~20 million packets per second
- Kernel wake-ups: ~20,000 times per second
- Busy polling: Net tile never sleeps
Optimizations
Preferred busy polling: Reduces latency by polling more aggressively Zero-copy I/O: Eliminates software copies via DMA Batching: RX and TX processed in batches Cache optimization: Critical structures aligned to cache linesFuture Improvements
SO_PREFERRED_BUSY_POLLnot yet used- IRQ affinity not yet configured
- NIC interrupts not yet disabled
- Could reduce RX mcache scaling from O(n*m) to O(max(n,m))
- tx_free ring probably obsolete (could move COMPLETION→TX directly)