Skip to main content
Tango is Firedancer’s zero-copy, lock-free IPC framework designed for ultra-low-latency message passing between processing units. It provides the foundational messaging primitives used throughout Firedancer for communication between tiles, processes, and system components.

Overview

Tango implements a sophisticated messaging system based on fragments (variable-size message chunks) with metadata-driven routing and filtering. The design enables:
  • Zero-copy communication: Direct memory access without data copying
  • Lock-free operations: No mutex contention in critical paths
  • Ordered delivery: Strict sequential ordering with gap detection
  • Overrun detection: Automatic detection of lost messages
  • Multi-producer/multi-consumer: Flexible topology support
Tango’s design allows seamless transition between single-process threaded and multi-process distributed deployments without code changes.

Core Concepts

Message Fragments

Messages in Tango are partitioned into fragments:
  • Each fragment carries 0-65,535 bytes of payload
  • Fragments have 64-bit globally unique sequence numbers
  • Multi-fragment messages are supported (unlimited fragments per message)
  • Zero-sized fragments are valid (heartbeat/keepalive)
Fragment metadata:
struct fd_frag_meta {
  ulong  seq;     // Sequence number (globally unique)
  ulong  sig;     // Application-defined signature (filtering)
  uint   chunk;   // Compressed payload location
  ushort sz;      // Fragment size (0-65535 bytes)
  ushort ctl;     // Control bits (SOM/EOM/ERR/origin)
  uint   tsorig;  // Origin timestamp (compressed)
  uint   tspub;   // Publish timestamp (compressed)
};

Control Bits

Fragment boundaries:
  • SOM (Start-of-Message): First fragment of a message
  • EOM (End-of-Message): Last fragment of a message
  • ERR (Error): Entire message is corrupt (discard)
Origin tracking:
  • 13-bit origin ID (0-8191)
  • Identifies message producer
  • Up to FD_FRAG_META_ORIG_MAX (8192) origins

Sequence Numbers

Properties:
// 64-bit sequence numbers
// - Monotonically increasing across all producers
// - Gaps indicate lost fragments (overrun)
// - Wrapping handled correctly (randomized initial values)

int fd_seq_lt(ulong a, ulong b);  // a < b?
int fd_seq_eq(ulong a, ulong b);  // a == b?
long fd_seq_diff(ulong a, ulong b);  // How far ahead is a?
Sequence number math:
ulong next = fd_seq_inc(seq, 1UL);      // Increment
ulong prev = fd_seq_dec(seq, 1UL);      // Decrement
long diff = fd_seq_diff(seq_a, seq_b);  // Distance

Architecture Components

Mcache (mcache/)

Hybrid ring buffer + direct-mapped cache for fragment metadata:Structure:
  • Ring buffer of fd_frag_meta_t entries
  • Power-of-2 size (typically 4K-1M entries)
  • Direct mapping: seq % depth → cache line
  • Lock-free atomic updates (SSE/AVX instructions)
Access patterns:
// Publisher: Write metadata atomically
fd_frag_meta_t * meta = mcache + (seq % depth);
meta->seq = seq;  // Atomic write (first SSE word)

// Consumer: Read metadata atomically
ulong seq = fd_frag_meta_seq_query(meta);
if (fd_seq_gt(seq, last_seq)) {
  // New fragment available
}
Depth sizing:
  • Small queues: 1K-4K entries (low latency)
  • Deep queues: 256K-1M entries (burst absorption)
  • Trade-off: Memory vs. overrun tolerance
Memory footprint:
  • Each entry: 32 bytes (FD_FRAG_META_SZ)
  • Aligned to 32 bytes (FD_FRAG_META_ALIGN)
  • Total: depth * 32 bytes

Dcache (dcache/)

Payload storage with chunk-granular allocation:Chunk system:
#define FD_CHUNK_SZ     (64UL)  // 64-byte chunks
#define FD_CHUNK_LG_SZ  (6)     // log2(64) = 6
Chunk addressing:
// Convert chunk ID to pointer
void * payload = fd_chunk_to_laddr(chunk0, chunk_id);

// Convert pointer to chunk ID
ulong chunk_id = fd_laddr_to_chunk(chunk0, payload);
Properties:
  • 32-bit chunk IDs (4 billion chunks max)
  • Chunk 0 is base address
  • All chunks 64-byte aligned
  • Max data region: 256 GB per dcache
Allocation strategies:
  • Linear allocation (sequential writes)
  • Circular buffer (fixed-size messages)
  • Custom allocators (variable-size)

Fctl (fctl/)

Credit-based flow control for backpressure:Mechanism:
  • Consumer publishes credits (“I can receive N fragments”)
  • Producer consumes credits before sending
  • Prevents overrun of slow consumers
  • Dynamic credit replenishment
Credit types:
  • Fragment credits: Number of fragments consumer can receive
  • Chunk credits: Amount of dcache space available
Usage:
// Producer checks credits
if (fctl_credits_avail >= 1) {
  // Send fragment
  fctl_credits_avail--;
}

// Consumer replenishes credits
fctl_publish_credits(processed_count);

Fseq (fseq/)

Shared sequence number for synchronization:Purpose:
  • Publisher advertises latest sequence number
  • Multiple consumers read the same fseq
  • Enables “pub-sub” patterns
  • Flow control signal publishing
Memory layout:
// Single 64-bit atomic sequence number
// Typically in shared memory (wksp)
// Cache-line aligned to avoid false sharing
Operations:
// Publisher: Update sequence
fd_fseq_update(fseq, new_seq);

// Consumer: Query latest
ulong latest = fd_fseq_query(fseq);

Tcache (tcache/)

Specialized cache for transaction metadata:Features:
  • Transaction-specific metadata storage
  • Deduplication support
  • Tag-based lookup
  • Recent transaction tracking
Use case:
  • Disco dedup tile uses tcache
  • Maps transaction signature → metadata
  • Fast duplicate detection

CNC (cnc/)

Tile lifecycle and health management:Functions:
  • Tile start/stop signaling
  • Heartbeat monitoring
  • State machine coordination
  • Diagnostic queries
State transitions:
BOOT → INIT → RUNNING → HALT → BOOT
         ↓       ↓
      ERROR    ERROR
Heartbeat:
  • Periodic “I’m alive” signal
  • Timeout detection
  • Automatic restart on failure

Tempo (tempo/)

Wallclock and timing utilities:Features:
  • Synchronized wallclock access
  • Timestamp compression/decompression
  • Temporal comparisons
  • Nanosecond precision
Timestamp compression:
// Compress 64-bit timestamp to 32 bits
ulong ts_comp = fd_frag_meta_ts_comp(ts);

// Decompress using reference timestamp
long ts = fd_frag_meta_ts_decomp(ts_comp, ts_ref);
Accuracy:
  • ±2.1 seconds from reference
  • Suitable for latency measurements
  • No clock synchronization required

Message Reassembly

Single-Fragment Messages

Simplest case (most common):
ctl = fd_frag_meta_ctl(origin, /*som*/1, /*eom*/1, /*err*/0);
// SOM and EOM both set → complete message in one fragment

Multi-Fragment Messages

// First fragment
ctl = fd_frag_meta_ctl(origin, /*som*/1, /*eom*/0, /*err*/0);

// Middle fragments
ctl = fd_frag_meta_ctl(origin, /*som*/0, /*eom*/0, /*err*/0);

// Last fragment
ctl = fd_frag_meta_ctl(origin, /*som*/0, /*eom*/1, /*err*/0);
Reassembly logic:
  1. Wait for SOM fragment from origin
  2. Accumulate fragments until EOM
  3. Check for sequence number gaps (overrun)
  4. If ERR bit set, discard entire message

Signature Filtering

Consumers can filter messages without reading payloads:
// Example: Route by protocol type
ulong sig = fd_disco_netmux_sig(hash, port, ip, proto, hdr_sz);

if (fd_disco_netmux_sig_proto(sig) == DST_PROTO_TPU_QUIC) {
  // Handle QUIC packets
} else {
  // Skip without reading dcache
}

Performance Characteristics

Atomic Operations

Tango uses x86 atomic instructions for lock-free updates:
// SSE (16-byte atomic)
__m128i sse0 = fd_frag_meta_seq_sig_query(meta);
// Reads seq and sig atomically (AVX CPUs)

// AVX (32-byte, not guaranteed atomic)
__m256i avx = meta->avx;
// Can hold entire metadata in one register

Memory Ordering

// Compiler fence (not CPU fence)
FD_COMPILER_MFENCE();

// Ensures compiler doesn't reorder loads/stores
// No runtime cost on x86 (TSO memory model)

Benchmarks (typical)

  • Latency: 20-50 ns (mcache read)
  • Throughput: 100M+ fragments/sec (single producer)
  • Scalability: Linear with # of producers
  • Overrun detection: Zero overhead

Constants & Limits

// Fragment metadata
#define FD_FRAG_META_ALIGN     (32UL)
#define FD_FRAG_META_SZ        (32UL)
#define FD_FRAG_META_ORIG_MAX  (8192UL)

// Chunk allocation
#define FD_CHUNK_ALIGN  (64UL)
#define FD_CHUNK_SZ     (64UL)

// Fragment size limits
#define FD_FRAG_SZ_MAX  (65535UL)  // 16-bit size field

Usage Example

// Publish a fragment
fd_frag_meta_t * meta = mcache + (seq % depth);
void * payload = fd_chunk_to_laddr(chunk0, chunk);

// Write payload
memcpy(payload, data, data_sz);

// Publish metadata atomically
meta->seq = seq;
meta->sig = signature;
meta->chunk = chunk;
meta->sz = (ushort)data_sz;
meta->ctl = fd_frag_meta_ctl(origin, som, eom, err);
meta->tsorig = fd_frag_meta_ts_comp(tsorig);
meta->tspub = fd_frag_meta_ts_comp(tspub);

seq++;

Workspace Integration

Tango objects live in Firedancer workspaces (wksp):
// Create mcache in workspace
fd_wksp_t * wksp = fd_wksp_attach("my_wksp");
void * mem = fd_wksp_alloc(wksp, fd_mcache_align(), fd_mcache_footprint(depth));
fd_mcache_t * mcache = fd_mcache_join(fd_mcache_new(mem, depth));
Benefits:
  • Huge page backed (2MB/1GB pages)
  • NUMA-aware allocation
  • Persistent across processes
  • Named objects for discovery
  • Standard UNIX permissions

Control Utility

Tango provides fd_tango_ctl for management and inspection:
# Inspect mcache
fd_tango_ctl mcache query <wksp>:<name>

# Monitor fseq
fd_tango_ctl fseq monitor <wksp>:<name>

# Check CNC health
fd_tango_ctl cnc status <wksp>:<name>

Header Files

#include "tango/fd_tango.h"       // Main header
#include "tango/fd_tango_base.h"  // Base types
#include "tango/mcache/fd_mcache.h"
#include "tango/dcache/fd_dcache.h"
#include "tango/fctl/fd_fctl.h"
#include "tango/fseq/fd_fseq.h"
#include "tango/cnc/fd_cnc.h"
  • Disco - Uses Tango for tile communication
  • Util - Workspace management (wksp)

Build docs developers (and LLMs) love