System Architecture - X For You Feed Algorithm

Architectural Overview

The X For You Feed Algorithm is built on a microservices architecture with clear separation of concerns. Each component has a specific responsibility and communicates through well-defined interfaces.

The system emphasizes composability, parallelism, and graceful degradation to achieve both performance and reliability.

Component Diagram

┌─────────────────────────────────────────────────────────────────────────────────────────────┐
│                                    CLIENT APPLICATION                                       │
│                                    (Mobile/Web App)                                         │
└────────────────────────────────────────┬────────────────────────────────────────────────────┘
                                         │ gRPC Request
                                         │ ScoredPostsQuery
                                         ▼
┌─────────────────────────────────────────────────────────────────────────────────────────────┐
│                                      HOME MIXER                                             │
│                                    [Rust + gRPC Server]                                     │
│                                                                                             │
│  ┌──────────────────────────────────────────────────────────────────────────────────────┐  │
│  │                           Phoenix Candidate Pipeline                                 │  │
│  │                         (Implements CandidatePipeline Trait)                         │  │
│  └──────────────────────────────────────────────────────────────────────────────────────┘  │
│                                         │                                                   │
│                    ┌────────────────────┼────────────────────┐                              │
│                    ▼                    ▼                    ▼                              │
│            ┌──────────────┐    ┌──────────────┐    ┌──────────────┐                        │
│            │   Sources    │    │  Hydrators   │    │   Filters    │                        │
│            ├──────────────┤    ├──────────────┤    ├──────────────┤                        │
│            │ Thunder      │    │ CoreData     │    │ Age          │                        │
│            │ Phoenix      │    │ Gizmoduck    │    │ Duplicates   │                        │
│            │              │    │ Video        │    │ Socialgraph  │                        │
│            │              │    │ Subscription │    │ Keywords     │                        │
│            └──────┬───────┘    └──────┬───────┘    └──────┬───────┘                        │
│                   │                   │                   │                                 │
│                   └───────────────────┴───────────────────┘                                 │
│                                       │                                                     │
│                                       ▼                                                     │
│                           ┌──────────────────────┐                                          │
│                           │      Scorers         │                                          │
│                           ├──────────────────────┤                                          │
│                           │ Phoenix Scorer       │                                          │
│                           │ Weighted Scorer      │                                          │
│                           │ Diversity Scorer     │                                          │
│                           └──────────┬───────────┘                                          │
│                                      │                                                      │
│                                      ▼                                                      │
│                           ┌──────────────────────┐                                          │
│                           │      Selector        │                                          │
│                           │  (Sort by Score)     │                                          │
│                           └──────────┬───────────┘                                          │
└───────────────────────────────────────┼──────────────────────────────────────────────────────┘
                                        │ ScoredPostsResponse
                                        ▼
                             ┌─────────────────────┐
                             │  Ranked Feed        │
                             │  [Top 50 Posts]     │
                             └─────────────────────┘


┌────────────────────────────┐     ┌────────────────────────────┐     ┌────────────────────┐
│         THUNDER            │     │         PHOENIX            │     │   DATA STORES      │
│    [Rust Service]          │     │    [Python/JAX ML]         │     │                    │
│                            │     │                            │     │  • User Graphs     │
│  ┌──────────────────────┐  │     │  ┌──────────────────────┐  │     │  • Post Content    │
│  │  In-Memory Stores    │  │     │  │  Retrieval Model     │  │     │  • Media Assets    │
│  │  - Per-user posts    │  │     │  │  (Two-Tower)         │  │     │  • Engagement DB   │
│  │  - Indexed by author │  │     │  └──────────────────────┘  │     │                    │
│  └──────────────────────┘  │     │                            │     └────────────────────┘
│           ▲                │     │  ┌──────────────────────┐  │
│           │                │     │  │  Ranking Model       │  │
│  ┌────────┴─────────────┐  │     │  │  (Grok Transformer)  │  │
│  │  Kafka Consumer      │  │     │  └──────────────────────┘  │
│  │  - Post events       │  │     │                            │
│  │  - Delete events     │  │     └────────────────────────────┘
│  └──────────────────────┘  │
│           ▲                │
└───────────┼────────────────┘
            │
   ┌────────┴─────────┐
   │  KAFKA TOPICS    │
   │  - post.create   │
   │  - post.delete   │
   └──────────────────┘

Core Design Patterns

1. Candidate Pipeline Pattern

The Candidate Pipeline is a composable framework that defines a standard way to build recommendation systems.

Trait Architecture
Pipeline Execution
Benefits

The framework is built on Rust traits that define behavior contracts:

// From candidate_pipeline.rs

#[async_trait]
pub trait Source<Q, C>: Send + Sync {
    fn name(&self) -> &str;
    fn enable(&self, query: &Q) -> bool;
    async fn get_candidates(&self, query: &Q) -> Result<Vec<C>>;
}

#[async_trait]
pub trait Hydrator<Q, C>: Send + Sync {
    fn name(&self) -> &str;
    fn enable(&self, query: &Q) -> bool;
    async fn hydrate(&self, query: &Q, candidates: &[C]) -> Result<Vec<H>>;
    fn update_all(&self, candidates: &mut [C], hydrated: Vec<H>);
}

#[async_trait]
pub trait Filter<Q, C>: Send + Sync {
    fn name(&self) -> &str;
    fn enable(&self, query: &Q) -> bool;
    async fn filter(&self, query: &Q, candidates: Vec<C>) 
        -> Result<FilterResult<C>>;
}

#[async_trait]
pub trait Scorer<Q, C>: Send + Sync {
    fn name(&self) -> &str;
    fn enable(&self, query: &Q) -> bool;
    async fn score(&self, query: &Q, candidates: &[C]) -> Result<Vec<S>>;
    fn update_all(&self, candidates: &mut [C], scores: Vec<S>);
}

The pipeline orchestrates all components:

// From candidate_pipeline.rs:53
async fn execute(&self, query: Q) -> PipelineResult<Q, C> {
    // 1. Hydrate query (parallel)
    let hydrated_query = self.hydrate_query(query).await;
    
    // 2. Fetch candidates (parallel)
    let candidates = self.fetch_candidates(&hydrated_query).await;
    
    // 3. Hydrate candidates (parallel)
    let hydrated_candidates = self.hydrate(&hydrated_query, candidates).await;
    
    // 4. Filter (sequential)
    let (kept_candidates, filtered_candidates) = 
        self.filter(&hydrated_query, hydrated_candidates).await;
    
    // 5. Score (sequential)
    let scored_candidates = self.score(&hydrated_query, kept_candidates).await;
    
    // 6. Select (synchronous)
    let selected_candidates = self.select(&hydrated_query, scored_candidates);
    
    // 7. Post-selection processing
    let final_candidates = self
        .filter_post_selection(&hydrated_query, selected_candidates)
        .await;
    
    // 8. Side effects (async, non-blocking)
    self.run_side_effects(Arc::new(SideEffectInput {
        query: Arc::new(hydrated_query),
        selected_candidates: final_candidates.clone(),
    }));
    
    PipelineResult {
        retrieved_candidates: hydrated_candidates,
        filtered_candidates,
        selected_candidates: final_candidates,
        query: Arc::new(hydrated_query),
    }
}

2. Parallelism Strategy

The system maximizes throughput by parallelizing independent operations:

Query Hydrators: Parallel

Multiple hydrators fetch different aspects of user context simultaneously:

let hydrate_futures = hydrators.iter().map(|h| h.hydrate(&query));
let results = join_all(hydrate_futures).await; // All run concurrently

Sources: Parallel

Thunder and Phoenix retrieval run simultaneously:

let source_futures = sources.iter().map(|s| s.get_candidates(query));
let results = join_all(source_futures).await; // Thunder + Phoenix in parallel

Hydrators: Parallel

All candidate hydrators enrich data concurrently:

let hydrate_futures = hydrators.iter().map(|h| h.hydrate(query, &candidates));
let results = join_all(hydrate_futures).await; // All hydrators in parallel

Filters: Sequential

Filters must run in order since each depends on the previous filter’s output

Scorers: Sequential

Scorers run in order since later scorers may modify scores from earlier ones

Why Sequential for Filters/Scorers? Each filter removes candidates, affecting the input to the next filter. Each scorer may adjust scores computed by previous scorers. Dependencies require sequential execution.

3. Candidate Isolation in Phoenix

One of the most important architectural decisions is candidate isolation during ranking.

Problem: If candidates could attend to each other during transformer inference, the score for Post A would depend on which other posts happen to be in the batch. This makes scores inconsistent and uncacheable.

Solution: Custom Attention Masking

# From recsys_model.py - attention mask creation

def create_candidate_isolation_mask(
    batch_size: int,
    history_len: int, 
    num_candidates: int
) -> Array:
    """
    Creates attention mask where:
    - User + History can attend to each other (bidirectional)
    - Candidates can attend to User + History
    - Candidates CANNOT attend to other candidates (only self)
    """
    seq_len = 1 + history_len + num_candidates  # user + history + candidates
    mask = np.ones((seq_len, seq_len), dtype=np.float32)
    
    # Block candidates from attending to other candidates
    candidate_start = 1 + history_len
    for i in range(num_candidates):
        for j in range(num_candidates):
            if i != j:  # Allow self-attention
                pos_i = candidate_start + i
                pos_j = candidate_start + j
                mask[pos_i, pos_j] = 0.0  # Block attention
    
    return mask

Attention Pattern:

        Keys (attend TO) →
        ┌─────┬─────────────┬──────────────────────┐
        │ U   │  History    │  Candidates (C1-C4)  │
   ┌────┼─────┼─────────────┼──────────────────────┤
   │ U  │  ✓  │  ✓  ✓  ✓    │  ✗   ✗   ✗   ✗       │
 Q  ├────┼─────┼─────────────┼──────────────────────┤
 u  │ H1 │  ✓  │  ✓  ✓  ✓    │  ✗   ✗   ✗   ✗       │
 e  │ H2 │  ✓  │  ✓  ✓  ✓    │  ✗   ✗   ✗   ✗       │
 r  │ H3 │  ✓  │  ✓  ✓  ✓    │  ✗   ✗   ✗   ✗       │
 i  ├────┼─────┼─────────────┼──────────────────────┤
 e  │ C1 │  ✓  │  ✓  ✓  ✓    │  ✓   ✗   ✗   ✗       │ ← Can only self-attend
 s  │ C2 │  ✓  │  ✓  ✓  ✓    │  ✗   ✓   ✗   ✗       │
 ↓  │ C3 │  ✓  │  ✓  ✓  ✓    │  ✗   ✗   ✓   ✗       │
    │ C4 │  ✓  │  ✓  ✓  ✓    │  ✗   ✗   ✗   ✓       │
    └────┴─────┴─────────────┴──────────────────────┘

This isolation means we can pre-compute and cache candidate scores for specific user contexts, dramatically improving inference efficiency.

4. Hash-Based Embeddings

Both retrieval and ranking models use hash-based embeddings instead of traditional vocabulary lookup tables.

# From recsys_model.py - hash embedding

class HashConfig:
    num_user_hashes: int = 2    # Multiple hash functions per feature
    num_item_hashes: int = 2
    num_author_hashes: int = 2

def embed_with_hashing(feature_id: int, num_hashes: int, emb_size: int) -> Array:
    """
    Apply multiple hash functions and average the embeddings.
    This reduces collision effects and improves representation quality.
    """
    embeddings = []
    for hash_idx in range(num_hashes):
        # Different hash function per index
        hash_value = hash_function(feature_id, hash_idx) % HASH_TABLE_SIZE
        embedding = embedding_table[hash_value]  # Lookup
        embeddings.append(embedding)
    
    # Average across hash functions
    return np.mean(embeddings, axis=0)

Why Hashing?

Constant Memory

Embedding table size is fixed regardless of vocabulary size (users, posts, authors)

No OOV Problem

New users/posts/authors work immediately without retraining

Collision Mitigation

Multiple hash functions reduce collision impact through averaging

Sparse Features

Efficiently handles high-cardinality features like user IDs

5. Grok-Based Transformer

The Phoenix ranking model uses the Grok transformer architecture adapted for recommendations:

# From recsys_model.py - model configuration

recsys_model = PhoenixModelConfig(
    emb_size=128,
    num_actions=14,  # Multi-action prediction
    history_seq_len=32,
    candidate_seq_len=8,
    hash_config=HashConfig(
        num_user_hashes=2,
        num_item_hashes=2,
        num_author_hashes=2,
    ),
    product_surface_vocab_size=16,
    model=TransformerConfig(
        emb_size=128,
        widening_factor=2,
        key_size=64,
        num_q_heads=2,
        num_kv_heads=2,
        num_layers=2,
        attn_output_multiplier=0.125,
    ),
)

Key Adaptations from Grok:

Custom Input Embeddings

Instead of text tokens, the model embeds:

User hash features
Post hash features
Author hash features
Action type (for history)
Product surface (mobile, web, etc.)

Multi-Action Output Heads

Rather than next-token prediction, outputs probabilities for 14 engagement actions:

# Output projection
logits = model(input_embeddings, attention_mask)
# Shape: [batch_size, num_candidates, num_actions]

probabilities = sigmoid(logits)
# Each candidate gets 14 probability scores

Candidate Isolation Masking

Custom attention mask ensures candidates can’t attend to each other (see section above)

Sequence Structure

Input sequence format:

[USER] [H1] [H2] ... [H_n] [C1] [C2] ... [C_m]

Where:

USER = User context embedding
H_i = History items (posts user engaged with)
C_i = Candidate items to rank

Data Flow Diagram

REQUEST FLOW:

1. Client Request
   │
   ├─► [gRPC] ScoredPostsQuery
   │           - viewer_id
   │           - max_results
   │           - filters
   │
   ▼
2. Home Mixer
   │
   ├─► Query Hydration (parallel)
   │   ├─► Fetch engagement history from DB
   │   ├─► Fetch following list from Graph Service
   │   └─► Fetch user preferences
   │
   ├─► Candidate Sourcing (parallel)
   │   ├─► Thunder: Get in-network posts
   │   │   └─► In-memory lookup by author IDs
   │   │
   │   └─► Phoenix Retrieval: Get out-of-network posts
   │       ├─► Encode user context → embedding
   │       ├─► ANN search over candidate embeddings
   │       └─► Return top-K similar posts
   │
   ├─► Candidate Hydration (parallel)
   │   ├─► CoreData: Post text, media, timestamps
   │   ├─► Gizmoduck: Author profiles
   │   ├─► Video: Durations and thumbnails
   │   └─► Subscription: Access eligibility
   │
   ├─► Pre-Scoring Filters (sequential)
   │   ├─► Remove duplicates
   │   ├─► Remove old posts
   │   ├─► Remove blocked authors
   │   └─► Remove seen posts
   │
   ├─► Scoring (sequential)
   │   ├─► Phoenix Scorer
   │   │   ├─► Batch candidates
   │   │   ├─► Call Phoenix ranking model
   │   │   └─► Get 14 action probabilities per candidate
   │   │
   │   ├─► Weighted Scorer
   │   │   └─► Combine probabilities into final score
   │   │
   │   └─► Diversity Scorer
   │       └─► Penalize repeated authors
   │
   ├─► Selection
   │   └─► Sort by score, take top K
   │
   ├─► Post-Selection Filters (sequential)
   │   ├─► Visibility filtering (spam, deleted, etc.)
   │   └─► Conversation deduplication
   │
   └─► Response
       └─► [gRPC] ScoredPostsResponse
                   - ranked post IDs
                   - scores
                   - metadata

Deployment Architecture

Service Topology
Scaling Strategy
Monitoring

┌─────────────────────────────────────────────────────┐
│                  LOAD BALANCER                      │
└────────────────────┬────────────────────────────────┘
                     │
     ┌───────────────┼───────────────┐
     │               │               │
     ▼               ▼               ▼
┌─────────┐    ┌─────────┐    ┌─────────┐
│  Home   │    │  Home   │    │  Home   │
│  Mixer  │    │  Mixer  │    │  Mixer  │
│ Instance│    │ Instance│    │ Instance│
└────┬────┘    └────┬────┘    └────┬────┘
     │              │              │
     └──────────────┼──────────────┘
                    │
     ┌──────────────┼──────────────┐
     │              │              │
     ▼              ▼              ▼
┌─────────┐    ┌─────────┐   ┌──────────┐
│ Thunder │    │ Phoenix │   │   Data   │
│ Service │    │   ML    │   │  Stores  │
└─────────┘    └─────────┘   └──────────┘

Technology Choices

Why Rust for Backend?

Performance

Zero-cost abstractions and no garbage collection pauses

Memory Safety

Compile-time guarantees prevent crashes and data races

Async Runtime

Tokio provides efficient async I/O for high concurrency

Type Safety

Strong typing catches bugs at compile time

Why JAX for ML?

Performance

JIT compilation and XLA backend for optimized execution

Functional API

Pure functions enable easy testing and debugging

Autodiff

Automatic differentiation for training

Scaling

Easy parallelization across GPUs/TPUs

Why gRPC?

gRPC provides efficient binary serialization (Protocol Buffers), bidirectional streaming, and strong typing through service definitions.

// Service definition
service ScoredPostsService {
  rpc GetScoredPosts(ScoredPostsQuery) returns (ScoredPostsResponse);
}

message ScoredPostsQuery {
  uint64 viewer_id = 1;
  string client_app_id = 2;
  string country_code = 3;
  repeated uint64 seen_ids = 4;
  bool in_network_only = 5;
}

message ScoredPostsResponse {
  repeated ScoredPost scored_posts = 1;
}

Configuration Management

The system uses a multi-layer configuration approach:

Default Configuration

Hard-coded defaults in the codebase

pub mod params {
    pub const MAX_GRPC_MESSAGE_SIZE: usize = 10 * 1024 * 1024; // 10MB
    pub const DEFAULT_RESULT_SIZE: usize = 50;
    pub const DEFAULT_HISTORY_LENGTH: usize = 32;
}

Command-Line Arguments

Override defaults at startup

#[derive(Parser, Debug)]
struct Args {
    #[arg(long)]
    grpc_port: u16,
    #[arg(long)]
    metrics_port: u16,
}

Dynamic Configuration

Runtime configuration changes through config service

Scoring weights
Filter thresholds
Model endpoints

Next Steps

Phoenix Deep Dive

Learn about the retrieval and ranking models in detail

Deployment Guide

Set up the system in your infrastructure

Customization

Add custom sources, filters, and scorers

Performance Tuning

Optimize latency and throughput

Getting Started

Core Components

How It Works

Phoenix ML System

Implementation

Key Concepts

​Architectural Overview

​Component Diagram

​Core Design Patterns

​1. Candidate Pipeline Pattern

Composability

Testability

Observability

Error Isolation

​2. Parallelism Strategy

​3. Candidate Isolation in Phoenix

​4. Hash-Based Embeddings

Constant Memory

No OOV Problem

Collision Mitigation

Sparse Features

​5. Grok-Based Transformer

​Data Flow Diagram

​Deployment Architecture

Latency Percentiles

Error Rates

Candidate Metrics

Model Performance

​Technology Choices

​Why Rust for Backend?

Performance

Memory Safety

Async Runtime

Type Safety

​Why JAX for ML?

Performance

Functional API

Autodiff

Scaling

​Why gRPC?

​Configuration Management

​Next Steps

Phoenix Deep Dive

Deployment Guide

Customization

Performance Tuning

Build docs developers (and LLMs) love

Architectural Overview

Component Diagram

Core Design Patterns

1. Candidate Pipeline Pattern

2. Parallelism Strategy

3. Candidate Isolation in Phoenix

4. Hash-Based Embeddings

5. Grok-Based Transformer

Data Flow Diagram

Deployment Architecture

Technology Choices

Why Rust for Backend?

Why JAX for ML?

Why gRPC?

Configuration Management

Next Steps