Architectural Overview
The X For You Feed Algorithm is built on a microservices architecture with clear separation of concerns. Each component has a specific responsibility and communicates through well-defined interfaces.
The system emphasizes composability , parallelism , and graceful degradation to achieve both performance and reliability.
Component Diagram
┌─────────────────────────────────────────────────────────────────────────────────────────────┐
│ CLIENT APPLICATION │
│ (Mobile/Web App) │
└────────────────────────────────────────┬────────────────────────────────────────────────────┘
│ gRPC Request
│ ScoredPostsQuery
▼
┌─────────────────────────────────────────────────────────────────────────────────────────────┐
│ HOME MIXER │
│ [Rust + gRPC Server] │
│ │
│ ┌──────────────────────────────────────────────────────────────────────────────────────┐ │
│ │ Phoenix Candidate Pipeline │ │
│ │ (Implements CandidatePipeline Trait) │ │
│ └──────────────────────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ┌────────────────────┼────────────────────┐ │
│ ▼ ▼ ▼ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Sources │ │ Hydrators │ │ Filters │ │
│ ├──────────────┤ ├──────────────┤ ├──────────────┤ │
│ │ Thunder │ │ CoreData │ │ Age │ │
│ │ Phoenix │ │ Gizmoduck │ │ Duplicates │ │
│ │ │ │ Video │ │ Socialgraph │ │
│ │ │ │ Subscription │ │ Keywords │ │
│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │
│ │ │ │ │
│ └───────────────────┴───────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────┐ │
│ │ Scorers │ │
│ ├──────────────────────┤ │
│ │ Phoenix Scorer │ │
│ │ Weighted Scorer │ │
│ │ Diversity Scorer │ │
│ └──────────┬───────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────┐ │
│ │ Selector │ │
│ │ (Sort by Score) │ │
│ └──────────┬───────────┘ │
└───────────────────────────────────────┼──────────────────────────────────────────────────────┘
│ ScoredPostsResponse
▼
┌─────────────────────┐
│ Ranked Feed │
│ [Top 50 Posts] │
└─────────────────────┘
┌────────────────────────────┐ ┌────────────────────────────┐ ┌────────────────────┐
│ THUNDER │ │ PHOENIX │ │ DATA STORES │
│ [Rust Service] │ │ [Python/JAX ML] │ │ │
│ │ │ │ │ • User Graphs │
│ ┌──────────────────────┐ │ │ ┌──────────────────────┐ │ │ • Post Content │
│ │ In-Memory Stores │ │ │ │ Retrieval Model │ │ │ • Media Assets │
│ │ - Per-user posts │ │ │ │ (Two-Tower) │ │ │ • Engagement DB │
│ │ - Indexed by author │ │ │ └──────────────────────┘ │ │ │
│ └──────────────────────┘ │ │ │ └────────────────────┘
│ ▲ │ │ ┌──────────────────────┐ │
│ │ │ │ │ Ranking Model │ │
│ ┌────────┴─────────────┐ │ │ │ (Grok Transformer) │ │
│ │ Kafka Consumer │ │ │ └──────────────────────┘ │
│ │ - Post events │ │ │ │
│ │ - Delete events │ │ └────────────────────────────┘
│ └──────────────────────┘ │
│ ▲ │
└───────────┼────────────────┘
│
┌────────┴─────────┐
│ KAFKA TOPICS │
│ - post.create │
│ - post.delete │
└──────────────────┘
Core Design Patterns
1. Candidate Pipeline Pattern
The Candidate Pipeline is a composable framework that defines a standard way to build recommendation systems.
Trait Architecture
Pipeline Execution
Benefits
The framework is built on Rust traits that define behavior contracts: // From candidate_pipeline.rs
#[async_trait]
pub trait Source < Q , C > : Send + Sync {
fn name ( & self ) -> & str ;
fn enable ( & self , query : & Q ) -> bool ;
async fn get_candidates ( & self , query : & Q ) -> Result < Vec < C >>;
}
#[async_trait]
pub trait Hydrator < Q , C > : Send + Sync {
fn name ( & self ) -> & str ;
fn enable ( & self , query : & Q ) -> bool ;
async fn hydrate ( & self , query : & Q , candidates : & [ C ]) -> Result < Vec < H >>;
fn update_all ( & self , candidates : & mut [ C ], hydrated : Vec < H >);
}
#[async_trait]
pub trait Filter < Q , C > : Send + Sync {
fn name ( & self ) -> & str ;
fn enable ( & self , query : & Q ) -> bool ;
async fn filter ( & self , query : & Q , candidates : Vec < C >)
-> Result < FilterResult < C >>;
}
#[async_trait]
pub trait Scorer < Q , C > : Send + Sync {
fn name ( & self ) -> & str ;
fn enable ( & self , query : & Q ) -> bool ;
async fn score ( & self , query : & Q , candidates : & [ C ]) -> Result < Vec < S >>;
fn update_all ( & self , candidates : & mut [ C ], scores : Vec < S >);
}
The pipeline orchestrates all components: // From candidate_pipeline.rs:53
async fn execute ( & self , query : Q ) -> PipelineResult < Q , C > {
// 1. Hydrate query (parallel)
let hydrated_query = self . hydrate_query ( query ) . await ;
// 2. Fetch candidates (parallel)
let candidates = self . fetch_candidates ( & hydrated_query ) . await ;
// 3. Hydrate candidates (parallel)
let hydrated_candidates = self . hydrate ( & hydrated_query , candidates ) . await ;
// 4. Filter (sequential)
let ( kept_candidates , filtered_candidates ) =
self . filter ( & hydrated_query , hydrated_candidates ) . await ;
// 5. Score (sequential)
let scored_candidates = self . score ( & hydrated_query , kept_candidates ) . await ;
// 6. Select (synchronous)
let selected_candidates = self . select ( & hydrated_query , scored_candidates );
// 7. Post-selection processing
let final_candidates = self
. filter_post_selection ( & hydrated_query , selected_candidates )
. await ;
// 8. Side effects (async, non-blocking)
self . run_side_effects ( Arc :: new ( SideEffectInput {
query : Arc :: new ( hydrated_query ),
selected_candidates : final_candidates . clone (),
}));
PipelineResult {
retrieved_candidates : hydrated_candidates ,
filtered_candidates ,
selected_candidates : final_candidates ,
query : Arc :: new ( hydrated_query ),
}
}
This pattern provides several key advantages:
Composability Add new sources, filters, or scorers without modifying the pipeline logic
Testability Each component can be tested independently with mocked dependencies
Observability Automatic logging and metrics for each pipeline stage
Error Isolation Failures in one component don’t crash the entire pipeline
2. Parallelism Strategy
The system maximizes throughput by parallelizing independent operations:
Query Hydrators: Parallel
Multiple hydrators fetch different aspects of user context simultaneously: let hydrate_futures = hydrators . iter () . map ( | h | h . hydrate ( & query ));
let results = join_all ( hydrate_futures ) . await ; // All run concurrently
Sources: Parallel
Thunder and Phoenix retrieval run simultaneously: let source_futures = sources . iter () . map ( | s | s . get_candidates ( query ));
let results = join_all ( source_futures ) . await ; // Thunder + Phoenix in parallel
Hydrators: Parallel
All candidate hydrators enrich data concurrently: let hydrate_futures = hydrators . iter () . map ( | h | h . hydrate ( query , & candidates ));
let results = join_all ( hydrate_futures ) . await ; // All hydrators in parallel
Filters: Sequential
Filters must run in order since each depends on the previous filter’s output
Scorers: Sequential
Scorers run in order since later scorers may modify scores from earlier ones
Why Sequential for Filters/Scorers? Each filter removes candidates, affecting the input to the next filter. Each scorer may adjust scores computed by previous scorers. Dependencies require sequential execution.
3. Candidate Isolation in Phoenix
One of the most important architectural decisions is candidate isolation during ranking.
Problem: If candidates could attend to each other during transformer inference, the score for Post A would depend on which other posts happen to be in the batch. This makes scores inconsistent and uncacheable.
Solution: Custom Attention Masking
# From recsys_model.py - attention mask creation
def create_candidate_isolation_mask (
batch_size : int ,
history_len : int ,
num_candidates : int
) -> Array:
"""
Creates attention mask where:
- User + History can attend to each other (bidirectional)
- Candidates can attend to User + History
- Candidates CANNOT attend to other candidates (only self)
"""
seq_len = 1 + history_len + num_candidates # user + history + candidates
mask = np.ones((seq_len, seq_len), dtype = np.float32)
# Block candidates from attending to other candidates
candidate_start = 1 + history_len
for i in range (num_candidates):
for j in range (num_candidates):
if i != j: # Allow self-attention
pos_i = candidate_start + i
pos_j = candidate_start + j
mask[pos_i, pos_j] = 0.0 # Block attention
return mask
Attention Pattern:
Keys (attend TO) →
┌─────┬─────────────┬──────────────────────┐
│ U │ History │ Candidates (C1-C4) │
┌────┼─────┼─────────────┼──────────────────────┤
│ U │ ✓ │ ✓ ✓ ✓ │ ✗ ✗ ✗ ✗ │
Q ├────┼─────┼─────────────┼──────────────────────┤
u │ H1 │ ✓ │ ✓ ✓ ✓ │ ✗ ✗ ✗ ✗ │
e │ H2 │ ✓ │ ✓ ✓ ✓ │ ✗ ✗ ✗ ✗ │
r │ H3 │ ✓ │ ✓ ✓ ✓ │ ✗ ✗ ✗ ✗ │
i ├────┼─────┼─────────────┼──────────────────────┤
e │ C1 │ ✓ │ ✓ ✓ ✓ │ ✓ ✗ ✗ ✗ │ ← Can only self-attend
s │ C2 │ ✓ │ ✓ ✓ ✓ │ ✗ ✓ ✗ ✗ │
↓ │ C3 │ ✓ │ ✓ ✓ ✓ │ ✗ ✗ ✓ ✗ │
│ C4 │ ✓ │ ✓ ✓ ✓ │ ✗ ✗ ✗ ✓ │
└────┴─────┴─────────────┴──────────────────────┘
This isolation means we can pre-compute and cache candidate scores for specific user contexts, dramatically improving inference efficiency.
4. Hash-Based Embeddings
Both retrieval and ranking models use hash-based embeddings instead of traditional vocabulary lookup tables.
# From recsys_model.py - hash embedding
class HashConfig :
num_user_hashes: int = 2 # Multiple hash functions per feature
num_item_hashes: int = 2
num_author_hashes: int = 2
def embed_with_hashing ( feature_id : int , num_hashes : int , emb_size : int ) -> Array:
"""
Apply multiple hash functions and average the embeddings.
This reduces collision effects and improves representation quality.
"""
embeddings = []
for hash_idx in range (num_hashes):
# Different hash function per index
hash_value = hash_function(feature_id, hash_idx) % HASH_TABLE_SIZE
embedding = embedding_table[hash_value] # Lookup
embeddings.append(embedding)
# Average across hash functions
return np.mean(embeddings, axis = 0 )
Why Hashing?
Constant Memory Embedding table size is fixed regardless of vocabulary size (users, posts, authors)
No OOV Problem New users/posts/authors work immediately without retraining
Collision Mitigation Multiple hash functions reduce collision impact through averaging
Sparse Features Efficiently handles high-cardinality features like user IDs
The Phoenix ranking model uses the Grok transformer architecture adapted for recommendations:
# From recsys_model.py - model configuration
recsys_model = PhoenixModelConfig(
emb_size = 128 ,
num_actions = 14 , # Multi-action prediction
history_seq_len = 32 ,
candidate_seq_len = 8 ,
hash_config = HashConfig(
num_user_hashes = 2 ,
num_item_hashes = 2 ,
num_author_hashes = 2 ,
),
product_surface_vocab_size = 16 ,
model = TransformerConfig(
emb_size = 128 ,
widening_factor = 2 ,
key_size = 64 ,
num_q_heads = 2 ,
num_kv_heads = 2 ,
num_layers = 2 ,
attn_output_multiplier = 0.125 ,
),
)
Key Adaptations from Grok:
Multi-Action Output Heads
Rather than next-token prediction, outputs probabilities for 14 engagement actions: # Output projection
logits = model(input_embeddings, attention_mask)
# Shape: [batch_size, num_candidates, num_actions]
probabilities = sigmoid(logits)
# Each candidate gets 14 probability scores
Candidate Isolation Masking
Custom attention mask ensures candidates can’t attend to each other (see section above)
Input sequence format: [USER] [H1] [H2] ... [H_n] [C1] [C2] ... [C_m]
Where:
USER = User context embedding
H_i = History items (posts user engaged with)
C_i = Candidate items to rank
Data Flow Diagram
REQUEST FLOW:
1. Client Request
│
├─► [gRPC] ScoredPostsQuery
│ - viewer_id
│ - max_results
│ - filters
│
▼
2. Home Mixer
│
├─► Query Hydration (parallel)
│ ├─► Fetch engagement history from DB
│ ├─► Fetch following list from Graph Service
│ └─► Fetch user preferences
│
├─► Candidate Sourcing (parallel)
│ ├─► Thunder: Get in-network posts
│ │ └─► In-memory lookup by author IDs
│ │
│ └─► Phoenix Retrieval: Get out-of-network posts
│ ├─► Encode user context → embedding
│ ├─► ANN search over candidate embeddings
│ └─► Return top-K similar posts
│
├─► Candidate Hydration (parallel)
│ ├─► CoreData: Post text, media, timestamps
│ ├─► Gizmoduck: Author profiles
│ ├─► Video: Durations and thumbnails
│ └─► Subscription: Access eligibility
│
├─► Pre-Scoring Filters (sequential)
│ ├─► Remove duplicates
│ ├─► Remove old posts
│ ├─► Remove blocked authors
│ └─► Remove seen posts
│
├─► Scoring (sequential)
│ ├─► Phoenix Scorer
│ │ ├─► Batch candidates
│ │ ├─► Call Phoenix ranking model
│ │ └─► Get 14 action probabilities per candidate
│ │
│ ├─► Weighted Scorer
│ │ └─► Combine probabilities into final score
│ │
│ └─► Diversity Scorer
│ └─► Penalize repeated authors
│
├─► Selection
│ └─► Sort by score, take top K
│
├─► Post-Selection Filters (sequential)
│ ├─► Visibility filtering (spam, deleted, etc.)
│ └─► Conversation deduplication
│
└─► Response
└─► [gRPC] ScoredPostsResponse
- ranked post IDs
- scores
- metadata
Deployment Architecture
Service Topology
Scaling Strategy
Monitoring
┌─────────────────────────────────────────────────────┐
│ LOAD BALANCER │
└────────────────────┬────────────────────────────────┘
│
┌───────────────┼───────────────┐
│ │ │
▼ ▼ ▼
┌─────────┐ ┌─────────┐ ┌─────────┐
│ Home │ │ Home │ │ Home │
│ Mixer │ │ Mixer │ │ Mixer │
│ Instance│ │ Instance│ │ Instance│
└────┬────┘ └────┬────┘ └────┬────┘
│ │ │
└──────────────┼──────────────┘
│
┌──────────────┼──────────────┐
│ │ │
▼ ▼ ▼
┌─────────┐ ┌─────────┐ ┌──────────┐
│ Thunder │ │ Phoenix │ │ Data │
│ Service │ │ ML │ │ Stores │
└─────────┘ └─────────┘ └──────────┘
Horizontal Scaling
Add more Home Mixer instances behind the load balancer
Stateless design enables easy scaling
Each instance handles full request pipeline independently
Thunder Sharding
Shard Thunder by user ID ranges
Each shard maintains posts for a subset of authors
Home Mixer fans out to relevant shards
Phoenix Model Serving
Deploy Phoenix models with GPU acceleration
Batch inference requests for efficiency
Use model serving framework (TorchServe, Triton, etc.)
Caching Layer
Cache frequently accessed data
User engagement history
Author profiles
Phoenix predictions for popular posts
Key metrics to track:
Latency Percentiles
p50, p95, p99 request latency
Per-stage latency breakdown
Error Rates
Overall error rate
Per-component failure rate
Candidate Metrics
Candidates retrieved per source
Candidates filtered at each stage
Final selection count
Model Performance
Phoenix inference latency
Batch sizes
GPU utilization
Technology Choices
Why Rust for Backend?
Performance Zero-cost abstractions and no garbage collection pauses
Memory Safety Compile-time guarantees prevent crashes and data races
Async Runtime Tokio provides efficient async I/O for high concurrency
Type Safety Strong typing catches bugs at compile time
Why JAX for ML?
Performance JIT compilation and XLA backend for optimized execution
Functional API Pure functions enable easy testing and debugging
Autodiff Automatic differentiation for training
Scaling Easy parallelization across GPUs/TPUs
Why gRPC?
gRPC provides efficient binary serialization (Protocol Buffers), bidirectional streaming, and strong typing through service definitions.
// Service definition
service ScoredPostsService {
rpc GetScoredPosts ( ScoredPostsQuery ) returns ( ScoredPostsResponse );
}
message ScoredPostsQuery {
uint64 viewer_id = 1 ;
string client_app_id = 2 ;
string country_code = 3 ;
repeated uint64 seen_ids = 4 ;
bool in_network_only = 5 ;
}
message ScoredPostsResponse {
repeated ScoredPost scored_posts = 1 ;
}
Configuration Management
The system uses a multi-layer configuration approach:
Default Configuration
Hard-coded defaults in the codebase pub mod params {
pub const MAX_GRPC_MESSAGE_SIZE : usize = 10 * 1024 * 1024 ; // 10MB
pub const DEFAULT_RESULT_SIZE : usize = 50 ;
pub const DEFAULT_HISTORY_LENGTH : usize = 32 ;
}
Command-Line Arguments
Override defaults at startup #[derive( Parser , Debug )]
struct Args {
#[arg(long)]
grpc_port : u16 ,
#[arg(long)]
metrics_port : u16 ,
}
Dynamic Configuration
Runtime configuration changes through config service
Scoring weights
Filter thresholds
Model endpoints
Next Steps
Phoenix Deep Dive Learn about the retrieval and ranking models in detail
Deployment Guide Set up the system in your infrastructure
Customization Add custom sources, filters, and scorers
Performance Tuning Optimize latency and throughput