Pipeline Stages

Overview

The For You feed algorithm processes every request through a sequence of well-defined stages. Each stage transforms the data, enriching candidates with additional information, filtering out ineligible content, and ultimately producing a ranked list of posts.

Stage Flow

The pipeline executes the following stages in order:

┌─────────────────────────┐
│   1. Query Hydration    │  Fetch user context and engagement history
└───────────┬─────────────┘
            │
            ▼
┌─────────────────────────┐
│   2. Candidate Sourcing │  Retrieve candidates from Thunder and Phoenix
└───────────┬─────────────┘
            │
            ▼
┌─────────────────────────┐
│  3. Candidate Hydration │  Enrich candidates with metadata
└───────────┬─────────────┘
            │
            ▼
┌─────────────────────────┐
│  4. Pre-Scoring Filters │  Remove ineligible candidates
└───────────┬─────────────┘
            │
            ▼
┌─────────────────────────┐
│      5. Scoring         │  Predict engagement and compute scores
└───────────┬─────────────┘
            │
            ▼
┌─────────────────────────┐
│      6. Selection       │  Sort by score and select top K
└───────────┬─────────────┘
            │
            ▼
┌─────────────────────────┐
│ 7. Post-Selection       │  Final validation and visibility filtering
│    Filtering            │
└───────────┬─────────────┘
            │
            ▼
┌─────────────────────────┐
│   Ranked Feed Response  │
└─────────────────────────┘

1. Query Hydration

Purpose: Load user context required for personalized recommendations.

What Gets Hydrated

User Action Sequence: Recent engagement history (likes, reposts, replies, clicks, etc.)
User Features: Following list, blocked/muted accounts, muted keywords, preferences
Bloom Filters: Efficient data structure for previously seen posts
Seen IDs: Posts the user has explicitly seen in recent sessions

This stage runs before candidate sourcing because both Thunder and Phoenix need user context to retrieve relevant candidates.

2. Candidate Sourcing

Candidates are retrieved from two parallel sources:

Thunder (In-Network)

Retrieves recent posts from accounts the user follows:

Original posts
Replies
Reposts
Video posts

Thunder maintains an in-memory store for sub-millisecond lookups.

Phoenix Retrieval (Out-of-Network)

Uses a two-tower ML model to find relevant posts from the global corpus:

User Tower: Encodes user engagement history into an embedding
Candidate Tower: Encodes all posts into embeddings
Similarity Search: Retrieves top-K posts via dot product similarity

See Phoenix Retrieval for detailed architecture.

3. Candidate Hydration

Purpose: Enrich candidates with additional metadata needed for filtering and scoring.

Hydrated Data

Core Post Data: Text, media URLs, timestamps
Author Information: Username, display name, verification status
Video Duration: For video posts (used in scoring eligibility)
Subscription Status: Whether the post requires paid subscription
Visibility Information: Safety labels, spam detection results

Hydrators run in parallel where possible for optimal performance.

4. Pre-Scoring Filters

Purpose: Remove candidates that should never be scored or shown. Filters run sequentially. See Filtering for complete details on each filter. Key filters include:

Duplicate removal
Age filtering
Self-post removal
Blocked/muted author filtering
Previously seen post filtering

5. Scoring

Purpose: Predict user engagement and compute relevance scores. Scorers apply sequentially:

Phoenix Scorer

The Grok-based transformer model predicts probabilities for multiple engagement types:

P(favorite), P(reply), P(repost)
P(click), P(profile_click), P(video_view)
P(share), P(dwell), P(follow_author)
P(not_interested), P(block_author), P(mute_author), P(report)

Weighted Scorer

Combines predictions into a single relevance score using weighted formula:

Final Score = Σ (weight_i × P(action_i))

Positive actions have positive weights, negative actions have negative weights.

Author Diversity Scorer

Attenuates scores for repeated authors to ensure feed diversity:

multiplier = (1 - floor) × decay^position + floor

The first post from an author gets full score, subsequent posts are attenuated.

OON Scorer

Adjusts out-of-network content scores by a configurable weight factor to balance in-network and out-of-network content.

See Scoring and Ranking for detailed formulas and implementation.

6. Selection

Purpose: Sort candidates by final score and select the top K. The selector:

Sorts all scored candidates in descending order by score
Selects the top K candidates (typically 100-500 depending on request parameters)
Passes selected candidates to post-selection filters

7. Post-Selection Filtering

Purpose: Final validation before serving to the user. Post-selection filters include:

Visibility Filter (VF)

Removes posts that are deleted, spam, violence, gore, or otherwise violate content policies.

Conversation Deduplication

Deduplicates multiple branches of the same conversation thread to avoid repetitive content.

These filters run after selection because they may require expensive API calls or database lookups.

Side Effects

After the main pipeline completes, side effects run asynchronously:

Cache request information for future use
Log served candidates for downstream analytics
Update Bloom filters with newly served post IDs

Side effects never block the response to the user.

Performance Characteristics

The entire pipeline typically completes in 50-150ms from request to response.

Query Hydration: 5-10ms
Candidate Sourcing: 10-30ms (parallel execution)
Candidate Hydration: 15-40ms (parallel execution)
Filters: 5-15ms (sequential)
Scoring: 20-50ms (Phoenix model inference)
Selection & Post-Filters: 5-10ms

Implementation

The pipeline is implemented using the candidate-pipeline framework, which provides traits for each stage:

pub trait Source<Q, C> {
    async fn get(&self, query: &Q) -> Result<Vec<C>, String>;
}

pub trait Hydrator<Q, C> {
    async fn hydrate(&self, query: &Q, candidates: Vec<C>) -> Result<Vec<C>, String>;
}

pub trait Filter<Q, C> {
    async fn filter(&self, query: &Q, candidates: Vec<C>) -> Result<FilterResult<C>, String>;
}

pub trait Scorer<Q, C> {
    async fn score(&self, query: &Q, candidates: &[C]) -> Result<Vec<C>, String>;
}

pub trait Selector<Q, C> {
    async fn select(&self, query: &Q, candidates: Vec<C>) -> Result<Vec<C>, String>;
}

This architecture enables:

Separation of concerns: Each stage has a single responsibility
Parallel execution: Independent operations run concurrently
Graceful error handling: Failures in non-critical stages don’t crash the pipeline
Easy testing: Each stage can be tested in isolation

Scoring and Ranking

Deep dive into the Phoenix model and weighted scoring formula

Filtering

Complete reference of all pre-scoring and post-selection filters

Phoenix Architecture

Transformer architecture with candidate isolation

Thunder

In-memory post store for in-network candidates

Getting Started

Core Components

How It Works

Phoenix ML System

Implementation

Key Concepts

Overview

Stage Flow

1. Query Hydration

What Gets Hydrated

2. Candidate Sourcing

Thunder (In-Network)

Phoenix Retrieval (Out-of-Network)

3. Candidate Hydration

Hydrated Data

4. Pre-Scoring Filters

5. Scoring

6. Selection

7. Post-Selection Filtering

Visibility Filter (VF)

Conversation Deduplication

Side Effects

Performance Characteristics

Implementation

Scoring and Ranking

Filtering

Phoenix Architecture

Thunder

Build docs developers (and LLMs) love

Getting Started

Core Components

How It Works

Phoenix ML System

Implementation

Key Concepts

​Overview

​Stage Flow

​1. Query Hydration

​What Gets Hydrated

​2. Candidate Sourcing

​Thunder (In-Network)

​Phoenix Retrieval (Out-of-Network)

​3. Candidate Hydration

​Hydrated Data

​4. Pre-Scoring Filters

​5. Scoring

​6. Selection

​7. Post-Selection Filtering

Visibility Filter (VF)

Conversation Deduplication

​Side Effects

​Performance Characteristics

​Implementation

​Related Pages

Scoring and Ranking

Filtering

Phoenix Architecture

Thunder

Build docs developers (and LLMs) love

Overview

Stage Flow

1. Query Hydration

What Gets Hydrated

2. Candidate Sourcing

Thunder (In-Network)

Phoenix Retrieval (Out-of-Network)

3. Candidate Hydration

Hydrated Data

4. Pre-Scoring Filters

5. Scoring

6. Selection

7. Post-Selection Filtering

Side Effects

Performance Characteristics

Implementation

Related Pages