Skip to main content

Overview

The For You feed algorithm processes every request through a sequence of well-defined stages. Each stage transforms the data, enriching candidates with additional information, filtering out ineligible content, and ultimately producing a ranked list of posts.

Stage Flow

The pipeline executes the following stages in order:
┌─────────────────────────┐
│   1. Query Hydration    │  Fetch user context and engagement history
└───────────┬─────────────┘


┌─────────────────────────┐
│   2. Candidate Sourcing │  Retrieve candidates from Thunder and Phoenix
└───────────┬─────────────┘


┌─────────────────────────┐
│  3. Candidate Hydration │  Enrich candidates with metadata
└───────────┬─────────────┘


┌─────────────────────────┐
│  4. Pre-Scoring Filters │  Remove ineligible candidates
└───────────┬─────────────┘


┌─────────────────────────┐
│      5. Scoring         │  Predict engagement and compute scores
└───────────┬─────────────┘


┌─────────────────────────┐
│      6. Selection       │  Sort by score and select top K
└───────────┬─────────────┘


┌─────────────────────────┐
│ 7. Post-Selection       │  Final validation and visibility filtering
│    Filtering            │
└───────────┬─────────────┘


┌─────────────────────────┐
│   Ranked Feed Response  │
└─────────────────────────┘

1. Query Hydration

Purpose: Load user context required for personalized recommendations.

What Gets Hydrated

  • User Action Sequence: Recent engagement history (likes, reposts, replies, clicks, etc.)
  • User Features: Following list, blocked/muted accounts, muted keywords, preferences
  • Bloom Filters: Efficient data structure for previously seen posts
  • Seen IDs: Posts the user has explicitly seen in recent sessions
This stage runs before candidate sourcing because both Thunder and Phoenix need user context to retrieve relevant candidates.

2. Candidate Sourcing

Candidates are retrieved from two parallel sources:

Thunder (In-Network)

Retrieves recent posts from accounts the user follows:
  • Original posts
  • Replies
  • Reposts
  • Video posts
Thunder maintains an in-memory store for sub-millisecond lookups.

Phoenix Retrieval (Out-of-Network)

Uses a two-tower ML model to find relevant posts from the global corpus:
  • User Tower: Encodes user engagement history into an embedding
  • Candidate Tower: Encodes all posts into embeddings
  • Similarity Search: Retrieves top-K posts via dot product similarity
See Phoenix Retrieval for detailed architecture.

3. Candidate Hydration

Purpose: Enrich candidates with additional metadata needed for filtering and scoring.

Hydrated Data

  • Core Post Data: Text, media URLs, timestamps
  • Author Information: Username, display name, verification status
  • Video Duration: For video posts (used in scoring eligibility)
  • Subscription Status: Whether the post requires paid subscription
  • Visibility Information: Safety labels, spam detection results
Hydrators run in parallel where possible for optimal performance.

4. Pre-Scoring Filters

Purpose: Remove candidates that should never be scored or shown. Filters run sequentially. See Filtering for complete details on each filter. Key filters include:
  • Duplicate removal
  • Age filtering
  • Self-post removal
  • Blocked/muted author filtering
  • Previously seen post filtering

5. Scoring

Purpose: Predict user engagement and compute relevance scores. Scorers apply sequentially:
1

Phoenix Scorer

The Grok-based transformer model predicts probabilities for multiple engagement types:
  • P(favorite), P(reply), P(repost)
  • P(click), P(profile_click), P(video_view)
  • P(share), P(dwell), P(follow_author)
  • P(not_interested), P(block_author), P(mute_author), P(report)
2

Weighted Scorer

Combines predictions into a single relevance score using weighted formula:
Final Score = Σ (weight_i × P(action_i))
Positive actions have positive weights, negative actions have negative weights.
3

Author Diversity Scorer

Attenuates scores for repeated authors to ensure feed diversity:
multiplier = (1 - floor) × decay^position + floor
The first post from an author gets full score, subsequent posts are attenuated.
4

OON Scorer

Adjusts out-of-network content scores by a configurable weight factor to balance in-network and out-of-network content.
See Scoring and Ranking for detailed formulas and implementation.

6. Selection

Purpose: Sort candidates by final score and select the top K. The selector:
  1. Sorts all scored candidates in descending order by score
  2. Selects the top K candidates (typically 100-500 depending on request parameters)
  3. Passes selected candidates to post-selection filters

7. Post-Selection Filtering

Purpose: Final validation before serving to the user. Post-selection filters include:

Visibility Filter (VF)

Removes posts that are deleted, spam, violence, gore, or otherwise violate content policies.

Conversation Deduplication

Deduplicates multiple branches of the same conversation thread to avoid repetitive content.
These filters run after selection because they may require expensive API calls or database lookups.

Side Effects

After the main pipeline completes, side effects run asynchronously:
  • Cache request information for future use
  • Log served candidates for downstream analytics
  • Update Bloom filters with newly served post IDs
Side effects never block the response to the user.

Performance Characteristics

The entire pipeline typically completes in 50-150ms from request to response.
  • Query Hydration: 5-10ms
  • Candidate Sourcing: 10-30ms (parallel execution)
  • Candidate Hydration: 15-40ms (parallel execution)
  • Filters: 5-15ms (sequential)
  • Scoring: 20-50ms (Phoenix model inference)
  • Selection & Post-Filters: 5-10ms

Implementation

The pipeline is implemented using the candidate-pipeline framework, which provides traits for each stage:
pub trait Source<Q, C> {
    async fn get(&self, query: &Q) -> Result<Vec<C>, String>;
}

pub trait Hydrator<Q, C> {
    async fn hydrate(&self, query: &Q, candidates: Vec<C>) -> Result<Vec<C>, String>;
}

pub trait Filter<Q, C> {
    async fn filter(&self, query: &Q, candidates: Vec<C>) -> Result<FilterResult<C>, String>;
}

pub trait Scorer<Q, C> {
    async fn score(&self, query: &Q, candidates: &[C]) -> Result<Vec<C>, String>;
}

pub trait Selector<Q, C> {
    async fn select(&self, query: &Q, candidates: Vec<C>) -> Result<Vec<C>, String>;
}
This architecture enables:
  • Separation of concerns: Each stage has a single responsibility
  • Parallel execution: Independent operations run concurrently
  • Graceful error handling: Failures in non-critical stages don’t crash the pipeline
  • Easy testing: Each stage can be tested in isolation

Scoring and Ranking

Deep dive into the Phoenix model and weighted scoring formula

Filtering

Complete reference of all pre-scoring and post-selection filters

Phoenix Architecture

Transformer architecture with candidate isolation

Thunder

In-memory post store for in-network candidates

Build docs developers (and LLMs) love