Filters

Filters are the second stage of the Home Mixer pipeline, responsible for removing unsuitable candidates based on various criteria. Each filter implements the Filter trait and returns a FilterResult containing both kept and removed candidates.

Overview

Filters implement an asynchronous filter method that partitions candidates into kept and removed lists. Filters can optionally implement an enable method to conditionally activate based on query parameters.

pub struct FilterResult<T> {
    pub kept: Vec<T>,
    pub removed: Vec<T>,
}

Available Filters

AgeFilter

Removes tweets older than a specified duration using Snowflake ID timestamps.

max_age

Duration

required

Maximum age threshold for tweets. Tweets older than this duration are filtered out.

Purpose: Ensures feed freshness by removing stale content Implementation:

home-mixer/filters/age_filter.rs

pub struct AgeFilter {
    pub max_age: Duration,
}

impl AgeFilter {
    fn is_within_age(&self, tweet_id: i64) -> bool {
        snowflake::duration_since_creation_opt(tweet_id)
            .map(|age| age <= self.max_age)
            .unwrap_or(false)
    }
}

AuthorSocialgraphFilter

Removes candidates from authors that the viewer has blocked or muted. Purpose: Respects user preferences and social graph settings Logic:

Checks query.user_features.blocked_user_ids
Checks query.user_features.muted_user_ids
Removes candidates where author matches either list
Short-circuits if both lists are empty for performance

Code Example:

home-mixer/filters/author_socialgraph_filter.rs

let viewer_blocked_user_ids = query.user_features.blocked_user_ids.clone();
let viewer_muted_user_ids = query.user_features.muted_user_ids.clone();

for candidate in candidates {
    let author_id = candidate.author_id as i64;
    let muted = viewer_muted_user_ids.contains(&author_id);
    let blocked = viewer_blocked_user_ids.contains(&author_id);
    if muted || blocked {
        removed.push(candidate);
    } else {
        kept.push(candidate);
    }
}

CoreDataHydrationFilter

Filters out candidates that failed to hydrate essential data from backend services. Purpose: Ensures all candidates have minimum required data for display Criteria:

Author ID must be non-zero
Tweet text must be non-empty (after trimming)

Implementation:

home-mixer/filters/core_data_hydration_filter.rs

let (kept, removed) = candidates
    .into_iter()
    .partition(|c| c.author_id != 0 && !c.tweet_text.trim().is_empty());

DedupConversationFilter

Keeps only the highest-scored candidate per conversation branch. Purpose: Prevents multiple tweets from the same conversation thread appearing in the feed Algorithm:

Groups candidates by conversation ID (derived from ancestors)
For each conversation, keeps only the highest-scoring tweet
Removes all other tweets from that conversation

Code Example:

home-mixer/filters/dedup_conversation_filter.rs

fn get_conversation_id(candidate: &PostCandidate) -> u64 {
    candidate
        .ancestors
        .iter()
        .copied()
        .min()
        .unwrap_or(candidate.tweet_id as u64)
}

for candidate in candidates {
    let conversation_id = get_conversation_id(&candidate);
    let score = candidate.score.unwrap_or(0.0);

    if let Some((kept_idx, best_score)) = best_per_convo.get_mut(&conversation_id) {
        if score > *best_score {
            let previous = std::mem::replace(&mut kept[*kept_idx], candidate);
            removed.push(previous);
            *best_score = score;
        } else {
            removed.push(candidate);
        }
    } else {
        let idx = kept.len();
        best_per_convo.insert(conversation_id, (idx, score));
        kept.push(candidate);
    }
}

DropDuplicatesFilter

Removes duplicate tweets based on tweet ID. Purpose: Ensures each unique tweet appears at most once in the feed Implementation: Uses a HashSet to track seen tweet IDs

home-mixer/filters/drop_duplicates_filter.rs

let mut seen_ids = HashSet::new();
let mut kept = Vec::new();
let mut removed = Vec::new();

for candidate in candidates {
    if seen_ids.insert(candidate.tweet_id) {
        kept.push(candidate);
    } else {
        removed.push(candidate);
    }
}

IneligibleSubscriptionFilter

Filters out subscription-only posts from authors the viewer hasn’t subscribed to. Purpose: Enforces subscription access controls Logic:

If candidate.subscription_author_id is None, keep the candidate (not subscription-gated)
If Some(author_id), check if viewer is subscribed to that author
Remove if not subscribed

Code Example:

home-mixer/filters/ineligible_subscription_filter.rs

let subscribed_user_ids: HashSet<u64> = query
    .user_features
    .subscribed_user_ids
    .iter()
    .map(|id| *id as u64)
    .collect();

let (kept, removed): (Vec<_>, Vec<_>) =
    candidates
        .into_iter()
        .partition(|candidate| match candidate.subscription_author_id {
            Some(author_id) => subscribed_user_ids.contains(&author_id),
            None => true,
        });

MutedKeywordFilter

Filters out tweets containing keywords the user has muted.

tokenizer

Arc<TweetTokenizer>

required

Tokenizer for processing tweet text and muted keywords

Purpose: Respects user content preferences and keyword mutes Algorithm:

Tokenizes user’s muted keywords
Creates a UserMutes matcher with token sequences
Tokenizes each candidate’s tweet text
Removes candidates matching any muted keyword pattern

Implementation:

home-mixer/filters/muted_keyword_filter.rs

let tokenized = muted_keywords.iter().map(|k| self.tokenizer.tokenize(k));
let token_sequences: Vec<TokenSequence> = tokenized.collect::<Vec<_>>();
let user_mutes = UserMutes::new(token_sequences);
let matcher = MatchTweetGroup::new(user_mutes);

for candidate in candidates {
    let tweet_text_token_sequence = self.tokenizer.tokenize(&candidate.tweet_text);
    if matcher.matches(&tweet_text_token_sequence) {
        removed.push(candidate);
    } else {
        kept.push(candidate);
    }
}

PreviouslySeenPostsFilter

Filters out posts the user has already seen using Bloom filters and explicit seen ID lists. Purpose: Prevents showing the same content multiple times Data Sources:

query.seen_ids: Explicit list of seen post IDs from the client
query.bloom_filter_entries: Probabilistic data structure for efficient seen-post checks

Related Posts: Also checks IDs of related posts (retweets, quoted tweets) to ensure conversation context isn’t repeated Implementation:

home-mixer/filters/previously_seen_posts_filter.rs

let bloom_filters = query
    .bloom_filter_entries
    .iter()
    .map(BloomFilter::from_entry)
    .collect::<Vec<_>>();

let (removed, kept): (Vec<_>, Vec<_>) = candidates.into_iter().partition(|c| {
    get_related_post_ids(c).iter().any(|&post_id| {
        query.seen_ids.contains(&post_id)
            || bloom_filters
                .iter()
                .any(|filter| filter.may_contain(post_id))
    })
});

PreviouslyServedPostsFilter

Filters out posts that have already been served in previous responses (used for pagination). Conditional Activation: Only enabled for bottom requests (pagination)

home-mixer/filters/previously_served_posts_filter.rs

fn enable(&self, query: &ScoredPostsQuery) -> bool {
    query.is_bottom_request
}

Purpose: Enables infinite scroll without duplicates Logic: Checks query.served_ids and filters candidates whose IDs (or related post IDs) appear in that set

RetweetDeduplicationFilter

Deduplicates retweets, keeping only the first occurrence of a tweet. Purpose: Prevents seeing both an original tweet and retweets of it Algorithm:

Track seen tweet IDs in a HashSet
For original tweets: mark ID as seen and keep
For retweets: check if retweeted tweet ID is already seen
Keep first occurrence, remove subsequent ones

Code Example:

home-mixer/filters/retweet_deduplication_filter.rs

let mut seen_tweet_ids: HashSet<u64> = HashSet::new();

for candidate in candidates {
    match candidate.retweeted_tweet_id {
        Some(retweeted_id) => {
            if seen_tweet_ids.insert(retweeted_id) {
                kept.push(candidate);
            } else {
                removed.push(candidate);
            }
        }
        None => {
            seen_tweet_ids.insert(candidate.tweet_id as u64);
            kept.push(candidate);
        }
    }
}

SelfTweetFilter

Removes tweets where the author is the viewer. Purpose: Prevents users from seeing their own tweets in the For You feed Implementation:

home-mixer/filters/self_tweet_filter.rs

let viewer_id = query.user_id as u64;
let (kept, removed): (Vec<_>, Vec<_>) = candidates
    .into_iter()
    .partition(|c| c.author_id != viewer_id);

VFFilter

Visibility Filtering - applies safety and policy rules to filter inappropriate content. Purpose: Enforces platform safety policies and content guidelines Logic:

Checks candidate.visibility_reason field
If FilteredReason::SafetyResult indicates Action::Drop, removes candidate
Removes candidates with any other filtered reason

Implementation:

home-mixer/filters/vf_filter.rs

fn should_drop(reason: &Option<FilteredReason>) -> bool {
    match reason {
        Some(FilteredReason::SafetyResult(safety_result)) => {
            matches!(safety_result.action, Action::Drop(_))
        }
        Some(_) => true,
        None => false,
    }
}

let (removed, kept): (Vec<_>, Vec<_>) = candidates
    .into_iter()
    .partition(|c| should_drop(&c.visibility_reason));

Filter Pipeline Order

Filters are typically applied in a specific order to optimize performance:

CoreDataHydrationFilter - Remove invalid candidates early
VFFilter - Apply safety rules
SelfTweetFilter - Remove viewer’s own tweets
AuthorSocialgraphFilter - Respect blocks and mutes
DropDuplicatesFilter - Remove duplicate IDs
PreviouslySeenPostsFilter - Filter previously seen content
PreviouslyServedPostsFilter - Filter paginated content
AgeFilter - Remove stale content
RetweetDeduplicationFilter - Deduplicate retweets
DedupConversationFilter - Deduplicate conversations
MutedKeywordFilter - Apply keyword mutes
IneligibleSubscriptionFilter - Enforce subscription access

Sources - Retrieve candidates before filtering
Scorers - Score candidates after filtering

Candidate Pipeline Traits

Phoenix Models

Home Mixer Components

Overview

Available Filters

AgeFilter

AuthorSocialgraphFilter

CoreDataHydrationFilter

DedupConversationFilter

DropDuplicatesFilter

IneligibleSubscriptionFilter

MutedKeywordFilter

PreviouslySeenPostsFilter

PreviouslyServedPostsFilter

RetweetDeduplicationFilter

SelfTweetFilter

VFFilter

Filter Pipeline Order

Build docs developers (and LLMs) love

Candidate Pipeline Traits

Phoenix Models

Home Mixer Components

​Overview

​Available Filters

​AgeFilter

​AuthorSocialgraphFilter

​CoreDataHydrationFilter

​DedupConversationFilter

​DropDuplicatesFilter

​IneligibleSubscriptionFilter

​MutedKeywordFilter

​PreviouslySeenPostsFilter

​PreviouslyServedPostsFilter

​RetweetDeduplicationFilter

​SelfTweetFilter

​VFFilter

​Filter Pipeline Order

​Related Components

Build docs developers (and LLMs) love

Overview

Available Filters

AgeFilter

AuthorSocialgraphFilter

CoreDataHydrationFilter

DedupConversationFilter

DropDuplicatesFilter

IneligibleSubscriptionFilter

MutedKeywordFilter

PreviouslySeenPostsFilter

PreviouslyServedPostsFilter

RetweetDeduplicationFilter

SelfTweetFilter

VFFilter

Filter Pipeline Order

Related Components