Skip to main content
Filters are the second stage of the Home Mixer pipeline, responsible for removing unsuitable candidates based on various criteria. Each filter implements the Filter trait and returns a FilterResult containing both kept and removed candidates.

Overview

Filters implement an asynchronous filter method that partitions candidates into kept and removed lists. Filters can optionally implement an enable method to conditionally activate based on query parameters.
pub struct FilterResult<T> {
    pub kept: Vec<T>,
    pub removed: Vec<T>,
}

Available Filters

AgeFilter

Removes tweets older than a specified duration using Snowflake ID timestamps.
max_age
Duration
required
Maximum age threshold for tweets. Tweets older than this duration are filtered out.
Purpose: Ensures feed freshness by removing stale content Implementation:
home-mixer/filters/age_filter.rs
pub struct AgeFilter {
    pub max_age: Duration,
}

impl AgeFilter {
    fn is_within_age(&self, tweet_id: i64) -> bool {
        snowflake::duration_since_creation_opt(tweet_id)
            .map(|age| age <= self.max_age)
            .unwrap_or(false)
    }
}

AuthorSocialgraphFilter

Removes candidates from authors that the viewer has blocked or muted. Purpose: Respects user preferences and social graph settings Logic:
  • Checks query.user_features.blocked_user_ids
  • Checks query.user_features.muted_user_ids
  • Removes candidates where author matches either list
  • Short-circuits if both lists are empty for performance
Code Example:
home-mixer/filters/author_socialgraph_filter.rs
let viewer_blocked_user_ids = query.user_features.blocked_user_ids.clone();
let viewer_muted_user_ids = query.user_features.muted_user_ids.clone();

for candidate in candidates {
    let author_id = candidate.author_id as i64;
    let muted = viewer_muted_user_ids.contains(&author_id);
    let blocked = viewer_blocked_user_ids.contains(&author_id);
    if muted || blocked {
        removed.push(candidate);
    } else {
        kept.push(candidate);
    }
}

CoreDataHydrationFilter

Filters out candidates that failed to hydrate essential data from backend services. Purpose: Ensures all candidates have minimum required data for display Criteria:
  • Author ID must be non-zero
  • Tweet text must be non-empty (after trimming)
Implementation:
home-mixer/filters/core_data_hydration_filter.rs
let (kept, removed) = candidates
    .into_iter()
    .partition(|c| c.author_id != 0 && !c.tweet_text.trim().is_empty());

DedupConversationFilter

Keeps only the highest-scored candidate per conversation branch. Purpose: Prevents multiple tweets from the same conversation thread appearing in the feed Algorithm:
  1. Groups candidates by conversation ID (derived from ancestors)
  2. For each conversation, keeps only the highest-scoring tweet
  3. Removes all other tweets from that conversation
Code Example:
home-mixer/filters/dedup_conversation_filter.rs
fn get_conversation_id(candidate: &PostCandidate) -> u64 {
    candidate
        .ancestors
        .iter()
        .copied()
        .min()
        .unwrap_or(candidate.tweet_id as u64)
}

for candidate in candidates {
    let conversation_id = get_conversation_id(&candidate);
    let score = candidate.score.unwrap_or(0.0);

    if let Some((kept_idx, best_score)) = best_per_convo.get_mut(&conversation_id) {
        if score > *best_score {
            let previous = std::mem::replace(&mut kept[*kept_idx], candidate);
            removed.push(previous);
            *best_score = score;
        } else {
            removed.push(candidate);
        }
    } else {
        let idx = kept.len();
        best_per_convo.insert(conversation_id, (idx, score));
        kept.push(candidate);
    }
}

DropDuplicatesFilter

Removes duplicate tweets based on tweet ID. Purpose: Ensures each unique tweet appears at most once in the feed Implementation: Uses a HashSet to track seen tweet IDs
home-mixer/filters/drop_duplicates_filter.rs
let mut seen_ids = HashSet::new();
let mut kept = Vec::new();
let mut removed = Vec::new();

for candidate in candidates {
    if seen_ids.insert(candidate.tweet_id) {
        kept.push(candidate);
    } else {
        removed.push(candidate);
    }
}

IneligibleSubscriptionFilter

Filters out subscription-only posts from authors the viewer hasn’t subscribed to. Purpose: Enforces subscription access controls Logic:
  • If candidate.subscription_author_id is None, keep the candidate (not subscription-gated)
  • If Some(author_id), check if viewer is subscribed to that author
  • Remove if not subscribed
Code Example:
home-mixer/filters/ineligible_subscription_filter.rs
let subscribed_user_ids: HashSet<u64> = query
    .user_features
    .subscribed_user_ids
    .iter()
    .map(|id| *id as u64)
    .collect();

let (kept, removed): (Vec<_>, Vec<_>) =
    candidates
        .into_iter()
        .partition(|candidate| match candidate.subscription_author_id {
            Some(author_id) => subscribed_user_ids.contains(&author_id),
            None => true,
        });

MutedKeywordFilter

Filters out tweets containing keywords the user has muted.
tokenizer
Arc<TweetTokenizer>
required
Tokenizer for processing tweet text and muted keywords
Purpose: Respects user content preferences and keyword mutes Algorithm:
  1. Tokenizes user’s muted keywords
  2. Creates a UserMutes matcher with token sequences
  3. Tokenizes each candidate’s tweet text
  4. Removes candidates matching any muted keyword pattern
Implementation:
home-mixer/filters/muted_keyword_filter.rs
let tokenized = muted_keywords.iter().map(|k| self.tokenizer.tokenize(k));
let token_sequences: Vec<TokenSequence> = tokenized.collect::<Vec<_>>();
let user_mutes = UserMutes::new(token_sequences);
let matcher = MatchTweetGroup::new(user_mutes);

for candidate in candidates {
    let tweet_text_token_sequence = self.tokenizer.tokenize(&candidate.tweet_text);
    if matcher.matches(&tweet_text_token_sequence) {
        removed.push(candidate);
    } else {
        kept.push(candidate);
    }
}

PreviouslySeenPostsFilter

Filters out posts the user has already seen using Bloom filters and explicit seen ID lists. Purpose: Prevents showing the same content multiple times Data Sources:
  • query.seen_ids: Explicit list of seen post IDs from the client
  • query.bloom_filter_entries: Probabilistic data structure for efficient seen-post checks
Related Posts: Also checks IDs of related posts (retweets, quoted tweets) to ensure conversation context isn’t repeated Implementation:
home-mixer/filters/previously_seen_posts_filter.rs
let bloom_filters = query
    .bloom_filter_entries
    .iter()
    .map(BloomFilter::from_entry)
    .collect::<Vec<_>>();

let (removed, kept): (Vec<_>, Vec<_>) = candidates.into_iter().partition(|c| {
    get_related_post_ids(c).iter().any(|&post_id| {
        query.seen_ids.contains(&post_id)
            || bloom_filters
                .iter()
                .any(|filter| filter.may_contain(post_id))
    })
});

PreviouslyServedPostsFilter

Filters out posts that have already been served in previous responses (used for pagination). Conditional Activation: Only enabled for bottom requests (pagination)
home-mixer/filters/previously_served_posts_filter.rs
fn enable(&self, query: &ScoredPostsQuery) -> bool {
    query.is_bottom_request
}
Purpose: Enables infinite scroll without duplicates Logic: Checks query.served_ids and filters candidates whose IDs (or related post IDs) appear in that set

RetweetDeduplicationFilter

Deduplicates retweets, keeping only the first occurrence of a tweet. Purpose: Prevents seeing both an original tweet and retweets of it Algorithm:
  1. Track seen tweet IDs in a HashSet
  2. For original tweets: mark ID as seen and keep
  3. For retweets: check if retweeted tweet ID is already seen
  4. Keep first occurrence, remove subsequent ones
Code Example:
home-mixer/filters/retweet_deduplication_filter.rs
let mut seen_tweet_ids: HashSet<u64> = HashSet::new();

for candidate in candidates {
    match candidate.retweeted_tweet_id {
        Some(retweeted_id) => {
            if seen_tweet_ids.insert(retweeted_id) {
                kept.push(candidate);
            } else {
                removed.push(candidate);
            }
        }
        None => {
            seen_tweet_ids.insert(candidate.tweet_id as u64);
            kept.push(candidate);
        }
    }
}

SelfTweetFilter

Removes tweets where the author is the viewer. Purpose: Prevents users from seeing their own tweets in the For You feed Implementation:
home-mixer/filters/self_tweet_filter.rs
let viewer_id = query.user_id as u64;
let (kept, removed): (Vec<_>, Vec<_>) = candidates
    .into_iter()
    .partition(|c| c.author_id != viewer_id);

VFFilter

Visibility Filtering - applies safety and policy rules to filter inappropriate content. Purpose: Enforces platform safety policies and content guidelines Logic:
  • Checks candidate.visibility_reason field
  • If FilteredReason::SafetyResult indicates Action::Drop, removes candidate
  • Removes candidates with any other filtered reason
Implementation:
home-mixer/filters/vf_filter.rs
fn should_drop(reason: &Option<FilteredReason>) -> bool {
    match reason {
        Some(FilteredReason::SafetyResult(safety_result)) => {
            matches!(safety_result.action, Action::Drop(_))
        }
        Some(_) => true,
        None => false,
    }
}

let (removed, kept): (Vec<_>, Vec<_>) = candidates
    .into_iter()
    .partition(|c| should_drop(&c.visibility_reason));

Filter Pipeline Order

Filters are typically applied in a specific order to optimize performance:
  1. CoreDataHydrationFilter - Remove invalid candidates early
  2. VFFilter - Apply safety rules
  3. SelfTweetFilter - Remove viewer’s own tweets
  4. AuthorSocialgraphFilter - Respect blocks and mutes
  5. DropDuplicatesFilter - Remove duplicate IDs
  6. PreviouslySeenPostsFilter - Filter previously seen content
  7. PreviouslyServedPostsFilter - Filter paginated content
  8. AgeFilter - Remove stale content
  9. RetweetDeduplicationFilter - Deduplicate retweets
  10. DedupConversationFilter - Deduplicate conversations
  11. MutedKeywordFilter - Apply keyword mutes
  12. IneligibleSubscriptionFilter - Enforce subscription access
  • Sources - Retrieve candidates before filtering
  • Scorers - Score candidates after filtering

Build docs developers (and LLMs) love