Skip to main content

Overview

The Filter trait defines how candidates are evaluated and partitioned into kept and removed sets. Filters run sequentially and reduce the candidate pool based on various criteria.

Trait Definition

pub trait Filter<Q, C>: Any + Send + Sync
where
    Q: Clone + Send + Sync + 'static,
    C: Clone + Send + Sync + 'static,
{
    fn enable(&self, query: &Q) -> bool;
    async fn filter(&self, query: &Q, candidates: Vec<C>) -> Result<FilterResult<C>, String>;
    fn name(&self) -> &'static str;
}

Type Parameters

Q
generic
The query type that contains request context and parametersConstraints: Clone + Send + Sync + 'static
C
generic
The candidate type to filterConstraints: Clone + Send + Sync + 'static

FilterResult

pub struct FilterResult<C> {
    pub kept: Vec<C>,
    pub removed: Vec<C>,
}
kept
Vec<C>
Candidates that passed the filter criteria and continue to the next stage
removed
Vec<C>
Candidates that were filtered out and are excluded from further processing

Methods

enable

enable
fn
Determines whether this filter should run for the given query
fn enable(&self, query: &Q) -> bool
query
&Q
Reference to the query object
return
bool
Returns true if this filter should run, false to skip. Default implementation returns true.

filter

filter
async fn
Evaluates candidates and partitions them into kept and removed sets
async fn filter(&self, query: &Q, candidates: Vec<C>) -> Result<FilterResult<C>, String>
query
&Q
Reference to the query object containing filtering parameters
candidates
Vec<C>
Vector of candidates to evaluate (takes ownership)
return
Result<FilterResult<C>, String>
Returns a FilterResult with kept and removed candidates, or an error message on failure

name

name
fn
Returns a stable name for logging and metrics
fn name(&self) -> &'static str
return
&'static str
A short type name derived from the implementing struct

Example Implementation

Here’s a real example that filters tweets based on their age:
use xai_candidate_pipeline::filter::{Filter, FilterResult};
use tonic::async_trait;
use std::time::Duration;

pub struct AgeFilter {
    pub max_age: Duration,
}

impl AgeFilter {
    pub fn new(max_age: Duration) -> Self {
        Self { max_age }
    }

    fn is_within_age(&self, tweet_id: i64) -> bool {
        snowflake::duration_since_creation_opt(tweet_id)
            .map(|age| age <= self.max_age)
            .unwrap_or(false)
    }
}

#[async_trait]
impl Filter<ScoredPostsQuery, PostCandidate> for AgeFilter {
    async fn filter(
        &self,
        _query: &ScoredPostsQuery,
        candidates: Vec<PostCandidate>,
    ) -> Result<FilterResult<PostCandidate>, String> {
        // Partition candidates by age
        let (kept, removed): (Vec<_>, Vec<_>) = candidates
            .into_iter()
            .partition(|c| self.is_within_age(c.tweet_id));

        Ok(FilterResult { kept, removed })
    }
}

Usage Notes

  • Filters run sequentially in the order they are configured
  • Each filter receives the kept candidates from the previous filter
  • The removed candidates can be tracked for metrics and debugging
  • Filters take ownership of the candidate vector for efficient partitioning
  • Use partition for simple boolean criteria, or build custom logic for complex rules

Common Filter Types

Quality Filters

  • Remove low-quality or spam content
  • Filter by content safety scores
  • Remove candidates missing required fields

Business Logic Filters

  • Apply user preferences and settings
  • Enforce diversity constraints
  • Remove already-seen content

Performance Filters

  • Limit candidate pool size before expensive operations
  • Remove candidates that would fail downstream validation

Best Practices

  1. Sequential Ordering: Place cheap filters before expensive ones
  2. Descriptive Names: Override name() to provide meaningful filter identifiers
  3. Track Removed: Log removed candidate counts for monitoring
  4. Error Handling: Return descriptive errors for debugging
  5. Conditional Execution: Use enable() to skip filters based on query parameters
  6. Efficient Partitioning: Use Iterator::partition for simple boolean criteria

Performance Considerations

  • Filters run sequentially, so order matters for performance
  • Place filters that remove many candidates early in the chain
  • Avoid expensive async operations if possible; defer to scorers
  • Consider batch operations when accessing external services

See Also

Build docs developers (and LLMs) love