Filtering Strategies

RTK achieves 60-90% token savings through four core strategies: filtering, grouping, truncation, and deduplication. Each command uses one or more strategies tailored to its output format.

The Four Strategies

1. Filtering

Remove noise (comments, whitespace, boilerplate) while preserving structure

2. Grouping

Aggregate similar items (files by directory, errors by type)

3. Truncation

Keep relevant context, cut redundancy (first/last lines, signatures only)

4. Deduplication

Collapse repeated patterns with counts (“[ERROR] … (×5)“)

Strategy Matrix

Different commands use different strategies:

Strategy	Used By	Technique	Reduction
Stats Extraction	git status, git log, pnpm	Count/aggregate, drop details	90-99%
Error Only	runner (err mode)	stderr only, drop stdout	60-80%
Grouping by Pattern	lint, tsc, grep	Group by rule/file/error code	80-90%
Deduplication	log_cmd	Unique + count	70-85%
Structure Only	json_cmd	Keys + types, strip values	80-95%
Code Filtering	read, smart	Filter by level (none/minimal/aggressive)	0-90%
Failure Focus	vitest, playwright, runner	Failures only, hide passing	94-99%
Tree Compression	ls	Hierarchy with counts	50-70%
Progress Filtering	wget, pnpm install	Strip ANSI, final result only	85-95%
JSON/Text Dual	ruff, pip	JSON when available, text fallback	80%+
State Machine	pytest	Track test state, extract failures	90%+
NDJSON Streaming	go test	Line-by-line JSON parse	90%+

Language-Aware Filtering

RTK’s filter.rs module provides language-aware code filtering with three levels:

Filter Levels

None (0% reduction)

Keep everything—raw file content.

// Example: filter.rs with FilterLevel::None
fn calculate_total(items: &[Item]) -> i32 {
    // Sum all items
    items.iter().map(|i| i.value).sum()
}

Minimal (20-40% reduction)

Strip comments and normalize whitespace. Keep structure and code.

// Example: filter.rs with FilterLevel::Minimal
fn calculate_total(items: &[Item]) -> i32 {
    items.iter().map(|i| i.value).sum()
}

Aggressive (60-90% reduction)

Strip comments and function bodies. Keep only signatures.

// Example: filter.rs with FilterLevel::Aggressive
fn calculate_total(items: &[Item]) -> i32 { ... }

Language Support

RTK detects languages by file extension:

Language	Extensions	Comment Syntax
Rust	`.rs`	`//`, `/* */`, `///`
Python	`.py`, `.pyw`	`#`, `"""`
JavaScript	`.js`, `.mjs`, `.cjs`	`//`, `/* */`
TypeScript	`.ts`, `.tsx`	`//`, `/* */`
Go	`.go`	`//`, `/* */`
C/C++	`.c`, `.cpp`, `.h`, `.hpp`	`//`, `/* */`
Java	`.java`	`//`, `/* */`
Ruby	`.rb`	`#`, `=begin`/`=end`
Shell	`.sh`, `.bash`, `.zsh`	`#`

Usage Examples

# None: Full file content
rtk read src/main.rs -l none

# Minimal: Strip comments (default)
rtk read src/main.rs -l minimal

# Aggressive: Signatures only
rtk read src/main.rs -l aggressive

When to use aggressive? When LLMs need to understand code structure but not implementation details. Perfect for “what functions exist?” queries.

Command-Specific Strategies

Git Operations

git status (Stats Extraction)

Raw output (50 lines, ~800 tokens):

On branch main
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	modified:   src/filter.rs
	modified:   src/git.rs
	modified:   src/tracking.rs

Untracked files:
  (use "git add <file>..." to include in what will be committed)
	src/discover/

RTK output (1 line, ~20 tokens):

3 modified, 1 untracked ✓

Strategy:

Count modified files: 3
Count untracked files: 1
Aggregate: “3 modified, 1 untracked”
Token savings: 97%

git diff (Stats + Compact)

Raw output (200+ lines, ~3000 tokens):

diff --git a/src/filter.rs b/src/filter.rs
index abc1234..def5678 100644
--- a/src/filter.rs
+++ b/src/filter.rs
@@ -150,7 +150,12 @@ impl FilterStrategy for MinimalFilter {
-    let mut result = String::new();
+    let mut result = String::with_capacity(content.len());
     for line in content.lines() {
...

RTK output (~30 lines, ~500 tokens):

+142/-89, 3 files changed

src/filter.rs:
  +12/-7 (capacity optimization)
src/git.rs:
  +89/-45 (exit code fix)
src/tracking.rs:
  +41/-37 (refactor)

Strategy:

Extract stats: +142/-89
Group by file
Show summary per file
Token savings: 83%

git log (One-line summaries)

Raw output (50 lines, ~1000 tokens):

commit abc1234def5678
Author: User <[email protected]>
Date:   Thu Jan 23 10:00:00 2025 -0800

    Add token tracking feature
    
    Implements SQLite-based tracking for token savings.
    Adds gain command for analytics.

commit 789ghijklm012
...

RTK output (5 lines, ~100 tokens):

5 commits, +142/-89

abc1234 Add token tracking feature
789ghij Fix git argument parsing
345mnop Add pnpm support

Strategy:

Count commits: 5
Extract total stats: +142/-89
Show first line of each commit message
Token savings: 90%

Testing

vitest/playwright (Failure Focus)

Raw output (200+ lines, ~4000 tokens):

✓ test/auth.test.ts (5)
  ✓ should login with valid credentials
  ✓ should reject invalid password
  ✓ should lock account after 3 failures
  ✓ should reset password
  ✓ should expire sessions

✓ test/api.test.ts (12)
  ✓ GET /users returns list
  ✓ POST /users creates user
  ...

✗ test/db.test.ts (2)
  ✗ should handle transaction rollback
    AssertionError: expected 0 to equal 1
  ✗ should validate constraints
    DatabaseError: NOT NULL constraint failed

RTK output (~10 lines, ~200 tokens):

FAILED: 2/19 tests

✗ test/db.test.ts:
  - should handle transaction rollback
    AssertionError: expected 0 to equal 1
  - should validate constraints
    DatabaseError: NOT NULL constraint failed

Strategy:

Hide passing tests (17 passed → omitted)
Show only failures (2 failed)
Include error messages for debugging
Token savings: 95%

cargo test / go test (Failure Focus + NDJSON)

Raw output (100+ lines, ~2000 tokens):

running 15 tests
test utils::test_parse ... ok
test utils::test_format ... ok
test core::test_init ... ok
test core::test_process ... FAILED
test filters::test_minimal ... ok
...

---- core::test_process stdout ----
thread 'core::test_process' panicked at 'assertion failed'

RTK output (~5 lines, ~100 tokens):

FAILED: 1/15 tests

✗ core::test_process
  assertion failed at src/core.rs:42

Strategy (Rust):

Hide passing tests (14 passed → omitted)
Extract failure details (panic message, file:line)
Token savings: 95%

Strategy (Go - NDJSON):

{"Action":"run","Package":"pkg1","Test":"TestAuth"}
{"Action":"fail","Package":"pkg1","Test":"TestAuth"}
{"Action":"pass","Package":"pkg2","Test":"TestDB"}

Parse line-by-line JSON events
Track test state per package
Aggregate failures only
Token savings: 90%

Linting

ESLint/TSC/Ruff (Grouping by Rule)

Raw output (150 lines, ~2500 tokens):

/src/auth.ts
  12:5  error  'user' is assigned a value but never used  no-unused-vars
  15:8  error  Missing semicolon  semi
  23:1  error  'password' is assigned a value but never used  no-unused-vars

/src/db.ts
  8:3   error  'conn' is assigned a value but never used  no-unused-vars
  14:9  error  Missing semicolon  semi
  ...

RTK output (~15 lines, ~300 tokens):

Errors by rule:
  no-unused-vars: 23 violations
  semi: 45 violations
  indent: 12 violations

Errors by file:
  src/auth.ts: 8 errors
  src/db.ts: 15 errors
  src/api.ts: 5 errors

Strategy:

Parse error lines (regex: file:line:col rule)
Group by rule (no-unused-vars: 23, semi: 45, …)
Group by file (auth.ts: 8, db.ts: 15, …)
Token savings: 88%

Logs & Data

Logs (Deduplication)

Raw output (1000+ lines, ~20000 tokens):

[INFO] Starting server on port 3000
[ERROR] Database connection failed: timeout
[ERROR] Database connection failed: timeout
[ERROR] Database connection failed: timeout
[ERROR] Database connection failed: timeout
[ERROR] Database connection failed: timeout
[INFO] Retrying connection...
[ERROR] Database connection failed: timeout
...

RTK output (~5 lines, ~100 tokens):

[INFO] Starting server on port 3000
[ERROR] Database connection failed: timeout (×127)
[INFO] Retrying connection... (×12)
[ERROR] Database connection failed: timeout (×58)

Strategy:

Identify repeated lines (exact match)
Collapse with counts: ”(×127)”
Keep first occurrence + count
Token savings: 99%

JSON (Structure Extraction)

Raw output (500 lines, ~10000 tokens):

{
  "users": [
    {
      "id": 1,
      "name": "Alice",
      "email": "[email protected]",
      "profile": {
        "bio": "Lorem ipsum dolor sit amet...",
        "avatar": "data:image/png;base64,iVBORw0KGgoAAAANS...",
        "preferences": { ... }
      }
    },
    // ... 99 more users
  ],
  "metadata": { ... }
}

RTK output (~10 lines, ~200 tokens):

{
  "users": [ { /* 100 items */ } ],
  "metadata": { /* object */ }
}

Schema:
  users: array of objects
    - id: number
    - name: string
    - email: string
    - profile: object

Strategy:

Parse JSON structure
Extract keys + types
Count array lengths
Strip values (especially large strings/base64)
Token savings: 98%

Advanced Patterns

State Machine Parsing (pytest)

Pytest output doesn’t have JSON mode—RTK uses a state machine to parse text:

enum TestState {
    Idle,
    TestStart,
    Passed,
    Failed,
    Summary,
}

for line in output.lines() {
    if line.contains("::test_") {
        state = TestState::TestStart;
        current_test = extract_name(line);
    } else if line.contains("PASSED") {
        state = TestState::Passed;
        // Omit passing tests
    } else if line.contains("FAILED") {
        state = TestState::Failed;
        failures.push(current_test);
    }
}

Result: Only failed tests appear in output (90% reduction).

NDJSON Streaming (go test)

Go’s test runner outputs newline-delimited JSON with interleaved package events:

{"Action":"run","Package":"pkg1","Test":"TestA"}
{"Action":"fail","Package":"pkg1","Test":"TestA"}
{"Action":"run","Package":"pkg2","Test":"TestB"}
{"Action":"pass","Package":"pkg2","Test":"TestB"}

RTK parses line-by-line and tracks state per package:

let mut pkg_failures: HashMap<String, Vec<String>> = HashMap::new();

for line in output.lines() {
    let event: TestEvent = serde_json::from_str(line)?;
    if event.Action == "fail" {
        pkg_failures
            .entry(event.Package)
            .or_default()
            .push(event.Test);
    }
}

Result: Aggregated failures per package (90% reduction).

Package Manager Detection (JS/TS)

Modern JS/TS commands auto-detect package managers:

let is_pnpm = Path::new("pnpm-lock.yaml").exists();
let is_yarn = Path::new("yarn.lock").exists();

let mut cmd = if is_pnpm {
    Command::new("pnpm").arg("exec").arg("--").arg("eslint")
} else if is_yarn {
    Command::new("yarn").arg("exec").arg("--").arg("eslint")
} else {
    Command::new("npx").arg("--no-install").arg("--").arg("eslint")
};

Why this matters:

CWD preservation: pnpm/yarn exec preserve working directory
Monorepo support: Works in nested package.json structures
No global installs: Uses project-local dependencies only

Choosing the Right Strategy

Identify output format

Is it structured (JSON, NDJSON) or unstructured (text, logs)?

Determine information density

High density (code) → filtering. Low density (test results) → failure focus.

Check for repetition

Repeated patterns (logs, errors) → deduplication. Unique items → grouping.

Measure effectiveness

Aim for 60%+ reduction. If <60%, consider combining strategies.

Best Practices

Prioritize structure

Keep structure (function signatures, file paths) and drop details (implementations, values)

Focus on failures

LLMs need to see errors, not successes. Hide passing tests, show only failures.

Use JSON when available

Structured formats (JSON, NDJSON) are easier to parse and compress than text.

Preserve exit codes

Always propagate exit codes for CI/CD reliability. Filter output, not behavior.

Get Started

Core Concepts

Commands

Integration

Analytics

Advanced

Resources

The Four Strategies

1. Filtering

2. Grouping

3. Truncation

4. Deduplication

Strategy Matrix

Language-Aware Filtering

Filter Levels

Language Support

Usage Examples

Command-Specific Strategies

Git Operations

Testing

Linting

Logs & Data

Advanced Patterns

State Machine Parsing (pytest)

NDJSON Streaming (go test)

Package Manager Detection (JS/TS)

Choosing the Right Strategy

Best Practices

Prioritize structure

Focus on failures

Use JSON when available

Preserve exit codes

Get Started

Core Concepts

Commands

Integration

Analytics

Advanced

Resources

​The Four Strategies

1. Filtering

2. Grouping

3. Truncation

4. Deduplication

​Strategy Matrix

​Language-Aware Filtering

​Filter Levels

​Language Support

​Usage Examples

​Command-Specific Strategies

​Git Operations

​Testing

​Linting

​Logs & Data

​Advanced Patterns

​State Machine Parsing (pytest)

​NDJSON Streaming (go test)

​Package Manager Detection (JS/TS)

​Choosing the Right Strategy

​Best Practices

Prioritize structure

Focus on failures

Use JSON when available

Preserve exit codes

The Four Strategies

Strategy Matrix

Language-Aware Filtering

Filter Levels

Language Support

Usage Examples

Command-Specific Strategies

Git Operations

Testing

Linting

Logs & Data

Advanced Patterns

State Machine Parsing (pytest)

NDJSON Streaming (go test)

Package Manager Detection (JS/TS)

Choosing the Right Strategy

Best Practices