Testing - RTK (Rust Token Killer)

RTK follows a Test-Driven Development (TDD) approach with multiple layers of testing to ensure correctness, performance, and maintainability.

Testing Strategy

Test Pyramid

RTK uses a three-tier testing approach:

           ┌─────────────┐
           │   Smoke     │  Manual validation (69 assertions)
           │   Tests     │  scripts/test-all.sh
           └─────────────┘
        ┌───────────────────┐
        │   Integration     │  End-to-end command testing
        │   Tests           │  Real command execution
        └───────────────────┘
     ┌────────────────────────┐
     │     Unit Tests         │  Filter functions (105+ tests)
     │                        │  Embedded in modules
     └────────────────────────┘

Test Coverage

Unit tests: 105+ tests across 25+ modules
Smoke tests: 69 assertions covering all commands
TDD workflow: Red-Green-Refactor mandatory for all new code

From CLAUDE.md: “All code follows Red-Green-Refactor. See .claude/skills/rtk-tdd/ for the full workflow and Rust-idiomatic patterns.”

TDD Workflow

Red-Green-Refactor Cycle

RED: Write a failing test

Start by writing a test for the functionality you want to implement:

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_filter_git_log() {
        let raw = "abc123 Fix bug\ndef456 Add feature\n";
        let filtered = filter_git_log(raw, 0);
        assert!(filtered.contains("2 commits"));
    }
}

Run the test - it should fail because the function doesn’t exist yet:

cargo test test_filter_git_log

GREEN: Implement minimal code to pass

Write the simplest code that makes the test pass:

fn filter_git_log(raw: &str, _verbose: u8) -> String {
    let count = raw.lines().count();
    format!("{} commits", count)
}

Run the test again - it should now pass:

cargo test test_filter_git_log

REFACTOR: Improve the code

Refactor for clarity, performance, and maintainability:

fn filter_git_log(raw: &str, verbose: u8) -> String {
    let lines: Vec<&str> = raw.lines()
        .filter(|l| !l.is_empty())
        .collect();
    
    if verbose >= 3 {
        return raw.to_string(); // Show raw output
    }
    
    format!("{} commits", lines.len())
}

Run tests again to ensure refactoring didn’t break anything:

cargo test

Dominant Pattern

From CLAUDE.md (line 403):

Dominant pattern: raw string input → filter function → assert output contains/excludes

Example:

#[test]
fn test_lint_groups_by_rule() {
    let raw = r#"
        /path/to/file.js:10:5 - error no-unused-vars: 'x' is defined but never used
        /path/to/file.js:15:3 - error no-unused-vars: 'y' is defined but never used
        /path/to/other.js:20:1 - error semi: Missing semicolon
    "#;
    
    let filtered = filter_lint_output(raw);
    
    // Assert grouped output
    assert!(filtered.contains("no-unused-vars: 2"));
    assert!(filtered.contains("semi: 1"));
    
    // Assert reduction
    assert!(filtered.len() < raw.len() / 2);
}

Unit Tests

Test Organization

Unit tests are embedded in each module:

// src/example_cmd.rs

use anyhow::Result;

pub fn run(args: &[String], verbose: u8) -> Result<()> {
    // Implementation
}

fn filter_output(raw: &str) -> String {
    // Filtering logic
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_filter_output_basic() {
        let raw = "line1\nline2\nline3";
        let result = filter_output(raw);
        assert_eq!(result, "3 lines");
    }

    #[test]
    fn test_filter_output_empty() {
        let result = filter_output("");
        assert_eq!(result, "0 lines");
    }

    #[test]
    fn test_token_savings() {
        let raw = "x".repeat(1000);
        let result = filter_output(&raw);
        
        // Verify ≥60% savings
        let savings = 1.0 - (result.len() as f64 / raw.len() as f64);
        assert!(savings >= 0.6, "Expected ≥60% savings, got {:.1}%", savings * 100.0);
    }
}

Running Unit Tests

# Run all tests
cargo test

# Run tests for specific module
cargo test git::tests::

# Run specific test
cargo test test_filter_git_log

# Run tests with output (see println! statements)
cargo test -- --nocapture

# Run tests in parallel (default)
cargo test

# Run tests sequentially (for debugging)
cargo test -- --test-threads=1

Test Fixtures

For complex outputs, use fixture files:

#[test]
fn test_filter_large_output() {
    let raw = include_str!("../tests/fixtures/git_log_raw.txt");
    let filtered = filter_git_log(raw, 0);
    
    // Test expectations
    assert!(filtered.contains("commits"));
    assert!(filtered.len() < raw.len() / 2);
}

Fixture location: tests/fixtures/<cmd>_raw.txt

Integration Tests

Command Execution Tests

Integration tests execute real commands:

// tests/integration_test.rs

use std::process::Command;

#[test]
fn test_rtk_git_status() {
    let output = Command::new("target/debug/rtk")
        .args(["git", "status"])
        .output()
        .expect("Failed to execute rtk");
    
    assert!(output.status.success());
    
    let stdout = String::from_utf8_lossy(&output.stdout);
    // Verify compressed output format
    assert!(stdout.contains("modified") || stdout.contains("✓"));
}

#[test]
fn test_exit_code_preservation() {
    // Test command that fails
    let output = Command::new("target/debug/rtk")
        .args(["git", "invalid-command"])
        .output()
        .expect("Failed to execute rtk");
    
    // Should fail with git's exit code
    assert!(!output.status.success());
}

Integration tests require the RTK binary to be built first: cargo build

Smoke Tests

scripts/test-all.sh

Smoke tests validate all commands on a real system:

#!/bin/bash
# scripts/test-all.sh

# 69 assertions covering all commands

# Git commands
rtk git status || fail "git status failed"
rtk git log -5 || fail "git log failed"
rtk git diff HEAD~1 || fail "git diff failed"

# Test runners
rtk err "false" && fail "err should fail on error"
rtk test "cargo test" || fail "test command failed"

# File operations
rtk ls /tmp || fail "ls failed"
rtk read Cargo.toml || fail "read failed"

# ... 60+ more assertions

Running Smoke Tests

# Requires installed binary
cargo install --path .

# Run smoke tests
bash scripts/test-all.sh

# Output: PASS/FAIL for each command

Smoke tests are run manually before releases. They require RTK to be installed system-wide.

Performance Benchmarks

Startup Time

Verify <10ms startup time requirement:

# Install hyperfine
cargo install hyperfine

# Benchmark RTK vs raw command
hyperfine 'rtk git status' 'git status' --warmup 3

# Expected output:
# Benchmark 1: rtk git status
#   Time (mean ± σ):       8.2 ms ±   0.5 ms    [User: 3.1 ms, System: 4.2 ms]
# 
# Benchmark 2: git status
#   Time (mean ± σ):       7.5 ms ±   0.4 ms    [User: 2.8 ms, System: 4.0 ms]
# 
# Summary
#   'git status' ran 1.09 ± 0.09 times faster than 'rtk git status'

Acceptable: RTK overhead <10ms (typically 5-15ms)

Memory Usage

# macOS
/usr/bin/time -l target/release/rtk git status
# Look for: maximum resident set size

# Linux
/usr/bin/time -v target/release/rtk git status
# Look for: Maximum resident set size (kbytes)

Target: <5MB resident memory

Token Savings Verification

Verify token savings in tests:

fn count_tokens(text: &str) -> usize {
    (text.len() as f64 / 4.0).ceil() as usize
}

#[test]
fn test_token_savings_git_status() {
    let raw = include_str!("../tests/fixtures/git_status_raw.txt");
    let filtered = filter_git_status(raw, 0);
    
    let input_tokens = count_tokens(raw);
    let output_tokens = count_tokens(&filtered);
    let savings_pct = ((input_tokens - output_tokens) as f64 / input_tokens as f64) * 100.0;
    
    // Verify ≥60% savings
    assert!(savings_pct >= 60.0, 
            "Expected ≥60% savings, got {:.1}%", savings_pct);
}

Manual Testing Requirements

For Filter Changes

From CLAUDE.md (lines 477-481): Manual testing is REQUIRED for filter changes:

Test with real command

rtk git log -10

Inspect output, verify condensed correctly

Verify critical info preserved

Check that essential information is retained:

Commit hashes for git log
Error messages for linters
File names for file operations

Check format is readable

Output should be human-readable and consistently formatted

Verify exit code

rtk git invalid-command
echo $?  # Should match git's exit code

For Hook Changes

Test in real Claude Code session:

Create test Claude Code session
Type raw command (e.g., git status)
Verify hook rewrites to rtk git status
Check output is correct

For Performance Changes

Benchmark baseline

hyperfine 'rtk git status' --warmup 3 > /tmp/before.txt

Make changes and rebuild

# Make changes
cargo build --release

Benchmark again

hyperfine 'target/release/rtk git status' --warmup 3 > /tmp/after.txt

Compare results

diff /tmp/before.txt /tmp/after.txt

Startup time should remain <10ms

Cross-Platform Testing

From CLAUDE.md (lines 494-497):

macOS (zsh): Test locally

Linux (bash): Use Docker

docker run --rm -v $(pwd):/rtk -w /rtk rust:latest cargo test

Windows (PowerShell): Trust CI/CD pipeline or test manually

Anti-pattern: Running only automated tests (cargo test, cargo clippy) without actually executing rtk <cmd> and inspecting output.

Test Commands Reference

# Unit tests
cargo test                    # All tests
cargo test filter::tests::    # Module-specific
cargo test -- --nocapture     # With stdout

# Integration tests
cargo test --test integration_test

# Smoke tests (requires installed binary)
bash scripts/test-all.sh

# Performance benchmarks
hyperfine 'rtk git status' 'git status' --warmup 3
/usr/bin/time -l rtk git status  # macOS memory
/usr/bin/time -v rtk git status  # Linux memory

# Pre-commit quality gate
cargo fmt --all --check && cargo clippy --all-targets && cargo test

Test Writing Guidelines

1. Test Names Should Be Descriptive

// ✅ GOOD
#[test]
fn test_filter_git_log_groups_by_date() { ... }

// ❌ BAD
#[test]
fn test1() { ... }

2. Use Assert Messages

// ✅ GOOD
assert!(savings >= 0.6, 
        "Expected ≥60% savings, got {:.1}%", savings * 100.0);

// ❌ BAD
assert!(savings >= 0.6);

3. Test Edge Cases

#[test]
fn test_filter_empty_input() {
    let result = filter_output("");
    assert_eq!(result, "0 items");
}

#[test]
fn test_filter_unicode() {
    let raw = "emoji 🚀 test";
    let result = filter_output(raw);
    assert!(result.contains("test"));
}

#[test]
fn test_filter_ansi_codes() {
    let raw = "\x1b[31merror\x1b[0m message";
    let result = filter_output(raw);
    assert!(result.contains("error"));
}

4. Verify Token Savings

All filter tests should verify ≥60% token savings:

#[test]
fn test_achieves_target_savings() {
    let raw = generate_realistic_output();
    let filtered = filter_output(&raw);
    
    let savings = 1.0 - (filtered.len() as f64 / raw.len() as f64);
    assert!(savings >= 0.6, 
            "Failed to meet 60% savings target: {:.1}%", savings * 100.0);
}

Untested Modules Backlog

From CLAUDE.md (line 398):

See .claude/skills/rtk-tdd/references/testing-patterns.md for RTK-specific patterns and untested module backlog.

Modules that may need additional test coverage:

Container commands (docker/podman)
GitHub CLI commands (gh)
Environment commands (env)
Some error handling paths

Next Steps

Adding Commands - Implement new filters with TDD
Architecture - Understand filter strategies
Security - Security testing and review process

Development

Security

​Testing Strategy

​Test Pyramid

​Test Coverage

​TDD Workflow

​Red-Green-Refactor Cycle

​Dominant Pattern

​Unit Tests

​Test Organization

​Running Unit Tests

​Test Fixtures

​Integration Tests

​Command Execution Tests

​Smoke Tests

​scripts/test-all.sh

​Running Smoke Tests

​Performance Benchmarks

​Startup Time

​Memory Usage

​Token Savings Verification

​Manual Testing Requirements

​For Filter Changes

​For Hook Changes

​For Performance Changes

​Cross-Platform Testing

​Test Commands Reference

​Test Writing Guidelines

​1. Test Names Should Be Descriptive

​2. Use Assert Messages

​3. Test Edge Cases

​4. Verify Token Savings

​Untested Modules Backlog

​Next Steps

Testing Strategy

Test Pyramid

Test Coverage

TDD Workflow

Red-Green-Refactor Cycle

Dominant Pattern

Unit Tests

Test Organization

Running Unit Tests

Test Fixtures

Integration Tests

Command Execution Tests

Smoke Tests

scripts/test-all.sh

Running Smoke Tests

Performance Benchmarks

Startup Time

Memory Usage

Token Savings Verification

Manual Testing Requirements

For Filter Changes

For Hook Changes

For Performance Changes

Cross-Platform Testing

Test Commands Reference

Test Writing Guidelines

1. Test Names Should Be Descriptive

2. Use Assert Messages

3. Test Edge Cases

4. Verify Token Savings

Untested Modules Backlog

Next Steps