Performance characteristics

Dirty is designed to be fast, even when scanning directories with many repositories. This guide explains how it works and how to optimize performance.

How Dirty works

Dirty’s performance comes from three key design decisions:

Parallel scanning with rayon

Dirty uses the rayon crate to inspect repositories in parallel. After finding all repository paths, it processes them concurrently across multiple CPU cores.

// From main.rs:119-123
let infos: Vec<_> = repos
    .par_iter()
    .filter_map(|p| inspect_repo(p, args.include_unpushed))
    .filter(|i| (!args.dirty || i.dirty) && (!args.local || i.local_only))
    .collect();

Direct git access with libgit2

Instead of spawning git processes, Dirty uses libgit2 via the git2 crate to access repository data directly. This eliminates process spawning overhead.

// From main.rs:89-96
let repo = Repository::open(path).ok()?;

let mut opts = StatusOptions::new();
opts.include_untracked(true)
    .recurse_untracked_dirs(false)
    .exclude_submodules(true);
let dirty = !repo.statuses(Some(&mut opts)).ok()?.is_empty();
let local_only = repo.remotes().ok().is_none_or(|r| r.is_empty());

Limited depth by default

By default, Dirty only searches 3 levels deep. This balances speed with coverage for typical directory structures.

Default depth of 3

The default depth of 3 is optimized for common project layouts:

~/code/               # depth 0
├── projects/         # depth 1
│   ├── app/          # depth 2 (repo)
│   └── lib/          # depth 2 (repo)
└── personal/         # depth 1
    └── tools/        # depth 2
        └── scripts/  # depth 3 (repo)

Depth is measured from the starting directory. A repo at depth 3 means there are 3 directory levels between the start path and the repository.

Why depth 3?

Fast scanning: Limits filesystem traversal
Typical coverage: Most developers organize code 1-3 levels deep
Predictable performance: Prevents accidentally scanning entire home directories

Adjusting depth with -L flag

Use the -L flag to control how deep Dirty searches:

# Faster: only immediate subdirectories
dirty -L 1 ~/code

# Deeper: search up to 5 levels
dirty -L 5 ~/code

# Very deep: search up to 10 levels (slower)
dirty -L 10 ~/code

Start with -L 1 or -L 2 for faster scans of well-organized directories. Increase depth only if repositories are being missed.

Finding the right depth

If you see “No git repos found”, try increasing the depth:

# Test different depths to find repositories
dirty -L 1 ~/code  # Fast but might miss nested repos
dirty -L 3 ~/code  # Default balance
dirty -L 5 ~/code  # Slower but more thorough

Performance note: —include-unpushed flag

The --include-unpushed flag shows how many commits ahead of upstream each repository is, but it’s significantly slower:

// From main.rs:26-28
/// Include unpushed commit info (ahead of upstream) in the output
///
/// Note: this requires resolving the upstream tracking branch, which is slower,
/// so it is only computed when this flag is set.

Why is it slower?

Checking unpushed commits requires:

Resolving the upstream tracking branch
Computing graph distance between HEAD and upstream
Handling edge cases (detached HEAD, no upstream, etc.)

# Fast: basic status check
dirty ~/code

# Slower: includes unpushed commit counts
dirty --include-unpushed ~/code

Only use --include-unpushed when you need unpushed commit information. For general scans, omit this flag for better performance.

Handling large monorepo directories

Dirty is optimized for scanning multiple repositories, not for inspecting large individual repositories.

Stopping at .git directories

Dirty stops descending into subdirectories once it finds a .git folder:

// From main.rs:55-58
if dir.join(".git").exists() {
    repos.push(dir.to_path_buf());
    return;  // Stop recursing
}

This means:

Large repos don’t slow down scanning: Dirty won’t traverse a monorepo’s entire file tree
Nested repos are skipped: Only the top-level repository is detected

monorepo/
├── .git/              # Dirty finds this
├── packages/
│   └── nested-repo/
│       └── .git/      # This is NOT scanned (inside a repo)
└── ...

If you have intentionally nested git repositories (not submodules), only the outermost repository will be detected.

Symbolic links are skipped

Dirty does not follow symbolic links to avoid infinite loops and duplicate scanning:

// From main.rs:64
if path.is_dir() && !path.is_symlink() {
    collect_repos(&path, max_depth, depth + 1, repos);
}

If your repositories are behind symlinks, you’ll need to scan the actual directories:

# Won't find repos behind symlink
dirty ~/symlink-to-code

# Will find repos
dirty ~/actual-code-directory

Tips for optimal performance

Scan specific subdirectories

Instead of scanning your entire home directory, target specific code directories:

# Slow: scans everything
dirty ~ -L 5

# Fast: scans only code directory
dirty ~/code -L 3

Use filters to reduce output processing

Filtering happens during scanning, so filters don’t significantly impact performance:

# Same speed: filtering is efficient
dirty --dirty ~/code
dirty --local ~/code
dirty --dirty --local ~/code

Avoid unnecessary —include-unpushed

Only use --include-unpushed when you specifically need unpushed commit counts:

# Fast: general status check
dirty ~/code

# Slow: includes upstream comparisons
dirty --include-unpushed ~/code

Lower depth for faster CI/automation

In CI environments or scripts where you know repository locations, use lower depth:

# Fast: depth 1 for flat structures
dirty -L 1 /workspace