Why git-filter-repo Exists
None of the existing repository filtering tools (git filter-branch, BFG Repo Cleaner, manual fast-export/fast-import) provided what was needed. No tool provided any of the first eight traits listed below, and no tool provided more than two of the last four traits. git-filter-repo was built from the ground up to address all 12 of these design goals.The 12 Design Goals
1. Starting Report
Problem: Users often don’t know what to filter or how to begin. Solution: Provide an analysis of the repository to help users understand what to prune or rename.How it works
How it works
Running
git filter-repo --analyze generates reports showing:- All paths that have ever existed in the repository
- File renames that have occurred
- Sizes of objects aggregated by path, directory, extension, and blob ID
- Largest files and directories in history
2. Keep vs. Remove
Problem: Most tools only provide ways to remove paths. Specifying all paths to keep requires listing everything that ever existed. Solution: Provide both--path (to keep) and --path-regex with exclusion patterns.
With
--path, you specify what to keep. Everything else is automatically removed. This is much simpler than having to list every path you want to exclude.Keep only specific directories
3. Renaming
Problem: Renaming paths was difficult or impossible with existing tools. Solution: Make path renaming easy with sanity checks.Renaming capabilities
Renaming capabilities
- Treat a subdirectory as the root:
--subdirectory-filter - Move root to a subdirectory:
--to-subdirectory-filter - Rename paths:
--path-rename - Detect collisions when renames cause multiple files to have the same path
- Special handling for commit copies (oldname→newname without modification)
Examples
4. More Intelligent Safety
Problem: git filter-branch writes copies of original refs to a special namespace, which is not a user-friendly recovery mechanism. Solution: Detect and require a fresh clone, ensuring users have a good backup. See Fresh Clone Requirements for detailed information.Safe workflow
5. Auto Shrink
Problem: After filtering, users had to manually remove old cruft and repack. The documented steps didn’t always work. Solution: Automatically clean up and repack the repository after filtering.git-filter-repo automatically:
- Expires all reflogs
- Deletes the origin remote (to prevent accidental pushes of rewritten history)
- Repacks the repository
- Runs garbage collection
6. Clean Separation
Problem: Mixing old and rewritten repositories together causes confusion and accidental re-pushing of old data. Solution: Remove origin remote and avoid mixing old and new refs.After filtering
You need to explicitly add a new remote for your rewritten repository:
7. Versatility
Problem: Shell-based filtering is:- OS-dependent
- Has poor string manipulation
- Requires forking processes
- Lacks rich data structures
Command-Line Flags
Simple flags for common operations like
--path, --replace-text, --mailmapPython Callbacks
Register functions to process specific data types or Git objects
Python Library
Import filter-repo as a module to build custom tools
Rich Data Structures
Use Python’s dicts, lists, and objects instead of shell variables
Callback example
8. Old Commit References
Problem: After rewriting, old commit IDs in emails, issues, and documentation become invalid. Solution: Provide a mapping from old to new commit IDs viarefs/replace/ references.
Using the mapping
9. Commit Message Consistency
Problem: Commit messages often reference other commits by SHA-1 (“reverts commit abc123”, “fixes commit def456”). After rewriting, these references are invalid. Solution: Automatically rewrite commit message references to use new commit IDs.git-filter-repo detects patterns like:
- “reverts commit abc123”
- “fixes def456”
- “see commit abc123def456”
10. Become-Empty Pruning
Problem: Commits that become empty due to filtering should be pruned, but git filter-branch:- Misses commits that should be pruned
- Prunes commits that started empty (which may be intentional)
How empty commit pruning works
How empty commit pruning works
- If a commit’s file changes are all filtered out, the commit becomes empty
- If the commit’s parent is also pruned, use the first non-pruned ancestor as the new parent
- If no non-pruned ancestor exists and it’s not a merge, make it a new root commit
- If it’s a merge with no non-pruned ancestors, remove that parent (potentially making it a non-merge)
- Preserve commits that were empty from the start (often used for versioning/releases)
11. Become-Degenerate Pruning
Problem: Pruning commits can cause topology changes. Merge commits can become degenerate when:- Both parents become the same commit (after ancestor pruning)
- One parent becomes an ancestor of the other
--no-ff merges that started degenerate).
12. Speed
Problem: git filter-branch is extremely to unusably slow for non-trivial repositories. Solution: Use the fast-export/fast-import pipeline for maximum performance.git-filter-repo is multiple orders of magnitude faster than git filter-branch. Operations that took hours with filter-branch often complete in minutes with filter-repo.
Comparison with Other Tools
vs. git filter-branch
- Speed: filter-branch is multiple orders of magnitude slower
- Safety: filter-branch has many gotchas that can silently corrupt history
- Usability: filter-branch is very onerous to use for non-trivial rewrites
- Maintenance: Git project says filter-branch issues cannot be backward-compatibly fixed
vs. BFG Repo Cleaner
- Scope: BFG is limited to a few kinds of rewrites
- Architecture: BFG’s architecture is not amenable to handling more types of rewrites
- Bugs: BFG has shortcomings and bugs even for its intended use case
- Extensibility: BFG cannot be extended with custom logic
For BFG users, there’s
bfg-ish, a reimplementation of BFG based on filter-repo with several new features and bugfixes. See the contrib/filter-repo-demos/ directory.vs. Manual fast-export/fast-import
- Complexity: Manual stream editing is error-prone
- Corruption risk: Regex replacements on the stream can corrupt commit messages or file contents
- Empty commits: No way to prune empty commits
- Commit references: No way to update commit message references
- Character encoding: Often breaks with non-ASCII filenames
Design Philosophy Summary
git-filter-repo was designed to be:- Safe: Require fresh clones, validate state, provide clear errors
- Fast: Use optimal architecture, minimal overhead
- Powerful: Handle all types of history rewriting
- User-friendly: Good defaults, helpful analysis, clear documentation
- Extensible: Python callbacks and library usage
- Correct: Handle edge cases properly (empty commits, degenerate merges, etc.)
Next Steps
How It Works
Understand the fast-export | filter | fast-import pipeline
Fresh Clone Requirements
Learn why fresh clones are required and how to override
Quick Start
Start using git-filter-repo with practical examples
Use Cases
See real-world examples of history rewriting
