FilteringOptions
Handles command-line argument parsing and filtering configuration.parse_args
Parse command-line arguments into a configuration object.List of command-line arguments (typically
sys.argv[1:])Whether to raise an error if no arguments provided
argparse.Namespace object with all filtering options.
Example
default_options
Get default filtering options without parsing arguments.argparse.Namespace with default values. Useful when you want to programmatically set options:
Common Options
Key attributes in the returned args object:Allow filtering on non-fresh clones
Do partial history rewrite (keeps old and new history)
Refs to filter (e.g.,
['refs/heads/main'] or ['--all'])How to handle replace refs:
'delete-no-add', 'delete-and-add', 'update-no-add', 'update-or-add', 'update-and-add'Whether to prune empty commits:
'always', 'auto', 'never'List of path filtering/renaming operations. Each item is a tuple:
(mod_type, match_type, value)Strip blobs larger than this size (in bytes)
Text replacement rules from
--replace-text fileMailmap for name/email translation
Source repository path (for
--source)Target repository path (for
--target)Suppress progress output
Show debug information
RepoFilter
The main filtering engine that processes repository history.Constructor
Filtering options from
FilteringOptions.parse_args() or default_options()Function called for each filename:
def callback(filename: bytes) -> bytes | NoneFunction called for commit/tag messages:
def callback(message: bytes) -> bytesFunction called for author/committer/tagger names:
def callback(name: bytes) -> bytesFunction called for email addresses:
def callback(email: bytes) -> bytesFunction called for branch/tag refs:
def callback(refname: bytes) -> bytesFunction called for each blob:
def callback(blob: Blob, metadata: dict) -> NoneFunction called for each commit:
def callback(commit: Commit, metadata: dict) -> NoneFunction called for each tag:
def callback(tag: Tag, metadata: dict) -> NoneFunction called for each reset:
def callback(reset: Reset, metadata: dict) -> NoneFunction called when processing completes:
def callback() -> NoneFunction called for file changes:
def callback(filename: bytes, mode: bytes, blob_id: int|bytes, value: FileInfoValueHelper) -> tupleMethods
run
Execute the filtering operation.- Runs sanity checks on the repository
- Starts fast-export to read history
- Processes each Git object through callbacks
- Writes filtered objects to fast-import
- Updates refs and performs cleanup
insert
Manually insert a Git object into the output stream.The object to insert
- Adding new commits
- Inserting modified blobs
- Creating new branches/tags
Class Methods
sanity_check
Check if repository is safe to filter (called automatically byrun()).
Dictionary of ref names to hashes
Whether repository is bare
Git config settings
SystemExit if repository doesn’t appear to be a fresh clone.
GitUtils
Utility class for Git operations.get_commit_count
Count commits in repository.Path to repository
Arguments to
git rev-list (default: ['--all'])get_blob_sizes
Get sizes of all blobs in repository.get_file_changes
Get file changes between two commits.Path to repository
Parent commit hash
Commit hash
FileChange objects.
get_refs
Get all refs in repository.FastExportParser
Low-level parser for git fast-export output. Most users won’t need this directly asRepoFilter handles it internally.
RepoFilter to parse the output of git fast-export.
Methods
parse_stream()- Parse fast-export output from stdinparse_file(filename)- Parse fast-export output from a file
ProgressWriter
Handles progress output during filtering operations.ProgressWriter class writes progress updates to stderr in the format expected by git fast-import. It’s used internally by RepoFilter.
Methods
write(progress_message)- Write a progress message
record_id_rename
Utility function to record object ID translations.Original object ID (mark or hash)
New object ID after filtering
