Skip to main content

FilteringOptions

Handles command-line argument parsing and filtering configuration.

parse_args

Parse command-line arguments into a configuration object.
args = fr.FilteringOptions.parse_args(input_args, error_on_empty=True)
input_args
list[str]
required
List of command-line arguments (typically sys.argv[1:])
error_on_empty
bool
default:true
Whether to raise an error if no arguments provided
Returns an argparse.Namespace object with all filtering options.

Example

import git_filter_repo as fr

# Parse with custom options
args = fr.FilteringOptions.parse_args([
    '--path', 'src/',
    '--path', 'docs/',
    '--force',
    '--replace-text', 'expressions.txt'
])

# Create filter with these options
filter = fr.RepoFilter(args)
filter.run()

default_options

Get default filtering options without parsing arguments.
args = fr.FilteringOptions.default_options()
Returns an argparse.Namespace with default values. Useful when you want to programmatically set options:
args = fr.FilteringOptions.default_options()
args.force = True
args.refs = ['--all']
args.path_changes = [('filter', 'match', b'src/')]

Common Options

Key attributes in the returned args object:
force
bool
default:false
Allow filtering on non-fresh clones
partial
bool
default:false
Do partial history rewrite (keeps old and new history)
refs
list[str]
default:["--all"]
Refs to filter (e.g., ['refs/heads/main'] or ['--all'])
replace_refs
str
How to handle replace refs: 'delete-no-add', 'delete-and-add', 'update-no-add', 'update-or-add', 'update-and-add'
prune_empty
str
default:"auto"
Whether to prune empty commits: 'always', 'auto', 'never'
path_changes
list
List of path filtering/renaming operations. Each item is a tuple: (mod_type, match_type, value)
max_blob_size
int
default:0
Strip blobs larger than this size (in bytes)
replace_text
dict
Text replacement rules from --replace-text file
mailmap
MailmapInfo
Mailmap for name/email translation
source
bytes
Source repository path (for --source)
target
bytes
Target repository path (for --target)
quiet
bool
default:false
Suppress progress output
debug
bool
default:false
Show debug information

RepoFilter

The main filtering engine that processes repository history.

Constructor

filter = fr.RepoFilter(
    args,
    filename_callback=None,
    message_callback=None,
    name_callback=None,
    email_callback=None,
    refname_callback=None,
    blob_callback=None,
    commit_callback=None,
    tag_callback=None,
    reset_callback=None,
    done_callback=None,
    file_info_callback=None
)
args
argparse.Namespace
required
Filtering options from FilteringOptions.parse_args() or default_options()
filename_callback
callable
Function called for each filename: def callback(filename: bytes) -> bytes | None
message_callback
callable
Function called for commit/tag messages: def callback(message: bytes) -> bytes
name_callback
callable
Function called for author/committer/tagger names: def callback(name: bytes) -> bytes
email_callback
callable
Function called for email addresses: def callback(email: bytes) -> bytes
refname_callback
callable
Function called for branch/tag refs: def callback(refname: bytes) -> bytes
blob_callback
callable
Function called for each blob: def callback(blob: Blob, metadata: dict) -> None
commit_callback
callable
Function called for each commit: def callback(commit: Commit, metadata: dict) -> None
tag_callback
callable
Function called for each tag: def callback(tag: Tag, metadata: dict) -> None
reset_callback
callable
Function called for each reset: def callback(reset: Reset, metadata: dict) -> None
done_callback
callable
Function called when processing completes: def callback() -> None
file_info_callback
callable
Function called for file changes: def callback(filename: bytes, mode: bytes, blob_id: int|bytes, value: FileInfoValueHelper) -> tuple

Methods

run

Execute the filtering operation.
filter.run()
This method:
  1. Runs sanity checks on the repository
  2. Starts fast-export to read history
  3. Processes each Git object through callbacks
  4. Writes filtered objects to fast-import
  5. Updates refs and performs cleanup

insert

Manually insert a Git object into the output stream.
filter.insert(obj)
obj
Blob | Commit | Reset | Tag
required
The object to insert
Useful for:
  • Adding new commits
  • Inserting modified blobs
  • Creating new branches/tags
def add_license_blob(commit, metadata):
    if len(commit.parents) == 0:  # Root commit
        # Create new blob with license text
        blob = fr.Blob(b'MIT License\n...')
        filter.insert(blob)
        
        # Add file change to commit
        commit.file_changes.append(
            fr.FileChange(b'M', b'LICENSE', blob.id, b'100644')
        )

Class Methods

sanity_check

Check if repository is safe to filter (called automatically by run()).
fr.RepoFilter.sanity_check(refs, is_bare, config_settings)
refs
dict
required
Dictionary of ref names to hashes
is_bare
bool
required
Whether repository is bare
config_settings
dict
required
Git config settings
Raises SystemExit if repository doesn’t appear to be a fresh clone.

GitUtils

Utility class for Git operations.

get_commit_count

Count commits in repository.
count = fr.GitUtils.get_commit_count(repo, *args)
repo
bytes
required
Path to repository
args
list
Arguments to git rev-list (default: ['--all'])

get_blob_sizes

Get sizes of all blobs in repository.
unpacked_size, packed_size = fr.GitUtils.get_blob_sizes(quiet=False)
Returns two dicts mapping blob hash to size in bytes.

get_file_changes

Get file changes between two commits.
file_changes = fr.GitUtils.get_file_changes(
    repo, parent_hash, commit_hash
)
repo
bytes
required
Path to repository
parent_hash
bytes
required
Parent commit hash
commit_hash
bytes
required
Commit hash
Returns list of FileChange objects.

get_refs

Get all refs in repository.
refs = fr.GitUtils.get_refs(repo_working_dir)
Returns dict mapping ref names (bytes) to commit hashes (bytes).

FastExportParser

Low-level parser for git fast-export output. Most users won’t need this directly as RepoFilter handles it internally.
parser = fr.FastExportParser(
    blob_callback=None,
    commit_callback=None,
    tag_callback=None,
    reset_callback=None,
    checkpoint_callback=None,
    done_callback=None
)
The parser processes a fast-export stream and calls appropriate callbacks for each object type. It’s used internally by RepoFilter to parse the output of git fast-export.

Methods

  • parse_stream() - Parse fast-export output from stdin
  • parse_file(filename) - Parse fast-export output from a file

ProgressWriter

Handles progress output during filtering operations.
writer = fr.ProgressWriter()
The ProgressWriter class writes progress updates to stderr in the format expected by git fast-import. It’s used internally by RepoFilter.

Methods

  • write(progress_message) - Write a progress message

record_id_rename

Utility function to record object ID translations.
fr.record_id_rename(old_id, new_id)
old_id
bytes
required
Original object ID (mark or hash)
new_id
bytes
required
New object ID after filtering
Used internally to track how object IDs change during filtering. This is particularly important for maintaining consistency when commit IDs are referenced in commit messages.

Complete Example

Comprehensive filtering script combining multiple features:
#!/usr/bin/env python3
import sys
import git_filter_repo as fr

# Track statistics
stats = {
    'commits_processed': 0,
    'blobs_modified': 0,
    'files_removed': 0
}

def blob_callback(blob, metadata):
    """Replace sensitive data in blobs."""
    # Only process text files
    if b"\0" not in blob.data[0:8192]:
        original_size = len(blob.data)
        
        # Replace sensitive patterns
        blob.data = blob.data.replace(
            b'password = "secret"',
            b'password = "***"'
        )
        
        if len(blob.data) != original_size:
            stats['blobs_modified'] += 1

def commit_callback(commit, metadata):
    """Filter commits and update metadata."""
    stats['commits_processed'] += 1
    
    # Remove files matching pattern
    original_count = len(commit.file_changes)
    commit.file_changes = [
        c for c in commit.file_changes
        if not c.filename.endswith(b'.log')
    ]
    stats['files_removed'] += original_count - len(commit.file_changes)
    
    # Update commit message
    if b'[skip ci]' not in commit.message:
        commit.message = commit.message.rstrip() + b'\n\n[filtered]'
    
    # Update author email domain
    if commit.author_email.endswith(b'@oldcompany.com'):
        commit.author_email = commit.author_email.replace(
            b'@oldcompany.com',
            b'@newcompany.com'
        )

def done_callback():
    """Print statistics when filtering completes."""
    print(f"\nFiltering Statistics:")
    print(f"  Commits processed: {stats['commits_processed']}")
    print(f"  Blobs modified: {stats['blobs_modified']}")
    print(f"  Files removed: {stats['files_removed']}")

# Parse arguments
args = fr.FilteringOptions.parse_args([
    '--force',
    '--path', 'src/',
    '--path', 'docs/',
    '--replace-text', 'expressions.txt'
])

# Create and run filter
filter = fr.RepoFilter(
    args,
    blob_callback=blob_callback,
    commit_callback=commit_callback,
    done_callback=done_callback
)
filter.run()

print("\nFiltering complete!")

Advanced Usage

Multiple Source/Target Repositories

import git_filter_repo as fr

# Set up source filter
source_args = fr.FilteringOptions.default_options()
source_args.source = b'/path/to/source/repo'
source_args.refs = ['--all']

# Set up target filter  
target_args = fr.FilteringOptions.default_options()
target_args.target = b'/path/to/target/repo'
target_args.force = True

# Create filters
target_filter = fr.RepoFilter(target_args)
target_filter.importer_only()

source_filter = fr.RepoFilter(source_args, commit_callback=my_callback)
source_filter.set_output(target_filter)

# Run the pipeline
source_filter.run()

State Branch for Incremental Filtering

args = fr.FilteringOptions.parse_args([
    '--state-branch', 'filter-state',
    '--force'
])

filter = fr.RepoFilter(args, commit_callback=my_callback)
filter.run()

# On subsequent runs, the state is preserved
# and only new commits are processed

Build docs developers (and LLMs) love