Skip to main content

Overview

Advanced filtering uses Python callbacks to give you complete control over the filtering process. This enables complex operations that can’t be achieved with simple command-line options.

Understanding Callbacks

Callbacks are Python functions that filter-repo calls for each git object. You provide the function body as a string.

Basic Callback Structure

For a callback like --name-callback, filter-repo creates:
def name_callback(name):
  YOUR_CODE_HERE
  return modified_name
You only provide the YOUR_CODE_HERE part.
Bytestrings Requiredgit-filter-repo uses bytestrings (bytes), not strings:
  • Use b"text" instead of "text"
  • Compare with b"value" not "value"
  • Use .replace(b"old", b"new")

Simple Callbacks

Name Callback

Modify author, committer, and tagger names:
git filter-repo --name-callback '
  return name.replace(b"Wiliam", b"William")
'

Email Callback

Fix email addresses:
git filter-repo --email-callback '
  # Fix common typos
  email = email.replace(b".cm", b".com")
  email = email.replace(b"gmial.com", b"gmail.com")
  return email
'

Refname Callback

Modify branch and tag names:
git filter-repo --refname-callback '
  # Add prefix to all branches (refs/heads/main -> refs/heads/v2-main)
  if refname.startswith(b"refs/heads/"):
    branch = refname[11:]  # Remove "refs/heads/"
    return b"refs/heads/v2-" + branch
  return refname
'
Refnames must be fully qualified:
  • Use b"refs/heads/main" not b"main"
  • Use b"refs/tags/v1.0" not b"v1.0"

Filename Callback

Rename or remove files:
git filter-repo --filename-callback '
  # Remove all files in src/ subdirectories (except toplevel src/)
  if b"/src/" in filename:
    return None  # Delete file
  
  # Rename tools/ -> scripts/misc/
  if filename.startswith(b"tools/"):
    return b"scripts/misc/" + filename[6:]
  
  # Keep all other files unchanged
  return filename
'
Return values:
  • filename - Keep file unchanged
  • Modified filename - Rename file
  • None - Remove file from history

Message Callback

Modify commit and tag messages:
git filter-repo --message-callback '
  # Add Signed-off-by if missing
  if b"Signed-off-by:" not in message:
    message += b"\nSigned-off-by: Me Myself <[email protected]>"
  
  # Fix typos
  message = re.sub(b"[Ee]-?[Mm][Aa][Ii][Ll]", b"email", message)
  
  return message
'

Object Callbacks

More powerful callbacks that operate on complete git objects.

Blob Callback

Modify file contents:
git filter-repo --blob-callback '
  # Skip blobs over 25 bytes
  if len(blob.data) > 25:
    blob.skip()
  else:
    blob.data = blob.data.replace(b"Hello", b"Goodbye")
'
Blob properties:
  • blob.data - File contents (bytes)
  • blob.original_id - Original git hash
  • blob.id - New git object ID
  • blob.skip() - Remove this blob

Commit Callback

Modify commits:
git filter-repo --commit-callback '
  # Remove executable files with "666" in their name
  commit.file_changes = [
    change for change in commit.file_changes
    if not (change.mode == b"100755" and b"666" in change.filename)
  ]
  
  # Prevent deletion of specific file
  commit.file_changes = [
    change for change in commit.file_changes
    if not (change.type == b"D" and change.filename == b"important.txt")
  ]
  
  # Make all .sh files executable
  for change in commit.file_changes:
    if change.filename.endswith(b".sh"):
      change.mode = b"100755"
'
Commit properties:
  • commit.branch - Branch name (bytes)
  • commit.original_id - Original commit hash
  • commit.author_name, commit.author_email, commit.author_date
  • commit.committer_name, commit.committer_email, commit.committer_date
  • commit.message - Commit message (bytes)
  • commit.parents - List of parent commit IDs
  • commit.file_changes - List of FileChange objects
  • commit.skip(new_id) - Skip this commit
FileChange properties:
  • change.type - b"M" (modify), b"D" (delete), b"DELETEALL"
  • change.filename - Path (bytes)
  • change.mode - File mode: b"100644", b"100755", b"120000", b"160000"
  • change.blob_id - Git blob ID

Tag Callback

Modify annotated tags:
git filter-repo --tag-callback '
  # Skip tags by specific author
  if tag.tagger_name == b"Jim Williams":
    tag.skip()
  else:
    # Add extra info to tag message
    tag.message += b"\n\nTag of %s by %s on %s" % (
      tag.ref, tag.tagger_email, tag.tagger_date
    )
'
Tag properties:
  • tag.ref - Tag name (without refs/tags/ prefix)
  • tag.from_ref - Commit being tagged
  • tag.original_id - Original tag hash
  • tag.tagger_name, tag.tagger_email, tag.tagger_date
  • tag.message - Tag message
  • tag.skip() - Remove this tag

Reset Callback

Modify reset (branch creation) events:
git filter-repo --reset-callback '
  # Rename master branch to main
  reset.ref = reset.ref.replace(b"master", b"main")
'
Reset properties:
  • reset.ref - Reference name
  • reset.from_ref - Commit hash or mark

Advanced Use Cases

Multi-Line Callbacks

Use multi-line Python code:
git filter-repo --filename-callback '
  # Define a mapping
  renames = {
    b"README": b"README.md",
    b"COPYING": b"LICENSE",
    b"AUTHORS": b"CONTRIBUTORS.md",
  }
  
  # Apply renames
  if filename in renames:
    return renames[filename]
  
  # Remove backup files
  if filename.endswith(b".bak") or filename.endswith(b"~"):
    return None
  
  return filename
'

Using Regular Expressions

The re module is available:
git filter-repo --message-callback '
  # Convert issue references: #123 -> JIRA-123
  message = re.sub(b"#(\\d+)", b"JIRA-\\1", message)
  
  # Remove trailing whitespace from each line
  lines = message.split(b"\\n")
  lines = [re.sub(b"\\s+$", b"", line) for line in lines]
  message = b"\\n".join(lines)
  
  return message
'

Accessing Metadata

Commit callback receives additional metadata:
git filter-repo --commit-callback '
  # aux_info contains:
  # - orig_parents: original parent commit IDs
  # - had_file_changes: whether commit had file changes
  
  # Example: Mark commits that lost all files
  if not commit.file_changes and aux_info["had_file_changes"]:
    commit.message += b"\n\n[Note: All file changes filtered out]"
'

Conditional Processing

git filter-repo --blob-callback '
  # Only process small text files
  if len(blob.data) > 1024 * 1024:  # > 1MB
    return
  
  if b"\\0" in blob.data[0:8192]:  # Binary file
    return
  
  # Safe to process as text
  blob.data = blob.data.upper()
'

Combining Callbacks

Use multiple callbacks together:
git filter-repo \
  --name-callback 'return name.title()' \
  --email-callback 'return email.lower()' \
  --filename-callback '
    if filename.endswith(b".tmp"):
      return None
    return filename
  ' \
  --message-callback '
    return message.replace(b"TODO", b"DONE")
  '

Complex Examples

Enforce File Naming Convention

git filter-repo --filename-callback '
  # Convert to lowercase
  parts = filename.split(b"/")
  parts[-1] = parts[-1].lower()
  filename = b"/".join(parts)
  
  # Replace spaces with hyphens
  filename = filename.replace(b" ", b"-")
  
  # Remove special characters
  filename = re.sub(b"[^a-z0-9/_.-]", b"", filename)
  
  return filename
'

Add File Headers

git filter-repo --blob-callback '
  # Skip binary files
  if b"\\0" in blob.data[0:8192]:
    return
  
  # Add copyright header to source files
  header = b"""# Copyright (C) 2024 Example Corp
# Licensed under MIT License

"""
  
  if not blob.data.startswith(b"# Copyright"):
    blob.data = header + blob.data
'

Squash Small Commits

This requires more complex logic:
git filter-repo --commit-callback '
  # Skip commits with tiny messages
  if len(commit.message) < 10:
    commit.skip(commit.first_parent())
'
commit.skip(new_id) marks the commit as skipped and maps its ID to new_id. Children of this commit will use new_id as their parent.

Rewrite Dates

git filter-repo --commit-callback '
  # Make all commits appear to be from 2024
  import time
  from datetime import datetime
  
  # Parse existing date
  timestamp, timezone = commit.author_date.split()
  dt = datetime.fromtimestamp(int(timestamp))
  
  # Update year
  new_dt = dt.replace(year=2024)
  new_timestamp = int(new_dt.timestamp())
  
  # Update both author and committer dates
  commit.author_date = b"%d %s" % (new_timestamp, timezone)
  commit.committer_date = commit.author_date
'

Remove Merge Commits

git filter-repo --commit-callback '
  # Skip merge commits (commits with multiple parents)
  if len(commit.parents) > 1:
    commit.skip(commit.first_parent())
'

Using External Scripts

For very complex logic, use external Python scripts:
git filter-repo --commit-callback "$(cat my_callback.py)"
my_callback.py:
import json

# Load configuration
with open('filter-config.json', 'rb') as f:
  config = json.load(f)

# Complex filtering logic
if commit.branch in config['protected_branches']:
  return

# ... more logic ...

Performance Tips

Optimize Callbacks
  1. Avoid expensive operations in hot paths
  2. Cache results when possible
  3. Short-circuit early if possible
  4. Use bytestring operations (faster than string)
# Good: Short-circuit early
if not filename.endswith(b".py"):
  return filename
# ... expensive processing ...

# Bad: Always processes
# ... expensive processing ...
if filename.endswith(b".py"):
  return modified_filename
return filename
Callback ErrorsIf a callback raises an exception, filter-repo will abort. Test thoroughly:
# Test on a small branch first
git filter-repo --refs test-branch --callback '...'

Available Modules

These Python modules are available in callbacks:
  • argparse - Argument parsing
  • collections - Container datatypes
  • fnmatch - Filename pattern matching
  • io - I/O operations
  • os - Operating system interface
  • platform - Platform identification
  • re - Regular expressions
  • shutil - High-level file operations
  • subprocess - Subprocess management
  • sys - System-specific parameters
  • time - Time access
  • textwrap - Text wrapping
  • datetime - Date/time handling
Plus all filter-repo classes:
  • Blob, Commit, Tag, Reset, FileChange
  • FilteringOptions, RepoFilter

API Compatibility Warning

API May ChangeThe callback API is NOT guaranteed to be stable. If you write scripts that use callbacks:
  1. Pin to a specific git-filter-repo version
  2. Test after any upgrades
  3. Contribute test cases for APIs you rely on
See Library Usage for more stable APIs.

Next Steps

Build docs developers (and LLMs) love