Overview
Callbacks are functions you provide to RepoFilter that get called when processing different Git objects. They allow you to inspect and modify repository history programmatically.
Callback Types
git-filter-repo supports several types of callbacks, each called at different points during filtering.
Object Callbacks
Called for each Git object (with full object access):
blob_callback - Called for each blob (file content)
commit_callback - Called for each commit
tag_callback - Called for each annotated tag
reset_callback - Called for each branch reset
Field Callbacks
Called for specific fields (simpler, string-based):
filename_callback - Called for each filename
message_callback - Called for commit/tag messages
name_callback - Called for author/committer/tagger names
email_callback - Called for email addresses
refname_callback - Called for branch/tag names
Special Callbacks
file_info_callback - Advanced callback with access to file contents and metadata
done_callback - Called once when filtering completes
Callback Signatures
blob_callback
def blob_callback(blob: Blob, metadata: dict) -> None:
"""Called for each blob."""
pass
The blob object to process. Modify blob.data to change contents.
Contains commit_rename_func, ancestry_graph, original_ancestry_graph
Example:
def blob_callback(blob, metadata):
# Skip binary files
if b"\0" in blob.data[0:8192]:
return
# Replace text in all text files
blob.data = blob.data.replace(b'TODO', b'DONE')
# Skip large files
if len(blob.data) > 5_000_000:
blob.skip()
commit_callback
def commit_callback(commit: Commit, metadata: dict) -> None:
"""Called for each commit."""
pass
The commit object to process. Modify any attribute.
Includes commit_rename_func, ancestry_graph, original_ancestry_graph, orig_parents, had_file_changes
Example:
def commit_callback(commit, metadata):
# Add sign-off to all commits
author = f"{commit.author_name.decode()} <{commit.author_email.decode()}>"
sign_off = f"\n\nSigned-off-by: {author}".encode()
if sign_off not in commit.message:
commit.message = commit.message.rstrip() + sign_off
# Filter file changes
commit.file_changes = [
c for c in commit.file_changes
if c.filename.startswith(b'src/')
]
# Skip if commit becomes empty
if not commit.file_changes and commit.parents:
commit.skip(commit.first_parent())
tag_callback
def tag_callback(tag: Tag, metadata: dict) -> None:
"""Called for each annotated tag."""
pass
The tag object to process
Example:
def tag_callback(tag, metadata):
# Rename version tags
if tag.ref.startswith(b'v'):
tag.ref = b'version-' + tag.ref[1:]
# Update tagger email
if tag.tagger_email == b'[email protected]':
tag.tagger_email = b'[email protected]'
reset_callback
def reset_callback(reset: Reset, metadata: dict) -> None:
"""Called for each branch reset."""
pass
Example:
def reset_callback(reset, metadata):
# Rename master to main
if reset.ref == b'refs/heads/master':
reset.ref = b'refs/heads/main'
filename_callback
def filename_callback(filename: bytes) -> bytes | None:
"""Called for each filename. Return None to exclude file."""
pass
Returns: Modified filename (bytes) or None to exclude the file
Example:
def filename_callback(filename):
# Exclude build artifacts
if filename.endswith(b'.pyc') or filename.endswith(b'.o'):
return None
# Rename directory
if filename.startswith(b'old_src/'):
return b'src/' + filename[8:]
return filename
message_callback
def message_callback(message: bytes) -> bytes:
"""Called for commit and tag messages."""
pass
Example:
import re
def message_callback(message):
# Remove JIRA ticket references
message = re.sub(br'\[?PROJ-\d+\]?:?\s*', b'', message)
# Normalize line endings
message = message.replace(b'\r\n', b'\n')
return message
name_callback
def name_callback(name: bytes) -> bytes:
"""Called for author, committer, and tagger names."""
pass
Example:
def name_callback(name):
# Normalize name format
return name.replace(b'Jon', b'John')
email_callback
def email_callback(email: bytes) -> bytes:
"""Called for all email addresses."""
pass
Example:
def email_callback(email):
# Update company domain
if email.endswith(b'@oldcompany.com'):
return email.replace(b'@oldcompany.com', b'@newcompany.com')
return email
refname_callback
def refname_callback(refname: bytes) -> bytes:
"""Called for branch and tag references."""
pass
Example:
def refname_callback(refname):
# Add prefix to all branches
if refname.startswith(b'refs/heads/'):
branch = refname[11:] # Remove 'refs/heads/'
return b'refs/heads/team1-' + branch
return refname
file_info_callback
Advanced callback with access to file contents and utilities.
def file_info_callback(
filename: bytes,
mode: bytes,
blob_id: int | bytes,
value: FileInfoValueHelper
) -> tuple[bytes, bytes, int | bytes]:
"""Process file with access to contents."""
pass
File mode (b'100644', b'100755', b'120000', b'160000')
value
FileInfoValueHelper
required
Helper object with utility methods (see below)
Returns: (filename, mode, blob_id) tuple, or (filename, None, None) to delete, or (None, ...) to exclude
FileInfoValueHelper Methods
get_contents_by_identifier(blob_id)
Retrieve blob contents by mark or hash. Returns bytes or None.
get_size_by_identifier(blob_id)
Get blob size without reading contents. Returns int.
insert_file_with_contents(contents)
Create new blob with given contents. Returns new blob_id.
Check if contents appear to be binary. Returns bool.
apply_replace_text(contents)
Apply text replacements from --replace-text. Returns modified bytes.
Custom data storage for passing state between callbacks
Example:
def file_info_callback(filename, mode, blob_id, value):
# Only process Python files
if not filename.endswith(b'.py'):
return (filename, mode, blob_id)
# Get file contents
contents = value.get_contents_by_identifier(blob_id)
if contents is None:
return (filename, mode, blob_id)
# Skip if binary
if value.is_binary(contents):
return (filename, mode, blob_id)
# Format with black (example)
import subprocess
import tempfile
with tempfile.NamedTemporaryFile(suffix='.py', delete=False) as f:
f.write(contents)
temp_path = f.name
try:
subprocess.run(['black', temp_path], check=True)
with open(temp_path, 'rb') as f:
new_contents = f.read()
finally:
os.unlink(temp_path)
# Insert modified blob
if new_contents != contents:
new_blob_id = value.insert_file_with_contents(new_contents)
return (filename, mode, new_blob_id)
return (filename, mode, blob_id)
done_callback
def done_callback() -> None:
"""Called once when filtering completes."""
pass
Example:
stats = {'count': 0}
def commit_callback(commit, metadata):
stats['count'] += 1
def done_callback():
print(f"Processed {stats['count']} commits")
filter = fr.RepoFilter(
args,
commit_callback=commit_callback,
done_callback=done_callback
)
Callbacks receive a metadata dictionary with helpful utilities:
commit_rename_func
Function to translate old commit hashes to new ones.
def commit_callback(commit, metadata):
# Get translation function
translate = metadata['commit_rename_func']
# Translate hash in commit message
if b'cherry-picked from ' in commit.message:
# Extract old hash and translate it
old_hash = extract_hash(commit.message)
new_hash = translate(old_hash)
commit.message = commit.message.replace(old_hash, new_hash)
ancestry_graph
Graph of commit ancestry in the filtered repository.
def commit_callback(commit, metadata):
graph = metadata['ancestry_graph']
# Check ancestry relationships
if commit.parents:
parent_id = commit.parents[0]
# graph has methods like is_ancestor(possible_ancestor, commit)
original_ancestry_graph
Graph of commit ancestry in the original repository.
def commit_callback(commit, metadata):
orig_graph = metadata['original_ancestry_graph']
# Get original parents
orig_parents = metadata['orig_parents']
# Check if was originally a merge
if len(orig_parents) >= 2:
print(f"Commit {commit.original_id} was a merge")
orig_parents
Original parent commits before filtering (commit_callback only).
def commit_callback(commit, metadata):
orig_parents = metadata['orig_parents']
current_parents = commit.parents
if len(orig_parents) != len(current_parents):
print("Parents were pruned")
had_file_changes
Whether commit originally had file changes (commit_callback only).
def commit_callback(commit, metadata):
if metadata['had_file_changes'] and not commit.file_changes:
print(f"Commit {commit.original_id} became empty")
Common Patterns
Lint History
Run a linter on all files in history:
import subprocess
import tempfile
import os
blobs_handled = {}
def commit_callback(commit, metadata):
for change in commit.file_changes:
# Skip if already processed
if change.blob_id in blobs_handled:
change.blob_id = blobs_handled[change.blob_id]
continue
if change.type == b'D':
continue
# Only process Python files
if not change.filename.endswith(b'.py'):
continue
# Get contents via git cat-file
cmd = ['git', 'cat-file', 'blob', change.blob_id]
contents = subprocess.check_output(cmd)
# Write to temp file
with tempfile.NamedTemporaryFile(suffix='.py', delete=False) as f:
f.write(contents)
temp_path = f.name
try:
# Run linter
subprocess.run(['black', temp_path], check=True)
# Read modified contents
with open(temp_path, 'rb') as f:
new_contents = f.read()
# Create new blob
if new_contents != contents:
blob = fr.Blob(new_contents)
filter.insert(blob)
blobs_handled[change.blob_id] = blob.id
change.blob_id = blob.id
finally:
os.unlink(temp_path)
Add File to Beginning
Insert a file into all root commits:
import subprocess
# Hash the file into git's object database
file_hash = subprocess.check_output(
['git', 'hash-object', '-w', 'LICENSE']
).strip()
def commit_callback(commit, metadata):
if len(commit.parents) == 0: # Root commit
commit.file_changes.append(
fr.FileChange(b'M', b'LICENSE', file_hash, b'100644')
)
import re
def message_callback(message):
# Remove all Signed-off-by lines
message = re.sub(
br'^\s*Signed-off-by:.*$',
b'',
message,
flags=re.MULTILINE
)
# Clean up extra blank lines
message = re.sub(br'\n\n+', b'\n\n', message)
return message.strip() + b'\n'
Track Statistics
stats = {
'commits': 0,
'empty_commits_removed': 0,
'blobs_modified': 0,
'total_size_removed': 0
}
def blob_callback(blob, metadata):
original_size = len(blob.data)
# Replace sensitive data
blob.data = blob.data.replace(b'SECRET_KEY', b'***')
if len(blob.data) != original_size:
stats['blobs_modified'] += 1
stats['total_size_removed'] += original_size - len(blob.data)
def commit_callback(commit, metadata):
stats['commits'] += 1
if not commit.file_changes and commit.parents:
stats['empty_commits_removed'] += 1
commit.skip(commit.first_parent())
def done_callback():
print(f"\n=== Statistics ===")
print(f"Commits processed: {stats['commits']}")
print(f"Empty commits removed: {stats['empty_commits_removed']}")
print(f"Blobs modified: {stats['blobs_modified']}")
print(f"Total size removed: {stats['total_size_removed']} bytes")
Combining Multiple Callbacks
You can use multiple callbacks together:
import git_filter_repo as fr
def my_filename_callback(filename):
# Rename directories
if filename.startswith(b'old_name/'):
return b'new_name/' + filename[9:]
return filename
def my_message_callback(message):
# Add prefix to all messages
return b'[Migrated] ' + message
def my_commit_callback(commit, metadata):
# Update author emails
if commit.author_email.endswith(b'@old.com'):
commit.author_email = commit.author_email.replace(
b'@old.com', b'@new.com'
)
def my_done_callback():
print("Filtering complete!")
args = fr.FilteringOptions.parse_args(['--force'])
filter = fr.RepoFilter(
args,
filename_callback=my_filename_callback,
message_callback=my_message_callback,
commit_callback=my_commit_callback,
done_callback=my_done_callback
)
filter.run()
Best Practices
-
Start Simple: Begin with field callbacks (filename, message, name, email) before moving to object callbacks
-
Test on Small Repos: Test your callbacks on a small test repository first
-
Handle Encoding: All strings in git-filter-repo are bytes, not str
-
Be Careful with skip(): Skipping commits changes their children’s parents
-
Use file_info_callback for Content: When you need both filename and contents, use
file_info_callback instead of blob_callback
-
Track State: Use module-level or closure variables to track state across callbacks
-
Check for None: File operations can return None (e.g., when blobs are stripped)
-
Preserve Metadata: Don’t forget to update commit messages, dates, etc. as needed