Skip to main content

Overview

The git-filter-repo library represents Git objects as Python classes. These objects are created when parsing fast-export output and can be modified by your callbacks before being written to fast-import.

Blob

Represents file content (a Git blob object).
blob = fr.Blob(data, original_id=None)
data
bytes
required
The file content as bytes
original_id
bytes
The original Git hash of the blob (from fast-export’s original-oid)

Attributes

id
int
The mark (internal ID) for this blob in the fast-import stream
old_id
int
The original mark from fast-export, if different from id
original_id
bytes
The Git object hash (40-character hex string)
data
bytes
The blob content. Modify this to change file contents.
dumped
int
Whether this object has been written (0=not yet, 1=written, 2=skipped)

Methods

dump(file_)
method
Writes the blob to the fast-import stream
skip(new_id=None)
method
Marks this blob to be skipped (not written to output)

Example

def process_blob(blob, metadata):
    # Modify text files only
    if b"\0" not in blob.data[0:8192]:
        # Replace sensitive data
        blob.data = blob.data.replace(b'SECRET_KEY', b'***REDACTED***')
    
    # Skip large blobs
    if len(blob.data) > 10_000_000:
        blob.skip()

FileChange

Represents a file change within a commit (modify, delete, or rename).
change = fr.FileChange(type_, filename=None, id_=None, mode=None)
type_
bytes
required
The type of change: b'M' (modify), b'D' (delete), or b'DELETEALL'
filename
bytes
The file path (required for M and D, None for DELETEALL)
id_
int | bytes
The blob ID (mark or hash). Required for M type.
mode
bytes
File mode: b'100644' (regular), b'100755' (executable), b'120000' (symlink), or b'160000' (submodule)

Attributes

type
bytes
The change type (b'M', b'D', or b'DELETEALL')
filename
bytes
The file path relative to repository root
mode
bytes
The file mode (Unix permissions)
blob_id
int | bytes
The blob mark or hash. None means the blob was filtered out.

Example

def add_file_to_commit(commit, metadata):
    # Add a new file
    commit.file_changes.append(
        fr.FileChange(b'M', b'README.md', blob_id, b'100644')
    )
    
    # Delete a file
    commit.file_changes.append(
        fr.FileChange(b'D', b'secrets.txt')
    )
    
    # Filter out changes to .pyc files
    commit.file_changes = [
        c for c in commit.file_changes
        if not c.filename.endswith(b'.pyc')
    ]

Commit

Represents a Git commit with all its metadata and file changes.
commit = fr.Commit(
    branch,
    author_name, author_email, author_date,
    committer_name, committer_email, committer_date,
    message,
    file_changes,
    parents,
    original_id=None,
    encoding=None
)
branch
bytes
required
The branch name (e.g., b'refs/heads/main')
author_name
bytes
required
Author’s name
author_email
bytes
required
Author’s email
author_date
bytes
required
Author date in format: b'1234567890 +0000'
committer_name
bytes
required
Committer’s name
committer_email
bytes
required
Committer’s email
committer_date
bytes
required
Committer date in format: b'1234567890 +0000'
message
bytes
required
The commit message
file_changes
list[FileChange]
required
List of FileChange objects
parents
list
required
List of parent commit IDs (marks or hashes)
original_id
bytes
Original Git commit hash
encoding
bytes
Commit message encoding (None implies UTF-8)

Attributes

All constructor parameters are available as attributes, plus:
id
int
The mark for this commit
old_id
int
The original mark from fast-export

Methods

first_parent()
method
Returns the first parent commit ID, or None if no parents
skip(new_id=None)
method
Marks this commit to be skipped (not written). If parents exist, commits that had this as a parent will use new_id as the new parent.

Example

def process_commit(commit, metadata):
    # Modify commit message
    commit.message = commit.message + b'\n\nProcessed by filter-repo'
    
    # Update author
    if commit.author_email == b'[email protected]':
        commit.author_email = b'[email protected]'
    
    # Skip empty commits
    if not commit.file_changes and commit.parents:
        commit.skip(commit.first_parent())
    
    # Only keep changes to specific directory
    commit.file_changes = [
        c for c in commit.file_changes
        if c.filename.startswith(b'src/')
    ]

Tag

Represents an annotated tag.
tag = fr.Tag(
    ref, from_ref,
    tagger_name, tagger_email, tagger_date, tag_msg,
    original_id=None
)
ref
bytes
required
Tag name (without refs/tags/ prefix)
from_ref
int | bytes
required
The commit this tag points to (mark or hash)
tagger_name
bytes
Tagger’s name
tagger_email
bytes
Tagger’s email
tagger_date
bytes
Tag date in format: b'1234567890 +0000'
tag_msg
bytes
required
The tag message
original_id
bytes
Original Git tag hash

Example

def process_tag(tag, metadata):
    # Rename tags
    if tag.ref.startswith(b'v'):
        tag.ref = b'version-' + tag.ref[1:]
    
    # Update tag message
    tag.message = tag.message.replace(b'Release', b'Version')

Reset

Represents a branch creation or reset.
reset = fr.Reset(ref, from_ref=None)
ref
bytes
required
The branch reference (e.g., b'refs/heads/main')
from_ref
int | bytes
The commit to reset/point the branch to (mark or hash)

Example

def process_reset(reset, metadata):
    # Rename branches
    if reset.ref == b'refs/heads/master':
        reset.ref = b'refs/heads/main'

Progress

Represents a progress message that fast-import can display.
progress = fr.Progress(message)
message
bytes
required
The progress message

Checkpoint

Represents a checkpoint directive for fast-import (forces writing current state).
checkpoint = fr.Checkpoint()

Date Utilities

Helper functions for working with Git date formats.

string_to_date

Parse a Git date string into a Python datetime object.
date_obj = fr.string_to_date(b'1234567890 +0000')
datestring
bytes
required
Git date format: b'<unix_timestamp> <timezone_offset>'
Returns a datetime object with timezone information.

date_to_string

Convert a datetime object to Git date format.
date_str = fr.date_to_string(date_obj)
dateobj
datetime
required
A datetime object with timezone information
Returns bytes in format: b'<unix_timestamp> <timezone_offset>'

Example

from datetime import datetime, timezone
import git_filter_repo as fr

def update_dates(commit, metadata):
    # Parse existing date
    author_date = fr.string_to_date(commit.author_date)
    
    # Modify (e.g., move to UTC)
    utc_date = author_date.astimezone(timezone.utc)
    
    # Convert back
    commit.author_date = fr.date_to_string(utc_date)

Build docs developers (and LLMs) love