Skip to main content

Introduction

git-filter-repo can be used as a Python library for complex repository filtering operations. This allows you to programmatically manipulate Git history with full control over commits, blobs, tags, and other Git objects.
API BACKWARD COMPATIBILITY CAVEATPrograms using git-filter-repo as a library can reach into its internals, but backward compatibility of all APIs is not guaranteed. Since repository filtering is typically a one-shot operation, this should not be a problem in practice. If you want to re-use a filtering program, either use the same version of git and git-filter-repo, or make sure to re-test it.

Installation

To use git-filter-repo as a library, you need to make it importable in Python:
import git_filter_repo as fr
This requires either:
  • Creating a symlink from git-filter-repo to git_filter_repo.py
  • Adding the directory containing git-filter-repo to your PYTHONPATH
  • Renaming or copying git-filter-repo to git_filter_repo.py

Basic Usage

The simplest program that behaves identically to the command-line tool:
import sys
import git_filter_repo as fr

args = fr.FilteringOptions.parse_args(sys.argv[1:])
if args.analyze:
    fr.RepoAnalyze.run(args)
else:
    filter = fr.RepoFilter(args)
    filter.run()

Core Components

The library exports several key classes and functions:

Data Structures

  • Blob - Represents file content
  • Commit - Represents a commit with metadata and file changes
  • Tag - Represents an annotated tag
  • Reset - Represents branch creation/reset
  • FileChange - Represents a file modification, deletion, or addition
  • Progress - Progress messages for fast-import
  • Checkpoint - Checkpointing directives for fast-import

Processing

  • FastExportParser - Parses git fast-export output
  • RepoFilter - Main filtering engine
  • FilteringOptions - Command-line argument parsing
  • ProgressWriter - Progress output handling

Utilities

  • GitUtils - Git repository utilities
  • string_to_date - Parse git date format
  • date_to_string - Convert to git date format
  • record_id_rename - Record mark translations

Common Use Cases

Simple Callback Example

Modify all commit messages:
import git_filter_repo as fr

def my_commit_callback(commit, metadata):
    # Modify commit message
    commit.message = commit.message.replace(b'old', b'new')

args = fr.FilteringOptions.parse_args(['--force'])
filter = fr.RepoFilter(args, commit_callback=my_commit_callback)
filter.run()

Inserting New Content

Add a LICENSE file to the root commit:
import subprocess
import git_filter_repo as fr

# Create blob from file
fhash = subprocess.check_output(['git', 'hash-object', '-w', 'LICENSE']).strip()

def fixup_commits(commit, metadata):
    if len(commit.parents) == 0:
        # This is a root commit
        commit.file_changes.append(
            fr.FileChange(b'M', b'LICENSE', fhash, b'100644')
        )

args = fr.FilteringOptions.parse_args(['--force'])
filter = fr.RepoFilter(args, commit_callback=fixup_commits)
filter.run()

Processing Blobs

Modify file contents:
import git_filter_repo as fr

def my_blob_callback(blob, metadata):
    # Skip binary files
    if b"\0" not in blob.data[0:8192]:
        # Modify text content
        blob.data = blob.data.replace(b'password', b'***')

args = fr.FilteringOptions.parse_args(['--force'])
filter = fr.RepoFilter(args, blob_callback=my_blob_callback)
filter.run()

Callback Metadata

Callbacks receive a metadata dict containing:
commit_rename_func
function
Function to translate old commit hashes to new ones
ancestry_graph
AncestryGraph
Graph of new commit ancestry relationships
original_ancestry_graph
AncestryGraph
Graph of original commit ancestry relationships
For commit callbacks, additional fields may include:
orig_parents
list
Original parent commits before filtering
had_file_changes
bool
Whether the commit originally had file changes

Next Steps

Data Structures

Learn about Blob, Commit, Tag, and other objects

Filtering APIs

Explore RepoFilter and FilteringOptions

Callbacks

Master the callback system

Examples

See real-world examples

Build docs developers (and LLMs) love