Introduction
git-filter-repo can be used as a Python library for complex repository filtering operations. This allows you to programmatically manipulate Git history with full control over commits, blobs, tags, and other Git objects.Installation
To use git-filter-repo as a library, you need to make it importable in Python:- Creating a symlink from
git-filter-repotogit_filter_repo.py - Adding the directory containing
git-filter-repoto yourPYTHONPATH - Renaming or copying
git-filter-repotogit_filter_repo.py
Basic Usage
The simplest program that behaves identically to the command-line tool:Core Components
The library exports several key classes and functions:Data Structures
Blob- Represents file contentCommit- Represents a commit with metadata and file changesTag- Represents an annotated tagReset- Represents branch creation/resetFileChange- Represents a file modification, deletion, or additionProgress- Progress messages for fast-importCheckpoint- Checkpointing directives for fast-import
Processing
FastExportParser- Parses git fast-export outputRepoFilter- Main filtering engineFilteringOptions- Command-line argument parsingProgressWriter- Progress output handling
Utilities
GitUtils- Git repository utilitiesstring_to_date- Parse git date formatdate_to_string- Convert to git date formatrecord_id_rename- Record mark translations
Common Use Cases
Simple Callback Example
Modify all commit messages:Inserting New Content
Add a LICENSE file to the root commit:Processing Blobs
Modify file contents:Callback Metadata
Callbacks receive ametadata dict containing:
Function to translate old commit hashes to new ones
Graph of new commit ancestry relationships
Graph of original commit ancestry relationships
Original parent commits before filtering
Whether the commit originally had file changes
Next Steps
Data Structures
Learn about Blob, Commit, Tag, and other objects
Filtering APIs
Explore RepoFilter and FilteringOptions
Callbacks
Master the callback system
Examples
See real-world examples
