Skip to main content

Overview

Git uses pack files to efficiently store objects. Instead of storing each object as a separate file, Git can bundle many objects into a single “pack” file using delta compression.
Pack files dramatically reduce repository size by storing only differences between similar objects.

Why Pack Files?

Initially, Git stores each object as a separate file in .git/objects/. This is simple but inefficient:
  • Space: Duplicate or similar content is stored multiple times
  • Network: Fetching requires transferring many small files
  • Performance: Managing thousands of files is slower than one large file
Pack files solve these problems through:
  1. Delta compression: Store differences instead of full content
  2. Bundling: Combine many objects into one file
  3. Indexing: Fast lookup with accompanying index files

Pack File Structure

Pack File Format

PACK FILE
├── Header (12 bytes)
│   ├── Signature: "PACK" (4 bytes)
│   ├── Version: 2 or 3 (4 bytes)
│   └── Number of objects (4 bytes)
├── Object 1
├── Object 2
├── ...
├── Object N
└── SHA-1 checksum (20 bytes)

Index File Format

Each pack file has a corresponding .idx index file for fast lookup:
INDEX FILE
├── Header
│   ├── Magic number: 0xff744f63 (4 bytes)
│   └── Version: 2 (4 bytes)
├── Fanout table (256 x 4 bytes)
├── Object names (sorted SHA-1s)
├── CRC32 checksums
├── Pack offsets
└── Large offsets (if needed)

Object Types in Pack Files

Pack files can contain different object representations:

1. Undeltified Objects

Full object content:
OBJ_COMMIT
OBJ_TREE
OBJ_BLOB
OBJ_TAG

2. Deltified Objects

Stored as differences from a base object:
OBJ_OFS_DELTA  - Delta relative to pack offset
OBJ_REF_DELTA  - Delta relative to object SHA-1

Delta Compression

Git uses the xdelta algorithm to compute deltas:
# Example: File versions
Version 1: "Hello World"
Version 2: "Hello Git World"

# Delta representation
- Keep bytes 0-6: "Hello "
- Insert: "Git "
- Keep bytes 6-11: "World"
Git typically creates deltas for similar files, but it can also delta across different filenames if content is similar.

Creating Pack Files

Manual Packing

# Pack all loose objects
$ git repack

# Aggressive repacking (slower, better compression)
$ git repack -a -d -f --depth=250 --window=250

# Pack with delta compression
$ git gc --aggressive

Pack Options

Maximum delta depth (default: 50). Deeper deltas save more space but are slower to access.
git repack --depth=250
Number of objects to consider for delta compression (default: 10).
git repack --window=250
Pack all objects, not just loose ones.
git repack -a
Remove redundant packs after repacking.
git repack -d

Inspecting Pack Files

View Pack Contents

# List pack files
$ ls -lh .git/objects/pack/
-r--r--r-- 1 user group 12M pack-8f1e7cf.pack
-r--r--r-- 1 user group 89K pack-8f1e7cf.idx

# Verify pack integrity
$ git verify-pack -v .git/objects/pack/pack-8f1e7cf.idx

# Show pack statistics
$ git count-objects -v
count: 0
size: 0
in-pack: 5234
packs: 1
size-pack: 12000
prune-packable: 0
garbage: 0
size-garbage: 0

Understanding verify-pack Output

$ git verify-pack -v pack-*.idx | head
8f1e7cf0... commit 234 156 12
3b18e512... blob   1024 512 168 1 8f1e7cf0...
9d1a2e3f... tree   156 98 680
Format: SHA-1 type size packed-size offset [depth base]

Pack File Strategies

Pack Generation Strategy

Git considers several factors when creating deltas:
  1. File similarity: Similar content is likely to delta well
  2. Recency: Newer objects are likely to be accessed more
  3. Type grouping: Objects of the same type often compress better

Delta Base Selection

Base Object Selection Priority:
1. Recent objects (likely to be accessed soon)
2. Similar size objects
3. Same pathname in different versions
4. Similar content

Performance Considerations

Deep delta chains can slow down object access. Balance compression vs. speed based on your use case.

Optimization Tips

# Quick repack (fast, good compression)
$ git repack -a -d

# Aggressive repack (slow, best compression)
$ git repack -a -d -f --depth=250 --window=250

# Maintain good pack structure
$ git gc --auto

Multi-Pack Index

For repositories with many pack files:
# Create multi-pack index
$ git multi-pack-index write

# Repack into fewer packs
$ git multi-pack-index repack --batch-size=1g

Network Packs

During fetch/push, Git creates custom packs:
# Fetch creates a thin pack
$ git fetch origin

# Complete thin pack
$ git unpack-objects < thin.pack
$ git index-pack thin.pack
Thin packs can reference objects not in the pack itself, assuming they exist in the destination repository.

Build docs developers (and LLMs) love