Skip to main content

What is a Git Repository?

A Git repository is a database that stores the complete history of your project, including all files, directories, commits, branches, and metadata. It’s the foundation of Git’s distributed version control system, enabling you to track changes, collaborate with others, and maintain a full record of your project’s evolution.
Every Git repository is self-contained and stores the entire project history locally, making Git a truly distributed version control system.

Repository Types

Git supports two main types of repositories:
1
Working Repository
2
A standard repository with a .git directory at the root of your working tree. This is what you typically create when starting a new project:
3
$ git init
Initialized empty Git repository in .git/
4
This creates:
5
  • A .git directory containing all Git internals
  • A working tree where you edit files
  • An index (staging area) to prepare commits
  • 6
    Bare Repository
    7
    A bare repository (typically named <project>.git) contains only the Git data without a working tree. These are commonly used as central repositories for collaboration:
    8
    $ git init --bare project.git
    
    9
    Bare repositories are ideal for:
    10
  • Central servers where developers push and pull
  • Repositories that only serve as remote endpoints
  • Situations where no one directly edits files
  • Repository Structure

    Inside the .git directory, Git maintains a well-defined structure:

    objects/

    The object database stores all content: commits, trees (directories), blobs (files), and tag objects. Objects are identified by their SHA-1 hash.
    objects/
    ├── [0-9a-f][0-9a-f]/  # First 2 chars of SHA-1
    │   └── [38 chars]      # Remaining 38 chars
    ├── pack/               # Compressed object packs
    └── info/               # Additional metadata
    

    refs/

    Stores references (pointers to commits):
    • refs/heads/ - Local branches
    • refs/tags/ - Tags
    • refs/remotes/ - Remote-tracking branches
    A symbolic reference pointing to your current branch:
    ref: refs/heads/main
    
    In detached HEAD state, it contains a commit SHA directly.

    index

    The staging area (covered in detail in the Staging Area concept). A binary file tracking what will go into your next commit.

    config

    Repository-specific configuration settings, including:
    • Remote repository URLs
    • Branch tracking information
    • User preferences for this repository

    hooks/

    Customization scripts that run at specific points in Git’s execution (e.g., pre-commit, post-merge).

    The Object Database

    Git’s object database is implemented in object-file.c and uses a content-addressable storage system. Every object has:
    1. An ID - A 40-character SHA-1 hash of the object’s type and contents
    2. A type - One of: commit, tree, blob, or tag
    3. Contents - The actual data
    Because objects are identified by their content hash, identical files share the same blob object, saving disk space across your entire repository history.

    Object Storage Formats

    Loose objects: Newly created objects are stored individually:
    .git/objects/1b/61de420a21a2f1aaef93e38ecd0e45e8bc9f0a
    
    Packed objects: Git periodically compresses multiple objects into pack files for efficiency:
    .git/objects/pack/pack-<hash>.pack
    .git/objects/pack/pack-<hash>.idx  # Index for fast lookup
    

    Repository Layout Example

    Here’s what a typical repository structure looks like:
    my-project/
    ├── .git/
    │   ├── HEAD                    # Current branch pointer
    │   ├── config                  # Repository configuration
    │   ├── description            # Repository description
    │   ├── hooks/                 # Git hooks
    │   ├── index                  # Staging area
    │   ├── objects/               # Object database
    │   │   ├── 1b/               # Object subdirectory
    │   │   │   └── 61de420...    # Actual object file
    │   │   ├── pack/             # Packed objects
    │   │   └── info/             # Object metadata
    │   ├── refs/                 # References
    │   │   ├── heads/            # Local branches
    │   │   │   └── main
    │   │   ├── remotes/          # Remote branches
    │   │   │   └── origin/
    │   │   │       └── main
    │   │   └── tags/             # Tags
    │   └── logs/                 # Reflogs
    │       ├── HEAD
    │       └── refs/
    ├── src/                       # Working tree
    ├── README.md
    └── .gitignore
    

    Repository Operations

    Creating a Repository

    1

    Initialize a new repository

    $ git init
    
    2

    Clone an existing repository

    $ git clone https://github.com/user/repo.git
    
    This creates a complete copy including all history.

    Repository Discovery

    Git searches for a repository by looking for a .git directory in the current directory and then each parent directory. This is implemented in setup.c:
    // Git walks up the directory tree looking for .git
    struct repository *repo = discover_git_directory();
    
    If Git can’t find a .git directory, commands will fail with “not a git repository”.

    Gitfiles and Worktrees

    Git supports a special mechanism called gitfiles where .git is a plain text file instead of a directory:
    gitdir: /path/to/real/repository
    
    This is used by:
    • Submodules - To allow the parent repository to remove submodule working trees without losing the repository
    • Worktrees - To enable multiple working directories sharing one repository

    Object Reachability and Garbage Collection

    Git only keeps objects that are reachable from:
    • References (branches, tags)
    • The reflog
    • The index
    Unreachable objects may be deleted by git gc (garbage collection):
    $ git gc
    Counting objects: 2857, done.
    Delta compression using up to 8 threads.
    Compressing objects: 100% (1234/1234), done.
    
    Git automatically runs garbage collection periodically, but you can run it manually to optimize storage.

    Repository Configuration

    Repositories have three configuration levels:
    1. System (/etc/gitconfig) - Applies to all users
    2. Global (~/.gitconfig) - User-specific settings
    3. Local (.git/config) - Repository-specific settings
    Local settings override global, which override system:
    $ git config --local user.email "[email protected]"
    $ git config --global user.email "[email protected]"
    

    Key Implementation Details

    From repository.h and repository.c:
    • Repository struct - Core data structure managing repository state
    • Object database - Content-addressable storage with SHA-1 addressing
    • Reference storage - Multiple backends (files, reftable) for storing refs
    • Work tree - Association between repository and working directory
    struct repository {
        struct object_odb *objects;  // Object database
        struct ref_store *refs;      // Reference storage
        struct index_state *index;   // Staging area
        char *worktree;              // Working tree path
    };
    

    Best Practices

    1

    Keep repositories focused

    One repository per project or logical unit. Avoid creating mega-repositories unless using advanced features like sparse checkout.
    2

    Don't commit build artifacts

    Use .gitignore to exclude generated files, dependencies, and build outputs from the repository.
    3

    Use bare repositories for sharing

    When setting up a central repository, use --bare to prevent direct editing conflicts.
    4

    Regular maintenance

    Periodically run git gc and git fsck to optimize storage and verify repository integrity.

    Further Reading

    • git help repository-layout - Complete repository structure reference
    • git help config - Configuration system documentation
    • git help gc - Garbage collection and repository maintenance

    Build docs developers (and LLMs) love