Architecture Overview
Gitaly provides a Git RPC layer that enables GitLab to scale Git operations horizontally by addressing the fundamental challenges of Git’s architecture.High-Level Architecture
Gitaly sits between GitLab components and Git repositories, providing RPC-based access to Git operations:View the complete architecture diagram for a detailed overview.
Why Git Needs Gitaly
Git’s Scaling Challenges
Git’s architectural characteristics make it difficult to scale horizontally, similar to relational databases:Problematic Workloads
These Git workloads are particularly challenging:- Large, busy monorepos - High commit volume with very large packs for full clones
- High commit volume repositories - Frequent pack creation
- Binaries in Git - Dense content that intensifies resource usage
- Full history cloning - Expensive packfile creation
Impact on Infrastructure
These characteristics create infrastructure challenges:- Memory burstiness - Makes containerization difficult due to strict container memory limits
- Disk IO burstiness - Makes remote file systems (NFS) unreliable and prone to integrity issues
- CPU burstiness - Complicates resource allocation in container environments
On GitLab.com, the P99 access time to create a
Rugged::Repository object spiked over 30 seconds due to filesystem latency. This made Git operations essentially unusable, demonstrating the need for Gitaly.Design Decisions
Core Architecture Choices
gRPC over HTTP+JSON
Uses gRPC for the RPC framework instead of HTTP+JSON. This provides a complete set of conventions and patterns, allowing faster development with protocol buffers for efficient serialization.
High-Level Operations
Gitaly exposes high-level Git operations, not low-level Git object/ref storage lookups. Many Git operations involve unbounded Git object lookups (e.g., diff generation depends on changed files). Making each lookup a remote procedure call is not feasible.
Minimize gRPC Calls
GitLab requests should use as few Gitaly gRPC calls as possible. It’s better to move GitLab application logic into Gitaly to save gRPC round trips. Defining new gRPC calls is cheap when it saves round trips.
Protocol and Communication
Protocol Buffers Gitaly uses Google Protocol Buffers to define:- Available requests and required data
- Response formats for each request
- Service definitions
proto/*.proto files.
Client Libraries
All protocol definitions and auto-generated gRPC client code live in the Gitaly repository. Client code is distributed as:
- Ruby gem for gitlab-rails
- Go package for gitlab-shell, gitlab-workhorse, gitaly-ssh
- Client executables as needed
Storage and Caching
Metadata Cache
Cached metadata is stored in Redis LRU for fast access to frequently used data
Payload Cache
Cached payloads are stored in files since Redis can’t efficiently store large objects
Networking
- Unix socket for Git operations (eliminates need for early authentication)
- TCP for monitoring and metrics
- No router needed - GitLab already has logic for which storage server contains which repository
Gitaly Components
Gitaly Server
The main Go server that:- Receives gRPC requests from GitLab components
- Executes Git commands on local repositories
- Controls access to the
gitbinary - Manages resource limits and concurrency
- Exports Prometheus metrics
Gitaly-Ruby
A pool of Ruby helper processes used for:- Legacy GitLab application code too complex to rewrite in Go
- Prototyping new features
- RPCs that interact with multiple repositories (e.g., merging branches)
High Availability with Praefect
Praefect provides high availability through asynchronous replication.Praefect Responsibilities
Praefect State Management
Praefect uses PostgreSQL 9.6+ to store internal state:- Which repositories need replication
- Which Gitaly server is the primary
- Replication job queues
Performance Improvements
Gitaly improves GitLab’s Git performance through:- Centralized monitoring - One place to observe and optimize Git operations
- Intelligent caching - Reduces expensive Git operations
- Proximity to storage - Moves Git operations close to disk to reduce latency
- Elimination of NFS - Removes unreliable remote filesystem layer