Replication Mechanisms
Gitaly Cluster uses replication to keep repository data synchronized across multiple Gitaly nodes. This ensures data redundancy and enables failover when nodes become unavailable.When Replication Occurs
Praefect relies on replication in two scenarios:- After non-transactional operations: When a Gitaly RPC doesn’t support transactions, changes are applied to the primary and then replicated to secondaries.
- For replica repair: When a transaction fails on some nodes but succeeds on a quorum, the unsuccessful replicas are repaired via replication.
Transactional Replication
For transaction-aware mutator RPCs, Praefect attempts to apply changes to all replicas simultaneously: If a quorum of replicas successfully applies the RPC, replication is scheduled only for unsuccessful replicas. This minimizes replication overhead while maintaining consistency.Transactional replication uses Git’s reference-transaction hook to coordinate writes across nodes. See Strong Consistency for details.
Non-Transactional Replication
For mutator RPCs that don’t support transactions, Praefect routes the request to the primary only: Once the primary completes the operation, Praefect schedules replication jobs to update all secondary nodes.Replication Process
The replication process varies depending on whether the target repository already exists:New Repository Creation
When replicating to a node that doesn’t have the repository:- Snapshot: Create a compressed archive of the source repository
- Transfer: Send the snapshot to the target Gitaly node
- Extract: Decompress and extract the repository on target
- Fetch: Perform a Git fetch to get any changes that occurred during transfer
- Sync files: Copy additional files (e.g.,
info/attributes)
Existing Repository Update
When the target repository already exists:- Fetch: Perform a Git fetch from the source repository
- Update references: Apply reference changes from source
- Sync files: Update auxiliary files if changed
Object Pool Handling
If the source repository uses an object pool:- Get object pool information from source
- Replicate or link to the object pool on target
- Link the target repository to the appropriate object pool
Object pools allow multiple repositories to share common objects, reducing storage requirements. Praefect ensures object pool membership is maintained across replicas.
Replication Job Queue
Praefect uses a PostgreSQL-backed queue to manage replication jobs:Job Lifecycle
- Scheduled: Praefect creates a job when a repository is modified
- Dequeued: A replication worker picks up the job
- In Progress: The worker replicates data from source to target
- Completed: Job is removed from the queue on success
- Failed: Job is retried with exponential backoff on failure
Monitoring the Queue
Check queue depth with thegitaly_praefect_replication_queue_depth metric:
Strong Consistency via Reference Transactions
Gitaly Cluster achieves strong consistency by coordinating reference updates across nodes using Git’s reference-transaction hook.Transaction Flow
How It Works
- RPC Broadcast: Praefect sends the mutator RPC to all replica nodes simultaneously
- Git Execution: Each Gitaly node executes the Git command
- Hook Trigger: Git calls the reference-transaction hook before updating references
- Vote Submission: Each hook sends a hash of the proposed changes to Praefect
- Quorum Check: Praefect waits for all votes and verifies they match
- Commit/Abort: If quorum is reached, Praefect tells hooks to commit; otherwise abort
- Result: Git updates references only if the hook succeeded
The reference-transaction hook requires Git 2.28.0 or newer. Older Git versions fall back to eventual consistency.
Voting Strategies
Praefect supports multiple voting strategies: Strong (default): All nodes must agree. Threshold equals the number of voters.ceil((votes+1)/2).
Transaction Metrics
Monitor transaction performance with these metrics:gitaly_praefect_transactions_total: Total transactions createdgitaly_praefect_transactions_delay_seconds: Time waiting for quorumgitaly_praefect_subtransactions_per_transaction_total: Subtransactions per transactiongitaly_praefect_voters_per_transaction_total: Nodes voting per transaction
Automatic Reconciliation
Praefect can automatically detect and repair inconsistencies:- Scans repositories for inconsistencies
- Compares replicas to identify outdated copies
- Schedules replication jobs to repair outdated replicas
- Reports metrics on reconciliation progress
Reconciliation runs periodically and is independent of normal replication. It acts as a safety net to catch and fix any consistency issues.
Replication Performance Considerations
Large Repositories
Replication can be resource-intensive for large repositories: Snapshot overhead: Creating compressed archives consumes CPU and I/O. For very large repositories, this can take significant time. Network transfer: Large snapshots must be transferred over the network. Ensure sufficient bandwidth between Gitaly nodes. Storage pressure: During snapshot extraction, you temporarily need space for both the compressed and uncompressed data.Fork Networks
Forked repositories can be particularly challenging:- Without object pools, each fork creates a full copy during replication
- Storage quotas may be exceeded temporarily until housekeeping runs
- Read distribution can show inconsistent disk usage depending on which replica is queried
Replication Best Practices
- Monitor queue depth: A growing replication queue indicates capacity issues
- Size batch appropriately: Balance throughput and resource usage with
batch_size - Enable reconciliation: Catch and repair any missed replication jobs
- Use object pools: Reduce storage and replication overhead for fork networks
- Watch metrics: Track replication delay and latency to detect problems early
Next Steps
Failover
Learn how failover works when nodes fail
Praefect Configuration
Configure replication settings