This guide covers essential maintenance operations for running a Copr instance, including backups, monitoring, upgrades, and routine tasks.
Backup and Recovery
Backend Storage Backups
The backend uses RAID for redundancy and rsnapshot for incremental backups to storinator01.
Backup Schedule
Backups run via cron on the backend server:
crontab -l -u copr
# Typically runs weekly (Fridays)
0 3 * * 5 ionice --class=idle /usr/local/bin/rsnapshot_copr_backend > /dev/null
Verifying Backend Backups
Check the most recent backup start time:
ssh copr-be
xz -d < /var/log/cron-20241101.xz | grep '(copr) CMD'
# Look for:
# Nov 1 03:00:02 copr-be CROND[3482216]: (copr) CMD (ionice --class=idle /usr/local/bin/rsnapshot_copr_backend >/dev/null)
Find a build that completed just before that time (e.g., build 8185411)
Verify it exists on storinator01:
ssh [email protected]
find /srv/nfs/copr-be/copr-be-copr-user/backup/.sync/var/lib/copr/public_html/results/@copr/copr-pull-requests:pr:3473 | grep 8185411 | grep rpm $
Check available disk space:
Backups typically take several days to complete. Don’t verify a build if the backup is still in progress.
Backend Recovery Procedure
Recovery from backups is a multi-day operation. Plan carefully and don’t rush.
The rsync from storinator runs at ~110 MB/s. For 20TB of data, expect 5 days of sync time.
Step 1: Prepare a new RAID array
Spawn a temporary instance:
git clone [email protected] :fedora-copr/ansible-fedora-copr.git
cd ansible-fedora-copr
./run-playbook pb-backup-recovery-01.yml
Run the configuration playbook:
ansible-playbook ./pb-backup-recovery-02.yml -i 54.81.xxx.xx, -u fedora
Step 2: Create RAID array
SSH to the instance and partition disks:
for i in /dev/nvme[1-4 ]n1 ; do \
( echo gpt ; echo n ; echo ; echo ; echo ; echo ; echo w ) | sudo fdisk $i
done
Create RAID 10:
mdadm --create /dev/md0 --level raid10 \
--name copr-backend-data --raid-disks 4 /dev/nvme[1-4]n1p1
Format and mount:
mkfs.ext4 /dev/md0 -L copr-repo
tune2fs -m0 /dev/md0
mkdir /mnt/data
chown copr:copr /mnt/data
mount /dev/disk/by-label/copr-repo /mnt/data/
Step 3: Workaround kernel bug
There’s a kernel bug causing IO operations to hang. Apply this workaround:
echo frozen > /sys/block/md0/md/sync_action
After data is copied (about a week), unfreeze:
echo idle > /sys/block/md0/md/sync_action
Step 4: Setup SSH keys
Run in tmux as copr user:
tmux
su - copr
ssh-keygen -t rsa
Copy ~/.ssh/id_rsa.pub to storinator01:
Step 5: Sync the data
From the temporary instance:
time until rsync -av -H --info=progress2 --rsh=ssh \
--max-alloc=4G \
[email protected] :/srv/nfs/copr-be/copr-be-copr-user/backup/.sync/var/lib/copr/public_html/ \
/mnt/data ; \
do true ; done
This command will retry on failure and run for approximately 5 days.
Step 6: Attach volumes to production
Umount from temporary instance:
umount /mnt/data/
mdadm --stop /dev/md0
In AWS EC2 console:
Detach all copr-backend-backup-test-raid-10 volumes from temporary instance
Stop the backend service: systemctl stop copr-backend.target
Detach old volumes from production instance
Attach recovery volumes to production instance
Assemble RAID and mount
Step 7: Fix permissions
Temporarily disable SELinux:
Start services:
systemctl start lighttpd.service copr-backend.target
Relabel filesystem:
time copr-selinux-relabel
setenforce 1
Database Backups
Private (Complete) Backups
Complete dumps with sensitive data are stored in /backups/ on the frontend:
ssh copr-fe
su - postgres
/usr/local/bin/backup-database coprdb
This script sleeps initially and takes 20+ minutes due to XZ compression. The backup contains sensitive data like API tokens - never download or publish it.
Backups are automatically pulled by rdiff-backup configured via Ansible:
https://pagure.io/fedora-infra/ansible/blob/main/f/playbooks/rdiff-backup.yml
Verify backups exist:
ls -alh /backups/
# Should show recent timestamp:
# -rw-r--r--. 1 postgres postgres 662M Nov 5 01:21 coprdb-2024-11-05.dump.xz
Public Database Dumps
Sanitized dumps (without private tables) are available at:
https://copr.fedorainfracloud.org/db_dumps/
Generated by:
cat /etc/cron.d/cron-backup-database-coprdb
These dumps are suitable for:
Testing and debugging
Development environments
Public experimentation
Keygen and DistGit Backups
Keygen Volume Snapshots
GPG keypairs on /var/lib/copr-keygen are protected by EC2 volume snapshots.
Verify in AWS Console:
Go to EC2 > Volumes > vol-0108e05e229bf7eaf
Check snapshots are being created in Ohio (us-east-2)
Filter with tag: FedoraGroup=copr
Snapshots are stored in us-east-2 (Ohio), not us-east-1 (Virginia).
DistGit Snapshots
DistGit data is extensive (terabytes) but not critical:
Periodic EC2 volume snapshots are taken
In case of failure, restore from snapshot or initialize empty volume
No formal backup process due to data being reproducible
System Upgrades
Upgrading Persistent Instances
Upgrading Copr infrastructure to new Fedora versions involves creating fresh VMs and migrating data.
Pre-Upgrade Preparation
1. Announce the outage (see Outage Announcements )
2. Check for hotfixes
On the old instance:
rpm -Va | grep -v -e /etc/ -e /boot/
Review files marked with S.5....T. - these have been modified.
Also check: https://github.com/fedora-copr/copr/issues?q=label%3Ahot-fixed+is%3Aclosed
3. Clone helper repository
git clone [email protected] :fedora-copr/ansible-fedora-copr.git
cd ansible-fedora-copr
Review and update group_vars/{dev,prod}.yml with:
4. Backup Let’s Encrypt certificates
sudo rbac-playbook -l copr-keygen.aws.fedoraproject.org \
groups/copr-keygen.yml -t certbot
Do this for all instances (frontend, backend, distgit, keygen).
Launch New Instances
Spawn new VM:
opts = ( -e copr_instance = dev -e server_id = keygen )
ansible-playbook play-vm-migration-01-new-box.yml "${ opts [ @ ]}"
Note the output:
ElasticIP: not specified
Instance ID: i-04ba36eb360187572
Network ID: eni-048189f432f068270
Private IP: 172.30.2.94
Update group_vars/{dev,prod}.yml with new instance and network IDs.
Backend Pre-Preparation
For backend only: Run the playbook against a temporary hostname before the outage to minimize downtime.
Ensure copr-be-dev-temp.aws.fedoraproject.org is in inventory:
[copr_back_dev_aws]
copr-be-dev.aws.fedoraproject.org
copr-be-dev-temp.aws.fedoraproject.org birthday =yes
Run playbook:
sudo rbac-playbook -l copr-be-dev-temp.aws.fedoraproject.org \
groups/copr-backend.yml
Outage Window
1. Announce ongoing outage
2. Migrate IPs and volumes
For backend:
ansible-playbook play-vm-migration-02-migrate-backend-box.yml "${ opts [ @ ]}"
Follow manual instructions during playbook execution for DB backups and consistency checks.
For other services:
ansible-playbook play-vm-migration-02-migrate-non-backend-box.yml "${ opts [ @ ]}"
3. Provision new instances
In fedora-infra/ansible, set birthday=yes:
[copr_front_dev_aws]
copr.stg.fedoraproject.org birthday =yes
Run upgrade playbook:
sudo rbac-playbook -l copr-fe-dev.aws.fedoraproject.org \
manual/copr/copr-frontend-upgrade.yml
sudo rbac-playbook -l copr-fe-dev.aws.fedoraproject.org \
groups/copr-frontend.yml
4. Upgrade PostgreSQL (Frontend only)
Stop httpd:
Upgrade database:
dnf install postgresql-upgrade
postgresql-setup --upgrade
systemctl start postgresql
Rebuild indexes:
su postgres
reindexdb --all
Restart httpd:
5. Apply hotfixes and finalize
Revert birthday=yes and set services_disabled: false.
Rerun playbooks until all services are operational.
Post-Upgrade
1. Test reboot
Debug any boot issues now rather than during a future emergency.
2. Rename instances
Remove -new suffix from new instances, add -old to old ones:
opts = ( -e copr_instance = dev )
ansible-playbook play-vm-migration-03-rename-instances.yml "${ opts [ @ ]}"
3. Terminate old instances
In AWS EC2:
Disable termination protection: Actions → Instance settings → Change termination protection
Terminate instances
Keep old VMs for a few days if you want to retain DB /backups.
4. Announce resolution
Upgrading Builders
Builder VMs are ephemeral and automatically use the latest packages from infra repos.
If copr-rpmbuild is updated, terminate resalloc VMs to force recreation with new version.
Monitoring
Monitoring Services
Copr uses multiple monitoring systems:
Nagios - Primary monitoring for Fedora Infrastructure
Nagios External - External availability checks
Prometheus - Metrics and Grafana dashboards (internal to Red Hat)
UptimeRobot - Geographic CDN availability (AWS CloudFront)
Health Checks
copr-ping Test
Periodic end-to-end test that submits a build through the entire stack:
# Configured on backend
cat /etc/cron.d/copr-ping
Monitor results: https://copr.fedorainfracloud.org/coprs/g/copr/copr-ping/builds/
Storage Analysis
Weekly storage analysis generates usage statistics:
/usr/bin/copr-backend-analyze-results
View statistics: https://copr-be.cloud.fedoraproject.org/stats/index.html
Manual Health Checks
Verify all services:
# Check Copr package versions
./releng/run-on-all-infra 'rpm -qa | grep copr'
# Check for available updates
./releng/run-on-all-infra 'dnf copr list'
# Check service status
systemctl status copr-backend.target
systemctl status httpd
systemctl status postgresql
Log Locations
# Frontend
/var/log/httpd/error_log
/var/log/copr-frontend/
# Backend
/var/log/copr-backend/
/var/log/lighttpd/
# Database
/var/lib/pgsql/data/log/
# DistGit
/var/log/copr-dist-git/
Routine Maintenance Tasks
Managing Chroots
Enable New Fedora Release
Run this BEFORE Fedora branching happens to copy builds with correct dist tags.
ssh copr-fe
su - copr-fe
copr-frontend branch-fedora 31
This creates fedora-31-* chroots and forks latest successful Rawhide builds.
Once actions are processed (check https://copr.fedorainfracloud.org/status/stats/ ), activate:
copr-frontend alter-chroot --action activate \
fedora-31-x86_64 fedora-31-i386 \
fedora-31-ppc64le fedora-31-aarch64 \
fedora-31-armhfp fedora-31-s390x
Disable EOL Chroots
Check that other services (Fedora Review Service) don’t depend on the chroot before disabling.
fv = 34
copr-frontend alter-chroot --action eol \
fedora- $fv -x86_64 fedora- $fv -i386 \
fedora- $fv -ppc64le fedora- $fv -aarch64 \
fedora- $fv -armhfp fedora- $fv -s390x
This disables builds but preserves all repositories and data.
EOL Lifeless Rolling Chroots
Automatically mark inactive rolling chroots (Rawhide, CentOS Stream):
copr-frontend eol-lifeless-rolling-chroots
Add to cron for daily execution:
# /etc/cron.d/copr-frontend-optional
0 2 * * * copr-fe /usr/bin/copr-frontend eol-lifeless-rolling-chroots
Database Maintenance
Manual Backup
su - postgres
/usr/local/bin/backup-database coprdb
Vacuum and Analyze
su - postgres
vacuumdb --all --analyze
Check Database Size
su - postgres
psql -c "SELECT pg_database.datname, pg_size_pretty(pg_database_size(pg_database.datname)) AS size FROM pg_database;"
Announcing Outages
Follow this workflow for planned maintenance:
1. Schedule outage
Create ticket: https://pagure.io/fedora-infrastructure/new_issue
2. Announce planned outage
Update status.fedoraproject.org
Email: [email protected]
Twitter/Mastodon via Fedora Infrastructure
3. Start maintenance - announce ongoing
Update status page to “Ongoing outage”
4. Complete maintenance - announce resolution
Update status page to “Resolved”
Email copr-devel with changes summary
Close infrastructure ticket
Emergency Procedures
Backend Down - Builds Failing
Check backend services:
systemctl status copr-backend.target
systemctl status lighttpd
Check RAID status:
cat /proc/mdstat
mdadm --detail /dev/md0
Check disk space:
df -h /var/lib/copr/public_html/results
Review logs:
journalctl -u copr-backend -n 100
tail -100 /var/log/copr-backend/backend.log
Frontend Down - Website Inaccessible
Check httpd:
systemctl status httpd
journalctl -u httpd -n 50
Check database:
systemctl status postgresql
su - postgres -c "psql -c 'SELECT 1'"
Check disk space:
Database Issues
Check connections:
su - postgres
psql -c "SELECT count(*) FROM pg_stat_activity;"
Check for long queries:
psql -c "SELECT pid, now() - query_start AS duration, query FROM pg_stat_activity WHERE state = 'active' ORDER BY duration DESC;"
Check locks:
psql -c "SELECT * FROM pg_locks WHERE NOT granted;"
Additional Resources
Deployment Options Learn how to deploy Copr in different environments
Release Process Understand the Copr release workflow
Fedora Infra Copr SOP Official Fedora Infrastructure procedures
Architecture Understand Copr’s system architecture