Overview
copyparty can detect duplicate files based on their content and avoid storing multiple copies. This saves significant disk space, especially when multiple users upload the same files.How Deduplication Works
When deduplication is enabled:- File Upload: User uploads a file
- Hash Calculation: copyparty calculates the file’s hash (checksum)
- Duplicate Check: Compares hash against indexed files
- Link Creation: If duplicate found, creates a link instead of copying
Deduplication Methods
copyparty supports three types of deduplication:| Method | Safety | Compatibility | OS Support |
|---|---|---|---|
| Symlinks | Medium | Good | All |
| Hardlinks | Medium | Excellent | All |
| Reflinks | High | Good | Linux 5.3+, limited FS |
Symlinks (Default)
Symbolic links point to the original file. ✅ Advantages:- Each link can have its own timestamp
- Clearly visible that it’s not a regular file
- Works on all systems
- If you delete the original, symlinks break
- Some software doesn’t handle symlinks well
- Renaming files requires copyparty to update links
Hardlinks
Hard links are indistinguishable from regular files. ✅ Advantages:- Compatible with all software
- Deleting one copy doesn’t affect others
- Can be moved/renamed safely with any tool
- All copies share the same timestamp
- Editing one copy modifies all copies
- Less obvious that deduplication is happening
Reflinks (Copy-on-Write)
Reflinks use filesystem-level copy-on-write. ✅ Advantages:- Safest option: editing one copy doesn’t affect others
- Each copy is fully independent
- Automatic copy-on-write when modified
- Most space-efficient
- Requires Python 3.14+ and Linux kernel 5.3+
- Limited filesystem support (btrfs, maybe XFS)
- Not available on ZFS yet
Basic Deduplication Setup
[global]
e2dsa # scan and index all files
dedup # enable deduplication
[/uploads]
/mnt/uploads
accs:
w: *
r: admin
Choosing Deduplication Method
- Symlinks (Default)
- Hardlinks
- Reflinks
Best for most use cases:Or in config:
Per-Volume Deduplication
Enable deduplication for specific volumes only:Cross-Volume Deduplication
Deduplicate files across different volumes:--xlink- Enable cross-volume linking
Deduplication Statistics
View disk space saved:cpp_dupe_bytes- Disk space savedcpp_dupe_files- Number of duplicate filescpp_vol_bytes- Total volume sizecpp_vol_files- Total file count
Advanced Configuration
Safe Deduplication Mode
If you have other software modifying files, use this:--safe-dedup=1- Verify file integrity before deduplicating
Disable Clone Detection
If using S3 or similar storage where reading is expensive:Database Location
Move the deduplication database to faster storage:Skip Hashing for Large Files
Exclude large files from deduplication:- Saves indexing time
- Disables deduplication for matched files
- Files are still indexed by path/size/date
Deduplication with Uploads
Reject Duplicate Uploads
Prevent users from uploading duplicates:dedup, upload fails if file already exists.
Allow Duplicates with Links
Randomize Duplicate Filenames
Combine with filename randomization:Filesystem Compatibility
Symlinks and Hardlinks
Supported on all major filesystems:- ext4, ext3, ext2 ✅
- Btrfs ✅
- XFS ✅
- ZFS ✅
- NTFS ✅
- exFAT ⚠️ (symlinks may not work)
- FAT32 ❌ (no symlinks or hardlinks)
Reflinks
Only works on:- Btrfs ✅ (fully supported)
- XFS ⚠️ (maybe, needs testing)
- ZFS ❌ (not yet, known bugs)
- ext4/NTFS/others ❌
Example: Complete Deduplication Setup
complete-dedup.conf
Monitoring Deduplication
Check Symlink Destinations
Calculate Space Savings
Use Prometheus Metrics
Enable metrics and monitor:cpp_dupe_bytes{vol="/"}- Space savedcpp_dupe_files{vol="/"}- Duplicate count
Important Warnings
Troubleshooting
Deduplication not working
Deduplication not working
Check these requirements:
- Indexing enabled:
-e2dsaor-e2d - Dedup flag set:
--dedupor volflagdedup - Database exists:
.hist/up2k.dbin volume - Files are actually identical (same hash)
Broken symlinks after moving files
Broken symlinks after moving files
Symlinks break if you move/rename the original file outside of copyparty.Solutions:
- Use hardlinks instead:
--hardlink-only - Only move files through copyparty’s web UI
- Rebuild the database: restart with
-e2dsa
Hardlinks editing all copies
Hardlinks editing all copies
This is how hardlinks work. All hardlinks point to the same data.Solutions:
- Delete before editing (removes the hardlink)
- Copy the file first, then edit the copy
- Use reflinks if possible:
--reflink
Reflinks not working
Reflinks not working
Reflinks require:
- Python 3.14 or newer
- Linux kernel 5.3 or newer
- Btrfs filesystem (XFS maybe)
Database growing too large
Database growing too large
The
up2k.db database can grow large with many files.Solutions:- Move to SSD:
--hist /mnt/ssd/cpp - Exclude large files:
nohash: \.(mkv|iso)$ - Compress: use XZ filesystem compression
- Clean old entries (no built-in tool yet)
Next Steps
- Set up write-only folders for uploads
- Configure file sharing with deduplication
- Learn about media server with space-efficient storage
- Set up authentication to track uploaders