Skip to main content

Overview

copyparty can detect duplicate files based on their content and avoid storing multiple copies. This saves significant disk space, especially when multiple users upload the same files.

How Deduplication Works

When deduplication is enabled:
  1. File Upload: User uploads a file
  2. Hash Calculation: copyparty calculates the file’s hash (checksum)
  3. Duplicate Check: Compares hash against indexed files
  4. Link Creation: If duplicate found, creates a link instead of copying
The result: only one physical copy exists on disk, with multiple references to it.

Deduplication Methods

copyparty supports three types of deduplication:
MethodSafetyCompatibilityOS Support
SymlinksMediumGoodAll
HardlinksMediumExcellentAll
ReflinksHighGoodLinux 5.3+, limited FS
Symbolic links point to the original file. ✅ Advantages:
  • Each link can have its own timestamp
  • Clearly visible that it’s not a regular file
  • Works on all systems
⚠️ Disadvantages:
  • If you delete the original, symlinks break
  • Some software doesn’t handle symlinks well
  • Renaming files requires copyparty to update links
Hard links are indistinguishable from regular files. ✅ Advantages:
  • Compatible with all software
  • Deleting one copy doesn’t affect others
  • Can be moved/renamed safely with any tool
⚠️ Disadvantages:
  • All copies share the same timestamp
  • Editing one copy modifies all copies
  • Less obvious that deduplication is happening
Reflinks use filesystem-level copy-on-write. ✅ Advantages:
  • Safest option: editing one copy doesn’t affect others
  • Each copy is fully independent
  • Automatic copy-on-write when modified
  • Most space-efficient
⚠️ Disadvantages:
  • Requires Python 3.14+ and Linux kernel 5.3+
  • Limited filesystem support (btrfs, maybe XFS)
  • Not available on ZFS yet

Basic Deduplication Setup

1
Enable file indexing
2
Deduplication requires the file database:
3
python copyparty-sfx.py -e2dsa --dedup \
  -v /mnt/uploads:/uploads:w
4
  • -e2dsa - Scan and index all files
  • --dedup - Enable symlink-based deduplication
  • 5
    Using configuration file
    6
    [global]
      e2dsa    # scan and index all files
      dedup    # enable deduplication
    
    [/uploads]
      /mnt/uploads
      accs:
        w: *
        r: admin
    
    7
    Verify deduplication
    8
    After uploading duplicate files:
    9
    # Check for symlinks
    ls -la /mnt/uploads/
    
    # Count physical vs logical size
    du -sh /mnt/uploads/        # physical size
    du -sh --apparent-size /mnt/uploads/  # logical size
    

    Choosing Deduplication Method

    Per-Volume Deduplication

    Enable deduplication for specific volumes only:
    [global]
      e2dsa    # global indexing
    
    [/uploads]
      /mnt/uploads
      accs:
        w: *
      flags:
        dedup           # enable for this volume
        hardlinkonly    # use hardlinks instead of symlinks
    
    [/media]
      /mnt/media
      accs:
        r: *
      # no dedup flag = no deduplication
    

    Cross-Volume Deduplication

    Deduplicate files across different volumes:
    python copyparty-sfx.py -e2dsa --dedup --xlink \
      -v /mnt/uploads1:/up1:w \
      -v /mnt/uploads2:/up2:w
    
    • --xlink - Enable cross-volume linking
    ⚠️ Warning: Cross-volume deduplication is experimental and may have bugs.

    Deduplication Statistics

    View disk space saved:
    python copyparty-sfx.py --stats -e2dsa --dedup \
      -a admin:password \
      -v /mnt/uploads:/uploads:A,admin
    
    Access metrics at:
    http://your-server:3923/.cpr/metrics
    
    Metrics include:
    • cpp_dupe_bytes - Disk space saved
    • cpp_dupe_files - Number of duplicate files
    • cpp_vol_bytes - Total volume size
    • cpp_vol_files - Total file count

    Advanced Configuration

    Safe Deduplication Mode

    If you have other software modifying files, use this:
    python copyparty-sfx.py -e2dsa --dedup --safe-dedup=1 \
      -v /mnt/uploads:/uploads:w
    
    • --safe-dedup=1 - Verify file integrity before deduplicating
    This is slower but prevents deduplication of modified files.

    Disable Clone Detection

    If using S3 or similar storage where reading is expensive:
    [/uploads]
      /mnt/uploads
      accs:
        w: *
      flags:
        noclone    # disable duplicate detection entirely
    

    Database Location

    Move the deduplication database to faster storage:
    [global]
      e2dsa
      dedup
      hist: /mnt/ssd/copyparty-db  # put database on SSD
    
    [/uploads]
      /mnt/hdd/uploads  # data stays on HDD
      accs:
        w: *
    

    Skip Hashing for Large Files

    Exclude large files from deduplication:
    [/videos]
      /mnt/videos
      accs:
        w: *
      flags:
        e2dsa
        dedup
        nohash: \.(mkv|mp4|avi)$  # don't hash video files
    
    • Saves indexing time
    • Disables deduplication for matched files
    • Files are still indexed by path/size/date

    Deduplication with Uploads

    Reject Duplicate Uploads

    Prevent users from uploading duplicates:
    [/uploads]
      /mnt/uploads
      accs:
        w: *
      flags:
        e2dsa
        # no dedup flag = reject duplicate uploads
    
    Without dedup, upload fails if file already exists.
    [/uploads]
      /mnt/uploads
      accs:
        w: *
      flags:
        e2dsa
        dedup    # create link instead of rejecting
    
    Upload succeeds but creates a link instead of a new copy.

    Randomize Duplicate Filenames

    Combine with filename randomization:
    python copyparty-sfx.py -e2dsa --dedup \
      -v /mnt/uploads:/uploads:wG:c,fk=8
    
    Uploaders get unique filenames even for duplicate content.

    Filesystem Compatibility

    Supported on all major filesystems:
    • ext4, ext3, ext2 ✅
    • Btrfs ✅
    • XFS ✅
    • ZFS ✅
    • NTFS ✅
    • exFAT ⚠️ (symlinks may not work)
    • FAT32 ❌ (no symlinks or hardlinks)
    Only works on:
    • Btrfs ✅ (fully supported)
    • XFS ⚠️ (maybe, needs testing)
    • ZFS ❌ (not yet, known bugs)
    • ext4/NTFS/others ❌

    Example: Complete Deduplication Setup

    complete-dedup.conf
    [global]
      e2dsa              # scan all files on startup
      dedup              # enable deduplication
      hardlink-only      # use hardlinks instead of symlinks
      hist: /mnt/ssd/cpp # database on SSD for performance
      stats              # enable prometheus metrics
    
    [accounts]
      uploader: pass1
      admin: admin-secret
    
    [/uploads]
      /mnt/hdd/uploads
      accs:
        w: uploader
        A: admin
      flags:
        fk: 8              # filekeys for duplicate links
        sz: 1k-1g          # limit file sizes
        vmaxb: 500g        # max 500GB total
        maxn: 100,3600     # rate limit uploads
    
    [/videos]
      /mnt/hdd/videos
      accs:
        w: uploader  
        r: admin
      flags:
        dedup
        nohash: \.(mkv|mp4|avi)$  # skip hashing large videos
    

    Monitoring Deduplication

    # Find all symlinks
    find /mnt/uploads -type l
    
    # Show symlink targets
    find /mnt/uploads -type l -ls
    
    # Find broken symlinks
    find /mnt/uploads -type l ! -exec test -e {} \; -print
    

    Calculate Space Savings

    # Physical disk usage
    du -sh /mnt/uploads
    
    # Logical size (if all files were real)
    du -sh --apparent-size /mnt/uploads
    
    # Difference = space saved
    

    Use Prometheus Metrics

    Enable metrics and monitor:
    [global]
      e2dsa
      dedup
      stats
    
    [accounts]
      monitoring: metrics-secret
    
    [/]
      /mnt/uploads
      accs:
        w: *
        a: monitoring  # admin access for metrics
    
    Query metrics:
    curl -u :metrics-secret http://your-server:3923/.cpr/metrics
    
    Look for:
    • cpp_dupe_bytes{vol="/"} - Space saved
    • cpp_dupe_files{vol="/"} - Duplicate count

    Important Warnings

    Do not edit deduplicated files in-place!With symlinks or hardlinks, editing one file edits ALL copies.Safe editing methods:
    • Delete and re-upload
    • Copy the file first, then edit the copy
    • Use an editor that creates a new file (like vim’s “backup” mode)
    Database corruption = broken symlinksIf the .hist/up2k.db database becomes corrupted or deleted:
    • Symlinks may point to wrong files
    • Some files may become inaccessible
    Prevention:
    • Regular database backups
    • Use --hist to store DB on reliable storage
    • Consider reflinks if your filesystem supports them
    Cross-volume deduplication is experimentalThe --xlink option may have bugs. Use at your own risk:
    • Test thoroughly before production use
    • Keep backups
    • Monitor for broken links

    Troubleshooting

    Check these requirements:
    1. Indexing enabled: -e2dsa or -e2d
    2. Dedup flag set: --dedup or volflag dedup
    3. Database exists: .hist/up2k.db in volume
    4. Files are actually identical (same hash)
    Verify with:
    python copyparty-sfx.py -e2dsa --dedup -v /test::w
    # Upload same file twice and check with ls -la
    
    The up2k.db database can grow large with many files.Solutions:
    1. Move to SSD: --hist /mnt/ssd/cpp
    2. Exclude large files: nohash: \.(mkv|iso)$
    3. Compress: use XZ filesystem compression
    4. Clean old entries (no built-in tool yet)

    Next Steps

    Build docs developers (and LLMs) love