Overview
Thechecksum package provides fast file integrity verification using xxHash64, a high-performance non-cryptographic hash function. It includes checksum calculation and duplicate tracking for copy operations.
Interfaces
Hasher
Defines the behavior needed for checksum calculation.Computes checksum for a file path
Computes checksum while reading from a reader
Types
XXHash64Hasher
Implementation of Hasher using xxHash64 algorithm.Tracker
Tracks checksums during a conversion run to detect duplicates.Maps checksums to file paths that produced them
Functions
NewXXHash64Hasher
Creates a new xxHash64 hasher instance.A ready-to-use xxHash64 hasher
Example
NewTracker
Creates a new empty tracker for duplicate detection.A new tracker instance
Example
FormatChecksum
Formats a checksum as a human-friendly hexadecimal string.Checksum value to format
16-character hexadecimal string (e.g., “a1b2c3d4e5f67890”)
Example
Hasher Methods
Calculate
Computes the checksum for a file at the given path.Path to file to checksum
xxHash64 checksum value
Returns an error if file cannot be opened or read
Example
CalculateReader
Computes checksum while reading from an io.Reader.Reader to compute checksum from
xxHash64 checksum value
Returns an error if reading fails
Buffer Size
Uses a 32KB buffer for efficient reading while keeping memory usage low.Example
Tracker Methods
Register
Records that a checksum has been seen for a given file path.Checksum value
File path that produced this checksum
Thread Safety
This method is thread-safe and can be called from multiple goroutines.Example
IsDuplicate
Checks whether a checksum has been registered before.Checksum value to check
Returns true if checksum was previously registered
Example
FirstPath
Returns the first file path associated with a checksum.Checksum value to look up
First file path registered with this checksum
Returns true if checksum was found
Example
Stats
Returns statistics about tracked checksums.Number of unique checksums tracked
Number of duplicate files detected
Example
Usage Example: Copy with Deduplication
Performance Characteristics
xxHash64 Benefits
- Fast: 10+ GB/s on modern CPUs
- Low memory: 32KB buffer for streaming calculation
- Collision-resistant: Suitable for duplicate detection
- Non-cryptographic: Optimized for speed over security
Memory Usage
- Hasher: Minimal overhead (stateless)
- Tracker: O(n) where n = number of unique files
- Calculation: 32KB buffer per operation
Thread Safety
TheTracker type is thread-safe:
- Uses
sync.RWMutexfor concurrent access - Safe to call
Register,IsDuplicate,FirstPath, andStatsfrom multiple goroutines
When to Use Checksums
Enable checksum verification when:- Using
--copy-onlymode (automatic) - Need to detect duplicate files
- Want to verify file integrity after copy
- Working with critical data that requires validation