Supported Hash Algorithms
IPED supports multiple hash algorithms computed in parallel for optimal performance:- MD5 - Fast legacy algorithm, still widely used in forensics
- SHA-1 - Standard algorithm for many hash databases
- SHA-256 - Modern secure hash, recommended for integrity verification
- SHA-512 - High-security hash for sensitive cases
- eDonkey - Peer-to-peer network hash for file sharing investigations
- PhotoDNA - Perceptual hash for similar image detection (law enforcement only)
Configuration
Hash algorithms are configured inHashTaskConfig.txt. By default, MD5 is enabled as the primary hash:
Implementation Details
The hash calculation is implemented in/iped-engine/src/main/java/iped/engine/task/HashTask.java:
Parallel Processing
IPED uses a multi-threaded approach with ExecutorService to compute multiple hashes simultaneously:- Reads file data once from disk
- Updates all enabled hash algorithms in parallel threads
- Minimizes I/O overhead by processing in 1MB buffers
eDonkey Hash
The eDonkey hash is computed differently than standard hashes:- Files are divided into 9.5MB chunks
- MD4 hash computed for each chunk
- Final hash is MD4 of all chunk hashes concatenated
- Used to identify files shared on eDonkey/eMule networks
Hash Database Lookup
IPED supports multiple hash database formats:NIST NSRL
National Software Reference Library - identifies known legitimate software files to exclude from analysis.NIST CAID
Child Abuse Image Database - identifies known CSAM content (law enforcement only).Project VIC
Victim Identification database for child exploitation investigations (law enforcement only).Interpol ICSE
International Child Sexual Exploitation database (law enforcement only).Custom CSV
Standard comma-separated format:Deduplication
IPED performs fast hash deduplication during processing:- First occurrence of a hash is fully processed
- Subsequent files with same hash skip expensive operations:
- Parsing
- Text extraction
- OCR
- Signature analysis
- Metadata still extracted for all instances
- Significantly reduces processing time for large datasets
Hash Storage
Computed hashes are stored as item properties:- Primary hash in
hashfield (first configured algorithm) - Additional hashes in extra attributes:
md5,sha-1,sha-256, etc. - Indexed for fast searching
- Available in analysis interface and reports
PhotoDNA
PhotoDNA is a perceptual hashing technology developed by Microsoft:- Generates robust hash resistant to image modifications
- Detects cropped, resized, or color-adjusted images
- Used globally by law enforcement for CSAM detection
- Only available to law enforcement agencies
- Contact [email protected] for access
Performance
Hash calculation performance:- Single MD5: ~400-600 MB/s on modern hardware
- Multiple algorithms in parallel: Minimal overhead (~10-15%)
- Hash lookups: Optimized with in-memory hash maps
- Negligible impact on overall processing speed
Error Handling
The HashTask handles I/O errors gracefully:- Marked with
ioErrorattribute - Empty hash value assigned
- Logged for investigator review
- Processing continues for other items