Overview
Imghash provides 19 different perceptual hashing algorithms, each optimized for different scenarios. This guide helps you select the right algorithm based on your requirements.Quick Recommendation
If you’re unsure which algorithm to choose, start with PDQ. It provides excellent robustness to common transformations while remaining fast enough for most applications.
Decision Tree
Consider Your Primary Use Case
Duplicate Detection & Content Moderation
- Use PDQ for general-purpose deduplication
- Use Average or Difference for simple, fast duplicate detection
- Use RASH for rotation-invariant duplicate detection
- Use GIST for scene-level similarity
- Use ColorMoment when color is important
- Use BoVW for feature-based similarity
- Use PDQ (developed by Meta for content matching)
- Use PHash for academic/research applications
- Use LBP for texture analysis
- Use BlockMean for block-based patterns
Evaluate Performance Requirements
Speed Priority (Simple & Fast)
- Average: Fastest, good for basic duplicate detection
- Difference: Very fast, detects horizontal changes
- Median: Fast, more robust than Average
- PDQ: Best balance of speed and robustness
- WHash: Wavelet-based, good middle ground
- BlockMean: Block-based approach, efficient
- GIST: Rich scene descriptor
- BoVW: Feature-based, highly configurable
- Zernike: Moment-based, rotation invariant
Consider Image Characteristics
Color is Important
- ColorMoment: Specifically designed for color images
- CLD (Color Layout Descriptor): MPEG-7 standard
- Average, Difference, Median: Simple structural hashes
- PHash: DCT-based structural hash
- PDQ: Advanced structural hash
- RASH: Rotation and Scale Hash
- RadialVariance: Radial-based approach
- Zernike: Moment-based rotation invariance
- MarrHildreth: Edge-based hashing
- EHD (Edge Histogram Descriptor): MPEG-7 standard
- LBP: Local Binary Patterns for texture
- HOGHash: Histogram of Oriented Gradients
Algorithm Comparison Table
All binary hash algorithms use Hamming distance by default. Float64 algorithms typically use L2 (Euclidean) or Cosine distance.
Binary Hash Algorithms
Binary hashes are compact, fast to compare, and work well for duplicate detection.| Algorithm | Hash Type | Use Case | Speed | Robustness |
|---|---|---|---|---|
| Average | Binary (64-bit) | Basic duplicate detection | ⚡️⚡️⚡️ Very Fast | ⭐️⭐️ Moderate |
| Difference | Binary (64-bit) | Horizontal gradient changes | ⚡️⚡️⚡️ Very Fast | ⭐️⭐️ Moderate |
| Median | Binary (64-bit) | Improved duplicate detection | ⚡️⚡️⚡️ Very Fast | ⭐️⭐️⭐️ Good |
| PHash | Binary (64-bit) | Academic/research standard | ⚡️⚡️ Fast | ⭐️⭐️⭐️⭐️ Excellent |
| WHash | Binary (64-bit) | Wavelet-based matching | ⚡️⚡️ Fast | ⭐️⭐️⭐️ Good |
| MarrHildreth | Binary (576-bit) | Edge-based detection | ⚡️ Moderate | ⭐️⭐️⭐️ Good |
| BlockMean | Binary (256-bit) | Block-based patterns | ⚡️⚡️ Fast | ⭐️⭐️⭐️ Good |
| PDQ | Binary (256-bit) | Production duplicate detection | ⚡️⚡️ Fast | ⭐️⭐️⭐️⭐️⭐️ Best |
| RASH | Binary | Rotation + scale invariant | ⚡️ Moderate | ⭐️⭐️⭐️⭐️ Excellent |
| BoVW (SimHash) | Binary | Feature-based vocabulary | ⚡️ Slow | ⭐️⭐️⭐️⭐️ Excellent |
Float64 Hash Algorithms
Float64 hashes provide richer representations, suitable for similarity search.| Algorithm | Hash Type | Default Metric | Use Case |
|---|---|---|---|
| ColorMoment | Float64 | L2 (Euclidean) | Color-aware similarity |
| Zernike | Float64 | L2 (Euclidean) | Rotation-invariant matching |
| GIST | Float64 | Cosine | Scene-level similarity |
| BoVW (Histogram) | Float64 | Cosine | Feature vocabulary |
| BoVW (MinHash) | Float64 | Jaccard | Feature signatures |
UInt8 Hash Algorithms
UInt8 hashes are compact histograms, balancing size and expressiveness.| Algorithm | Hash Type | Default Metric | Use Case |
|---|---|---|---|
| CLD | UInt8 | L2 (Euclidean) | Color layout (MPEG-7) |
| EHD | UInt8 | L1 (Manhattan) | Edge histogram (MPEG-7) |
| LBP | UInt8 | Chi-Square | Texture analysis |
| HOGHash | UInt8 | Cosine | Gradient-based matching |
| RadialVariance | UInt8 | L1 (Manhattan) | Radial patterns |
Use Case Examples
Detecting Near-Duplicate Images
Detecting Near-Duplicate Images
Best Choices: PDQ, Average, Median, DifferenceFor production systems handling user uploads, JPEG compression, and minor edits:For simpler cases with less variation:
Finding Similar Images by Content
Finding Similar Images by Content
Best Choices: GIST, ColorMoment, BoVWFor finding visually similar images (not exact duplicates):When color matters:
Detecting Rotated Images
Detecting Rotated Images
Best Choices: RASH, Zernike, RadialVarianceFor images that may be rotated:
Texture and Pattern Matching
Texture and Pattern Matching
Best Choices: LBP, HOGHash, BlockMeanFor texture analysis:
Content Moderation at Scale
Content Moderation at Scale
Best Choice: PDQFacebook/Meta developed PDQ specifically for large-scale content moderation:
Algorithm Characteristics
Simple Threshold-Based Algorithms
Average
Compares each pixel to the image mean. Fast and simple, good baseline.
Difference
Compares adjacent pixels horizontally. Detects gradient changes.
Median
Uses median instead of mean. More robust to outliers than Average.
Transform-Based Algorithms
PHash
Discrete Cosine Transform (DCT) based. Academic standard for perceptual hashing.
PDQ
Advanced DCT with Jarosz filtering. Industry standard for content moderation.
WHash
Haar wavelet transform. Good frequency domain representation.
Feature-Based Algorithms
GIST
Scene-level descriptor. Captures spatial structure and frequency information.
BoVW
Bag of Visual Words. Uses SIFT features and visual vocabulary.
HOGHash
Histogram of Oriented Gradients. Shape and appearance descriptor.
Specialized Algorithms
RASH
Rotation and Scale Hash. Invariant to rotation and scaling.
ColorMoment
Color-aware using Hu moments in YCrCb and HSV spaces.
LBP
Local Binary Patterns. Excellent for texture classification.
Performance Considerations
Computation Speed (Relative)
Memory Usage
Next Steps
Comparing Hashes
Learn how to compare hashes and interpret distance values
Practical Examples
See real-world examples and use cases