Overview
imghash provides 9 distinct similarity metrics to measure how alike two image hashes are. Each metric is optimized for specific hash types and use cases. Lower distance values indicate more similar images.similarity package and can be used with the WithDistance option.
Hamming Distance
Counts the number of differing bits between two binary hashes. The classic metric for bit-based perceptual hashes.Signature
Requirements
Algorithm
Example
Use Cases
- Average, Difference, Median hashes - Fast near-duplicate detection
- PHash - Robust image similarity
- Large-scale search - Optimized for speed with bit operations
Weighted Hamming
Like Hamming distance but applies per-byte weights to emphasize important hash regions.Signature
Requirements
- Binary hashes only
- Weights slice length must match hash byte length
Algorithm
Example
Use Cases
- PDQ hashes - Weight perceptually important DCT coefficients
- Custom applications - When certain hash regions are more discriminative
- Object detection - Weight center pixels higher than edges
WeightedHamming cannot be set as the default distance for an algorithm via WithDistance. You must call it directly.L1 Distance (Manhattan)
Sum of absolute differences between corresponding elements. Works with all hash types.Signature
Algorithm
Example
Characteristics
Hash Types
Binary, UInt8, Float64 - works with all types
Complexity
O(n) where n is hash length - very fast
Sensitivity
Linear - proportional to element differences
Range
[0, ∞) - unbounded maximum
Use Cases
- ColorMoment - Color distribution comparison
- EHD - Edge histogram matching
- When outliers matter - Doesn’t square differences like L2
L2 Distance (Euclidean)
Square root of sum of squared differences. The straight-line distance in n-dimensional space.Signature
Algorithm
Example
Characteristics
- Penalizes large differences - Squaring amplifies outliers
- Geometric interpretation - True Euclidean distance in feature space
- Smooth gradients - Better for optimization tasks
Use Cases
- GIST - Global scene descriptor comparison
- Zernike - Shape matching with moment invariants
- Feature vectors - When semantic distance matters
Cosine Distance
Measures the angle between two vectors, independent of magnitude. Returns 1 - cosine similarity.Signature
Algorithm
Example
Characteristics
- Range: [0, 2] where 0 = identical direction, 2 = opposite direction
- Magnitude independent - Only measures angle, not scale
- Normalized - Good for vectors with different scales
Use Cases
- High-dimensional features - When magnitude is less important than direction
- Normalized histograms - Comparing distributions
- Text-like features - Similar to TF-IDF comparison
Cosine returns 0 when both hashes are zero vectors, avoiding division by zero.
Chi-Square Distance
Statistical measure comparing probability distributions. Common for histogram comparison.Signature
Algorithm
Example
Characteristics
- Histogram optimized - Designed for comparing distributions
- Non-negative - Values must be >= 0
- Asymmetric penalty - Differences in small bins matter more
Use Cases
- CLD - Color layout histograms
- LBP - Texture pattern distributions
- Any histogram-based feature - Standard choice for distribution comparison
PCC (Peak Cross-Correlation)
Finds maximum correlation across circular rotations. Useful for rotation-invariant matching.Signature
Requirements
Algorithm
Example
Characteristics
- Rotation invariant - Tests all circular shifts
- Computationally expensive - O(n²) where n is hash length
- Returns correlation - Higher values mean more similar
- Range: (-∞, 1] where 1 = perfect match
Use Cases
- RadialVariance - Radial projection features
- Rotated images - When rotation is expected
- Circular features - Angular histograms, polar transforms
Jaccard Distance
Set-based similarity with three modes depending on hash type.Signature
Binary Mode (Bitset)
Compares set bits using intersection over union.UInt8/Float64 Mode (MinHash)
Treats hashes as MinHash signatures, counting matching positions.Example
- Binary (Bitset)
- UInt8 (MinHash)
Characteristics
Binary Mode
Bitset intersection/union - standard Jaccard index
MinHash Mode
Signature matching - approximates set similarity
Range
[0, 1] where 0 = identical, 1 = completely different
Symmetric
Jaccard(A,B) = Jaccard(B,A)
Use Cases
- Set-based features - When images are represented as feature sets
- BoVW with MinHash - Bag of Visual Words similarity
- Sparse binary features - When most bits are 0
Using Custom Metrics
Override an algorithm’s default distance function usingWithDistance:
Metric Selection Guide
Binary Hashes (Average, Difference, PHash)
Binary Hashes (Average, Difference, PHash)
Primary:
Hamming - Fast, standard choiceAlternative: Jaccard - Better for sparse binary vectorsAdvanced: WeightedHamming - When spatial importance variesHistogram Features (ColorMoment, CLD, EHD)
Histogram Features (ColorMoment, CLD, EHD)
Primary:
ChiSquare - Statistical standard for histogramsAlternative: L1 - Simpler, fasterAlternative: L2 - Penalizes large differences moreHigh-Dimensional Vectors (GIST, Zernike, HOGHash)
High-Dimensional Vectors (GIST, Zernike, HOGHash)
Primary:
L2 - True geometric distanceAlternative: Cosine - Magnitude-independentAlternative: L1 - Less sensitive to outliersRotation-Variant Features (RadialVariance)
Rotation-Variant Features (RadialVariance)
Primary:
PCC - Handles circular rotationsNote: Much slower but rotation-invariantDistance Comparison
| Metric | Hash Types | Complexity | Range | Best For |
|---|---|---|---|---|
| Hamming | Binary | O(n) | [0, ∞) | Bit-based hashes, speed |
| WeightedHamming | Binary | O(n) | [0, ∞) | Spatial importance |
| L1 | All | O(n) | [0, ∞) | Histograms, robustness |
| L2 | All | O(n) | [0, ∞) | Feature vectors, geometry |
| Cosine | All | O(n) | [0, 2] | Direction, normalized data |
| ChiSquare | All | O(n) | [0, ∞) | Probability distributions |
| PCC | All (same len) | O(n²) | (-∞, 1] | Rotation invariance |
| Jaccard | All | O(n) | [0, 1] | Sets, MinHash |
Error Handling
Performance Tips
Use Hamming for Binary
It’s highly optimized with bit operations - fastest option for Binary hashes.
Related
- Hash Types - Understand Binary, UInt8, and Float64 representations
- Choosing an Algorithm - Choose the right algorithm and metric combination
- Similarity API - Complete similarity package documentation