Skip to main content

Overview

Imghash provides 19 different perceptual hashing algorithms, each optimized for different scenarios. This guide helps you select the right algorithm based on your requirements.

Quick Recommendation

If you’re unsure which algorithm to choose, start with PDQ. It provides excellent robustness to common transformations while remaining fast enough for most applications.
pdq, err := imghash.NewPDQ()
if err != nil {
    panic(err)
}

hash, err := imghash.HashFile(pdq, "image.jpg")

Decision Tree

1

Consider Your Primary Use Case

Duplicate Detection & Content Moderation
  • Use PDQ for general-purpose deduplication
  • Use Average or Difference for simple, fast duplicate detection
  • Use RASH for rotation-invariant duplicate detection
Visual Similarity Search
  • Use GIST for scene-level similarity
  • Use ColorMoment when color is important
  • Use BoVW for feature-based similarity
Copyright & Forensics
  • Use PDQ (developed by Meta for content matching)
  • Use PHash for academic/research applications
Texture & Pattern Matching
  • Use LBP for texture analysis
  • Use BlockMean for block-based patterns
2

Evaluate Performance Requirements

Speed Priority (Simple & Fast)
  • Average: Fastest, good for basic duplicate detection
  • Difference: Very fast, detects horizontal changes
  • Median: Fast, more robust than Average
Balance (Robust & Reasonably Fast)
  • PDQ: Best balance of speed and robustness
  • WHash: Wavelet-based, good middle ground
  • BlockMean: Block-based approach, efficient
Quality Priority (More Computation)
  • GIST: Rich scene descriptor
  • BoVW: Feature-based, highly configurable
  • Zernike: Moment-based, rotation invariant
3

Consider Image Characteristics

Color is Important
  • ColorMoment: Specifically designed for color images
  • CLD (Color Layout Descriptor): MPEG-7 standard
Grayscale/Structure Only
  • Average, Difference, Median: Simple structural hashes
  • PHash: DCT-based structural hash
  • PDQ: Advanced structural hash
Rotation Invariance Needed
  • RASH: Rotation and Scale Hash
  • RadialVariance: Radial-based approach
  • Zernike: Moment-based rotation invariance
Edge/Texture Focus
  • MarrHildreth: Edge-based hashing
  • EHD (Edge Histogram Descriptor): MPEG-7 standard
  • LBP: Local Binary Patterns for texture
  • HOGHash: Histogram of Oriented Gradients

Algorithm Comparison Table

All binary hash algorithms use Hamming distance by default. Float64 algorithms typically use L2 (Euclidean) or Cosine distance.

Binary Hash Algorithms

Binary hashes are compact, fast to compare, and work well for duplicate detection.
AlgorithmHash TypeUse CaseSpeedRobustness
AverageBinary (64-bit)Basic duplicate detection⚡️⚡️⚡️ Very Fast⭐️⭐️ Moderate
DifferenceBinary (64-bit)Horizontal gradient changes⚡️⚡️⚡️ Very Fast⭐️⭐️ Moderate
MedianBinary (64-bit)Improved duplicate detection⚡️⚡️⚡️ Very Fast⭐️⭐️⭐️ Good
PHashBinary (64-bit)Academic/research standard⚡️⚡️ Fast⭐️⭐️⭐️⭐️ Excellent
WHashBinary (64-bit)Wavelet-based matching⚡️⚡️ Fast⭐️⭐️⭐️ Good
MarrHildrethBinary (576-bit)Edge-based detection⚡️ Moderate⭐️⭐️⭐️ Good
BlockMeanBinary (256-bit)Block-based patterns⚡️⚡️ Fast⭐️⭐️⭐️ Good
PDQBinary (256-bit)Production duplicate detection⚡️⚡️ Fast⭐️⭐️⭐️⭐️⭐️ Best
RASHBinaryRotation + scale invariant⚡️ Moderate⭐️⭐️⭐️⭐️ Excellent
BoVW (SimHash)BinaryFeature-based vocabulary⚡️ Slow⭐️⭐️⭐️⭐️ Excellent

Float64 Hash Algorithms

Float64 hashes provide richer representations, suitable for similarity search.
AlgorithmHash TypeDefault MetricUse Case
ColorMomentFloat64L2 (Euclidean)Color-aware similarity
ZernikeFloat64L2 (Euclidean)Rotation-invariant matching
GISTFloat64CosineScene-level similarity
BoVW (Histogram)Float64CosineFeature vocabulary
BoVW (MinHash)Float64JaccardFeature signatures

UInt8 Hash Algorithms

UInt8 hashes are compact histograms, balancing size and expressiveness.
AlgorithmHash TypeDefault MetricUse Case
CLDUInt8L2 (Euclidean)Color layout (MPEG-7)
EHDUInt8L1 (Manhattan)Edge histogram (MPEG-7)
LBPUInt8Chi-SquareTexture analysis
HOGHashUInt8CosineGradient-based matching
RadialVarianceUInt8L1 (Manhattan)Radial patterns

Use Case Examples

Best Choices: PDQ, Average, Median, DifferenceFor production systems handling user uploads, JPEG compression, and minor edits:
pdq, _ := imghash.NewPDQ()
h1, _ := imghash.HashFile(pdq, "original.jpg")
h2, _ := imghash.HashFile(pdq, "compressed.jpg")

dist, _ := pdq.Compare(h1, h2)

// PDQ is robust to JPEG compression, cropping, minor edits
if dist < 10 {
    fmt.Println("Likely duplicate")
}
For simpler cases with less variation:
avg, _ := imghash.NewAverage()
h1, _ := imghash.HashFile(avg, "img1.png")
h2, _ := imghash.HashFile(avg, "img2.png")

dist, _ := avg.Compare(h1, h2)
if dist < 5 {
    fmt.Println("Duplicate detected")
}
Best Choices: GIST, ColorMoment, BoVWFor finding visually similar images (not exact duplicates):
gist, _ := imghash.NewGIST()
h1, _ := imghash.HashFile(gist, "beach1.jpg")
h2, _ := imghash.HashFile(gist, "beach2.jpg")

dist, _ := gist.Compare(h1, h2)

// GIST captures scene-level features
// Lower cosine distance = more similar scenes
if dist < 0.3 {
    fmt.Println("Similar scenes")
}
When color matters:
cm, _ := imghash.NewColorMoment()
h1, _ := imghash.HashFile(cm, "sunset1.jpg")
h2, _ := imghash.HashFile(cm, "sunset2.jpg")

dist, _ := cm.Compare(h1, h2)
if dist < 15.0 {
    fmt.Println("Similar color distribution")
}
Best Choices: RASH, Zernike, RadialVarianceFor images that may be rotated:
rash, _ := imghash.NewRASH()
h1, _ := imghash.HashFile(rash, "original.jpg")
h2, _ := imghash.HashFile(rash, "rotated.jpg")

dist, _ := rash.Compare(h1, h2)

// RASH is designed to be rotation and scale invariant
if dist < 20 {
    fmt.Println("Same image despite rotation")
}
Best Choices: LBP, HOGHash, BlockMeanFor texture analysis:
lbp, _ := imghash.NewLBP()
h1, _ := imghash.HashFile(lbp, "fabric1.jpg")
h2, _ := imghash.HashFile(lbp, "fabric2.jpg")

dist, _ := lbp.Compare(h1, h2)

// LBP excels at texture patterns
// Uses Chi-Square distance by default
Best Choice: PDQFacebook/Meta developed PDQ specifically for large-scale content moderation:
pdq, _ := imghash.NewPDQ()

// Hash known harmful content
blocklist := make(map[string]imghash.Binary)
for _, path := range knownBadImages {
    hash, _ := imghash.HashFile(pdq, path)
    blocklist[path] = hash.(imghash.Binary)
}

// Check new upload
newHash, _ := imghash.HashFile(pdq, "upload.jpg")

for _, knownHash := range blocklist {
    dist, _ := pdq.Compare(newHash, knownHash)
    if dist < 10 {
        // Flag for review
        break
    }
}

Algorithm Characteristics

Simple Threshold-Based Algorithms

Average

Compares each pixel to the image mean. Fast and simple, good baseline.

Difference

Compares adjacent pixels horizontally. Detects gradient changes.

Median

Uses median instead of mean. More robust to outliers than Average.

Transform-Based Algorithms

PHash

Discrete Cosine Transform (DCT) based. Academic standard for perceptual hashing.

PDQ

Advanced DCT with Jarosz filtering. Industry standard for content moderation.

WHash

Haar wavelet transform. Good frequency domain representation.

Feature-Based Algorithms

GIST

Scene-level descriptor. Captures spatial structure and frequency information.

BoVW

Bag of Visual Words. Uses SIFT features and visual vocabulary.

HOGHash

Histogram of Oriented Gradients. Shape and appearance descriptor.

Specialized Algorithms

RASH

Rotation and Scale Hash. Invariant to rotation and scaling.

ColorMoment

Color-aware using Hu moments in YCrCb and HSV spaces.

LBP

Local Binary Patterns. Excellent for texture classification.

Performance Considerations

Hash Size vs. Accuracy Trade-off
  • Smaller hashes (64-bit): Faster comparison, less storage, lower accuracy
  • Larger hashes (256-bit+): Slower comparison, more storage, higher accuracy
  • Float64 hashes: Most expressive but require more sophisticated distance metrics

Computation Speed (Relative)

Fastest:     Average, Difference, Median
Fast:        PHash, PDQ, WHash, BlockMean
Moderate:    MarrHildreth, RASH, ColorMoment, CLD, EHD, LBP, HOGHash
Slow:        GIST, BoVW, Zernike, RadialVariance

Memory Usage

Binary (8-64 bytes):    Average, Difference, Median, PHash, WHash
Binary (32+ bytes):     BlockMean, PDQ, MarrHildreth, RASH
UInt8 (80-256 bytes):   CLD, EHD, LBP, HOGHash, RadialVariance
Float64 (128+ bytes):   ColorMoment, Zernike, GIST, BoVW

Next Steps

Comparing Hashes

Learn how to compare hashes and interpret distance values

Practical Examples

See real-world examples and use cases

Build docs developers (and LLMs) love