Overview
PDQ (Photo-DNA Quality) is a perceptual hash developed by Facebook (now Meta) that produces a 256-bit hash robust to JPEG compression, rescaling, and minor edits while remaining fast enough for large-scale deduplication. The algorithm applies a Discrete Cosine Transform (DCT) to a 64×64 grayscale image, extracts a 16×16 coefficient block, and thresholds it against the median to produce a binary hash.When to Use
Use PDQ when you need:- High robustness to JPEG compression and minor edits
- Large-scale deduplication of images (designed for production use)
- 256-bit hashes for better discrimination than 64-bit alternatives
- Industry-standard hash used by content moderation systems
PDQ is particularly effective for detecting duplicate photos, memes, and content variations at scale.
Constructor
Available Options
WithInterpolation(interp Interpolation)- Sets the resize interpolation methodWithDistance(fn DistanceFunc)- Overrides the default Hamming distance function
Supported Interpolation Methods
NearestNeighborBilinear(default)BicubicMitchellNetravaliLanczos2Lanczos3BilinearExact
Usage Example
With Custom Options
Default Settings
- Hash size: 256 bits (32 bytes)
- Resize dimensions: 64×64 pixels (fixed)
- Interpolation: Bilinear
- Distance metric: Hamming distance
- Processing: DCT-based with Jarosz filter
Technical Details
The PDQ algorithm:- Resizes the image to 64×64 pixels
- Converts to grayscale
- Applies Jarosz box filter (window=2, reps=2)
- Computes DCT on the filtered image
- Extracts top-left 16×16 DCT coefficients
- Thresholds coefficients against their median
- Produces a 256-bit binary hash
Comparison
PDQ hashes are compared using Hamming distance by default. Lower distances indicate more similar images.References
- PDQ and TMK+PDQF by Facebook
- Source:
pdq.go:27-32