Overview
GIST is a holistic scene descriptor that captures the overall spatial structure and dominant orientations of an image. It applies a bank of oriented Gabor filters at multiple scales and pools the responses over a coarse spatial grid to create a global image representation. This algorithm is ideal for:- Scene recognition and classification
- Finding images with similar spatial layouts
- Matching images with similar overall structure
- Content-based image retrieval for scenes
GIST returns a Float64 hash type (not Binary). Use Cosine distance for comparison.
How It Works
- Resize: Image is resized to the specified dimensions (default: 64×64)
- Grayscale: Converts to grayscale
- Normalization: Normalizes pixel values (zero mean, unit variance)
- Gabor Filtering: Applies oriented Gabor filters at 3 scales with multiple orientations:
- Scale 1 (wavelength 4): 8 orientations
- Scale 2 (wavelength 8): 8 orientations
- Scale 3 (wavelength 12): 4 orientations
- Spatial Pooling: Divides image into grid cells and averages filter responses
- L2 Normalization: Normalizes the final descriptor vector
Constructor
Available Options
WithSize(width, height uint)- Set resize dimensions (default: 64×64)WithInterpolation(interp Interpolation)- Set interpolation method (default: Bilinear)WithGridSize(x, y uint)- Set spatial grid dimensions (default: 4×4)WithDistance(fn DistanceFunc)- Override default Cosine distance function
Usage Example
Default Settings
Image resize width
Image resize height
Resize interpolation method
Number of grid cells in X direction (must be > 0)
Number of grid cells in Y direction (must be > 0)
Distance comparison function
Hash Type
Returnshashtype.Float64 - a normalized descriptor vector.
Hash size = gridX × gridY × totalOrientations
- With default 4×4 grid: 320 values (4 × 4 × 20 orientations)
- With 8×8 grid: 1280 values
Distance Metric
Default comparison uses Cosine distance. Lower values indicate more similar scene structure. You can override with:similarity.L2- Euclidean distancesimilarity.L1- Manhattan distance- Custom distance function
Grid Size Parameter
The grid size controls spatial granularity:- Smaller grids (4×4): Captures global scene structure, smaller hash
- Larger grids (8×8, 16×16): More spatial detail, larger hash, more discriminative
- Trade-off between descriptor size and spatial precision
Gabor Filter Bank
Fixed configuration (3 scales, 20 total orientations):- Wavelength 4: 8 orientations (fine details)
- Wavelength 8: 8 orientations (medium scale)
- Wavelength 12: 4 orientations (coarse structure)