Skip to main content

Overview

The Minimum Distance Classifier (MDC) is a simple nearest-neighbor classifier that assigns a new instance to the class of its closest training sample based on Euclidean distance. Namespace: Global (no namespace) Note: This is a lightweight implementation for 2D feature vectors. For more general k-NN classification, consider using a dedicated k-NN library.

Data structures

instance

Represents a single data point with features and label.
struct instance {
    std::array<double, 2> features; // Feature vector (2D)
    int label;                       // Class label
};
features
std::array<double, 2>
Feature vector containing exactly two dimensions
label
int
Integer class label associated with the instance
#include "MDC.h"

instance sample;
sample.features = {3.5, 2.1};
sample.label = 1;

Functions

calculate_distance

Compute the Euclidean distance between two instances.
double calculate_distance(const instance& instance_1, 
                         const instance& instance_2) noexcept;
instance_1
const instance&
First instance
instance_2
const instance&
Second instance
return
double
Euclidean distance between the feature vectors of the two instances
The distance is calculated as:
d = sqrt((x1 - x2)² + (y1 - y2)²)
instance p1 = {{1.0, 2.0}, 0};
instance p2 = {{4.0, 6.0}, 1};

double dist = calculate_distance(p1, p2);
// dist = sqrt((1-4)² + (2-6)²) = 5.0

classify

Classify a new instance using the minimum distance classifier.
int classify(const std::vector<instance>& training_data, 
            const instance& new_instance) noexcept;
training_data
const std::vector<instance>&
Vector of training instances with known labels
new_instance
const instance&
New instance to classify (label is ignored)
return
int
Predicted class label (label of the nearest training instance)
This function finds the training instance with the smallest Euclidean distance to the new instance and returns its label.
#include "MDC.h"
#include <vector>
#include <iostream>

int main() {
    // Create training data
    std::vector<instance> training_data = {
        {{1.0, 1.0}, 0},
        {{1.5, 2.0}, 0},
        {{5.0, 5.0}, 1},
        {{6.0, 6.5}, 1},
        {{2.0, 8.0}, 2},
        {{2.5, 9.0}, 2}
    };
    
    // Classify new instance
    instance test = {{5.5, 5.5}, -1};  // Label is ignored
    int predicted_label = classify(training_data, test);
    
    std::cout << "Predicted class: " << predicted_label << std::endl;
    // Expected output: 1 (closest to class 1 samples)
    
    return 0;
}

Example usage

#include "MDC.h"
#include <vector>
#include <iostream>
#include <iomanip>

int main() {
    // Create a simple 3-class dataset
    std::vector<instance> training_data;
    
    // Class 0: bottom-left region
    training_data.push_back({{1.0, 1.0}, 0});
    training_data.push_back({{1.5, 1.5}, 0});
    training_data.push_back({{2.0, 1.0}, 0});
    
    // Class 1: top-right region
    training_data.push_back({{8.0, 8.0}, 1});
    training_data.push_back({{8.5, 9.0}, 1});
    training_data.push_back({{9.0, 8.5}, 1});
    
    // Class 2: top-left region
    training_data.push_back({{1.0, 9.0}, 2});
    training_data.push_back({{2.0, 8.5}, 2});
    training_data.push_back({{1.5, 8.0}, 2});
    
    // Test samples
    std::vector<instance> test_samples = {
        {{1.2, 1.2}, -1},  // Should be class 0
        {{8.2, 8.7}, -1},  // Should be class 1
        {{1.8, 8.8}, -1},  // Should be class 2
        {{5.0, 5.0}, -1}   // Ambiguous
    };
    
    std::cout << "Classification results:\n";
    std::cout << std::fixed << std::setprecision(1);
    
    for (size_t i = 0; i < test_samples.size(); ++i) {
        const auto& test = test_samples[i];
        int prediction = classify(training_data, test);
        
        std::cout << "Sample (" << test.features[0] << ", " 
                  << test.features[1] << ") -> Class " 
                  << prediction << std::endl;
    }
    
    // Calculate distances manually
    instance query = {{5.0, 5.0}, -1};
    std::cout << "\nDistances from (5.0, 5.0):\n";
    
    for (const auto& train : training_data) {
        double dist = calculate_distance(query, train);
        std::cout << "  To (" << train.features[0] << ", " 
                  << train.features[1] << ") [class " 
                  << train.label << "]: " << dist << std::endl;
    }
    
    return 0;
}

Limitations

  • Fixed to 2D feature vectors only
  • No support for weighted voting or k-nearest neighbors
  • No distance metric customization (Euclidean only)
  • For production use, consider more robust k-NN implementations with:
    • Arbitrary feature dimensions
    • Multiple neighbor voting (k > 1)
    • Distance weighting
    • Efficient spatial indexing (k-d trees, ball trees)

Build docs developers (and LLMs) love