Overview
The Minimum Distance Classifier (MDC) is a simple nearest-neighbor classifier that assigns a new instance to the class of its closest training sample based on Euclidean distance.
Namespace: Global (no namespace)
Note: This is a lightweight implementation for 2D feature vectors. For more general k-NN classification, consider using a dedicated k-NN library.
Data structures
instance
Represents a single data point with features and label.
struct instance {
std::array<double, 2> features; // Feature vector (2D)
int label; // Class label
};
Feature vector containing exactly two dimensions
Integer class label associated with the instance
#include "MDC.h"
instance sample;
sample.features = {3.5, 2.1};
sample.label = 1;
Functions
calculate_distance
Compute the Euclidean distance between two instances.
double calculate_distance(const instance& instance_1,
const instance& instance_2) noexcept;
Euclidean distance between the feature vectors of the two instances
The distance is calculated as:
d = sqrt((x1 - x2)² + (y1 - y2)²)
instance p1 = {{1.0, 2.0}, 0};
instance p2 = {{4.0, 6.0}, 1};
double dist = calculate_distance(p1, p2);
// dist = sqrt((1-4)² + (2-6)²) = 5.0
classify
Classify a new instance using the minimum distance classifier.
int classify(const std::vector<instance>& training_data,
const instance& new_instance) noexcept;
training_data
const std::vector<instance>&
Vector of training instances with known labels
New instance to classify (label is ignored)
Predicted class label (label of the nearest training instance)
This function finds the training instance with the smallest Euclidean distance to the new instance and returns its label.
#include "MDC.h"
#include <vector>
#include <iostream>
int main() {
// Create training data
std::vector<instance> training_data = {
{{1.0, 1.0}, 0},
{{1.5, 2.0}, 0},
{{5.0, 5.0}, 1},
{{6.0, 6.5}, 1},
{{2.0, 8.0}, 2},
{{2.5, 9.0}, 2}
};
// Classify new instance
instance test = {{5.5, 5.5}, -1}; // Label is ignored
int predicted_label = classify(training_data, test);
std::cout << "Predicted class: " << predicted_label << std::endl;
// Expected output: 1 (closest to class 1 samples)
return 0;
}
Example usage
#include "MDC.h"
#include <vector>
#include <iostream>
#include <iomanip>
int main() {
// Create a simple 3-class dataset
std::vector<instance> training_data;
// Class 0: bottom-left region
training_data.push_back({{1.0, 1.0}, 0});
training_data.push_back({{1.5, 1.5}, 0});
training_data.push_back({{2.0, 1.0}, 0});
// Class 1: top-right region
training_data.push_back({{8.0, 8.0}, 1});
training_data.push_back({{8.5, 9.0}, 1});
training_data.push_back({{9.0, 8.5}, 1});
// Class 2: top-left region
training_data.push_back({{1.0, 9.0}, 2});
training_data.push_back({{2.0, 8.5}, 2});
training_data.push_back({{1.5, 8.0}, 2});
// Test samples
std::vector<instance> test_samples = {
{{1.2, 1.2}, -1}, // Should be class 0
{{8.2, 8.7}, -1}, // Should be class 1
{{1.8, 8.8}, -1}, // Should be class 2
{{5.0, 5.0}, -1} // Ambiguous
};
std::cout << "Classification results:\n";
std::cout << std::fixed << std::setprecision(1);
for (size_t i = 0; i < test_samples.size(); ++i) {
const auto& test = test_samples[i];
int prediction = classify(training_data, test);
std::cout << "Sample (" << test.features[0] << ", "
<< test.features[1] << ") -> Class "
<< prediction << std::endl;
}
// Calculate distances manually
instance query = {{5.0, 5.0}, -1};
std::cout << "\nDistances from (5.0, 5.0):\n";
for (const auto& train : training_data) {
double dist = calculate_distance(query, train);
std::cout << " To (" << train.features[0] << ", "
<< train.features[1] << ") [class "
<< train.label << "]: " << dist << std::endl;
}
return 0;
}
Limitations
- Fixed to 2D feature vectors only
- No support for weighted voting or k-nearest neighbors
- No distance metric customization (Euclidean only)
- For production use, consider more robust k-NN implementations with:
- Arbitrary feature dimensions
- Multiple neighbor voting (k > 1)
- Distance weighting
- Efficient spatial indexing (k-d trees, ball trees)