Dataset
Template class for managing collections of records with a common schema in clustering algorithms.Template parameters
Numeric type constrained by the
Numeric concept (requires std::is_arithmetic_v<T>)Constructors
Constructs a dataset with specified schemaParameters:
schema_ptr(std::shared_ptr<Schema<T>>): Shared pointer to the schema
Copy constructorParameters:
other(const Dataset<T>&): Dataset to copy from
Methods
Returns the number of attributes in the datasetReturns: Number of attributes defined in the schema
Returns the dataset schemaReturns: Const reference to shared pointer to the schema
Accesses attribute value at record i, attribute jParameters:
i(std::size_t): Record indexj(std::size_t): Attribute index
Accesses attribute value at record i, attribute j (const version)Parameters:
i(std::size_t): Record indexj(std::size_t): Attribute index
Checks if all attributes are continuous (numeric)Returns: True if dataset contains only continuous attributes
Checks if all attributes are discrete (categorical)Returns: True if dataset contains only discrete attributes
Saves the dataset to a fileParameters:
filename(const std::string&): Path to output file
Generates a confusion matrix from labelled and clustered dataReturns: Confusion matrix comparing true labels to cluster assignmentsRequires the dataset to have labels. Used for evaluating clustering quality.
Assignment operatorParameters:
other(const Dataset<T>&): Dataset to assign from
Example
Schema
Template class defining the structure and metadata for dataset attributes.Template parameters
Numeric type constrained by the
Numeric conceptMethods
Creates a deep copy of the schemaReturns: Pointer to cloned schema
Returns mutable reference to label attribute metadataReturns: Shared pointer to discrete attribute info for labels
Returns const reference to label attribute metadataReturns: Const shared pointer to discrete attribute info for labels
Returns mutable reference to ID attribute metadataReturns: Shared pointer to discrete attribute info for record IDs
Returns const reference to ID attribute metadataReturns: Const shared pointer to discrete attribute info for record IDs
Sets the label value for a recordParameters:
r(std::shared_ptr<Record<T>>&): Record to modifyval(const std::string&): String value to set as label
Sets the ID value for a recordParameters:
r(std::shared_ptr<Record<T>>&): Record to modifyval(const std::string&): String value to set as ID
Checks if the schema includes label informationReturns: True if schema has label attribute defined
Checks if two schemas are equal (including labels)Parameters:
o(const Schema<T>&): Other schema to compare
Checks if two schemas are equal excluding labelsParameters:
o(const Schema<T>&): Other schema to compare
Checks if an attribute is part of this schemaParameters:
info(const AttrInfo<T>&): Attribute to check
Example
Record
Template class representing a single data record in a dataset.Template parameters
Numeric type constrained by the
Numeric conceptConstructor
Constructs a record with specified schemaParameters:
schema_ptr(std::shared_ptr<Schema<T>>): Shared pointer to the schema
Methods
Returns the record’s schemaReturns: Const reference to shared pointer to the schema
Returns mutable reference to the label valueReturns: Mutable reference to label attribute value
Returns const reference to the label valueReturns: Const reference to label attribute value
Returns mutable reference to the ID valueReturns: Mutable reference to ID attribute value
Returns const reference to the ID valueReturns: Const reference to ID attribute value
Gets the record’s ID as an integerReturns: Record ID
Gets the record’s label as an integerReturns: Record label (class/category)
Private members
Shared pointer to the record’s schema
Label attribute value (for supervised evaluation)
ID attribute value (cluster assignment)
Vector of feature attribute values
Example
Namespace
All dataset-related classes are defined in themlpp::unsupervised::clustering namespace.
Type constraints
TheNumeric concept requires:
T is an arithmetic type (int, float, double, etc.).