namespace referencePiiClassification
Utility functions and type definitions for common classification tasks.
Classes
| struct |
A function object that reads an input socket and converts the incoming object to a PiiMatrix<T>, if possible. |
Enumerations
|
enum
|
{ AdaBoost, RealBoost, FloatBoost, SammeBoost }
Implemented boosting algorithms. |
|
enum
|
{ DistanceSum, DistanceProduct, DistanceMin, DistanceMax }
Different ways of combining sub-vector distances in PiiMultiFeatureDistance. |
|
enum
|
{ OverwriteRandomSample, OverwriteOldestSample, DiscardNewSample }
Possible actions when a sample buffer is full. |
|
enum
|
{ NonSupervisedLearner = 0x1, OnlineLearner = 0x2, WeightedLearner
= 0x4 }
Learning algorithm capabilities. |
|
enum
|
{ SomRandomInit, SomSampleInit }
Initialization modes for a SOM code book. |
|
enum
|
{ SomSequentialAlgorithm, SomBalancedAlgorithm, SomQErrAlgorithm }
Learning algorithms for training a SOM. |
|
enum
|
{ SomBubble, SomGaussian, SomCutGaussian }
Different types of SOM neighborhoods. |
|
enum
|
{ SomLinearAlpha, SomInverseAlpha }
SOM learning rate functions. |
|
enum
|
{ SomHexagonal, SomSquare }
Different topologies types for the arrangement of neighboring nodes in a SOM. |
Functions
|
template<class FeatureIterator, class ConstFeatureIterator>
void
|
(
Adapt a code vector towards sample with the given strength alpha. |
|
template<class SampleSet, class DistanceMeasure>
PiiMatrix< double >
|
(
Generate a distance matrix. |
|
double PII_CLASSIFICATION_EXPORT
|
Calculates classification error. |
|
Counts the number of distinct labels in labels. |
|
|
QVector<
int > PII_CLASSIFICATION_EXPORT
|
Counts the number of distinct integer labels in labels. |
|
PiiMatrix< int >
PII_CLASSIFICATION_EXPORT
|
Create a confusion matrix. |
|
void PII_CLASSIFICATION_EXPORT
|
Creates a non-linearly separable binary sample set so that the samples in class one are surrounded by those in the other one. |
|
void PII_CLASSIFICATION_EXPORT
|
Creates a non-linearly separable sample set in which two classes spiral around each other on a plane. |
|
template<class SampleSet>
SampleSet
|
(
Create a random sample set. |
|
template<class T, class DistanceMeasure>
PiiMatrix< int >
|
Go through the row matrix |
|
template<class SampleSet, class DistanceMeasure>
int
|
(
Find the closest match for sample in modelSet. |
|
template<class SampleSet, class DistanceMeasure>
|
(
Find the |
|
template<class SampleSet, class DistanceMeasure>
SampleSet
|
(
K-means clustering algorithm. |
|
template<class SampleSet, class DistanceMeasure>
double
|
(
Classify a sample using the k nearest neighbors rule. |
|
|
|
|
(
|
|
|
PII_CLASSIFICATION_EXPORT double
|
(
Calculate the squared distance between two nodes in a SOM with a hexagonal topology. |
|
PII_CLASSIFICATION_EXPORT double
|
(
Calculate the squared distance between two nodes in a SOM with a square topology. |
Enumeration details
-
enum BoostingAlgorithm
Implemented boosting algorithms.
-
AdaBoost- the original (discrete) AdaBoost as introduced by Schapire & Freund. This algorithm has mostly historical value, use RealBoost instead. -
RealBoost- AdaBoost with confidence-rated predictions (a.k.a RealAdaBoost). Usually more accurate than AdaBoost. -
FloatBoost- RealBoost supplemented by ideas from floating search methods (after Stan Z. Li et al.) -
SammeBoost- Stagewise Adaptive Modeling using a Multi-class Exponential loss function (after Ji Zhu et al.) A multi-class generalization of AdaBoost.
-
-
enum DistanceCombinationMode
Different ways of combining sub-vector distances in PiiMultiFeatureDistance.
-
DistanceSum- sub-vector distances are summed up -
DistanceProduct- sub-vector distances are multiplied by each other -
DistanceMin- the minimum sub-vector distance is returned -
DistanceMax- the maximum sub-vector distance is returned
-
-
enum FullBufferBehavior
Possible actions when a sample buffer is full.
-
OverwriteRandomSample- the sample to be overwritten will be picked at random -
OverwriteOldestSample- the oldest sample currently in the buffer will be overwritten. -
DiscardNewSample- perform no action. Once the buffer is full, new samples will no longer be buffered.
-
-
enum LearnerCapability
Learning algorithm capabilities.
-
NonSupervisedLearner- the algorithm can be trained with no a priori knowledge of sample labels. -
OnlineLearner- the classifier is capable of learning on-line, one sample at a time. -
WeightedLearner- the classifier is able to learn weighted samples.
-
-
enum SomInitMode
Initialization modes for a SOM code book.
-
SomRandomInit- initialize the code book randomly. The limits of the random values are taken from the first incoming feature vector. -
SomSampleInit- initialize the code book by selecting incoming samples as initial code vectors. In on-line learning, the first w*h samples will be used (w and h denote SOM width and height). In batch learning, initial code vectors will be randomly selected from the training samples.
-
-
enum SomLearningAlgorithm
Learning algorithms for training a SOM.
-
SomSequentialAlgorithm- the traditional sequential learning algorithm. Monotonically decreasing learning constant and neighborhood size. -
SomBalancedAlgorithm- the balanced SOM algorithm. Each input sample is weighted based on its disparity. This algorithm better captures small clusters in the input space while maintaining the topographic properties of the original SOM algorithm. -
SomQErrAlgorithm- a modification of the "parameterless" SOM algorithm. Each input sample is weighted based on its quantization error. This algorithm is the most "elastic" of the three. It tries to cover the whole input space independent of data density.
-
-
enum SomNeighborhood
Different types of SOM neighborhoods.
When updating nodes in a SOM, the amount of vector movement is determined by the neighborhood function.
-
SomBubble- each node within the current radius is updated with a weight of one. Others are not updated. -
SomGaussian- the neighbors are weighted according to a Gaussian function that decreases with distance. -
SomCutGaussian- the neighbors are weighted according to a Gaussian function that decreases with distance, if they fall within the radius. This is practically a combination of the two other modes.
-
-
enum SomRateFunction
SOM learning rate functions.
-
SomLinearAlpha- learning rate decreases linearly -
SomInverseAlpha- learning rate is inversely proportional to training interation index
-
-
enum SomTopology
Different topologies types for the arrangement of neighboring nodes in a SOM.
-
SomHexagonal- with each node, six closest neighbors have a distance of one -
SomSquare- four neighbors have a distance of one
___ ___ ___ ___ ___/ \___ | | | | / \___/ \ |___|___|___| \___/ \___/ | | | | / \___/ \ |___|___|___| \___/ \___/ | | | | \___/ |___|___|___|
-
Function details
-
template<class FeatureIterator, class ConstFeatureIterator>
void adaptVector
(- FeatureIterator code
- ConstFeatureIterator sample
- int length
- double alpha
#include <PiiClassification.h>Adapt a code vector towards sample with the given strength alpha.
The code vector will be modified in place. The function will calculate the weighted average of code vector C and sample S as
.
Parameters
- code
-
an iterator to the beginning of the code vector.
- sample
-
an iterator to the beginning of the vector to tune the code vector towards.
- length
- alpha
-
the strength of the tuning. 0 means no change, 1 means that code vector will be replaced with
sample.
-
template<class SampleSet, class DistanceMeasure>
PiiMatrix< double > calculateDistanceMatrix
(- const SampleSet & samples
- const DistanceMeasure & measure
- bool symmetric = true
- bool calculateDiagonal = false
#include <PiiClassification.h>Generate a distance matrix.
Let us denote the number of vectors (rows in
vectors) by N. The size of the distance matrix is N-by-N, and each element (r,c) stores the distance between vector r and vector c, calculated withmeasure(vectors[r],vectors[c]). Note that since distance measures and kernels share the same interface, this function can be used to calculate a kernel matrix as well.Parameters
- samples
- measure
- symmetric
-
if
true, the upper triangle will be filled by copying the lower triangle. - calculateDiagonal
-
if
true, each vector's distance to itself will also be calculated.
Returns
the pairwise distances between input vectors as a matrix.
-
double PII_CLASSIFICATION_EXPORT calculateError
#include <PiiClassification.h>Calculates classification error.
This function returns the ratio of misclassified samples.
Parameters
- knownLabels
-
the ground truth. N labels.
- hypothesis
-
the classification result. N labels. If a hypothesis is NaN, it will be ignored.
- weights
-
a weight for each sample. The weights should sum up to one. Can be omitted.
Returns
the (weighted) classification error, in [0,1].
-
#include <PiiClassification.h>Counts the number of distinct labels in labels.
Returns the found labels as a list of pairs storing the class label (pair.first) and the number of entries (pair.second).
QVector<double> labels = QVector<double>() << 0 << 1 << 2 << 1 << 4 << 0; QList<QPair<double,int> > counts = PiiClassification::countLabels(labels); // counts = ((0.0, 2), (1.0, 2), (2.0, 1), (4.0, 1))
The label list may not contain NANs.
-
#include <PiiClassification.h>Counts the number of distinct integer labels in labels.
This function ignores the decimal part of the class labels. The nth element in the returned list contains the number of labels whose value (truncated to an integer) equals n. All negative labels will be collected to the zero bin in the returned histogram.
QVector<double> labels = QVector<double>() << 0.9 << 1.1 << 2.5 << 1.3 << 4.05 << 0.01; QVector<int> counts = PiiClassification::countLabelsInt(labels); // counts = (2, 2, 1, 0, 1)
The label list may not contain NANs.
-
PiiMatrix< int > PII_CLASSIFICATION_EXPORT createConfusionMatrix
#include <PiiClassification.h>Create a confusion matrix.
Parameters
- knownLabels
-
the ground truth. N known class indices for the samples.
- hypothesis
-
N class labels produced by a classifier. If any of the hypotheses is -1, an extra "discard" class will be added as the last column of the returned matrix.
Returns
a matrix in which row indices correspond to the ground truth and column indices to the hypotheses. The values are hit counts. For example, if the value at (1,2) is 9, nine samples of class one were incorrectly classified to class two.
-
void PII_CLASSIFICATION_EXPORT createDartBoard
#include <PiiClassification.h>Creates a non-linearly separable binary sample set so that the samples in class one are surrounded by those in the other one.
In the picture below, samples1 = samples2 = 200. Samples with label 0 are shown in red, and the samples with label 1 in blue.
Parameters
- samples1
-
the number of samples in the set at the center.
- samples2
-
the number of samples in the surrounding set.
- samples
-
this matrix will be filled with the two-dimensional feature vectors. The first samples1 rows will represent class 0 and the rest class 1. The size of the matrix will be samples1 + samples2 by 2.
- labels
-
this vector will be filled with the corresponding class labels (0 for the samples1 entries, 1 for the rest).
-
void PII_CLASSIFICATION_EXPORT createDoubleSpiral
#include <PiiClassification.h>Creates a non-linearly separable sample set in which two classes spiral around each other on a plane.
In the picture below, samplesPerSet = 2000 and rounds = 3.0. Samples with label 0 are shown in red, and the samples with label 1 in blue.
Parameters
- samplesPerSet
-
the number of samples in each of the two sets
- rounds
-
the number of times the spirals will turn around the origin.
- samples
-
this matrix will be filled with the two-dimensional feature vectors. The first samplesPerSet rows will represent class 0 and the rest class 1. The size of the matrix will be 2 * samplesPerSet by 2.
- labels
-
this vector will be filled with the corresponding class labels (0 for the first half, 1 for the rest).
-
template<class SampleSet>
SampleSet createRandomSampleSet
(- int samples
- int features
- double minimum
- double maximum
#include <PiiClassification.h>Create a random sample set.
Each element in the returned sample set is a random number uniformly distributed in the range [minimum, maximum].
using namespace PiiClassification; PiiMatrix<double> matSamples = createRandomSampleSet<PiiMatrix<double> >(10, 16, -1, 1);
Parameters
- samples
-
the number of samples to create
- features
-
the number of columns (i.e. the length of the feature vector)
- minimum
-
smallest possible feature value
- maximum
-
largest possible feature value
-
template<class T, class DistanceMeasure>
PiiMatrix< int > fillMissingLabels
#include <PiiClassification.h>Go through the row matrix
labelsand replace each -1 with the label of the closest code vector incodeBook.Parameters
- labels
-
labels for the vectors in
codeBook. Labels with no associated code vector will not be changed. The label matrix may be either a column or a row matrix. - codeBook
-
code vectors. The number of rows in this matrix should be greater than or equal to the number of columns in
labels. - measure
-
the measure used for distance estimation.
Returns
the new labels
-
template<class SampleSet, class DistanceMeasure>
int findClosestMatch
(- typename PiiSampleSet::Traits< SampleSet >::ConstFeatureIterator sample
- const SampleSet & modelSet
- const DistanceMeasure & measure
- double * distance = 0
#include <PiiClassification.h>Find the closest match for sample in modelSet.
PiiSquaredGeometricDistance<const float*> dist; PiiMatrix<float> matSamples(50,2); // each row is a feature vector PiiMatrix<float> matObserved(1,2); // observed sample int iMatch = PiiClassification::findClosestMatch(matObserved[0], d->matFeatures, dist);
Parameters
- sample
-
an iterator to the beginning of feature data. Must be valid through
modelSet.featureCount()elements. - modelSet
-
the model samples to compare
featureVectoragainst. - measure
-
the distance measure used to calculate the difference between sample and each model.
- distance
-
an optional output-value parameter that will store the distance to the closest code book vector.
Returns
the index of the closest code model sample, or -1 if
modelSetis empty. -
template<class SampleSet, class DistanceMeasure>
MatchList findClosestMatches
(- typename PiiSampleSet::Traits< SampleSet >::ConstFeatureIterator sample
- const SampleSet & modelSet
- const DistanceMeasure & measure
- int n
#include <PiiClassification.h>Find the
nclosest matches for sample in modelSet.PiiSquaredGeometricDistance<const float*> dist; PiiMatrix<float> matSamples(50,2); // each row is a feature vector PiiMatrix<float> matObserved(1,2); // observed sample PiiClassification::MatchList lstMatces = PiiClassification::findClosestMatches(matObserved[0], d->matFeatures, dist, 5);
Parameters
- sample
-
an iterator to the beginning of feature data. Must be valid through
modelSet.featureCount()elements. - modelSet
-
the model samples to compare
featureVectoragainst. - measure
-
the distance measure used to calculate the difference between sample and each model.
- n
-
the number of closest matches to return. Each element in the returned list contains the distance to a model sample and its index in the sample set. The list is sorted in ascending order according to the distance. The length of the list is the minimum of n and the number of samples in modelSet.
-
template<class SampleSet, class DistanceMeasure>
SampleSet kMeans
(- const SampleSet & samples
- unsigned int k
- const DistanceMeasure & measure
- unsigned int maxIterations = 0
#include <PiiClassification.h>K-means clustering algorithm.
The k-means algorithm is an algorithm to cluster n objects based on attributes into k partitions, k < n. It is similar to the expectation-maximization algorithm for mixtures of Gaussians in that they both attempt to find the centers of natural clusters in the data. It assumes that the object attributes form a vector space. The objective it tries to achieve is to minimize total intra-cluster variance, or, the squared error function
where there are k clusters
,
, and
is the centroid or mean point of all
the points
. This
implementation uses an iterative refinement heuristic known as
Lloyd's algorithm to solve the optimization problem.
Parameters
- samples
-
a set of feature vectors to run the algorithm on. Each row of this matrix represents a feature vector. The number of samples must be greater than
k. - k
-
the number of centroids
- measure
-
a measure used to calculate the distance between samples and centroids.
- maxIterations
-
if this value is non-positive, the algorithm will be run until convergence. If you want to quit earlier, set this to a positive value
Returns
the centroids
-
template<class SampleSet, class DistanceMeasure>
double knnClassify
(- typename PiiSampleSet::Traits< SampleSet >::ConstFeatureIterator sample
- const SampleSet & modelSet
- const QVector< double > & labels
- const DistanceMeasure & measure
- int k
- double * distance = 0
- int * closestIndex = 0
#include <PiiClassification.h>Classify a sample using the k nearest neighbors rule.
This function compares sample to each model in modelSet, to find the k closest ones. Then, it uses labels to find out the class label that has the most occurrences within the k closest models. In the case of a tie, the class with the closest neighbor wins.
Parameters
- sample
-
an iterator to the beginning of feature data. Must be valid through
modelSet.featureCount()elements. - modelSet
-
the model samples to compare
featureVectoragainst. - labels
-
a label for each sample in modelSet. The length of this list must match the number of samples in modelSet.
- measure
-
the distance measure used to calculate the difference between sample and each model.
- k
-
the number of nearest neighbors to consider.
- distance
-
an optional output value that, if non-zero, will store the distance to the closest sample representing the winning class.
- closestIndex
-
an optional output value that, if non-zero, will store the index of the closest model sample of the winning class. Note the closest sample in the winning class may not be the closest of all samples.
Returns
the class label with the most representatives among the k nearest neighbors of sample.
-
Q_DECLARE_FLAGS
#include <PiiClassificationGlobal.h> -
Q_DECLARE_OPERATORS_FOR_FLAGS
(- LearnerCapabilities
#include <PiiClassificationGlobal.h> -
PII_CLASSIFICATION_EXPORT double somHexagonalDistance
(- int bx
- int by
- int tx
- int ty
#include <PiiSom.h>Calculate the squared distance between two nodes in a SOM with a hexagonal topology.
-
PII_CLASSIFICATION_EXPORT double somSquareDistance
(- int bx
- int by
- int tx
- int ty
#include <PiiSom.h>Calculate the squared distance between two nodes in a SOM with a square topology.
Add a note
Not a single note added yet. Be the first, add yours.