Into

Modules

Documentation

namespace referencePiiClassification

Utility functions and type definitions for common classification tasks.

Classes

struct

A function object that reads an input socket and converts the incoming object to a PiiMatrix<T>, if possible.

Enumerations

enum
{ AdaBoost, RealBoost, FloatBoost, SammeBoost }

Implemented boosting algorithms.

enum
{ DistanceSum, DistanceProduct, DistanceMin, DistanceMax }

Different ways of combining sub-vector distances in PiiMultiFeatureDistance.

enum
{ OverwriteRandomSample, OverwriteOldestSample, DiscardNewSample }

Possible actions when a sample buffer is full.

enum
{ NonSupervisedLearner = 0x1, OnlineLearner = 0x2, WeightedLearner = 0x4 }

Learning algorithm capabilities.

enum
{ SomRandomInit, SomSampleInit }

Initialization modes for a SOM code book.

enum
{ SomSequentialAlgorithm, SomBalancedAlgorithm, SomQErrAlgorithm }

Learning algorithms for training a SOM.

enum
{ SomBubble, SomGaussian, SomCutGaussian }

Different types of SOM neighborhoods.

enum
{ SomLinearAlpha, SomInverseAlpha }

SOM learning rate functions.

enum
{ SomHexagonal, SomSquare }

Different topologies types for the arrangement of neighboring nodes in a SOM.

Functions

template<class FeatureIterator, class ConstFeatureIterator>
void
(
  • FeatureIterator code
  • ConstFeatureIterator sample
  • int length
  • double alpha
)

Adapt a code vector towards sample with the given strength alpha.

template<class SampleSet, class DistanceMeasure>
PiiMatrix< double >
(
  • const SampleSet & samples
  • const DistanceMeasure & measure
  • bool symmetric = true
  • bool calculateDiagonal = false
)

Generate a distance matrix.

double PII_CLASSIFICATION_EXPORT
(
  • const QVector< double > & knownLabels
  • const QVector< double > & hypothesis
  • const QVector< double > & weights = < double >()
)

Calculates classification error.

QList< QPair< double, int > > PII_CLASSIFICATION_EXPORT
( )

Counts the number of distinct labels in labels.

QVector< int > PII_CLASSIFICATION_EXPORT
( )

Counts the number of distinct integer labels in labels.

PiiMatrix< int > PII_CLASSIFICATION_EXPORT
(
  • const QVector< double > & knownLabels
  • const QVector< double > & hypothesis
)

Create a confusion matrix.

void PII_CLASSIFICATION_EXPORT
(
  • int samples1
  • int samples2
  • PiiMatrix< double > & samples
  • QVector< double > & labels
)

Creates a non-linearly separable binary sample set so that the samples in class one are surrounded by those in the other one.

void PII_CLASSIFICATION_EXPORT
(
  • int samplesPerSet
  • double rounds
  • PiiMatrix< double > & samples
  • QVector< double > & labels
)

Creates a non-linearly separable sample set in which two classes spiral around each other on a plane.

template<class SampleSet>
SampleSet
(
  • int samples
  • int features
  • double minimum
  • double maximum
)

Create a random sample set.

template<class T, class DistanceMeasure>
PiiMatrix< int >
( )

Go through the row matrix labels and replace each -1 with the label of the closest code vector in codeBook.

template<class SampleSet, class DistanceMeasure>
int
(
  • typename PiiSampleSet::Traits< SampleSet >::ConstFeatureIterator sample
  • const SampleSet & modelSet
  • const DistanceMeasure & measure
  • double * distance = 0
)

Find the closest match for sample in modelSet.

template<class SampleSet, class DistanceMeasure>
(
  • typename PiiSampleSet::Traits< SampleSet >::ConstFeatureIterator sample
  • const SampleSet & modelSet
  • const DistanceMeasure & measure
  • int n
)

Find the n closest matches for sample in modelSet.

template<class SampleSet, class DistanceMeasure>
SampleSet
(
  • const SampleSet & samples
  • unsigned int k
  • const DistanceMeasure & measure
  • unsigned int maxIterations = 0
)

K-means clustering algorithm.

template<class SampleSet, class DistanceMeasure>
double
(
  • typename PiiSampleSet::Traits< SampleSet >::ConstFeatureIterator sample
  • const SampleSet & modelSet
  • const QVector< double > & labels
  • const DistanceMeasure & measure
  • int k
  • double * distance = 0
  • int * closestIndex = 0
)

Classify a sample using the k nearest neighbors rule.

( )
(
  • LearnerCapabilities
)
PII_CLASSIFICATION_EXPORT double
(
  • int bx
  • int by
  • int tx
  • int ty
)

Calculate the squared distance between two nodes in a SOM with a hexagonal topology.

PII_CLASSIFICATION_EXPORT double
(
  • int bx
  • int by
  • int tx
  • int ty
)

Calculate the squared distance between two nodes in a SOM with a square topology.

Enumeration details

  • enum BoostingAlgorithm

    Implemented boosting algorithms.

    • AdaBoost - the original (discrete) AdaBoost as introduced by Schapire & Freund. This algorithm has mostly historical value, use RealBoost instead.

    • RealBoost - AdaBoost with confidence-rated predictions (a.k.a RealAdaBoost). Usually more accurate than AdaBoost.

    • FloatBoost - RealBoost supplemented by ideas from floating search methods (after Stan Z. Li et al.)

    • SammeBoost - Stagewise Adaptive Modeling using a Multi-class Exponential loss function (after Ji Zhu et al.) A multi-class generalization of AdaBoost.

  • enum DistanceCombinationMode

    Different ways of combining sub-vector distances in PiiMultiFeatureDistance.

    • DistanceSum - sub-vector distances are summed up

    • DistanceProduct - sub-vector distances are multiplied by each other

    • DistanceMin - the minimum sub-vector distance is returned

    • DistanceMax - the maximum sub-vector distance is returned

  • enum FullBufferBehavior

    Possible actions when a sample buffer is full.

    • OverwriteRandomSample - the sample to be overwritten will be picked at random

    • OverwriteOldestSample - the oldest sample currently in the buffer will be overwritten.

    • DiscardNewSample - perform no action. Once the buffer is full, new samples will no longer be buffered.

  • enum LearnerCapability

    Learning algorithm capabilities.

    • NonSupervisedLearner - the algorithm can be trained with no a priori knowledge of sample labels.

    • OnlineLearner - the classifier is capable of learning on-line, one sample at a time.

    • WeightedLearner - the classifier is able to learn weighted samples.

  • enum SomInitMode

    Initialization modes for a SOM code book.

    • SomRandomInit - initialize the code book randomly. The limits of the random values are taken from the first incoming feature vector.

    • SomSampleInit - initialize the code book by selecting incoming samples as initial code vectors. In on-line learning, the first w*h samples will be used (w and h denote SOM width and height). In batch learning, initial code vectors will be randomly selected from the training samples.

  • enum SomLearningAlgorithm

    Learning algorithms for training a SOM.

    • SomSequentialAlgorithm - the traditional sequential learning algorithm. Monotonically decreasing learning constant and neighborhood size.

    • SomBalancedAlgorithm - the balanced SOM algorithm. Each input sample is weighted based on its disparity. This algorithm better captures small clusters in the input space while maintaining the topographic properties of the original SOM algorithm.

    • SomQErrAlgorithm - a modification of the "parameterless" SOM algorithm. Each input sample is weighted based on its quantization error. This algorithm is the most "elastic" of the three. It tries to cover the whole input space independent of data density.

  • enum SomNeighborhood

    Different types of SOM neighborhoods.

    When updating nodes in a SOM, the amount of vector movement is determined by the neighborhood function.

    • SomBubble - each node within the current radius is updated with a weight of one. Others are not updated.

    • SomGaussian - the neighbors are weighted according to a Gaussian function that decreases with distance.

    • SomCutGaussian - the neighbors are weighted according to a Gaussian function that decreases with distance, if they fall within the radius. This is practically a combination of the two other modes.

  • enum SomRateFunction

    SOM learning rate functions.

    • SomLinearAlpha - learning rate decreases linearly

    • SomInverseAlpha - learning rate is inversely proportional to training interation index

  • enum SomTopology

    Different topologies types for the arrangement of neighboring nodes in a SOM.

    • SomHexagonal - with each node, six closest neighbors have a distance of one

    • SomSquare - four neighbors have a distance of one

    The following picture illustrates the arrangement of neighbors with different topologies. With hexagonal arrangement, distance to the six closest neighbors is one. With squares, the corners have a distance of sqrt(2).
          ___        ___ ___ ___
      ___/   \___   |   |   |   |
     /   \___/   \  |___|___|___|
     \___/   \___/  |   |   |   |
     /   \___/   \  |___|___|___|
     \___/   \___/  |   |   |   |
         \___/      |___|___|___|
    

Function details

  • template<class FeatureIterator, class ConstFeatureIterator>

    void adaptVector

    (
    • FeatureIterator code
    • ConstFeatureIterator sample
    • int length
    • double alpha
    )

    #include <PiiClassification.h>

    Adapt a code vector towards sample with the given strength alpha.

    The code vector will be modified in place. The function will calculate the weighted average of code vector C and sample S as .

    Parameters
    code

    an iterator to the beginning of the code vector.

    sample

    an iterator to the beginning of the vector to tune the code vector towards.

    length
    alpha

    the strength of the tuning. 0 means no change, 1 means that code vector will be replaced with sample.

  • template<class SampleSet, class DistanceMeasure>

    PiiMatrix< double > calculateDistanceMatrix

    (
    • const SampleSet & samples
    • const DistanceMeasure & measure
    • bool symmetric = true
    • bool calculateDiagonal = false
    )

    #include <PiiClassification.h>

    Generate a distance matrix.

    Let us denote the number of vectors (rows in vectors) by N. The size of the distance matrix is N-by-N, and each element (r,c) stores the distance between vector r and vector c, calculated with measure(vectors[r],vectors[c]). Note that since distance measures and kernels share the same interface, this function can be used to calculate a kernel matrix as well.

    Parameters
    samples
    measure
    symmetric

    if true, the upper triangle will be filled by copying the lower triangle.

    calculateDiagonal

    if true, each vector's distance to itself will also be calculated.

    Returns

    the pairwise distances between input vectors as a matrix.

  • double PII_CLASSIFICATION_EXPORT calculateError

    (
    • const QVector< double > & knownLabels
    • const QVector< double > & hypothesis
    • const QVector< double > & weights = < double >()
    )

    #include <PiiClassification.h>

    Calculates classification error.

    This function returns the ratio of misclassified samples.

    Parameters
    knownLabels

    the ground truth. N labels.

    hypothesis

    the classification result. N labels. If a hypothesis is NaN, it will be ignored.

    weights

    a weight for each sample. The weights should sum up to one. Can be omitted.

    Returns

    the (weighted) classification error, in [0,1].

  • QList< QPair< double, int > > PII_CLASSIFICATION_EXPORT countLabels

    ( )

    #include <PiiClassification.h>

    Counts the number of distinct labels in labels.

    Returns the found labels as a list of pairs storing the class label (pair.first) and the number of entries (pair.second).

     QVector<double> labels = QVector<double>() << 0 << 1 << 2 << 1 << 4 << 0;
     QList<QPair<double,int> > counts = PiiClassification::countLabels(labels);
     // counts = ((0.0, 2), (1.0, 2), (2.0, 1), (4.0, 1))
    

    The label list may not contain NANs.

  • QVector< int > PII_CLASSIFICATION_EXPORT countLabelsInt

    ( )

    #include <PiiClassification.h>

    Counts the number of distinct integer labels in labels.

    This function ignores the decimal part of the class labels. The nth element in the returned list contains the number of labels whose value (truncated to an integer) equals n. All negative labels will be collected to the zero bin in the returned histogram.

     QVector<double> labels = QVector<double>() << 0.9 << 1.1 << 2.5 << 1.3 << 4.05 << 0.01;
     QVector<int> counts = PiiClassification::countLabelsInt(labels);
     // counts = (2, 2, 1, 0, 1)
    

    The label list may not contain NANs.

  • PiiMatrix< int > PII_CLASSIFICATION_EXPORT createConfusionMatrix

    (
    • const QVector< double > & knownLabels
    • const QVector< double > & hypothesis
    )

    #include <PiiClassification.h>

    Create a confusion matrix.

    Parameters
    knownLabels

    the ground truth. N known class indices for the samples.

    hypothesis

    N class labels produced by a classifier. If any of the hypotheses is -1, an extra "discard" class will be added as the last column of the returned matrix.

    Returns

    a matrix in which row indices correspond to the ground truth and column indices to the hypotheses. The values are hit counts. For example, if the value at (1,2) is 9, nine samples of class one were incorrectly classified to class two.

  • void PII_CLASSIFICATION_EXPORT createDartBoard

    (
    • int samples1
    • int samples2
    • PiiMatrix< double > & samples
    • QVector< double > & labels
    )

    #include <PiiClassification.h>

    Creates a non-linearly separable binary sample set so that the samples in class one are surrounded by those in the other one.

    In the picture below, samples1 = samples2 = 200. Samples with label 0 are shown in red, and the samples with label 1 in blue.

    Parameters
    samples1

    the number of samples in the set at the center.

    samples2

    the number of samples in the surrounding set.

    samples

    this matrix will be filled with the two-dimensional feature vectors. The first samples1 rows will represent class 0 and the rest class 1. The size of the matrix will be samples1 + samples2 by 2.

    labels

    this vector will be filled with the corresponding class labels (0 for the samples1 entries, 1 for the rest).

  • void PII_CLASSIFICATION_EXPORT createDoubleSpiral

    (
    • int samplesPerSet
    • double rounds
    • PiiMatrix< double > & samples
    • QVector< double > & labels
    )

    #include <PiiClassification.h>

    Creates a non-linearly separable sample set in which two classes spiral around each other on a plane.

    In the picture below, samplesPerSet = 2000 and rounds = 3.0. Samples with label 0 are shown in red, and the samples with label 1 in blue.

    Parameters
    samplesPerSet

    the number of samples in each of the two sets

    rounds

    the number of times the spirals will turn around the origin.

    samples

    this matrix will be filled with the two-dimensional feature vectors. The first samplesPerSet rows will represent class 0 and the rest class 1. The size of the matrix will be 2 * samplesPerSet by 2.

    labels

    this vector will be filled with the corresponding class labels (0 for the first half, 1 for the rest).

  • template<class SampleSet>

    SampleSet createRandomSampleSet

    (
    • int samples
    • int features
    • double minimum
    • double maximum
    )

    #include <PiiClassification.h>

    Create a random sample set.

    Each element in the returned sample set is a random number uniformly distributed in the range [minimum, maximum].

     using namespace PiiClassification;
     PiiMatrix<double> matSamples = createRandomSampleSet<PiiMatrix<double> >(10, 16, -1, 1);
    
    Parameters
    samples

    the number of samples to create

    features

    the number of columns (i.e. the length of the feature vector)

    minimum

    smallest possible feature value

    maximum

    largest possible feature value

  • template<class T, class DistanceMeasure>

    PiiMatrix< int > fillMissingLabels

    ( )

    #include <PiiClassification.h>

    Go through the row matrix labels and replace each -1 with the label of the closest code vector in codeBook.

    Parameters
    labels

    labels for the vectors in codeBook. Labels with no associated code vector will not be changed. The label matrix may be either a column or a row matrix.

    codeBook

    code vectors. The number of rows in this matrix should be greater than or equal to the number of columns in labels.

    measure

    the measure used for distance estimation.

    Returns

    the new labels

  • template<class SampleSet, class DistanceMeasure>

    int findClosestMatch

    (
    • typename PiiSampleSet::Traits< SampleSet >::ConstFeatureIterator sample
    • const SampleSet & modelSet
    • const DistanceMeasure & measure
    • double * distance = 0
    )

    #include <PiiClassification.h>

    Find the closest match for sample in modelSet.

     PiiSquaredGeometricDistance<const float*> dist;
     PiiMatrix<float> matSamples(50,2); // each row is a feature vector
     PiiMatrix<float> matObserved(1,2); // observed sample
     int iMatch = PiiClassification::findClosestMatch(matObserved[0],
                                                      d->matFeatures,
                                                      dist);
    
    Parameters
    sample

    an iterator to the beginning of feature data. Must be valid through modelSet.featureCount() elements.

    modelSet

    the model samples to compare featureVector against.

    measure

    the distance measure used to calculate the difference between sample and each model.

    distance

    an optional output-value parameter that will store the distance to the closest code book vector.

    Returns

    the index of the closest code model sample, or -1 if modelSet is empty.

  • template<class SampleSet, class DistanceMeasure>

    MatchList findClosestMatches

    (
    • typename PiiSampleSet::Traits< SampleSet >::ConstFeatureIterator sample
    • const SampleSet & modelSet
    • const DistanceMeasure & measure
    • int n
    )

    #include <PiiClassification.h>

    Find the n closest matches for sample in modelSet.

     PiiSquaredGeometricDistance<const float*> dist;
     PiiMatrix<float> matSamples(50,2); // each row is a feature vector
     PiiMatrix<float> matObserved(1,2); // observed sample
     PiiClassification::MatchList lstMatces =
       PiiClassification::findClosestMatches(matObserved[0],
                                             d->matFeatures,
                                             dist,
                                             5);
    
    Parameters
    sample

    an iterator to the beginning of feature data. Must be valid through modelSet.featureCount() elements.

    modelSet

    the model samples to compare featureVector against.

    measure

    the distance measure used to calculate the difference between sample and each model.

    n

    the number of closest matches to return. Each element in the returned list contains the distance to a model sample and its index in the sample set. The list is sorted in ascending order according to the distance. The length of the list is the minimum of n and the number of samples in modelSet.

  • template<class SampleSet, class DistanceMeasure>

    SampleSet kMeans

    (
    • const SampleSet & samples
    • unsigned int k
    • const DistanceMeasure & measure
    • unsigned int maxIterations = 0
    )

    #include <PiiClassification.h>

    K-means clustering algorithm.

    The k-means algorithm is an algorithm to cluster n objects based on attributes into k partitions, k < n. It is similar to the expectation-maximization algorithm for mixtures of Gaussians in that they both attempt to find the centers of natural clusters in the data. It assumes that the object attributes form a vector space. The objective it tries to achieve is to minimize total intra-cluster variance, or, the squared error function

    where there are k clusters , , and is the centroid or mean point of all the points . This implementation uses an iterative refinement heuristic known as Lloyd's algorithm to solve the optimization problem.

    Parameters
    samples

    a set of feature vectors to run the algorithm on. Each row of this matrix represents a feature vector. The number of samples must be greater than k.

    k

    the number of centroids

    measure

    a measure used to calculate the distance between samples and centroids.

    maxIterations

    if this value is non-positive, the algorithm will be run until convergence. If you want to quit earlier, set this to a positive value

    Returns

    the centroids

  • template<class SampleSet, class DistanceMeasure>

    double knnClassify

    (
    • typename PiiSampleSet::Traits< SampleSet >::ConstFeatureIterator sample
    • const SampleSet & modelSet
    • const QVector< double > & labels
    • const DistanceMeasure & measure
    • int k
    • double * distance = 0
    • int * closestIndex = 0
    )

    #include <PiiClassification.h>

    Classify a sample using the k nearest neighbors rule.

    This function compares sample to each model in modelSet, to find the k closest ones. Then, it uses labels to find out the class label that has the most occurrences within the k closest models. In the case of a tie, the class with the closest neighbor wins.

    Parameters
    sample

    an iterator to the beginning of feature data. Must be valid through modelSet.featureCount() elements.

    modelSet

    the model samples to compare featureVector against.

    labels

    a label for each sample in modelSet. The length of this list must match the number of samples in modelSet.

    measure

    the distance measure used to calculate the difference between sample and each model.

    k

    the number of nearest neighbors to consider.

    distance

    an optional output value that, if non-zero, will store the distance to the closest sample representing the winning class.

    closestIndex

    an optional output value that, if non-zero, will store the index of the closest model sample of the winning class. Note the closest sample in the winning class may not be the closest of all samples.

    Returns

    the class label with the most representatives among the k nearest neighbors of sample.

  • Q_DECLARE_FLAGS

    ( )

    #include <PiiClassificationGlobal.h>

  • Q_DECLARE_OPERATORS_FOR_FLAGS

    (
    • LearnerCapabilities
    )

    #include <PiiClassificationGlobal.h>

  • PII_CLASSIFICATION_EXPORT double somHexagonalDistance

    (
    • int bx
    • int by
    • int tx
    • int ty
    )

    #include <PiiSom.h>

    Calculate the squared distance between two nodes in a SOM with a hexagonal topology.

  • PII_CLASSIFICATION_EXPORT double somSquareDistance

    (
    • int bx
    • int by
    • int tx
    • int ty
    )

    #include <PiiSom.h>

    Calculate the squared distance between two nodes in a SOM with a square topology.

Notes (0)

Add a note

Not a single note added yet. Be the first, add yours.