Into

Modules

Documentation

classPiiSampleBalancer

#include <PiiSampleBalancer.h>

An operation that balances training sets by giving more weight to rare samples.

Inherits PiiDefaultOperation

Description

The weighting is based on the distribution of individual feature values. The balancer works in two modes: ProbabilitySelection and WeightCalculation. In the former mode, the operation either passes feature vectors to the features output or does nothing, based on the estimated weight of the sample. In the latter mode, all features will be passed, and the weight of the sample will be sent to the weight output.

The graph below illustrates sample weighting on one-dimensional Gaussian data. The (normalized) distribution of a feature value is shown in blue. Its inverse (green) is used as a weight. The red curve illustrates the effect of setting emphasis to three.

For multi-dimensional features, PiiSampleBalancer uses marginal distributions, based on the assumption that all features are independent. This is often not the case, but gives a reasonable approximation without huge memory requirements.

Inputs

featuresfeature vector. Each component must be quantized to the number of quantization levels determined by levels.

Outputs

featuresthe features. If mode is ProbabilitySelection, the features will be emitted only if a generated random number is less than weight. The select output will indicate whether the sample was selected or not. In WeightCalculation mode, this output will always pass the incoming features.
weightthe weight of the sample, 0.0-1.0 (double). 0.0 means not selected and 1.0 means definitely selected.
selecta boolean value indicating whether the sample was randomly selected or not. In WeightCalculation mode, this is output will always emit true.

Properties

double

The speed of adaptation to changing conditions.

int

The default number of quantization levels.

int

By default, the operation tries to flatten out the variations in feature distribution.

int

The number of features required for a reliable estimate.

QVariantList

A list of quantization levels for each feature value.

Operation mode.

Public types

enum
{ ProbabilitySelection, WeightCalculation }

Operation modes.

Constructors and destructor

Public member functions

double
virtual void
(
  • bool reset
)

Checks the operation for execution.

int
int
int
QVariantList
( )
( )
void
(
  • double adaptationRatio
)
void
(
  • int defaultLevels
)
void
(
  • int emphasis
)
void
(
  • int learningBatchSize
)
void
(
  • const QVariantList & levels
)
void
( )

Protected member functions

virtual void

Executes one round of processing.

Property details

  • double adaptationRatio

    [read, write]

    The speed of adaptation to changing conditions.

    The operation initially assumes a uniform feature distribution. The estimate of the distribution is updated once every learningBatchSize samples. The adaptation ratio tells how much the new measurements affect the learnt model. 0 means that the initial uniform approximation will never be changed. 1 means that the new estimate will fully replace the old one. The default value is 0.1.

  • int defaultLevels

    [read, write]

    The default number of quantization levels.

    This value is used for all features whose quantization levels have not been explicitly set by levels. The default value is 256.

  • int emphasis

    [read, write]

    By default, the operation tries to flatten out the variations in feature distribution.

    If the common samples need to be given even less weight, emphasis can be set to a larger value. The operation will raise the weight estimate to this power.

  • int learningBatchSize

    [read, write]

    The number of features required for a reliable estimate.

    The estimate is updated every learningBatchSize samples. The default value is 25600 (100 samples / histogram bin).

  • QVariantList levels

    [read, write]

    A list of quantization levels for each feature value.

    For three-dimensional feature vectors, the default can be changed as follows:

     balancer->setProperty("levels", QVariantList() << 128 << 256 << 64);
    

    The minimum number of quantization levels is one.

  • Mode mode

    [read, write]

    Operation mode.

Enumeration details

  • enum Mode

    Operation modes.

    • ProbabilitySelection - pass those feature vectors that are likely to be important with a higher probability than the others.

    • WeightCalculation - pass every incoming vector accompanied with selection probability.

Function details

  • PiiSampleBalancer

    ()
  • ~PiiSampleBalancer

    ()
  • double adaptationRatio

    ()
  • virtual void check

    (
    • bool reset
    )
    [virtual]

    Checks the operation for execution.

    This function creates a suitable flow controller by calling createFlowController(). It then sets the flow controller to the active processor and sets the processor as the input controller for all inputs.

    If you change socket groupings in your overridden implementation, please call PiiDefaultOperation::check() after that. Otherwise, your new groupings will not be in effect.

    Reimplemented from PiiDefaultOperation.

  • int defaultLevels

    ()
  • int emphasis

    ()
  • int learningBatchSize

    ()
  • QVariantList levels

    ()
  • Mode mode

    ()
  • void setAdaptationRatio

    (
    • double adaptationRatio
    )
  • void setDefaultLevels

    (
    • int defaultLevels
    )
  • void setEmphasis

    (
    • int emphasis
    )
  • void setLearningBatchSize

    (
    • int learningBatchSize
    )
  • void setLevels

    (
    • const QVariantList & levels
    )
  • void setMode

    ( )
  • virtual void process

    ()
    [protected, virtual]

    Executes one round of processing.

    This function is invoked by the processor if the necessary preconditions for a new processing round are met. This function does all the necessary calculations to create output objects and sends them to output sockets.

    Calls to process(), syncEvent(), and setProperty() are synchronized and cannot occur simultaneously. PiiDefaultOperation ensures this by locking processLock() for reading before calling process().

    Note: With time-consuming operations, one should occasionally check that the operation hasn't been interrupted, i.e. that state() returns Running.

    Exceptions
    PiiExecutionException

    whenever an unrecoverable error occurs during a processing round, the operation is interrupted, or finishes execution due to end of input data.

    Reimplemented from PiiDefaultOperation.

Notes (0)

Add a note

Not a single note added yet. Be the first, add yours.