Previous topic

distance module

Next topic

pyadb module

This Page

classifier module

class bregman.classifier.Classifier(num_classes, max_iter=200, error_thresh=0.001, dist_fun='Euc')
Base class for supervised and unsupervised classifiers
class bregman.classifier.GaussianMulti(num_classes, max_iter=200, error_thresh=0.001, dist_fun='Euc')
Supervised classification using a multivariate Gaussian model per class.
Also known as a quadratic classifier (Therien 1989).

gm = GaussianMulti(num_classes, max_iter, error_thresh, dist_fun)
    num_classes - number of clusters to estimate (required)
    max_iter - maximum number of iterations for training [200]
    error_thresh - threshold for sum-square-differences of old/new means [.001]
    dist_fun - future parameter, to allow alternate metrics [bregman.distance.euc]
    returns a new GaussianMulti instance

training:    
  assigns = train(X, labels)
   X - numpy ndarray observation matrix, n x d, n observations, d dimensions
   labels - per row labels for data in X, must be same length as rows of X

after training:
  assigns = classify(X)
   X - numpy ndarray observation matrix, n x d, n observations, d dimensions
   returns labels for class assignments to rows in X

self.M - the trained means
self.C - the trained covariances
classify(data, labels=None)
labels = myGM.classify(data)
classify_range(data, upper_bounds)
Classify data in ranges with given upper_bounds.
The algorithm is a majority vote algorithm among the classes.

Returns:
    a - assignments per upper_bound region
    c - counts of assignments per class            
evaluate(data, labels)
Estimate predicted labels from data, compare with True labels.
 Returns:
     a - accuracy as a proportion: 0.0 - 1.0
evaluate_range(data, true_labels, upper_bounds)
Perform assignment aggregation within data ranges by majority vote.
The maximum count among the K classes wins per range.
In case of a tie, randomly select among tied classes.

Returns:
    a - accuracy as a proportion: 0.0 - 1.0
    e - vector of True/False per range
train(data, labels=None, reset=True)
myGM.train(data, labels)
   Supervised classification for each unique label in labels using data.
   Employs a multivariate Gaussian model per class.
   self.M - per-class Gaussian means
   self.C - per-class Gaussian covariance matrices
bregman.classifier.GaussianPDF(data, m, C)
Gaussian PDF lookup for row-wise data
data - n-dimensional observation matrix (or vector)
m    - Gaussian mean vector
C    - Gaussian covariance matrix
class bregman.classifier.KMeans(num_classes, max_iter=200, error_thresh=0.001, dist_fun='Euc')
Unsupervised classification using k-means and random initialization

km = KMeans(num_classes, max_iter, error_thresh, dist_fun)
    num_classes - number of clusters to estimate
    max_iter - maximum number of iterations for training
    error_thresh - threshold for sum-square-differences of old/new means
    dist_fun - future parameter, to allow alternate metrics
    returns a new KMeans instance

training:    
  assigns = train(X)
   X - numpy ndarray observation matrix, n x d, n observations, d dimensions

after training:
  assigns = classify(X)
   X - numpy ndarray observation matrix, n x d, n observations, d dimensions

self.M - the trained means
classify(Y, labels=None)
Given a trained classifier, return the assignments to classes for matrix Y.
train(X, labels=None, reset=True)
Train the classifier using the data passed in X. 
X is a row-wise observation matrix with variates in the columns
and observations in the rows.
If reset=True (default) means will be re-initialized from data.
class bregman.classifier.SoftKMeans(num_classes, max_iter=200, error_thresh=0.001, beta=2.0, dist_fun='Euc')
Employ soft kmeans algorithm for unsupervised clustering

David MacKay,"Information Theory, Inference and Learning Algorithms", Cambridge, 2003
Chapter 22

Parameters:
   beta - softness/stiffness [2.0]
bregman.classifier.random()
random() -> x in the interval [0, 1).