This is the code that I use for my research projects.
Where can I get it?
pip install milk
if you use these tools.
Here is how to test how well you can classify some features,labels data, measured by cross-validation:
import numpy as np import milk features = np.random.rand(100,10) # 2d array of features: 100 examples of 10 features each labels = np.zeros(100) features[50:] += .5 labels[50:] = 1 confusion_matrix, names = milk.nfoldcrossvalidation(features, labels) print 'Accuracy:', confusion_matrix.trace()/float(confusion_matrix.sum())
If want to use a classifier, you instanciate a learner object and call its train() method:
import numpy as np import milk features = np.random.rand(100,10) labels = np.zeros(100) features[50:] += .5 labels[50:] = 1 learner = milk.defaultclassifier() model = learner.train(features, labels) # Now you can use the model on new examples: example = np.random.rand(10) print model.apply(example) example2 = np.random.rand(10) example2 += .5 print model.apply(example2)
API Documentation: http://packages.python.org/milk/
Mailing list: http://groups.google.com/group/milk-users
- Pythonic interface to libSVM. I took their code and implemented a wrapper instead of using auto-generated ones.
- Stepwise Discriminant Analysis for feature selection.
- K-means clustering. A simple implementation but it works well with very large datasets.
- Models can be pickle()d and unpickle()d.