Luis Pedro Coelho (EMBL) — Main developer of mahotas
Numpy provides basic data types (arrays, matrices).
Packages provide intelligence.
Original goal:
to support the Murphy Lab pipeline with Python.
Has since grown.
I'm going to use a
subcellular determination problem as an example.
from glob import glob
import numpy as np
import mahotas as mh
images = glob('nuclear/*dna.tiff') + glob('cytoplasmic/*dna.tiff')
labels = []
features = []
for im in images:
protein = mh.imread(im.replace('dna', 'protein'))
features.append(mh.features.haralick(protein).mean(0))
labels.append('nuclear' in im)
features = np.array(features)
labels = np.array(labels)
from sklearn import cross_validation
from sklearn.linear_model.logistic import LogisticRegression
scores = cross_validation.cross_val_score(
LogisticRegression(), features, labels, cv=5)
print("Logistic regression accuracy: {:%}".format(scores.mean()))
import milk
cmat,_ = milk.nfoldcrossvalidation(features, labels)
acc = cmat.trace()/float(cmat.sum())
print("SVM accuracy: {:%}".format(acc))
This presentation is available at http://bit.ly/eubias-mahotas