Illustration of cross validation modes

This is a companion to the paper

Luis Pedro Coelho, Joshua D. Kangas, Armaghan Naik, Elvira Osuna-Highley, Estelle Glory-Afshar, Margaret Fuhrman, Ramanuja Simha, Peter B. Berget, Jonathan W. Jarvik, and Robert F. Murphy, Determining the subcellular location of new proteins from microscope images using local features in Bioinformatics, 2013 [DOI]

The paper concerns the automatic determination of subcellular location from images. To evaluate such a system, one uses cross validation, repeatedly splitting the data into training and testing subsets.

The animation below illustrates the difference between cross validation "per image" and "per protein". This is only relevant when you have multiple proteins (or other markers) for each location of interest and multiple images for each marker.

Consider a simple example of trying to distinguish Nuclear from Golgi patterns. It is easier to demonstrate in this binary setting, but the same logic applies if we had more locations.

Start animation