I enjoyed this paper. However, the unsupervised categorization would have been more compelling for me if they'd used a dataset that wasn't Caltech 101. Reason? Caltech includes exemplar object types posed with neutral backgrounds. This would encourage a strong prior on those objects' edges, essentially solving the figure-background problem for the whole class of objects. By posing the object against a neutral background, a photographer is "segmenting" the foreground for the learning algorithm; isn't there still a pseudo-labeling process going on here? The same could be said of the objects' common sizes and positions. My point I guess is that they're actually solving a much more restricted learning problem than their super-flexible, zillion-parameter generative model might lead us to believe. Edited 04-09-2006 07:25 PM
|