Edited by author 11-30-2004 12:39 PM
An earlier version of this paper was presented at ECCV 02:
http://l2r.cs.uiuc.edu/~danr/Papers/vrelations.pdfThere's an interesting observation in the earlier paper that's been omitted from this version (at least, I didn't see it). They found that omitting singleton vocabulary items - anything oddball enough that it didn't cluster - degraded the performance of their detector.
This suggests something fundamental about the appropriate model for an object category (such as car, motorbike, face). Looking for averaged prototypes or even for a few elements that are common throughout all examples may not be the best approach. It may be that instead there are some shared elements in one subset and different shared elements in other subsets. For example, if your training set had just one example of a car from the 50s with fins, that car might have few detectable features in common with a contemporary Hyundai or Nissan. But the fins (and maybe some other characteristics) might create features that allow you to detect more cars from that era than if you were to drop out those features just because they're uncommon in your training set.