| David Thompson
|
4
|
 |
|
04-26-2006 09:54 AM ET (US)
|
|
This sort of "object reconstruction," where you define some physical constraints and then solve the inverse problem to get 3D objects out of their image projections, seems like a nice idea that is extremely difficult to do in practice.
In particular, how to arrive at appropriate costs for each of the specialist's operations? Adelson & Pentland use arbitrary prices that conveniently prove their point but costs in the physical world are elusive. Moreover they are probably very context-dependent. Outdoors, fancy lighting arrangments are far more unlikely than inside. Reflectance variations are more likely than structural protrusions when you're looking at a newspaper, and so on.
But if local and scene context is so important and so tricky, then much of the "object recognition work" is being done by whatever learning method you use to fix the costs and not by the optimization over lighting, reflectance and structure configurations.
A great rebuttal to this paper is the Cavanagh papers at the beginning of class, which convincingly argued that object recognition doesn't follow the process of reconstructing a model according to physical laws. Instead a few simple, canonical views and structure cues compete to explain the scene. It may be more computationally efficient to use physics as a tie-breaker between a few competing alternatives rather than rely on solving the enormous underconstrained inverse projection problem.
|