To reply to Eric's question.
There is a distinction that has to be made between the two tasks of "clustering" and "perceptual grouping" or image segmentation.
Clustering by itself is just the task of detecting signification changes in a dataset and cutting it into pieces which are coherent within themselves and have a significant difference between them. The operation of clustering by itself does not provide for any sort of perceptually meaningful output.
The task of perceptual grouping (image segmentation) is much harder than clustering. Infact it consists of two tasks. One which involves defining measures of similarity between two parts of a scene and two actually using this information to divide the image scene into pieces.
So you are right when you say that all clustering presupposes a good affinity matrix, but then that is not all that surprising since, the clustering algorithm only interprets what you give it. Hence, garbage in garbage out holds for image segmentation also.
The task of objectively evaluating segmentation has seen some recent activity and is the focus of david martin's research at berkeley. He has collected a benchmark database of human segmentations of a collection of images and has defined metrics for comparing an arbitrary segmentation to these benchmark segmentations. You can find more information at
http://www.cs.berkeley.edu/~dmartin/research.htmlThe question of choosing cues or features which will result in good affinity schemes is a topic of a fair amount of research in psychology and vision science.
Some of the seminal work on this was done by the members of the gestalt psychology movement.
An excellent reference on this topic is the Vision Science book by Stephen Palmer.
There is finally the issue of choosing a distance metric to combine the various features to get a single scalar that represents the affinity between two points in the scene. An interesting study on this topic is
http://www-dbv.informatik.uni-bonn.de/abst...puzicha.iccv99.html