My main complaint about this paper is that parts of it are very vague, indescript, and narrative and other parts, while maybe specific, lack critical details and narrative explanation. I'm sure a lot of this has to do with my unfamiliarity with the subject matter, but I think I am often misinterpreting the aim of the authors. Some questions I had while reading through: 1) What is a similarity transform? Image plane translations and rotations are described to be similarity transformations, but aren't these affine transformations? Are affine and similarity transformations disjoint sets? 2) What makes the Belongie et al. [10] approach heuristic? 3) I don't see that the intensity profiles are related by only a scale change  or at least not a constant scale change. I'm thinking of a ladder viewed straight on so that the distance between each rung is constant compared to the ladder viewed upward from close to the bottom rung, where the 2D projected distance between the rungs gets smaller the farther up the ladder you go. The intensity profile perpendicular to the rungs is not just uniformly scaled from one image to the other. 4) What's the covariance matrix C the covariance of? The profile feature vectors? What's the intuition behind equation (1)? Are (1) and (2) voting function that are improvements on the original algorithm, even without the cyclic string matching stuff? How do these two functions compare and which is better? Edited 10032002 01:38 AM
