Serge Belongie
|
7
|
 |
|
09-27-2001 02:45 AM ET (US)
|
|
Regarding Andrew's question, Figure 2 is hard to interpret because only the contours are shown and not the contour values. In the final version of the article (currently linked from the course homepage), they clarify in the text that in all three nonlinear cases, the first principal component varies monotonically along the parabola. The way to think of it then is to consider what values each point in the data set picks off in the different principal components. These values the points take on are the "coordinates" in the feature space. In the linear case, these coordinates merely replicate the original 2D coordinates exactly. In the quadratic case, for example, the first two coordinates in that feature space are very similar for points that are nearby one another along the curve: since there are no contour lines parallel to the underlying parabola, there is no axis along which the noise points can distinguish themselves. This doesn't happen until the 3rd component. So denoising happens when the 3rd component is discarded in the sense that the first 2 coordinates are basically ignorant of variation perpendicular to the curve but vary smoothly along the curve. This is clearer in the toy example with 3 clusters.
|