| Robin Hewitt
|
9
|
 |
|
10-12-2004 02:56 PM ET (US)
|
|
Sanjeev,
This general topic is an interesting one. The wards-error term is a compactness measure, equivalent to moment of inertia in physical objects. There are some assumptions implicit in this - one being that feature dimensions are orthogonal, something that's often not true in practice.
You can, however, use other error measures, including non-metric ones, with many clustering methods. For example, you can use Shannon entropy within clusters as an error measure. What you want in this case is to minimize surprisal within your clusters. Although non-euclidean, this error measure will work with Wards method, since all you need is a way to measure the increase in total error with each agglomeration step. Like moment of inertia, information-content is a multiple-linkage similarity measure, so it tends to encourage compact (rather than stringy) clusters.
- Robin
|