| Melanie Dumas
|
2
|
 |
|
05-09-2001 12:46 AM ET (US)
|
|
I really like the notion of "word perplexity reduction". That phrase cracks me up, particularly since the mathematical foundation upon which it is built is sound. :) In fact, the maximum likelihood measurement seems to be a modified version of the term-frequency inverse document frequency (TF-IDF) measure. The author repeatedly uses new names for established concepts (such as "Folding In" = "holdout set", "Vector space model" = "context vectors"), making the paper difficult to read.
However, I like the novelty of the approach: minimizing perplexity for automated indexing. Yahoo is still primarily indexed by hand, and this provides a possible technique to streamline that entire process.
|