| Who | When |
Messages | |
|
|
|
|
|
| Paul Prueitt
|
5
|
 |
|
01-13-2002 07:03 PM ET (US)
|
|
|
| Gyozo Gidofalvi
|
4
|
 |
|
05-09-2001 11:31 AM ET (US)
|
|
I agree with the previous postings about the overly complicated language and words used in the paper. However I found that the paper was well designed and achieved to show its well-defined goal (the superiority of PLSA method over several existing methods, which can be mainly contributed to the solid statistical foundations that PLSA is based on).
Even though the ability of tempered EM to avoid over-fitting was really appealing, I somehow felt that the control parameter Beta unnecessarily increased complexity. I would have really liked to see an entry in the table, which compared the different variants, for a version that used standard EM for the maximization of the "predictive power of the model."
I really liked figure 2, which nicely demonstrated the dynamics of the model, showing both the posterior and mixture probabilities for a sample query.
Finally, although not all the variants of PLSA are crystal clear to me, but the results reported suggest that the method presented is clearly superior both in terms of precision and recall.
|
| Dave Kauchak
|
3
|
 |
|
05-09-2001 02:26 AM ET (US)
|
|
I think my beggest qualm with this paper are similar to what others have mentioned, the language. The paper is written in a way that is extremely difficult to read. To start with, the author tends to use a variety of words like "ergonomic" and "ambivalence" in a context that I do not find appropriate. These words lead to ambiguities and confusion and, in my opinion, should not be used in a technical paper unless totally necessary.
The paper also uses terms and phrases from a wide variety of papers and introduces new phrases themselves (Melanie mentions a few of these great phrases). This is not necessarily bad, but often the paper introduces these concepts without a detailed explanation. Along these same lines, the paper could have been made much more readable by rephrasing some of these concepts or simply omitting them.
If you can get past the difficulty of the language, however, some of the ideas are interesting. I found the results presented in Table 1 based on the TDT-1 examples quite interesting. The results seem to clearly show the two different usages. I found it interesting that the system could distinguish between somewhat similar concepts, as in the flight example. I was suprised, however, that in this experiment they did not do word stemming. I wonder why they did not choose to stem the words and what effect this had.
|
| Melanie Dumas
|
2
|
 |
|
05-09-2001 12:46 AM ET (US)
|
|
I really like the notion of "word perplexity reduction". That phrase cracks me up, particularly since the mathematical foundation upon which it is built is sound. :) In fact, the maximum likelihood measurement seems to be a modified version of the term-frequency inverse document frequency (TF-IDF) measure. The author repeatedly uses new names for established concepts (such as "Folding In" = "holdout set", "Vector space model" = "context vectors"), making the paper difficult to read.
However, I like the novelty of the approach: minimizing perplexity for automated indexing. Yahoo is still primarily indexed by hand, and this provides a possible technique to streamline that entire process.
|
| Jonathan Ultis
|
1
|
 |
|
05-08-2001 01:33 AM ET (US)
|
|
Let me offer my humble apologies to everyone for making you read this paper. It's very difficult to parse even though the ideas are good. The pictures and charts are also pretty poor.
Basically, this paper deals with using a particular type of Bayesian network and a version of the EM algorithm to cluster terms that occur together frequently. I'm not going to talk about the details of the EM algorithm that the author uses, so don't worry about trying to figure that out in too much detail.
Similarly, don't worry about Figure 4. It is meant to show the benefit of annealed EM in preventing overfitting, but it isn't explained well at all in this paper.
Good luck with it.
|