QuickTopic (SM) free message boards QuickTopic (SM) free message boards
Skip to Messages
  Sign In to access your topic list  |New Topic |My Topics|Profile
Upgrade to Pro   Customize, show pictures, add an intro, and more:   QuickTopic Pro...and check out QuickThreadSM
Topic: Probabilistic latent semantic indexing
Views: 736, Unique: 428 
Subscribers: 0
What's
this?
Printer-Friendly Page
Subscribe to get & post, or stop messages by email Subscribe
About these ads
Who | When
Messagessort recent-bottom   
Post a new message
 
Paul Prueitt  5
01-13-2002 07:03 PM ET (US)
Edited by author 01-13-2002 07:03 PM
Regarding full text concept "extraction::

Might you review briefly the following short paper

http://www.ontologystream.com/aSLIP/files/verbMaps.htm


A short overview of the fundamental theory is at:

http://www.ontologystream.com/aSLIP/files/stratification.htm


And contact me for additional discussion.


Thank you

Dr. Paul Prueitt
Founder BCNGroup (1997)
bcngroup@erols.com
Gyozo Gidofalvi  4
05-09-2001 11:31 AM ET (US)
I agree with the previous postings about the overly complicated language and words used in the paper. However I found that the paper was well designed and achieved to show its well-defined goal (the superiority of PLSA method over several existing methods, which can be mainly contributed to the solid statistical foundations that PLSA is based on).

Even though the ability of tempered EM to avoid over-fitting was really appealing, I somehow felt that the control parameter Beta unnecessarily increased complexity. I would have really liked to see an entry in the table, which compared the different variants, for a version that used standard EM for the maximization of the "predictive power of the model."

I really liked figure 2, which nicely demonstrated the dynamics of the model, showing both the posterior and mixture probabilities for a sample query.

Finally, although not all the variants of PLSA are crystal clear to me, but the results reported suggest that the method presented is clearly superior both in terms of precision and recall.
Dave Kauchak  3
05-09-2001 02:26 AM ET (US)
I think my beggest qualm with this paper are similar to what others have mentioned, the language. The paper is written in a way that is extremely difficult to read. To start with, the author tends to use a variety of words like "ergonomic" and "ambivalence" in a context that I do not find appropriate. These words lead to ambiguities and confusion and, in my opinion, should not be used in a technical paper unless totally necessary.

The paper also uses terms and phrases from a wide variety of papers and introduces new phrases themselves (Melanie mentions a few of these great phrases). This is not necessarily bad, but often the paper introduces these concepts without a detailed explanation. Along these same lines, the paper could have been made much more readable by rephrasing some of these concepts or simply omitting them.

If you can get past the difficulty of the language, however, some of the ideas are interesting. I found the results presented in Table 1 based on the TDT-1 examples quite interesting. The results seem to clearly show the two different usages. I found it interesting that the system could distinguish between somewhat similar concepts, as in the flight example. I was suprised, however, that in this experiment they did not do word stemming. I wonder why they did not choose to stem the words and what effect this had.
Melanie Dumas  2
05-09-2001 12:46 AM ET (US)
I really like the notion of "word perplexity reduction". That phrase cracks me up, particularly since the mathematical foundation upon which it is built is sound. :)
In fact, the maximum likelihood measurement seems to be a modified version of the term-frequency inverse document frequency (TF-IDF) measure. The author repeatedly uses new names for established concepts (such as "Folding In" = "holdout set", "Vector space model" = "context vectors"), making the paper difficult to read.

However, I like the novelty of the approach: minimizing perplexity for automated indexing. Yahoo is still primarily indexed by hand, and this provides a possible technique to streamline that entire process.
Jonathan Ultis  1
05-08-2001 01:33 AM ET (US)
Let me offer my humble apologies to everyone for making you read this paper. It's very difficult to parse even though the ideas are good. The pictures and charts are also pretty poor.

Basically, this paper deals with using a particular type of Bayesian network and a version of the EM algorithm to cluster terms that occur together frequently. I'm not going to talk about the details of the EM algorithm that the author uses, so don't worry about trying to figure that out in too much detail.

Similarly, don't worry about Figure 4. It is meant to show the benefit of annealed EM in preventing overfitting, but it isn't explained well at all in this paper.

Good luck with it.
RSS link What's this?
QuickTopicSM message boards
Over 200,000 topics served
Learn more Frequently asked questions  Acknowledgements
What they're saying about QuickTopic
 Questions, comments, or suggestions? Contact Us
Read our use policy before beginning. We value your privacy; please read our privacy statement.
Copyright ©1999-2008 Internicity Inc. All rights reserved.