QuickTopic (SM) free message boards QuickTopic (SM) free message boards
Skip to Messages
  Sign In to access your topic list  |New Topic |My Topics|Profile
Upgrade to Pro   Customize, show pictures, add an intro, and more:   QuickTopic Pro...and check out QuickThreadSM
Topic: Document clustering using word clusters via the information bottleneck
Views: 612, Unique: 434 
Subscribers: 1
What's
this?
Printer-Friendly Page
Subscribe to get & post, or stop messages by email Subscribe
All messages    << 3-8  2-2 of 8  1-1 >>
About these ads
Who | When
Messagessort recent-top   
Post a new message
 
Kristin BransonPerson was signed in when posted  2
05-14-2002 02:29 PM ET (US)
I agree that this paper presented an interesting idea. The information bottleneck method seems very reasonable. My biggest qualm with the paper is the lack of argument for double clustering. I guess the reason I see for it is that the high dimensionality of the data makes the data noisy. I don't understand why, when doing the word clustering, we would want to maximize the mutual information about the documents. It seems to me that an obvious choice would be to maximize the information about the words themselves, since after this step a lot of information about the words is lost, but we do not throw away the information about the documents. Perhaps something like PCA on the words could replace the word clustering?

As there was not really much emphasis on actual reasons the double IB clustering algorithm is better than other algorithms, there is the possibility that the other algorithms tested in the experimental section might perform better than double IB clustering in some test beds. The authors did do a thorough job of comparing many algorithms and presenting their results. Their choice of testbed seemed a little strange to me, since it seems that these unsupervised methods might not be aimed at text classification (look at the footnote of how much better NB did at the text classification task examined). However, the authors seemed to have thought about this problem a lot and made a valiant effort to explain their choices.

In terms of writing style, this paper could have benefited from more structure. The paper seems to dive right into the details of the algorithm before giving a more general overview. For example, the introduction goes into extreme detail about the algorithm and experiments without giving more of an overview. There seem to be only two levels in this paper -- the most general level that doesn't give much information, and the detailed level, which was a little frustrating to me because I couldn't tell, at the start, where the description was going. However, the paper managed to get its ideas through in the end.
RSS link What's this?
All messages    << 3-8  2-2 of 8  1-1 >>
QuickTopicSM message boards
Over 200,000 topics served
Learn more Frequently asked questions  Acknowledgements
What they're saying about QuickTopic
 Questions, comments, or suggestions? Contact Us
Read our use policy before beginning. We value your privacy; please read our privacy statement.
Copyright ©1999-2008 Internicity Inc. All rights reserved.