Kristin Branson
|
8
|
 |
|
05-07-2002 05:59 PM ET (US)
|
|
Edited by author 05-07-2002 06:00 PM
I think that this paper presented an interesting and well-founded alternative to the standard HMM. It allows heuristics to be added to the standard model. I think that the vocabulary feature used in HMMs can also be used in MEMMs, with each feature being a word. MEMMs perhaps may not work ideally with these features because of the huge number of probabilities to learn in MEMMs versus HMMs (in HMMs, you must learn |S|*(|S| + |O|) probabilities, whereas in MEMMs you must learn |S|*|S|*|O| probabilities). I think this is also why MaxEn must be used -- there is not enough training data to accurately predict the transition probabilities, so some assumption about the distribution of the probabilities must be made. This is my guess, anyways. Clarification on this major point would have improved the paper.
HMMs cannot be used to model dependent observations because, as the introduction says, HMMs are not parameterized by the observations. This is because we are keeping track of P(o | s) instead of P(s | o).
My main complaint with this paper is the presentation of using MaxEn as a good thing. The requirement that Max En be used to train this model requires that a big assumption about the data be true. I don't like that Maximum Entropy is in the name of the model, as this attribute of the model is not what distinguishes it from the standard HMM. I am looking forward to today's presentation, as my understanding of Maximum Entropy is not very good, and I think an explanation of MaxEn would help me understand the paper.
|