| Degui Zhi
|
6
|
 |
|
05-09-2002 05:17 PM ET (US)
|
|
Some comments on experiments: I think the idea of testing models using HMM simulation is nice (maybe it is the routine in this field, however, the authors tactically didn't do this in the last paper, since they were comparing MEMM vs. HMM)
In section 5.1, I am suprised to see MEMM can have a error 42%. According to its structure, MEMM can only learn the frequency of each words as transition probabilities from state 0. How come it can be better than random guess? It must be correlation between word frequencies of the training set and the test set. Also in section 5.1, CRF have error rate 4.6%. It seems this result is close to Bayes error BigO(1/32).
Why CRF can be better? Intuitively, I think CRF removed the direction arrows in the graph so that the belief message can be transmitted in both direction during inference. Seeing a "i" as second symbol, the model may be able to back propagate this message to "reconsider" the decision made when seeing previous symbol. Anyway, this is only intuition. I fail to understand many of the derivations so I cannot check the mathematical proof. Hope I could follow the derivation during the talk.
Generally, I think this paper is better than the previous one in both theory and experiment sections, though I still share the same worry as Dave:" What exactly is it about generative(or hidden) models that makes it difficult to deal with non-independent features? "
|