Can anyone make it clear why the ``label bias problem'' is a
problem? The authors' explanation is lost on me: ``the
transitions leaving a given state compete only against each other
[sic], rather than against all other transitions in the model.''
Why is this a problem? In a given state, only transitions leaving
that state can be taken, right?
I hoped the example in figure 1 would help, but it just seems a
useless model for distinguishing between ``rib'' and ``rob'': it
forces a commitment to one or the other on the first letter, when
the two words simply can't be disambiguated. Why use such a
model? What is the point here? What is this ``score mass'' they
invoke? (Anyone read Bottou 1991?)