top bar
QuickTopic free message boards logo
Skip to Messages

TOPIC:

Information extraction with HMMs and shrinkage

9
ÐÄËéÀ뿪
08-24-2004
02:46 AM ET (US)
thank you.welding blankets,
  tadpole tapes, ceramic
  fiber ropes
, fiberglass
  yarns
ceramic
  fiber cloth
fiberglass
  tapes
fiberglass
  ropes
fiberglass
  fabrics
ceramic
  fiber
welding blankets tadpole
  tapes
ceramic fiber ropes fiberglass
  yarns
ceramic fiber cloth, fiberglass
  tapes
fiberglass ropes fiberglass
  fabrics
ceramic fiber welding blankets,
  tadpole tapes, ceramic fiber ropes, fiberglass yarns, ceramic fiber cloth, fiberglass
  tapes, fiberglass ropes, fiberglass fabrics, ceramic fiber<br>
  aluminum flashlights Aluminium
  alloy flashlight
aluminum flashlights
  Aluminium alloy flashlight flashlights
  flashlights flashlights flashlights aluminum flashlights
  flashlights, aluminum flashlights, aluminium alloy flashlight
welding blankets fiberglass
fabrics
fiberglass ropes fiberglass
tapes
fiberglass yarns ceramic
fiber
ceramic fiber cloth ceramic
fiber ropes
tadpole tapes
8
Dave Kauchak
05-14-2001
06:45 PM ET (US)
Oops... me too!
7
Hector Jasso
05-14-2001
02:31 PM ET (US)
My comments below will make more sense if you consider that I
read the wrong paper! The one I read is "Information Extraction with
HMM Structures Learned by Stochastic Optimization" by the same
authors.

The paper describes how to learn the HMM structures through a hill-
climbing algorithm...
6
Gyozo Gidofalvi
05-14-2001
01:06 PM ET (US)
Although the purpose of the shrinking technique introduced was clear after the first pass through the paper, like Sameer, i had problems undersantding the actuall process of shrinking. Now, it is clear that the task was to learn the appropriate lambda values used in equation (2) for a fixed hierarchy.

I'm also looking forward for the expaination of the set H_j in eqn (3), that was a bit fuzzy.

Although one could deduce the symbols used in the representation of the HMMs (figure 1-2), a single definition for each symbol would have made reading more enjoyable.

Although taking the harmonic mean of the precision and recall values gave a clearer view but this evaluation metric may not be appropriate and commonly used in similar work. Futhermore, i agree with the comment by Greg that the results were not conclusive and did not give a general structure/hierarchy that was performing best across all data sets.

Although i'm not familiar with the literature for HMMs but the comment in the conclusion section about the avalibility of the Viterbi algorithm for HMMs i found quite amusing. I hardly believe that that is a new technique that should be pointed out in anyway.
5
sameer agarwal
05-14-2001
11:10 AM ET (US)
Hello,
Even after reading the paper a number of times, I find large holes in my understanding of what is really going on.
My principal issue is with the greedy search used to construct the model structure. I am not sure if a greedy search is the proper way to construct these models.

Also I do not understand how the "shrinking" works. I understand the smoothing a bit, but the authors talk about the hierarchial bayes.. does it simply mean smoothing based on parent node. or is it more ?


sameer
4
Dave Kauchak
05-14-2001
04:08 AM ET (US)
The paper specifies a couple of restrictive characteristics for some of their models. I would be curious how things would change if these restrictions were weakened or removed. For example, the paper presents a HMMs that have only four types of states and in a specific configuration. Although this seems like an intuitive system/configuration, I would be curious to see the performance of other document models. My intuition is that the model defined is fairly robust, however, some complex documents, fields or domains may require a more complex model.
3
Greg Hamerly
05-14-2001
02:23 AM ET (US)
Hector, the paper describes a learning algorithm for smoothing probability estimates of vocabularies within HMM states. The structures, then, are fixed and not learned. What is learned are the HMM state vocabularies, transition probabilities, and mixture parameters for shrinkage.

While you're right that simple HMM structures may not be good at extracting from very differently-structured sources, it can still learn a large vocabulary from different sources with similarly structured documents.

I hope to make clear some of the confusions tomorrow in my talk. Something I'm still a bit unclear on is what, exactly, is in the holdout set H_j they discuss in the section on EM. More detail would be useful here. However, I believe H_j is simply the words that have been annotated as having label "j" in the training documents (i.e. j = target, non-target, etc.).

Overall I think the method is interesting and fairly clear; what is really needed though is more conclusive empirical results. All the results leave one wondering what is really the best choice for HMM information extraction.
2
Joe Drish
05-14-2001
12:06 AM ET (US)
In general I thought this was a good paper that was well-written. Their results confirm what they were trying to show, which was that shrinkage improves information extraction performance generally. Also, they provide a sufficient level of detail such that if one wanted to replicate their work it would be easy to do so.

Even though they motivate the usage of shrinkage well, one issue that I had is that they mention twice that shrinkage is "provably optimal" under appropriate conditions, but do not (as far as I can tell) provide a reference or explanation (much less a proof) of its optimality.
Edited 05-14-2001 12:06 AM
1
Hector Jasso
05-13-2001
09:49 PM ET (US)
Looking at figure 2, it is not clear to me exactly what kind of
structures the hill-climbing algorithm is constructing. I understand
they are only part of the learned structures, but

a) They seem too "specialized" to work for other kinds of announcements:
the training and testing sets are all from a single university.

b) If the algorithm was trained in a less restricted environment
where larger structures are needed (for example, hundreds of
different hall names, thousands of numbers), would the algorithm
be able to properly learn the corresponding structures? Or would it
be too computationally expensive to do hill climbing?

Since there is structure in the seminar announcements, then the learned
structures should be concise and understandable. But if the learned
structures grow indiscriminately as the variety of test cases grows,
then I would not be sure whether this approach is correct.
Upgrade to PRO

Upload pictures, personalize your board, and more!

Print | RSS Views: 491 (Unique: 322 ) / Subscribers: 0 | What's this?