QuickTopic (SM) free message boards QuickTopic (SM) free message boards
Skip to Messages
  Sign In to access your topic list  |New Topic |My Topics|Profile
Upgrade to Pro   Customize, show pictures, add an intro, and more:   QuickTopic Pro...and check out QuickThreadSM
Topic: CSE 250B Fall 2007
Printer-Friendly Page
All messages    << 145-160  129-144 of 161  113-128 >>
About these ads
Who | When
Messagessort recent-top    (not accepting new messages)
Charles Elkan  129
12-03-2007 10:58 PM ET (US)
<i>
By placing hyphens in novel english words correctly do you mean
breaking the word into its syllables?
<\i>
Generally a hyphen is only allowed between syllables, but some syllable breaks are not desirable hyphen places. For example, you don't want a hyphen after the first letter of "asymptomatic."

The task here is shallow: just to learn to predict hyphen locations as they are found in the training data. Syllabification is a deeper, related, task.
Matt Jacobsen  130
12-04-2007 01:57 AM ET (US)
I'm a bit confused about what function to use with Carl Rasmussen's non-linear optimizer. The example shows it being used with the Rosenbrock function which returns the function value, the partial derivatives, and the Hessian of the (general dimension) function. I'm guessing however that we only need to supply it with a function that returns the function value and the partial derivatives, right?

Then what might this function be? Would the Resenbrock function be appropriate? Or should we write one? Perhaps one that calculates the log likelihood? But what would the partial derivatives be?

I'm a bit lost on this. Any help would be appreciated.

Thanks,
Matt
Charles Elkan  131
12-04-2007 01:40 PM ET (US)
Edited by author 12-04-2007 01:41 PM
I'm a bit confused about what function to use with Carl Rasmussen's non-linear optimizer. The example shows it being used with the Rosenbrock function which returns the function value, the partial derivatives, and the Hessian of the (general dimension) function. I'm guessing however that we only need to supply it with a function that returns the function value and the partial derivatives, right?

Correct. Use checkgrad.m to verify that the partial derivatives you supply are correct.


Then what might this function be? ... should we write one? Perhaps one that calculates the log likelihood? But what would the partial derivatives be?

Yes, you should supply the log-likelihood as your function to maximize. The partial derivatives are given in my lecture notes on page 7.
Stefano  132
12-05-2007 03:14 PM ET (US)
I have a question about computing the partition function.

The matrix multiplications of M_t(v,w), are for n different m-x-m matrices. m is "... the cardinality of the set of labels." So, for this project it should be 2 since the each label of a character is binary, correct?. This creates 2x2 matrices for the multiplications, which seem to work out for the definition as g_t(y_{i-1},y_i).

I don't quite think my understanding of this is correct since for a feature function with different arity like g_t(y_{i-1}, y_i, y_{i+1}), this doesn't work. How are different arity feature functions dealt with, and what is the correct value for m?
Kristen Jaskie  133
12-05-2007 05:32 PM ET (US)
Deleted by author 12-05-2007 08:19 PM
Charles Elkan  134
12-05-2007 10:12 PM ET (US)

m is "... the cardinality of the set of labels." So, for this project it should be 2 since the each label of a character is binary, correct?. This creates 2x2 matrices for the multiplications

Yes, because there are just two labels: 0 for no hyphen, and 1 for hyphen. If you also have labels "begin" and "end" then m=4.


I don't quite think my understanding of this is correct since for a feature function with different arity like g_t(y_{i-1}, y_i, y_{i+1}), this doesn't work. How are different arity feature functions dealt with, and what is the correct value for m?

The arity has nothing to do with m, the number of different label values possible at a single position. The restriction to arity = 2 is fundamental to the CRFs we are studying. The reason is precisely that the Viterbi and matrix algorithms only apply when arity = 2.
Charles Elkan  135
12-05-2007 10:24 PM ET (US)
Edited by author 12-05-2007 10:25 PM
how to represent feature-functions in matlab

Given fixed xbar and ybar, each F_i value is a number. So you can represent all F_i as one vector for xbar and ybar. This vector is sparse since F_i(xbar,ybar) = 0 for most i. The F_i vector can be computed once and for all for each training example, before you start training.

How you compute the F_i vectors, i.e. how you go from definitions like f(y_i-1, y_i, xbar, i) = I(y_i = 1)*I(<x_i-1 x_i x_i+1 x_i+1> = vowel-consonant-vowel-consonant) to the numerical values, is a separate question. You may want to do this preprocessing in a scripting language.
Charles Elkan  136
12-07-2007 01:38 AM ET (US)
I have updated the online lecture notes on CRFs, which are here: http://www-cse.ucsd.edu/users/elkan/250B/loglinear.pdf

Also, for tutorials on CRFs see
http://www-cse.ucsd.edu/~elkan/254/papers.html
Jostein Oeysaed  137
12-08-2007 04:57 PM ET (US)
Hi Charles,

I'm feeling kind of frustrated because this last project seems much more difficult than the previous ones, and you didn't go over the stuff we needed to know until Thursday. My partner and I started early and were trying to do it before, but haven't made any progress till now since you hadn't gone over what we needed to know. We still have a LONG way to go (it's pretty confusing stuff) and now we only have a couple of days to do the entire thing in. The problem is that I have several other finals and projects next week, so I just don't know if we'll have time to do a good job AND to study for my other classes.

I think the week we had off due to the fires messed up your schedule, but I don't think it's fair for you to make us work like this during finals week. We really didn't have the material that we needed to do the project until the very last day of class.

It would help if you could be a little bit easier in grading this project, due to the lack of time we have to do it (like how you "graded" the first project :)
Charles Elkan  138
12-08-2007 06:33 PM ET (US)
I appreciate that the last project is the most complex one, and that many people have multiple other assignments due also. I will certainly be generous in scoring. Let me repeat also that scoring is not grading, so lowish scores can still lead to a good grade. The projects and exams are challenging!

I have been covering log-linear models and CRFs since November 27. I understand that many people may not have been able to focus on the project until now. I will try to be very available to answer questions on this message board.
Charles Elkan  139
12-09-2007 12:05 PM ET (US)
Project simplifications

Here are some suggestions and simplifications that should make the project easier.

(1) Implement perceptron training first. With this method, you do not need to compute the partition function, and you do not need to compute derivatives. All you need is the argmax.

(2) If you assume that no hyphen is allowed after the first letter or before the last letter, then you don't need START and STOP tags, so the cardinality of the tag set is just 2. With this cardinality, you can compute the argmax, Z, and derivatives all by brute force (i.e. enumerating all possible tag sequences). Implement this first and use it to do the project. Then, if you have time, use it to check the correctness of your code that uses sophisticated algorithms.

(3) Your priorities should be clearness, correctness, and usefulness. A clear, correct, useful method is far more desirable than a fancy method whose correctness is not clear and whose results are questionable.

(4) There is an 80-20 rule in everything. Best performance might come from 10,000 feature-functions but you can get almost as good performance with many fewer. Start with few enough so that your code is fast enough that you can debug it easily.

(5) Perception is reality. First impressions matter. Presentation counts. A good report about a good method is better than a bad report about a method that might be very good.

(6) Many problems are too hard to solve, except with simplifications. Having the judgment to invent potential simplifications, and then to select reasonable ones, is a very important skill.

In summary, I will be happy to give full marks to a good report about a method that clearly does solve the problem, even if the method itself is simplified. Specifically, you can get full marks based on just just one training method.
Charles Elkan  140
12-09-2007 05:25 PM ET (US)
Reminder: The final exam will be on Thursday at 3pm, in WLH 2208, which is the same room that lectures were in.
Jerry Fu  141
12-10-2007 11:30 PM ET (US)
I have a questions about feature functions, specifically how the lecture notes example pertains to our project. the function U(k,v) maximizes over the tags y at each step, which are either 1 or 0 in our project. However, a feature function that examines only vowel consonant patterns and hyphenates a particular letter in that sequence does not change based on the previous tag. so when performing the Viterbi algorithm, the value of this feature function would be multiplied by its weight and added in , correct?

If we have a feature function that examines the previous tag, we can precalculate all possible tag combinations in preprocessing, and then operate on these vectors instead of calculating these on the fly.
Charles Elkan  142
12-11-2007 12:15 AM ET (US)
a feature function that examines only vowel consonant patterns and hyphenates a particular letter in that sequence does not change based on the previous tag. so when performing the Viterbi algorithm, the value of this feature function would be multiplied by its weight and added in , correct?

Yes.

If we have a feature function that examines the previous tag, we can precalculate all possible tag combinations in preprocessing, and then operate on these vectors instead of calculating these on the fly.
Yes.
Charles Elkan  143
12-11-2007 03:11 PM ET (US)
Some additional notes on what to put in your report

Make sure you yourself, and the reader, are convinced that your algorithms and your implementation are correct. We want to be able to believe and trust your experimental results.

In the report, make it very clear exactly what feature-functins you use. For each different type of feature-function, explain it explicitly and say how many feature-functions there are of this type.

Examine and discuss your final trained CRF model. Tabulate which feature-functions have the largest learned weights (positive or negative). Use your intuition to explain, in retrospect, why these feature-functions have the most predictive power.
Kristen Jaskie  144
12-11-2007 11:45 PM ET (US)
My partner and I have been working super hard to finish this project early because we both have other deadlines and exams this week. We finished the code (using the Viterbi algorithm) this weekend but have spent the two days since trying to run the experiments. Our problem is that we keep running out of memory (even on my partner's brand new laptop optimized for performance with enhanced virtual memory in the control panel). We tried cutting down our number of feature functions to a minimum and performing garbage collection wherever possible. But while we can run the code just fine if we use 1% of the data for training, attempting to use even 5% causes the program to fail due to lack of memory. We have spent WAY too much time on this problem and are unsure how to proceed. We obveously cannot do 5-way crossvalidation. Should we just attempt 100-way crossvalidation (using 1% at a time to train and the rest to test?) We started early but we're down to the nub now and don't know what to do. Any suggestions?
RSS link What's this?
All messages    << 145-160  129-144 of 161  113-128 >>
QuickTopicSM message boards
Over 200,000 topics served
Learn more Frequently asked questions  Acknowledgements
What they're saying about QuickTopic
 Questions, comments, or suggestions? Contact Us
Read our use policy before beginning. We value your privacy; please read our privacy statement.
Copyright ©1999-2008 Internicity Inc. All rights reserved.