QuickTopic (SM) free message boards QuickTopic (SM) free message boards
Skip to Messages
  Sign In to access your topic list  |New Topic |My Topics|Profile
Upgrade to Pro   Customize, show pictures, add an intro, and more:   QuickTopic Pro...and check out QuickThreadSM
Topic: Levenberg-Marquardt
Views: 1292, Unique: 621 
Subscribers: 3
What's
this?
Printer-Friendly Page
Subscribe to get & post, or stop messages by email Subscribe
All messages            1-8 of 8        
About these ads
Who | When
Messagessort recent-top   
Post a new message
 
Robin Hewitt  1
10-02-2004 04:48 PM ET (US)
These papers are very clear presentations of the LM method...thanks for that! They did leave me with a question, though. How dependent is the success of this method on the quadratic-approximation assumption? Put differently, what happens when that assumption isn't a good one? As a follow on, is this ever an issue in practice?
Louka Dlagnekov  2
10-04-2004 02:36 AM ET (US)
These papers contain quite a mouthful of Math!

Both papers claims that the LM Optimization outperforms gradient descent for medium-sized problems. What exactly does this mean? Would an ADALINE using the LM method instead of steepest descent perform better in approximating the function y=4x_1*x_2?

Also, does "medium-sized problems" mean ones with a relatively small number of weights?
Gary Tedeschi  3
10-04-2004 05:16 PM ET (US)
On the last page of the paper Roweis discusses the Marquardt improvement to Levenberg's algorithm. The goal of the improvement is that "we should move further in the directions in which the gradient is smaller in order to get around the classic 'error valley' problem." I understand the goal, but I am not completely clear on how introducing diag(H) achieves it(?).
Sanjeev Kumar  4
10-05-2004 05:20 AM ET (US)
On the 4th page of paper, Roweis says linear approximation of f(w) is only valid near a minimum. I don't understand why (at least as long as we interpret "near" in euclidean distance sense) ? I tried understanding on following lines.

   Linear approximation of f(w) is valid in current neighborhood (1)
=> Quadratic approximation of E(w) is valid in current neighborhood (2)
=> we can reach minimum in 1 step (assuming exact validity) (3)
=> we are near minimum (4)

But implication (3) need not be true. It requires additional condition that neighborhood of (1) is large enough to contain minimum point and furthermore implication (4) would require different interpretation of "near".


One more (but unrelated) question: There are some methods ( e.g. Davidson-Fletcher-Powell) which update inverse of Hessian matrix based on secant equation, instead of computing it on every iteration, which can be very useful for large-sized problems. Is there any equivalent for updating inverse of (H + \lambda diag(H) ) so that it can be used in Levenberg-Marquardt algorithm ?
Steve Scher  5
10-05-2004 08:41 AM ET (US)
I'd like to understand a bit better how to choose between the directions indicated by the gradient, and by the Hessian. The papers are a bit silent on this, focusing on step size, except stating that L-M's *10 or /10 rule gives a good mix.

I'm a step behind Gary in message 3 /m3: Of course 2nd-order information can be a helpful addition to the gradient, but I don't understand that in general we want to move "in the directions in which the gradient is smaller". If we did that completely, we'd be moving orthogonally to the gradient. So it must mean to deviate a bit from the gradient, and in the direction pointed to by the Hessian, but LM favors the modified-gradient direction only where the linearization breaks down - why is this a good region to modify the gradient according to the Hessian?

Regarding Louka's question about what "medium sized" means in message 2 /m2: both papers describe the limit as "hundreds of weights" with "thousands" being prohibitive.
Rasit  6
10-05-2004 04:35 PM ET (US)
Edited by author 10-05-2004 08:53 PM
There was a slide regarding the perturbation at the end? Could you make your point about that slide again? (it was in the applications section.)

also some minor questions:
I do not quite understand how the Hessian is practically computed, the paper says it is an approximation.

Also, wouldn't LM be stuck at local minima most of the time?
 
Messages 7-8 deleted by topic administrator 07-21-2006 09:00 AM
RSS link What's this?
All messages            1-8 of 8        
QuickTopicSM message boards
Over 200,000 topics served
Learn more Frequently asked questions  Acknowledgements
What they're saying about QuickTopic
 Questions, comments, or suggestions? Contact Us
Read our use policy before beginning. We value your privacy; please read our privacy statement.
Copyright ©1999-2008 Internicity Inc. All rights reserved.