QuickTopic (SM) free message boards QuickTopic (SM) free message boards
Skip to Messages
  Sign In to access your topic list  |New Topic |My Topics|Profile
Upgrade to Pro   Customize, show pictures, add an intro, and more:   QuickTopic Pro...and check out QuickThreadSM
Topic: Regularization theory and neural networks architectures
Views: 395, Unique: 294 
Subscribers: 0
What's
this?
Printer-Friendly Page
Subscribe to get & post, or stop messages by email Subscribe
About these ads
Who | When
Messagessort recent-bottom   
Post a new message
 
Dave KauchakPerson was signed in when posted  4
10-04-2001 03:54 PM ET (US)
I have a quick question about the smoothing functions used. We've seen a number of different smoothing function both in the paper and in the presentation. My question does anyone know if much investigation has gone into the actual form of these function? My intuition is that the choice of this smoothing function will greatly affect the performance of the overall system. It seems that so far, the functions selected have been ones that should work well based on experience.
Gyozo Gidofalvi (Victor)  3
10-04-2001 02:03 PM ET (US)
At first, i found the paper very theoretical, and did not clearly understand the intention behind deriving different networks and approximation schemes from the variational principle defined in equation (1). Later experimental results on two different types of problems ( 2-dimensional additive funtion, 2-dimensional Gabor function ) clearly showed that different networks and approximation schemes work better on one type of problems than the other, and the difference in these methods is in the stabilizer used in the equation (1), which represent different a priori assumptions about smoothness.

One usefull lesson to take home from this paper, i think, is that trying one of these methods on a particular problem may not be enough; however careful thinking about the problem (a priori knowledge) may lead to a wise choice between these methods.
sameer agarwal  2
10-04-2001 02:01 PM ET (US)
Edited by author 10-04-2001 02:03 PM
Apart from the fact that the paper makes fairly heavy reading, I have just one comment and one question.

The idea that we can unify various kinds of regressions through an appropriate choice of a regularization functional is very elegant indeed. I especially like the characterization of the solution of in terms of the Basis function G and its null space.

which brings up the question, what is a semi-norm ?

also its interesting, that the authors talk about the radial basis functions as having a basis function which results in a proper norm instead of a semi-norm, resulting in a null space which only has zero entries. But this comes at a cost of adding another adjustable parameter "beta". I am curious, and this is something the authors do not address,

is it a general pheonomenon, that for basis functions that result in a norm, we will always end up adding one of more adjustable parameters to choice of functions ?

also, how much of saving is it anyways, since they talk about choosing the appropriate beta by using a technique like cross validation. Which is surprising, since the whole point of the exercise is to have a theoretical basis of choosing good regression estimators and not having to rely on empirical techniques like cross-validation.

Also isn;t the choice of the form of the prior P[f] which explicitly uses the smootnness functional inits expression a bit forced ?
Anand  1
10-04-2001 12:50 AM ET (US)
I am curious to explore the connection between this paper and regularized versions of the EM algorithm. The EM algorithm is employed to obtain the maximum likelihood estimate of mixture model parameters which best explains the unknown probability density of given data. Loosely speaking a gaussian mixture model tends to approximate an unknown probability density function which is available to us only in the form of examples.

Sometimes. it is possible that we have an apriori knowledge about the pdf of the source. In such cases, usually a prior is assumed on the mixture model parameters and the Maximum aposteriori estimate is obtained recursively using the EM. This connects to the correspondence between bayesian networks and reguralization networks pointed out in the paper.

In the light of this paper we may obtain alternate versions of the EM by setting it in a regularization framework. However i feel there are no specific advantage to doing so since both approaches seem equivalent.

Nonetheless the geometric interpretation in terms of kernel basis functions can shed more light on how assuming a prior influences the estimation.
RSS link What's this?
QuickTopicSM message boards
Over 200,000 topics served
Learn more Frequently asked questions  Acknowledgements
What they're saying about QuickTopic
 Questions, comments, or suggestions? Contact Us
Read our use policy before beginning. We value your privacy; please read our privacy statement.
Copyright ©1999-2006 Internicity Inc. All rights reserved.