QuickTopic free message boards logo
Skip to Messages


Regularization theory and neural networks architectures

Dave KauchakPerson was signed in when posted
03:54 PM ET (US)
I have a quick question about the smoothing functions used. We've seen a number of different smoothing function both in the paper and in the presentation. My question does anyone know if much investigation has gone into the actual form of these function? My intuition is that the choice of this smoothing function will greatly affect the performance of the overall system. It seems that so far, the functions selected have been ones that should work well based on experience.
Gyozo Gidofalvi (Victor)
02:03 PM ET (US)
At first, i found the paper very theoretical, and did not clearly understand the intention behind deriving different networks and approximation schemes from the variational principle defined in equation (1). Later experimental results on two different types of problems ( 2-dimensional additive funtion, 2-dimensional Gabor function ) clearly showed that different networks and approximation schemes work better on one type of problems than the other, and the difference in these methods is in the stabilizer used in the equation (1), which represent different a priori assumptions about smoothness.

One usefull lesson to take home from this paper, i think, is that trying one of these methods on a particular problem may not be enough; however careful thinking about the problem (a priori knowledge) may lead to a wise choice between these methods.
sameer agarwal
02:01 PM ET (US)
Apart from the fact that the paper makes fairly heavy reading, I have just one comment and one question.

The idea that we can unify various kinds of regressions through an appropriate choice of a regularization functional is very elegant indeed. I especially like the characterization of the solution of in terms of the Basis function G and its null space.

which brings up the question, what is a semi-norm ?

also its interesting, that the authors talk about the radial basis functions as having a basis function which results in a proper norm instead of a semi-norm, resulting in a null space which only has zero entries. But this comes at a cost of adding another adjustable parameter "beta". I am curious, and this is something the authors do not address,

is it a general pheonomenon, that for basis functions that result in a norm, we will always end up adding one of more adjustable parameters to choice of functions ?

also, how much of saving is it anyways, since they talk about choosing the appropriate beta by using a technique like cross validation. Which is surprising, since the whole point of the exercise is to have a theoretical basis of choosing good regression estimators and not having to rely on empirical techniques like cross-validation.

Also isn;t the choice of the form of the prior P[f] which explicitly uses the smootnness functional inits expression a bit forced ?
Edited 10-04-2001 02:03 PM
12:50 AM ET (US)
I am curious to explore the connection between this paper and regularized versions of the EM algorithm. The EM algorithm is employed to obtain the maximum likelihood estimate of mixture model parameters which best explains the unknown probability density of given data. Loosely speaking a gaussian mixture model tends to approximate an unknown probability density function which is available to us only in the form of examples.

Sometimes. it is possible that we have an apriori knowledge about the pdf of the source. In such cases, usually a prior is assumed on the mixture model parameters and the Maximum aposteriori estimate is obtained recursively using the EM. This connects to the correspondence between bayesian networks and reguralization networks pointed out in the paper.

In the light of this paper we may obtain alternate versions of the EM by setting it in a regularization framework. However i feel there are no specific advantage to doing so since both approaches seem equivalent.

Nonetheless the geometric interpretation in terms of kernel basis functions can shed more light on how assuming a prior influences the estimation.

Print | RSS Views: 701 (Unique: 488 ) / Subscribers: 0 | What's this?