Edited by author 10-01-2002 05:49 AM
okay I will try to answer some here and the rest in the talk (after all I want an audience ;))
I should be able to explain the GLM stuff a little more clearly in the class, but here is the short answer. When we perform normal linear regression, we assume that the model is of the form
y = Ax + \epsilon
where \epsilon (the noise) is assumed to be a zero mean gaussian with a fixed variance.
but as the paper states, there are cases where you would want a different noise model, suppose you know that the noise will always be positive, then you might want to use the poisson distribution. This extension to regression with various kinds of noise models is the subject of Generalized Linear Models.
the following link should help.
http://www.isds.duke.edu/computing/S/Snotes/node81.htmlThe bregman divergence is actually a meta object, parameterized by the function F. The divergence itself is a generalization of the concept of entropy, where the exact form of the generalization depends on the choice of F.
For the exponential family, the choice of F = -log x and calculaing the bregman divergence over the parameters, gives the KL divergence between the two distributions.
as for the actual algorithm assuming orthogonality of the V_i, that is not true. The algorithm by itself does not assume anything like that.
In the case of standard PCA, one has to explicitly state that the vectors v_i turn out to be orthonormal for the algorithm.
The only thing stopping us from getting the same V_i is if you look carefully at the algorith, there are multiple random restarts for each v_c, and given the fact tha a representation for which all V_is are the same will be quite bad in terms of the reconstruction error/Bregman divergence will push the vectors away from each other.