Kristin Branson
|
1
|
 |
|
10-01-2002 02:49 AM ET (US)
|
|
I think Sameer has quite a task to present all the background information necessary to understand this paper. I think the basic idea of the paper is clear, simple, and very nice, but I am getting very confused by the math, It would be helpful I think to know the meanings of all the parts of equation (2). I can see what functions G(theta) and Po(x) represent in specific examples, but I don't know what the meanings of these functions are in general. I feel like they have more meaning than just being a way to split up lg(P(x|theta)) into parts by their dependencies, but maybe I am wrong. I think this might help me understand why it can be easily verified that g(theta) = E[x|theta], as I do not understand the relationship between Po(x) and the expected value of x.
I am totally confused by the whole part in the last paragraph of 2.2 (as I have not looked at the Generalized Linear Models paper) and I think it would be helpful to hear some about that.
I was also wondering what the meaning of the Bregman distance is. It seems to fit conveniently into the minimization problem, but I am interested to hear its standard context.
Finally, does the actual algorithm for minimizing the loss function assume that the components v_i are mutually orthogonal? If it does not, then how is each minimization done separately -- it seems like we want the collection of l vectors v_i (and corresponding coefficients a_i) that minimize the loss function, but somehow we are able to divide this problem into finding each v_i separately. What stops you from getting the same v_i from each of these minimizations? Even if orthogonality is assumed, then how is it enforced?
|