| Jonathan Ultis
|
9
|
 |
|
04-13-2001 03:55 PM ET (US)
|
|
Edited by author 04-13-2001 04:03 PM
Oops - I swapped _r and _t from what they were in the paper. Sorry about that. In the work below, _r is labelled training data and _t is all data.
The error functions behave in an unintuitive way for me.
Say, for example, you have 3 colinear points. In particular, consider these colinear points (assume a little noise to make the problem harder if you like).
(0, 0), (0.1, 0.1), (1, 1)
assume (0, 0) and (1, 1) are labelled, but (0.1, 0.1) is not. Let PHI (which I'll write as O(x)) be the mean of the labelled points - O(x) = 0.5
Now consider the additive error
error = sum_r((h(x_r) - y_r)^2)/r + d_r - d_t which is
sum_r((h(x_r) - y_r)^2)/r + sum_r((h(x_r) - O(x_r))^2)/r - sum_t((h(x_t) - O(x_t))^2)/t
take the derivative and combine the two sums over r. We want to find local minima or maxima, so set it equal to 0. I also factored out a 2. I'm putting in r=2 and t=3 to simplify the math.
0 = sum_r(2h(x_r) - y_r - O(x_r))/2 - sum_t(h(x_t) - O(x_t))/3
doing the sums by hand...
sum_t(h(x_t) - O(x_t)) = h(0) - O(0) + h(0.1) - O(0.1) + h(1) - O(1)
sum_r(2h(x_r) - y_r - O(x_r)) = 2(h(0) + h(1)) - (0 + 1) - (O(0) + O(1))
together (this one is complicated and I could have messed it up)
0 = 4h(0) + 4h(1) - 2h(0.1) - O(0) - O(1) + 2O(0.1) - 3
since O(x) = 0.5
0 = 4h(0) + 4h(1) - 2h(0.1) - 5
let h(x) be the best fit straight line that minimizes the error
h(x) = ax + b
0 = 4(0 a + b) + 4(1 a + b) - 2(0.1 a + b) - 5
0 = 3.8a + 7.8b - 5
1.31 - 2.05 b = a
So, if we apply the correct slope for a (1), we end up with b = 0.15, which puts the line through (0, 0.15), (1, 1.15).
If we apply the correct b (0), we end up with a = 1.31, which puts the line through (0, 0), (1, 1.31)
Neither of these are correct, which suggests that the additive error will not work well in practice.
I had a mistake in my math the first time I worked this that made the line come out correctly for (0, 0), (0.5, 0.5), and (1, 1). I'm not sure if what I have now is entirely correct. If anyone finds a mistake, please let me know.
|