| Greg Hamerly
|
1
|
 |
|
04-10-2001 05:18 PM ET (US)
|
|
I enjoyed reading this paper. Something I have been interested in is what to do with all kinds of data that don't have labels, but that could be exploited.
I'm still confused on the notation of d() versus d_hat(). The latter (d_hat), I can see is the training error. The former (d) is posed as an integral -- I can only assume that actually means a sum over the unlabeled data points. Can anyone else put the meaning of d() and its ratio with d_hat() into plain english?
Working on intuition, this technique would seem to work at discriminating between widely different hypotheses (i.e. a 2nd-order polynomial versus a 10th-order polynomial), but I don't see that it has much power to discriminate among fairly similar hypotheses. Thus it seems to me to be a technique used for very gross model selection.
- greg
|