Dave Kauchak
|
3
|
 |
|
10-23-2001 03:20 AM ET (US)
|
|
To try and answer one of your questions Junwen: 1. You can think of the distribution of D as a weighting of the importance of the specific training examples. Our weak learner will return some that classifies examples (i.e. a function from the X to Y). That rule will presumably predict some number of training examples correctly and some number incorrectly. The weak learner is trying to decide a rule so as to minimuze some function of correctly classified and incorrectly classified examples (including the distribution D, or weighting). A simple approach to decide the best rule would just be to sum up the weights of the correctly classified training examples and sum the weights of the incorrectly classified examples and take the difference of the two. The rule that maximizes this values is the best rule. You can think of more elaborate algorithms for using these weights, but the key idea is to think of the weights as a rank of importance. So you want to pick a rule that will correctly pick the higher weighted ones right over the lower weighted ones.
|