| Ian Fasel
|
5
|
 |
|
10-25-2001 03:26 AM ET (US)
|
|
I've been on an implementation of this very face detector in my lab. Here are a few comments that might be illuminating:
1) Finding the best threshold for each feature takes a long time, because even though the search for a good threshold can be done very quickly (using, e.g., golden section search), there are over 160,000 possible features in a 24x24 box. On a 1GHz Athlon this takes 2.7 hours per round of AdaBoost. To overcome this, some sampling was done. Here's Michale Jones' explanation: "To get down to 43,000 features we subsampled by only taking every other feature (I think we only took features whose top, left corner was on an even numbered pixel). We did this solely for speed reasons - if you can use all 160,000 features your resulting detector might be better. Also, during each iteration of Adaboost, the algorithm had a 10% chance of considering any particular feature. In other words we only looked at about 10% of the 43,000 features on each iteration of Adaboost. Again, this is solely for speed reasons."
2) Deciding when to branch to the next level of the cascade is also very tricky, and is not explained well in the paper. The algorithm they give is fine, however choosing the final required performance for each cascade level is not obvious. There are tradeoffs: having fewer features early on gives you excellent speed benefits. However, each time you branch, you take a hit in accuracy on positive detections, and these hits accumulate quickly. Thus, this is a force pushing you to have more features and fewer branches. Viola and Jones' method is very ad hoc and makes the work difficult to reproduce (even for Paul Viola!), so there is much room for improvement here.
3) AdaBoost tries to minimize classification error, however the cascade algorithm tries to minimize false detections while attempting to keep correct detect rate high. This makes sense since, compared to the set of all natural images, faces are very rare. Thus, it seems that AdaBoost should be modified to make this the explicit goal.
|