QuickTopic free message boards logo
Skip to Messages


Robust Real Time Object Detection

Junwen WU
02:06 PM ET (US)
Has anyone tried to use the features they are using in other objects detection problem? I remember in poggio's work about using the Harr wavelet, the reason is also that the face images have a relatively stable intensity contrast, for example, the cheek will be brighter than the nose etc. I don't think other objects have such stable feature. So I'm wondering if the rectangle features can still work well when dealing with other objects.
Dave KauchakPerson was signed in when posted
12:48 PM ET (US)
I also was a bit annoyed about the reference [2] thing. I think the paper they refer to is Tieu and Viola (2000), Boosting Image Retrieval. I'm not exactly sure why they don' explicitly site it. Tieu also has a masters thesis on this stuff (Boosting Sparse Representations for Image Retreival, 2000), which is another option.
Markus Herrgard
12:37 PM ET (US)
As far as I can see the major difference in this implementation of AdaBoost compared to the standard ones is that instead of training the same classifier on each training round (with differently weighted data) the Viola & Jones version trains an individual classifier for each feature and then selects the one with the smallest error (with respect to the weight distribution). I'm wondering whether this modification changes in any way the theoretical performance guarantees presented in the AdaBoost papers we read earlier. On second though, it probably doesn't since the weak classifier in AdaBoost can be any non-random classifier including the really weak single feature based classifier used here. A second comment is on redundancy in the feature calculations. Since the actual test images are scanned by moving a window one pixel a time it would seem like there is a lot of overlap in calculating the integral image features. Is this true and would there be a way to speed up the calculation even further by storing some of the features calculated in the previous window.

So much for semi-intelligent comments and on to ranting about the paper. Because there was quite a bit of stuff included in the paper I had a really hard time in putting together how the whole classifier (with the cascades and all) would actually work. The authors didn't seem to make much effort in trying to make it easy for other people to implement their method (good work Ian!). Also, what's up with reference [2] in the paper. And as a final comment judging from Fig. 10 in the paper a much easier task for object recognition would be recognizing soccer balls from cluttered images. I'm not so sure of the commercial potential of a rapid soccer ball detector though ...
Edited 10-25-2001 12:39 PM
Joe Drish
08:06 AM ET (US)
I thought the simple modification to AdaBoost in order to select a small number of essential features was clever, as was cascading. Even though simplicity of explanation is en vogue these days, I thought they could have taken more time to elaborate upon precisely how much faster features were being evaluated. Overall the paper was well organized; the authors structured the paper well in line with their three contributions, which they thankfully reiterated several times.
Edited 10-25-2001 08:06 AM
Hsin-Hao Yu
06:57 AM ET (US)
In this paper, cascading works as follow: if a less-powerful classifier says a patch is not a face, it is almost certainly not a face. However if it says it is a face, it could be a false positive. A more powerful classifier need to be used. Can you train the network to do the opposite? Train a first-stage classifier to have low false-positive, but if it says a patch is not a face, pass it on to higher classifier to decide if it is a false-negative? Seems to be ok for me, but maybe I am mising something. Is this a dumb question? --- wait! Ok it's a dumb question, after reading Ian's comment. Doh!

The second question is: Ok this model is cool. It selects the most important features from a huge set of features. But maybe that's because the huge set of features is largely junk features to begin with? What happends if you start with a (large) set of good features (steerable)? Will the AdaBoost procedure still be useful? Also, the greediness of AdaBoost might have a problem in the following situation: it's possible that the features are not independent to each other. That is, feature A might make no sense, unless feature B is also included. It seems to me that AdaBoost will not be smart enough for this siatuation.
Edited 10-25-2001 07:09 AM
andrew cosand
05:00 AM ET (US)
I also thought their approach was kinda cool. The features seemed strange for a bit, but they sholud work as edge/gradient detectors (a and b) center-surround detectors (c), ect. While they do mention that the rectangular features are more limited than 'steerable' features, they point out that rectangular ones are rich enough to get the job done. The important point, though, is that with the integral image technique, the rectangular features can be computed FAST. If you have to use a couple rectangular features to get the same information as a single more complex feature that takes longer to calculate, you still come out ahead.
Edited 10-25-2001 05:00 AM
Ian Fasel
03:26 AM ET (US)
I've been on an implementation of this very face detector in my lab. Here are a few comments that might be illuminating:

1) Finding the best threshold for each feature takes a long time, because even though the search for a good threshold can be done very quickly (using, e.g., golden section search), there are over 160,000 possible features in a 24x24 box. On a 1GHz Athlon this takes 2.7 hours per round of AdaBoost. To overcome this, some sampling was done. Here's Michale Jones' explanation:
"To get down to 43,000 features we subsampled by only
taking every other feature (I think we only took
features whose top, left corner was on an even
numbered pixel). We did this solely for speed reasons
- if you can use all 160,000 features your resulting
detector might be better. Also, during each iteration
of Adaboost, the algorithm had a 10% chance of
considering any particular feature. In other words we
only looked at about 10% of the 43,000 features on
each iteration of Adaboost. Again, this is solely for
speed reasons."

2) Deciding when to branch to the next level of the cascade is also very tricky, and is not explained well in the paper. The algorithm they give is fine, however choosing the final required performance for each cascade level is not obvious. There are tradeoffs: having fewer features early on gives you excellent speed benefits. However, each time you branch, you take a hit in accuracy on positive detections, and these hits accumulate quickly. Thus, this is a force pushing you to have more features and fewer branches. Viola and Jones' method is very ad hoc and makes the work difficult to reproduce (even for Paul Viola!), so there is much room for improvement here.

3) AdaBoost tries to minimize classification error, however the cascade algorithm tries to minimize false detections while attempting to keep correct detect rate high. This makes sense since, compared to the set of all natural images, faces are very rare. Thus, it seems that AdaBoost should be modified to make this the explicit goal.
sean o'rourke
02:43 AM ET (US)
Curses. It appears that "<p>" is not one of the "selected HTML tags"...

sean o'rourke
02:41 AM ET (US)
Sameer -

<p>For the features, I just got that they were a poor man's (or efficient man's) analogue of the edge-sensitive cells in the visual cortex. Instead of having a wide variety of angles, they only have two. It looks like they have features for vertical (a) and horizontal (b) edges, plus thin vertical stripes (c). I'm not quite sure about (d), since it looks like it picks up horizontal stripes or stippling.

<p>Or maybe I'm missing your question.

Dave KauchakPerson was signed in when posted
02:06 AM ET (US)
I actually really liked this paper :) I've read a number of image retrieval/object detection papers, and I found the methods presented in this paper to be some of the most interesting.

In particular, I really like the idea of using boosting for feature selection (even though this is actually presented in the earlier paper). Being a big machine learning fan, I think it's great to learn what the best features are instead of trying to decide statically.

I also thought the idea of cascading classifiers was good. This had a number of nice effects. First, I think it allowed the detection performance to be increased by cascading the classifiers. I also think, however, that this hits on another key point of the paper which is temporal performance. The paper made specific compomises and design decisions to get the system so that it could run fairly well in real time. This is quite different from many other systems.

As for the features, they are simply those 24x24 rectangular things seen at the top right of page two. Since the paper only deals with black and white (i.e. intensities) a simple difference of sums is taken from the light and dark regions. I think the motivation is similar to that of earlier work in Tieu and Viola (2000), Boosting Image Retrieval, which I know some of us read for cs254 in the Spring.
samer agarwal
11:44 PM ET (US)
to begin with, a rather badly written paper. Their explanation of the learning system is rather confusing and could have been made much clearer. And I am still wondeing what their featurization is. Anyone have any insights into what is going on with their features ??

Print | RSS Views: 918 (Unique: 650 ) / Subscribers: 8 | What's this?