| Matt Clothier
|
1
|
 |
|
11-12-2003 05:06 PM ET (US)
|
|
I find the idea of doing activity recognition before 3D reconstruction interesting (since 3D reconstruction is probably the most common means of tracking humans). By using selective correspondences between the key frame and actual frame the locations of selected body parts (like hands and feet) can be found. I do have a few concerns about this technique though. First, is the coarse head and body tracking that they use. Their likelihood function is based upon a sum-of-squares distance measure between a color template and the image data. Although this may provide them with basic head and body localization, this can be easily broken if there happens to be an object in the image that has a similar color pattern (or if the person has a shirt on that matches the background). I am also concerned about the use of hand drawn key frames. If their technique was extended to recognize everyday activities such as walking, running, jumping, playing tennis, etc. then many, many key frames would need to be available for matching (which would take a lot of effort). In addition, notice that the keyframes in figure 10 have the person facing forward for the most part. What if the person turns around? Will the right hand become the left hand and vice-versa? Anyway, the technique has some promise, but at this point they make many assumptions about the data in order to get their technique to work. I am interested to hear other people's insights in defending their work.
|