| Who | When |
Messages | |
|
|
|
| Iman
|
12
|
 |
|
10-10-2006 04:49 PM ET (US)
|
|
The algorithms compared in the paper were all "general purpose behavior recognition algorithms."
Besides demonstrating that CUBOIDS outperformed the other algorithms in 3 different domains, how can the authors convince the reader that it is a better algorithm in general (or can they?)--especially when all of the algorithms need hand tuning of parameters.
This is a question which would apply to any study of this type where comparisons are being made to other algorithms.
|
| Matt
|
11
|
 |
|
10-10-2006 04:29 PM ET (US)
|
|
I really like the general approach of Cuboids, but given the very large difference in performance when 3d-Harris was used instead of temporal-Gabors, I'd be interested in seeing more work on how temporal feature selection works and how different methods (other than the two examined here) affect performance.
Doing a thought experiment on these temporal-Gabors, it seems like an edge moving back and forth at the right frequency (or for that matter, a flickering patch) would ellicit a large response (if I'm understanding this correctly). It seems like these may not be all that stable. Is there some way to combine 2d-Harris with temporal-Gabors, to find 'corners' with 'interesting' movement? Empirically, do the cuboids selected by either of the methods presented here correspond well to the features we most want?
|
| Deborah
|
10
|
 |
|
10-10-2006 04:14 PM ET (US)
|
|
Edited by author 10-10-2006 04:15 PM
Could you please give some more insight into why the quadrature pair works (fires) so well for the spatio-temporal (cuboid) keypoints? Thank you
(And thanks, Matt for the nice tutorial link!)
|
| Marius
|
9
|
 |
|
10-10-2006 03:48 PM ET (US)
|
|
I found the faces videos in the supplemental materials very amusing. They seem to be quite cartoonish, in the sense that they are very exaggerated expressions. I wonder how the classification algorithms would work with more subtle expressions of emotions, particularily eye-area movements.
It might be interesting to train and classify specific actors in movies or tv shows, maybe as a way to grade their performances in terms of 'range' of emotions. How varied are their emotions? Do they always get angry the same way or does it vary from scene to scene. Could be an interesting tool to train actors...
|
| Tingfan
|
8
|
 |
|
10-10-2006 01:39 PM ET (US)
|
|
1. The starting point and ending point for each behavior descriptor seems to be manually selected in this experiment. In other words, the whole video(7.5mins) is first segmented into several shots with manually labeled behavior. But in real life application, the data is always continuous. Would this algorithm works well on indefinite or inaccurate boundary? Or we can simply use a window to go through the video stream?
2. If the illumination is fluorecent light, there will be a 60hz blinking. Are there any false edges in time domain due to this blinking?
|
| Adam
|
7
|
 |
|
10-10-2006 01:36 PM ET (US)
|
|
It's interesting that it is able to perform so well without using any information as to the relative positions of the cuboids. Temporally, it seems this matters less because of the periodicity of much of the behavior being recognized. Though the results for facial expression recognition look good, I would think lack of positional information could cause problems in this area, because of the complex interactions of all features in the face when displaying emotion. I wonder how much incorporating this in the model would change the results for the data in the paper.
|
| Joshua
|
6
|
 |
|
10-10-2006 03:31 AM ET (US)
|
|
"The size of the cuboid is set to contain most of the volume of data that contributed to the response function at that interest point." Is this size variable for each cuboid (interest point)? If so, are similar cuboid types going to be guaranteed to have similar cuboid sizes to make the comparisons? How does the spatial scale factor (sigma) affect the size? On average, how many different features (cuboid types) are needed to identify unique behavior? Is there an expectation that using the temporal positions of the features will yield much improved performance?
|
| Carolina
|
5
|
 |
|
10-10-2006 02:43 AM ET (US)
|
|
I wonder if the cuboids can overlap each other? If yes,I would be interesting to find a method to make each cuboid unique, so there is no redundancy in the information. I also think it would be interesting to see if it is possible to keep information about cuboids neighbords to add information to the behavior.
|
| Boris
|
4
|
 |
|
10-10-2006 12:01 AM ET (US)
|
|
It would be interesting to see how spatial information could be incorporated into the descriptors. For object recognition with local features, RANSAC is often used to remove outlier matches - is there a spatio temporal structure to certain behaviors that can be found and somehow exploited.
Secondly, I wonder if this could somehow be combined with the Lepetit et al. method that I presented earlier. Instead of running k-means on a bunch of data in the domain, construct a seperate multi class classifier for each action/behavior that will take a novel cuboid and find it's correspondence in this action, or return "no matches". You could then classify novel video sequences based on how many correspondences were found between the original action/behavior sequence, and the novel sequence.
|
| Paul
|
3
|
 |
|
10-09-2006 11:07 PM ET (US)
|
|
I probably missed this in the paper, but how does this algorithm account for multiple mice in the same video sequence? Is there a segmentation preprocessing step applied to the raw video date?
|
| Matt
|
2
|
 |
|
10-09-2006 08:20 PM ET (US)
|
|
Edited by author 10-09-2006 08:35 PM
Just in case someone stumbles on the term "quadrature pair" and looks here first, here's an excerpt from Freeman, 1991 (just what I happened on first when looking to refresh my memory - a pretty good tutorial on Gabor filters in general can be found at our own http://mplab.ucsd.edu/tutorials/pdfs/gabor.pdf). "Consider two filters, identical except shifted in phase from each other by 90 degrees. The filters are called a quadrature pair..." <IMG SRC=" http://les.ucsd.edu/quadPair.png">
|
| Tom Duerig
|
1
|
 |
|
10-09-2006 06:37 PM ET (US)
|
|
Does a harris corner detector in the temporal domain fire only on massive dirivative changes temporally (ie second dirivatives/max accelerations) or on heavy temporal dirivatives (ie first dirivatives/max speed)?
|