| Who | When |
Messages | |
|
|
|
| Matt Clothier
|
1
|
 |
|
09-29-2003 03:09 AM ET (US)
|
|
Interesting results, however not without its flaws. I found three flaws in figure 3:
1- frame four comparision: the shoe of the "back person" is barely visible between the shoes of the "front person"; this creates what looks like a shadow in the final result
2- frame five comparison: in the final image, a lot of the "front person" ends up being replaced by the background (look at his front leg)
3- frame ten comparison (last frame): notice that a large part of the top of the baby buggy has disappeared in the final image
You may need to zoom in using acrobat to see these, but they are definately there.
|
| Diem Vu
|
2
|
 |
|
09-29-2003 01:08 PM ET (US)
|
|
I'm trying to verify the equation for F in section 6 ... with no hope. However, I have a feeling that in the last bracket, it should be eta instead of mu ((gamma - eta)^2...)
|
| Piotr
|
3
|
 |
|
09-29-2003 03:11 PM ET (US)
|
|
I will try to avoid going into too much details on graphical models in Tuesday's presentation since this is a course in itself and many of you have taken it. Also, although I will give some intuition behind variational methods, again this is a topic in itself so I will not be covering it in detail (some of those incomprehensible equations will remain vague). For those of you who have had no exposure to graphical models I suggest spending at least some time at the following link: http://www.cs.berkeley.edu/~paskin/gm-short-course/For those of you who have taken Sanjoy's course or have done graphical models stuff before, variational inference techniques are very powerful and interesting approaches. I recommend one of the following: http://www.cis.upenn.edu/~mkearns/papers/barbados/jgjs-var.pdfhttp://research.microsoft.com/users/jojic/tut.pdf
|
| Jing Shiau
|
4
|
 |
|
09-29-2003 03:51 PM ET (US)
|
|
This might be too late to ask... but the title of the paper listed on the schedule page says "Learning flexible sprites in video layers", but the link gives the paper "Learning Appearance and Transparency manifolds of occluded objects in layers." Which is the one being presented?
Also, where can I find more information about mean appearance and mean transparency (maps)? Thanks.
|
| Piotr
|
5
|
 |
|
09-29-2003 04:52 PM ET (US)
|
|
Oops, the wrong paper is linked. The correct paper is "Learning flexible sprites in video layers", it can be found at: http://www.psi.toronto.edu/~frey/papers/lay-cvpr01.pdf"Learning Appearance and Transparency manifolds of occluded objects in layers" is follow up work by Frey & Jojic, but actually the fundamental ideas are essentially the same (the work is more sophisticated).
|
Serge Belongie
|
6
|
 |
|
09-29-2003 05:20 PM ET (US)
|
|
Oops! I corrected the link. sorry about that Serge
|
| Matt Clothier
|
7
|
 |
|
09-29-2003 08:19 PM ET (US)
|
|
Oops... Ignore my comments then (wrong paper).
|
| Jing Shiau
|
8
|
 |
|
09-29-2003 10:53 PM ET (US)
|
|
Can a sprite be calculated from just one given point? The authors say that their technique allows for point-and-click video stabilization and point-and-click object removal.
In 6.1, when they talk about the E-Step updates for the posterior probability of transformation T, there is a term that penalizes transformations under which the transformed sprite looks similar to the background; but what if the object is similar to the background to start with? (Maybe something like a chameleon, whose texture blends in with the background?) Can their method still work?
|
| Piotr
|
9
|
 |
|
09-30-2003 01:32 AM ET (US)
|
|
Jing --
One first figures out where the sprites are based on a sequence (and all the points involved), and only after can you select a sprite (by clicking any point belonging to the sprite in the given image).
As to the second part of your question -- that's exactly right if the foreground is similar to the background things get more difficult -- this is an inherent problem for all segmentation algorithms.
|
| Neil Alldrin
|
10
|
 |
|
09-30-2003 02:19 AM ET (US)
|
|
Well, I have to say the results are impressive. Unfortunately I don't really have the background in variational techniques or graphical models to follow the details very well.
One question I have is if this will ever be capable of running in real-time? Since it's doing unsupervised learning and such does it need to look ahead in the video sequence?
|
| Matt Clothier
|
11
|
 |
|
09-30-2003 04:47 AM ET (US)
|
|
Neil, depends on your definition of "real time". Many vision papers I have seen talk about image processing at "interactive framerates", which can mean 8-15 Hz. This particular paper claims 1 Hz update rate:
"Through the use of fast Fourier transforms during variational inference, our algorithm is fast, able to process one 320x240 frame per second." (from Summary)
For this particular paper, their technique *might* do maybe a few Hz more today (with all the latest technology). So if you define "real time" as 30 fps, this probably won't happen soon unless they can optimize their algorithm (which would most likely use some clever tricks). The question does come to mind though, are there applications that require this to be "real time"?
|
| Mei-fang
|
12
|
 |
|
09-30-2003 05:28 AM ET (US)
|
|
Question:Does the frame rate or the interval of the input video sequence influence the accuracy of the probability model that defined the flexible sprite? If we have more consecutive video frames, will we derive a better model?
|
| John Rapp
|
13
|
 |
|
09-30-2003 07:50 AM ET (US)
|
|
It's also worth noting that they're talking about matlab numbers, which generally runs like a pig. Converting it to a faster language may have potential, or possibly customized hardware could do it more quickly. The fact that they're within an order of magnitude of real time processing is very promising.
Personally I found the paper mostly interesting as a possible front end to do image segmentation of video. There are probably quite a few problems with it that would make it not work in real world situations. I was personally wondering how well it adapts to a non-static background, since adding layers is apparently expensive (either in time or accuracy, I'm not sure which). But it definitely seems like it's dead on in certain situations.
|
| Sunny Chow
|
14
|
 |
|
09-30-2003 12:03 PM ET (US)
|
|
I'm sorry if the answer to my question appears obvious (since I'm unable to follow the details myself), but I thought that the M step tries to get the unobserved variables that are used in the E step to somehow agree with each other over the set of all the images in the video sequence? Doesn't that mean this sort of operation happens "after the fact", which reduces the importance of real time processing to a convenience?
Sunny.
On a wild tangent, does this technique remind anyone of the movie "Rising Sun" w/ Sean Connery? :D
|