top bar
QuickTopic free message boards logo
Skip to Messages


Discuss Zhang readings

08:49 AM ET (US)
FLV Converter
Deleted by topic administrator 01-09-2010 10:55 AM
02:40 AM ET (US)
Links of London store offers kinds of elegant jewelries for wholesale and retail. The more you buy, the more the discount you get and you have chance of free shipping (UK only) and magic gifts from links of London.
Deleted by topic administrator 01-07-2010 02:18 AM
Tomoki TsuchidaPerson was signed in when posted
04:42 AM ET (US)
Before the presentation on Friday, I was wondering what the difference between Infomax and the surprise framework was. After listening to the discussion, I'm still not sure if there is any fundamental difference between those. Like Nick was pointing out, it seems like maximizing surprise is dual to (or perhaps subset of) maximizing information over time. (Dr. Itti seemed to be saying that the result in the paper was not about maximizing surprise, but to me it seems that is what the point of the paper is. I wonder if I'm missing something important?) Since surprise is "how much new data D modifies my assumption about the world M," maximizing surprise is like gradient ascent in the model probability space. However, as Dr. Itti himself was saying, the choice of model space itself is a priori (and also static within itself.) So this framework does not learn - it doesn't create new assumptions about the world (although the probabilities of existing assumptions change,) which is why I suspect it's a subset of more general information gathering framework.

The suspicion above came about while I was thinking about the "snow paradox" discussed in the paper. The claims that caught my attention were (a) Shannon information of a data (e.g. the video of the snow) is a widely used (yet paradoxical) measure of information, and (b) saliency framework " best captures an approximation to surprise" because of this. I think the difference between the data we observe and the information contained in the data itself is not considered well in the argument. We can and do only care about data that we observe: eyes have physical limitations, feature extractions have limitations, and higher level processing have even more limitations. At each level, the information that we receive is processed, so the Shannon information at each level changes too (because the sample space is different.) In other words, snow might be boring not only because it's not "surprising." It might be boring for the same reason that a black screen is boring: maybe we can't see the pixels (if we sat too far away) or we can't parse the pixels fast enough (if we are not Matrix operators.) Information should be an attribute of a particular representation, not the world itself - and the video is not a fundamentally better representation than what's in various feature maps in the brain. Since most of the saliency frameworks use some feature extraction anyway, I don't think Shannon information of the pixels of a video is very relevant to the information discussed in saliency frameworks. Wouldn't most models consider all-black and all-snow as equally salient (or non-salient?)

From the information gathering perspective, I think this makes the surprise model miss an alternative strategy: instead of increasing surprise, the model space can be modified to increase information. For example, in an advanced math class, my model space might be M = {lecture is boring; there's nothing happening outside}. (As above, the class might be as boring even if a lot is happening - if I don't understand the class, it makes no difference to my brain.) If something happens outside, I might look outside to maximize the surprise. Alternatively, though, I can alter the model space (M = {professor writes an equation I understand; professor writes an equations I don't understand; there's nothing happening outside}) and take a different action (find the equations I understand.) For this to occur, the surprise framework requires at least one higher level of surprise - one that can modify the lower surprise model space. Even though this might indeed be a good model of what happens neurologically (i.e. change of lower model space requires modulations from higher-up,) to me the information maximization seems to better describe the ultimate goal in a more straightforward way (maximizing the information I get during the class.) I can't readily think of anything that surprise model explains that information maximization won't, though (perhaps except as a mechanism...)
Tomoki TsuchidaPerson was signed in when posted
05:33 AM ET (US)
It appears that a large part of the question I posted was answered by the presentation in the class... at least now I know that there's a research going on with respect to the peripheral motion detection. (I guess that's what happens when you post a question before the presentation.) The meta-level question still stands, but I'm not sure if it still makes sense...
Tomoki TsuchidaPerson was signed in when posted
04:11 AM ET (US)
I have another "peripheral" question / observation about the eye movement papers. I think these are the first papers we read in this term where the temporal aspect of the eye movement and the visibility maps are explicitly considered. If I understand them correctly, the "visibility map" is represented by spike- or Gaussian-shaped FPOC in both models. I believe this comes from the retinal cone cell density, which makes sense for modeling regular visual inputs.

However, the distribution of the rod cells are quite different - they are concentrated more in the peripheral than in the center. Since the rod cells are better for low-light observations and motion detections, I wonder if the saccadic movement pattern are quite different for low-light conditions? Moreover, I wonder how this affects saliency that result from observed motions. Unlike other primitive features (like color and intensity,) the motion feature is dependent on the time component, so I would think only models with temporal consideration can properly capture this feature. Would the "prescriptive" approach predict this design - i.e. is there a computational reason why want to detect motion most sensitively off center? Could the "off center" policy that resulted from the I-POMDP model somehow lead to this?

This may be a meta-level question from the perspective of the two papers: both models start from a priori FPOC and derive best policies based on that, and the FPOC distribution itself is not predicted. However, if the cone cells evolved before the rod cells, I imagine it may be possible to explain the motion detection / rod cell distribution *given* the cone cell FPOC. If there's no such reason, then I suppose the design occurred due to some other design trades-off. But it seems like most biological design decisions in the sensory machinery should have some information-maximizing purpose...
Tomoki TsuchidaPerson was signed in when posted
04:11 PM ET (US)
In the last session, there was a discussion about using facial images as the natural statistics in order to build saliency map for faces. However, since there is such a thing as face blindness, perhaps this simple model does not hold, at least for adult brains? That is, a person with prospagnosia will not be able to recognize faces, however many faces she is exposed to. Eventually they can learn to recognize faces to some extent, but this is done using a much higher-level and more indirect inferences.

Recognition and saliency might be different functions, though. Recognition would require categorization and recall after initial feature detection. I’m not sure if prospagnosia is caused by low-level saliency problem (can’t find faces in general) or memory problem (can see it’s a face, but can’t tell faces apart.) Perhaps both kinds exist?

I was also wondering how innate the facial saliency function is. There are apparently many different kinds of prospagnosia – some genetic, some developmental. Some have more generic object detection problems, whereas others are only limited to face detection issues. Assuming facial saliency issue goes with all this, that seems to imply that the facial saliency as a whole is a fairly high-level function (like language and music.) That is, the function may require both lower-level genetically controlled functions (like extracting certain features necessary for object recognition) and a higher-level learning (like focusing on faces specifically.) If this is the case, then I suspect genetic prospagnosia would have much wider effects than the developmental one.

I also imagine that, like language and music, there’s some developmental window when the learning occurs, after which you can no longer learn to do so. Facial recognition also occurs very quickly and unconsciously. I wonder if those facts are somehow related – perhaps there’s some neural feedback loop to lower-level neural networks which gets cut off after that development window?
Gary CottrellPerson was signed in when posted
01:08 PM ET (US)
Here is a place to discuss *any* of the readings..
Edited 10-20-2009 12:13 AM

Print | RSS Views: 772 (Unique: 427 ) / Subscribers: 0 | What's this?