|
|
| Who | When |
Messages | |
|
|
|
| Sameer Agarwal
|
1
|
 |
|
04-03-2001 09:09 PM ET (US)
|
|
Hi, Here are my thoughts and confusions
The idea presented in the paper is a neat one indeed. But I am struck by the lack of any real results in the paper. The authors concede that there are no known metrics or standard data sets for this problem domain, but a comparison with he existing state of the art on the same dataset could have been provided. They dismiss the existing methods based on wavelets a bit too quickly I think.
Also the claim that AdaBoost does not overfit is not exactly true. It is known that if you run Adaboost long enough, it will overfit.
Finally the authors talk about refining the query to increase the accuray of the images returned. But they are rather vague about how exactly they go about doing it. Did they just iteratively improve the existing classifier or construct a new one.
sameer
|
| Jonathan Ultis
|
2
|
 |
|
04-03-2001 11:30 PM ET (US)
|
|
Edited by author 04-04-2001 12:25 AM
I agree, this seems like a neat and reasonable idea. The downfall of the paper is definitely the lack of results.
The main points of the paper seems to be that highly selective features can be built from combinations of basic features and that using a subset of highly selective features leads to better query performance than using basic features.
The authors do provide some evidence that highly selective features can be built from basic features. At least, they show that the new features are highly selective. It would have been nice to know how selective basic features are since we can't know that the new features are more selective without that information.
In order to demonstrate that query performance improved, the authors should have published the performance of basic features compared to second order features compared to third order highly selective features. This would have shown the benefit of using highly selective features.
However, my gut feel is that the authors are almost certainly right about both points, so this feels like a good idea, even if we don't have the evidence to back it up.
|
| Dave Kauchak
|
3
|
 |
|
04-04-2001 02:22 AM ET (US)
|
|
I agree with both the comments below in that the experimental data provided on the results was minimal.
Along these same lines, I am curious how well their algorithm fits the actual usage of an image retrieval system. For example, the paper presents applications of the algorithm to extract classes of images from a set of diverse images. They seem to brush under the rug the idea of searching a more homogeneous database, or with a more homogeneous set of input (i.e. a specific type of car, a specific flower or the Eifel tower). I would be curious to see how well the algorithm responds to changes in the database and queries.
This seems to get at the larger point that the paper should have layed out more definative goals. Even though the paper states that there is currently not a definative set of tests, the paper could attempt to formulate a set of experiments that would confirm that the goals that the algorithm was designed for were accomplished.
On a side note, I found the layout of the last few figures a bit interesting. Was I the only one that was slightly confused to find the last two figures after the references?
|
| Aldebaro Klautau
|
4
|
 |
|
04-04-2001 06:45 AM ET (US)
|
|
The proposed features are interesting but I could not see how "over one million images can be scanned per second." The primitive filters are regular enough to allow tricks and efficiently compute the convolutions, as mentioned. But still, the number of operations seems to be high, unless one assumes small images. Instead of the two paragraphs after equation (4), mentioning kurtosis, etc., I would prefer some discussion about the computational cost.
I also think the paragraph about wavelets is too vague. The features proposed by the authors are not as sensitive to shifts as conventional wavelet decompositions because of the summation in equation (1). In image retrieval, as opposed to image coding, there is no concern with image reconstruction based on the feature representation, so the filters and operations are not required to have perfect reconstruction (PR) property, etc. The authors used this flexibility to adopt some sort of over-complete representation and then sum up all "coefficients" or filter outputs, which avoids spatial localization. If one ignores some of the requirements imposed to wavelets (or PR filter banks), similar results could be eventually obtained with a "wavelet-like" decomposition followed by the summation in (1). In my point of view, the particular choice of primitive filters should be determined mainly by computational cost. Again, this aspect was not exploited in the paper.
Besides the features, there is the boosting strategy. I have a question about the adaptation made by the authors to the original AdaBoost. Is this the first paper to associate each feature to one weak learner? I liked the paper, but in case of a positive answer, I would consider it a very good one because the idea of using boosting to select features is simple but very powerful.
|
| Kristin Branson
|
5
|
 |
|
04-04-2001 12:24 PM ET (US)
|
|
I think the idea of a program that can recognize high level features in images is an excellent idea. The authors' example of the third level feature map of a tiger's stripes versus the third level feature map of a waterfall was also pretty convincing that the authors were headed in the right direction. However, I cannot tell whether this idea generalizes well to all classes of images.
I was a little perplexed by the small discussion of how this technique differs from (and is better than) wavelet filters in terms of phase insensitivity, as I have seen that combining a sine and cosine gabor wavelet filter has the same result.
I agree with the other posts that this paper lacked supporting results. While I'm not familiar with how well other image retrieval systems have performed, and while this system seemed to perform well on at least one type of image (mountains), I feel this system performed fairly poorly on a few classes of images (sunsets and fields), in which it seemed to get things right about 30% of the time, which is not all that much better than guessing.
|
| Bianca Zadrozny
|
6
|
 |
|
04-04-2001 01:04 PM ET (US)
|
|
I was also a bit disappointed by the the lack of clear goals for the image retrieval system, and results that make it possible to compare it with other approaches.
The only quantitative performance results that the authors show are precision and recall curves. Although this gives an idea of how well the algorithm ranks the images as belonging to a certain class, the authors don't give any information into how many images users are likely to scan in order to find what they are looking for. In other words, they don't say what is the operation point on these curves for real-world applications. Also, they say the curves show the average precision and recall for each class but they don't put error bars indicating what is the standard deviation on the precision and recall within each class. The error bars would show how their algorithm varies depending on the initial positive training set selected by the user.
Like Sameer, I am also confused by how the learning proceeds after the user labels additional examples.
|
| Hector Jasso
|
7
|
 |
|
04-04-2001 01:11 PM ET (US)
|
|
(A "church feature"? Now, that is an interesting idea...)
To me, the results are convincing specially if you consider that the intended use of this is for image retrieval, not for image recognition. So speed is very important, and filtering false positives might not be much of a problem.
But some of the theoretical foundations of the paper appear to be loose:
- Details are likely to be lost for the next level of filtering. This means that looking for global structures that are composed of distributed details is not possible: tiger stripes can be detected, but only because stripes are contiguous.
- Although the filtering process is described as non-linear, they pass three filters in order to get the final feature map.
So I am not sure if the the theoretical foundation they offer is good enough for all kind of images.
|
|
|