| Kristin Branson
|
6
|
 |
|
04-25-2001 11:57 AM ET (US)
|
|
My biggest problem with this paper is that I don't understand the motivation the author had for the many decisions he made in designing his algorithm. First, I do not completely understand why a set of distributed, local searches is helpful. As each search is more or less independent from the others, this means that approximately the same number of documents will be visited in each search. Doesn't this type of search go against ARACHNID's purpose of exploiting "semantic topology"? That is to say, if many relevant documents are in one location, then shouldn't the search concentrate on that area in a greater amount than one local search can?
Of course, I am also not convinced on the extreme advantages of exploiting semantic topology that the author suggests. I mean, if I enter a query into a search engine I would rather get a more distributed set of sites returned. If a site is relevant and I choose to examine it, I will see the links it contains. It will not be helpful to me if the only links returned by the search engine are the links in the page. That's kind of an extreme example and I don't really understand the distribution of sites finally returned by the ARACHNID algorithm. Perhaps it is the case that the combination of a distributed search and exploiting semantic topology finds some happy midpoint between these two problems ...
I also felt that there was not a lot of support given for the more detailed choices of the algorithms described. I do not see, for example, why it would be important for gamma to "evolve" in the naive representation. I agree that the paper would have been more clear if it had, instead of going through the whole genetic algorithm thing, just mentioned that some weighted randomness should be used in choosing which link(s) to follow. I was especially concerned by the inclusion of section 5.2, as it seemed like the author was saying it was necessary to have some experimental results to show that all the details of the algorithm were actually doing what they were supposed to. As I do not have any understanding of genetic algorithms or their advantages, I was hoping that there was some mathematical basis for the details that was not presented in the paper.
One final problem I had was with figure 4. I do not understand why, when BFS has visited all of the documents, it has only returned about 20% of the relevant ones. The author does not clarify what selection criterion the BFS search is using to determine relevant documents, nor does he mention the number of documents returned by each of the two algorithms. It seems that if BFS and ARACHNID were to be copared, when 100% of documents had been visited, their recalls should be similar.
|