| Who | When |
Messages | |
|
|
|
|
|
| Hector Jasso
|
7
|
 |
|
04-25-2001 01:04 PM ET (US)
|
|
I also agree that there is not enough justification given for searching the web as proposed by the author. If I was looking for a specific piece of information that knew existed in the web (somebody's home page, a specific tutorial, etc), then this approach might be helpful, and an analogy with "energy", "fitness" etc could be appropriate. But in certain cases, a google-type approach might be more useful, where we understand what we are looking for as we gather hints from the results.
I would also like to know what exactly would be the difference between so-called agents and best-first approach with a good heuristic similar to the fitness value.
|
| Kristin Branson
|
6
|
 |
|
04-25-2001 11:57 AM ET (US)
|
|
My biggest problem with this paper is that I don't understand the motivation the author had for the many decisions he made in designing his algorithm. First, I do not completely understand why a set of distributed, local searches is helpful. As each search is more or less independent from the others, this means that approximately the same number of documents will be visited in each search. Doesn't this type of search go against ARACHNID's purpose of exploiting "semantic topology"? That is to say, if many relevant documents are in one location, then shouldn't the search concentrate on that area in a greater amount than one local search can?
Of course, I am also not convinced on the extreme advantages of exploiting semantic topology that the author suggests. I mean, if I enter a query into a search engine I would rather get a more distributed set of sites returned. If a site is relevant and I choose to examine it, I will see the links it contains. It will not be helpful to me if the only links returned by the search engine are the links in the page. That's kind of an extreme example and I don't really understand the distribution of sites finally returned by the ARACHNID algorithm. Perhaps it is the case that the combination of a distributed search and exploiting semantic topology finds some happy midpoint between these two problems ...
I also felt that there was not a lot of support given for the more detailed choices of the algorithms described. I do not see, for example, why it would be important for gamma to "evolve" in the naive representation. I agree that the paper would have been more clear if it had, instead of going through the whole genetic algorithm thing, just mentioned that some weighted randomness should be used in choosing which link(s) to follow. I was especially concerned by the inclusion of section 5.2, as it seemed like the author was saying it was necessary to have some experimental results to show that all the details of the algorithm were actually doing what they were supposed to. As I do not have any understanding of genetic algorithms or their advantages, I was hoping that there was some mathematical basis for the details that was not presented in the paper.
One final problem I had was with figure 4. I do not understand why, when BFS has visited all of the documents, it has only returned about 20% of the relevant ones. The author does not clarify what selection criterion the BFS search is using to determine relevant documents, nor does he mention the number of documents returned by each of the two algorithms. It seems that if BFS and ARACHNID were to be copared, when 100% of documents had been visited, their recalls should be similar.
|
| Melanie Dumas
|
5
|
 |
|
04-25-2001 02:30 AM ET (US)
|
|
I would like to present two links to papers that I will bring up in my presentation. They clarify the finer points about learning and the latest performance of the ARACHNID algorithm: http://citeseer.nj.nec.com/menczer99adaptive.html http://dollar.biz.uiowa.edu/~fil/Papers/sigir-01.pdfGreg, match(k,Q) makes a binary determination deciding if the words of the query occur in the document. Think of it as "for each word of the document, is it a query term?" Thanks for the comments, everyone. See you tomorrow!
|
| Melanie Dumas
|
4
|
 |
|
04-25-2001 02:22 AM ET (US)
|
|
|
| Yang Yu
|
3
|
 |
|
04-24-2001 05:23 PM ET (US)
|
|
Hi: I am wondering whether the "small piece of Web" that was tested experimentally is a good sample or not. As the author said, web is distributed, heterogenous, dynamic. While the "EB" doesn't have these features.
|
| Greg Hamerly
|
2
|
 |
|
04-24-2001 05:13 PM ET (US)
|
|
Clarity, clarity is what is needed in this paper. I agree with Dave -- terms like "energy", "agent", and "semantic topology" only serve to confuse the reader. Overall, this paper has too many variables, metrics, and parameters.
One equation I couldn't figure out was "match(k, Q)", on the third page. Q is the list of user-supplied keywords, k is a keyword in the document, but where does k come from?
I particularly liked section 3.2, because the agent has the ability to determine relevant keywords from documents as it crawls. I wonder how this is controlled, however, and how well it is able to stay on the correct, original topic.
|
| Dave Kauchak
|
1
|
 |
|
04-24-2001 03:50 PM ET (US)
|
|
First, what I liked about the paper. I like the idea of the local search strategy vs. a global search strategy. I would be interested to see further development of how well use of local information suits the web environment as it is today. I also like the idea of a simulated web environment. A simulated web environment allows an experimenter to keep the environment constant among runs of different algorithms (such as download times, network load and latency and page availability). However, I dont think that I got a very strong feeling for what was entailed in the simulation environment and how well the environment actually fit the web.
I had a few problems with the general agent representation and also the experimentation. The idea of examining the web based on local information instead of a large global representation is definitely interesting. However, I found the concepts of agents, energy and mutation slightly confusing. I think that the same algorithm could have been presented much more clearly without the concepts of actual agents and more in the framework of a basic randomized algorithm (possibly multithreaded).
Finally, I think that much better experimental results should have been presented. Although BFS and randomized local search are decent baselines to measure against, it is difficult to conceive of any semi-decent algorithm doing worse than these two algorithms. I think a better comparison would be some intelligent crawler that uses information found on pages (similar to those designed and researched for 250A).
|
|