QuickTopic (SM) free message boards QuickTopic (SM) free message boards
Skip to Messages
  Sign In to access your topic list  |New Topic |My Topics|Profile
Upgrade to Pro   Customize, show pictures, add an intro, and more:   QuickTopic Pro...and check out QuickThreadSM
Topic: Arachnid: Adaptive retrieval agents choosing heuristic neighborhoods
Views: 460, Unique: 285 
Subscribers: 0
What's
this?
Printer-Friendly Page
Subscribe to get & post, or stop messages by email Subscribe
All messages            1-7 of 7        
About these ads
Who | When
Messagessort recent-top   
Post a new message
 
Dave Kauchak  1
04-24-2001 03:50 PM ET (US)
First, what I liked about the paper. I like the idea of the local search strategy vs. a global search strategy. I would be interested to see further development of how well use of local information suits the web environment as it is today. I also like the idea of a simulated web environment. A simulated web environment allows an experimenter to keep the environment constant among runs of different algorithms (such as download times, network load and latency and page availability). However, I don’t think that I got a very strong feeling for what was entailed in the simulation environment and how well the environment actually fit the web.

I had a few problems with the general agent representation and also the experimentation. The idea of examining the web based on local information instead of a large global representation is definitely interesting. However, I found the concepts of agents, energy and mutation slightly confusing. I think that the same algorithm could have been presented much more clearly without the concepts of actual agents and more in the framework of a basic randomized algorithm (possibly multithreaded).

Finally, I think that much better experimental results should have been presented. Although BFS and randomized local search are decent baselines to measure against, it is difficult to conceive of any semi-decent algorithm doing worse than these two algorithms. I think a better comparison would be some intelligent crawler that uses information found on pages (similar to those designed and researched for 250A).
Greg Hamerly  2
04-24-2001 05:13 PM ET (US)
Clarity, clarity is what is needed in this paper. I agree with Dave -- terms like "energy", "agent", and "semantic topology" only serve to confuse the reader. Overall, this paper has too many variables, metrics, and parameters.

One equation I couldn't figure out was "match(k, Q)", on the third page. Q is the list of user-supplied keywords, k is a keyword in the document, but where does k come from?

I particularly liked section 3.2, because the agent has the ability to determine relevant keywords from documents as it crawls. I wonder how this is controlled, however, and how well it is able to stay on the correct, original topic.
Yang Yu  3
04-24-2001 05:23 PM ET (US)
Hi:
  I am wondering whether the "small piece of Web" that was tested experimentally is a good sample or not. As the author said, web is distributed, heterogenous, dynamic. While the "EB" doesn't have these features.
Melanie Dumas  4
04-25-2001 02:22 AM ET (US)
I would like to present two links to papers that I will bring up in my presentation. They clarify the finer points about learning and the latest performance of the ARACHNID algorithm:
   http://citeseer.nj.nec.com/menczer99adaptive.html
Melanie Dumas  5
04-25-2001 02:30 AM ET (US)
I would like to present two links to papers that I will bring up in my presentation. They clarify the finer points about learning and the latest performance of the ARACHNID algorithm:
   http://citeseer.nj.nec.com/menczer99adaptive.html
   http://dollar.biz.uiowa.edu/~fil/Papers/sigir-01.pdf

Greg, match(k,Q) makes a binary determination deciding if the words of the query occur in the document. Think of it as "for each word of the document, is it a query term?"

Thanks for the comments, everyone. See you tomorrow!
Kristin Branson  6
04-25-2001 11:57 AM ET (US)
  My biggest problem with this paper is that I don't understand the motivation the author had for the many decisions he made in designing his algorithm. First, I do not completely understand why a set of distributed, local searches is helpful. As each search is more or less independent from the others, this means that approximately the same number of documents will be visited in each search. Doesn't this type of search go against ARACHNID's purpose of exploiting "semantic topology"? That is to say, if many relevant documents are in one location, then shouldn't the search concentrate on that area in a greater amount than one local search can?

  Of course, I am also not convinced on the extreme advantages of exploiting semantic topology that the author suggests. I mean, if I enter a query into a search engine I would rather get a more distributed set of sites returned. If a site is relevant and I choose to examine it, I will see the links it contains. It will not be helpful to me if the only links returned by the search engine are the links in the page. That's kind of an extreme example and I don't really understand the distribution of sites finally returned by the ARACHNID algorithm. Perhaps it is the case that the combination of a distributed search and exploiting semantic topology finds some happy midpoint between these two problems ...

  I also felt that there was not a lot of support given for the more detailed choices of the algorithms described. I do not see, for example, why it would be important for gamma to "evolve" in the naive representation. I agree that the paper would have been more clear if it had, instead of going through the whole genetic algorithm thing, just mentioned that some weighted randomness should be used in choosing which link(s) to follow. I was especially concerned by the inclusion of section 5.2, as it seemed like the author was saying it was necessary to have some experimental results to show that all the details of the algorithm were actually doing what they were supposed to. As I do not have any understanding of genetic algorithms or their advantages, I was hoping that there was some mathematical basis for the details that was not presented in the paper.

  One final problem I had was with figure 4. I do not understand why, when BFS has visited all of the documents, it has only returned about 20% of the relevant ones. The author does not clarify what selection criterion the BFS search is using to determine relevant documents, nor does he mention the number of documents returned by each of the two algorithms. It seems that if BFS and ARACHNID were to be copared, when 100% of documents had been visited, their recalls should be similar.
Hector Jasso  7
04-25-2001 01:04 PM ET (US)
I also agree that there is not enough justification given for searching
the web as proposed by the author. If I was looking for a specific
piece of information that knew existed in the web (somebody's home
page, a specific tutorial, etc), then this approach might be helpful, and
an analogy with "energy", "fitness" etc could be appropriate. But in
certain cases, a google-type approach might be more useful, where
we understand what we are looking for as we gather hints from the
results.

I would also like to know what exactly would be the difference
between so-called agents and best-first approach with a good
heuristic similar to the fitness value.
RSS link What's this?
All messages            1-7 of 7        
QuickTopicSM message boards
Over 200,000 topics served
Learn more Frequently asked questions  Acknowledgements
What they're saying about QuickTopic
 Questions, comments, or suggestions? Contact Us
Read our use policy before beginning. We value your privacy; please read our privacy statement.
Copyright ©1999-2008 Internicity Inc. All rights reserved.