QuickTopic (SM) free message boards QuickTopic (SM) free message boards
Skip to Messages
  Sign In to access your topic list  |New Topic |My Topics|Profile
Upgrade to Pro   Customize, show pictures, add an intro, and more:   QuickTopic Pro...and check out QuickThreadSM
Topic: Learning to Use Selective Attention and Short-Term Memory in Sequential Tasks
Views: 262, Unique: 203 
Subscribers: 0
What's
this?
Printer-Friendly Page
Subscribe to get & post, or stop messages by email Subscribe
About these ads
Who | When
Messagessort recent-bottom   
Post a new message
 
Gyozo Gidofalvi  3
06-04-2002 06:15 PM ET (US)
I found the paper very interesting. I find that he algorithm presented for learning to make sequential decisions using selective attention and short-term memory has many potential applications.

I also found that although the notation was clearly defined at places it was unnecessary.

I agree that the simulation environment was highly simplified but even under these circumstances it was difficult to compare and analyze learned behavior.
Eric Wiewiora  2
06-04-2002 04:13 PM ET (US)
McCallum seems to specialize in large, baroque algorithms. In addition, this paper also presents a large, baroque testing environment. I found this algorithm to be particulary high overhead. Not only does the agent have to maintain a record of every experience in its history, but it also has to routinely retrain itself to determine if the agent can form a better policy by changing it's state representation. Perhaps the complexity of the algorithm is necesarry to suit the particular testing environmnet, but I have a strong feeling that there are much simpler algorithms that can learn the task as well.

I feel this is a pertinent topic in RL at this point. There has been a lot of work in learning MDPs efficiently, given a certain state-space framework, but relatively little work has been done on changing the state space in order to aid learning. There is some work on function approximators and clustering techniques in order to reduce the size of the state space, but there are probably better ways of generalizing the states that are more sensitive to the learning task.
Dave KauchakPerson was signed in when posted  1
06-03-2002 09:29 PM ET (US)
Edited by author 06-04-2002 02:31 AM
I found this to be a fairly interesting paper. Some of the details surrounding the U-tree algorithm could have been worded better and more concisely to help understanding, but overall, I think the paper did a good job of presenting the ideas.

I thought the idea of including information about history to be good, however, I wished a few more applications had been discussed. I would have liked for the author to compare this method against other function approximators for the state space. Experimental evaluation beyond the hand designed heuristics would have been interesting. Even if just at a high level, examining the distinctions and advantages and disadvantages of this versus other commonly used systems would have provided more motivation for using this method.

I found the behaviors learned by the system to be particularly amusing :) I'm glad that the author took the time of examining the behavior of the system, not just the empirical results (this can be a very time consuming task in some circumstances).

What advantage does temporal information add to an agent? I would be curious to see what would happen if this history was left out. I'm sure performance would degrade, but by how much?

I think some of the writing tended to be too much notation oriented. I think the notation section could have been simplified to mostly plain English since most of the notation described is not reused later in the paper. This is also the case for the calculations of R(s,a) and p(s'|s,a) in section 3. These could simply have been stated as averages.

I wonder how this model would do with a more realistic driving environment. I think the technique is fairly sound and would do well if the simulation parameters were made more realistic (such as a more even distribution of truck speeds, allowing the car to speed up and slow down, etc.).

I found the test environment to be somewhat tailored to the method. I would be curious to see how well this method works in environments that are markovian (or at least can be approximated as such).
RSS link What's this?
QuickTopicSM message boards
Over 200,000 topics served
Learn more Frequently asked questions  Acknowledgements
What they're saying about QuickTopic
 Questions, comments, or suggestions? Contact Us
Read our use policy before beginning. We value your privacy; please read our privacy statement.
Copyright ©1999-2006 Internicity Inc. All rights reserved.