Dave Kauchak
|
1
|
 |
|
06-03-2002 09:29 PM ET (US)
|
|
Edited by author 06-04-2002 02:31 AM
I found this to be a fairly interesting paper. Some of the details surrounding the U-tree algorithm could have been worded better and more concisely to help understanding, but overall, I think the paper did a good job of presenting the ideas.
I thought the idea of including information about history to be good, however, I wished a few more applications had been discussed. I would have liked for the author to compare this method against other function approximators for the state space. Experimental evaluation beyond the hand designed heuristics would have been interesting. Even if just at a high level, examining the distinctions and advantages and disadvantages of this versus other commonly used systems would have provided more motivation for using this method.
I found the behaviors learned by the system to be particularly amusing :) I'm glad that the author took the time of examining the behavior of the system, not just the empirical results (this can be a very time consuming task in some circumstances).
What advantage does temporal information add to an agent? I would be curious to see what would happen if this history was left out. I'm sure performance would degrade, but by how much?
I think some of the writing tended to be too much notation oriented. I think the notation section could have been simplified to mostly plain English since most of the notation described is not reused later in the paper. This is also the case for the calculations of R(s,a) and p(s'|s,a) in section 3. These could simply have been stated as averages.
I wonder how this model would do with a more realistic driving environment. I think the technique is fairly sound and would do well if the simulation parameters were made more realistic (such as a more even distribution of truck speeds, allowing the car to speed up and slow down, etc.).
I found the test environment to be somewhat tailored to the method. I would be curious to see how well this method works in environments that are markovian (or at least can be approximated as such).
|
| Eric Wiewiora
|
2
|
 |
|
06-04-2002 04:13 PM ET (US)
|
|
McCallum seems to specialize in large, baroque algorithms. In addition, this paper also presents a large, baroque testing environment. I found this algorithm to be particulary high overhead. Not only does the agent have to maintain a record of every experience in its history, but it also has to routinely retrain itself to determine if the agent can form a better policy by changing it's state representation. Perhaps the complexity of the algorithm is necesarry to suit the particular testing environmnet, but I have a strong feeling that there are much simpler algorithms that can learn the task as well.
I feel this is a pertinent topic in RL at this point. There has been a lot of work in learning MDPs efficiently, given a certain state-space framework, but relatively little work has been done on changing the state space in order to aid learning. There is some work on function approximators and clustering techniques in order to reduce the size of the state space, but there are probably better ways of generalizing the states that are more sensitive to the learning task.
|