| Eric Wiewiora
|
2
|
 |
|
06-04-2002 04:13 PM ET (US)
|
|
McCallum seems to specialize in large, baroque algorithms. In addition, this paper also presents a large, baroque testing environment. I found this algorithm to be particulary high overhead. Not only does the agent have to maintain a record of every experience in its history, but it also has to routinely retrain itself to determine if the agent can form a better policy by changing it's state representation. Perhaps the complexity of the algorithm is necesarry to suit the particular testing environmnet, but I have a strong feeling that there are much simpler algorithms that can learn the task as well.
I feel this is a pertinent topic in RL at this point. There has been a lot of work in learning MDPs efficiently, given a certain state-space framework, but relatively little work has been done on changing the state space in order to aid learning. There is some work on function approximators and clustering techniques in order to reduce the size of the state space, but there are probably better ways of generalizing the states that are more sensitive to the learning task.
|