| Eric Wiewiora
|
3
|
 |
|
04-25-2002 01:56 PM ET (US)
|
|
Response to Dave:
The notation in this paper is a little odd because it is more similar to the notation used by game theorists than ML researchers.
The two examples in the paper were chosen because they show off the the two main benefits of the algorithm.
The first example shows that the policy of the agent adapts immediately to changes in reward sources.
The second example shows how the agent adopts a policy of having the sources compete over one state, not all of them. This behavior leads to a more consistent policy, and would be hard to capture with a simpler algorithm.
|