| Dana Dahlstrom
|
6
|
 |
|
04-04-2002 03:24 PM ET (US)
|
|
[follow-up to previous post]
Actually, looking forward, my proposed patch would require modification to Equations 5 and 7 (the Q-learning and sarsa update rules) as well. It's probably better to revise the notation the other way:
1. Change the descriptions of the state, action, and reward sequences to begin with zero elements.
2. Change the summations in Equations 1 and 2 to begin with $t=0$.
3. Change $\gamma^{t-1}$ to $\gamma^t$ in Equation 1.
This way action $a_t$ is performed in state $s_t$ and reward $r_t$ results; I believe this is conventional.
|