| Bianca Zadrozny
|
3
|
 |
|
05-06-2001 11:34 PM ET (US)
|
|
I like the idea of min-max reinforcement learning that was introduced in the paper, but the experimental results are not very convincing to me. First, they only show experimental comparisons against heuristic players. Are these heuristic players good? How do they compare to humans? What is the state-of-the-art in Othello playing? Second, in all the learning curves they should put error bars, or at least say how much the result varies aproximately in each run. For example, when they compare RL1 against RL2, they say RL2 is "slightly stronger". But without error bars, we don't know if this result is statistically significant.
Some other minor problems:
- In the comparisons with TD(0) and MLP, I'm not sure who those players play against when they are being trained.
- They say RL1 wins against HP1 about 80% of the time after 30000 training games. But, in fact, the average value in the graph is about 75%.
- The explanations using parenthesis to descrite the white and black moves are confusing.
|