|
|
| Who | When |
Messages | |
|
|
|
| sameer agarwal
|
1
|
 |
|
05-03-2001 01:14 AM ET (US)
|
|
|
| Melanie Dumas
|
2
|
 |
|
05-06-2001 05:23 PM ET (US)
|
|
I find it facinating that an agent, knowing only the rules of the game, can build an effective neural network based on an evaluation of the game state alone. This defies intutition since the agent does not incorporate any kind of strategies or previous knowledge. The success of reinforcement learning on neural nets is known (InfoSpiders), but it is still surprising to see weights of a network evolve into a killer game playing strategy.
It seems odd that the selection of the next move is based on a one-ply min-max search only. It may be interesting to compare results of what would happen with a deeper search depth. I'm a little surprised that the authors did not justify this decision, or at least a write note for why the search depth is only one layer deep.
|
| Bianca Zadrozny
|
3
|
 |
|
05-06-2001 11:34 PM ET (US)
|
|
I like the idea of min-max reinforcement learning that was introduced in the paper, but the experimental results are not very convincing to me. First, they only show experimental comparisons against heuristic players. Are these heuristic players good? How do they compare to humans? What is the state-of-the-art in Othello playing? Second, in all the learning curves they should put error bars, or at least say how much the result varies aproximately in each run. For example, when they compare RL1 against RL2, they say RL2 is "slightly stronger". But without error bars, we don't know if this result is statistically significant.
Some other minor problems:
- In the comparisons with TD(0) and MLP, I'm not sure who those players play against when they are being trained.
- They say RL1 wins against HP1 about 80% of the time after 30000 training games. But, in fact, the average value in the graph is about 75%.
- The explanations using parenthesis to descrite the white and black moves are confusing.
|
| Dave Kauchak
|
4
|
 |
|
05-07-2001 03:29 AM ET (US)
|
|
Edited by author 05-07-2001 03:35 AM
First, I agree with Biancas analysis that some of the experiments should have been justified a bit better. Knowing how good the algorithms that their system was played against would be extremely helpful.
One thing that I found interesting that the paper began to bring up was what effect does the player that the system trains against have on the system. The paper mentions that the main influence that different opponents has is to present different states to the system. I would be curious what the training would be like with a totally random player. An analysis of the effects on training with different types of players may lead to incites into a better learning algorithm. Maybe reinforcement learning is not the best approach for this domain.
Also, along the same lines as Melanie's message, I think that the authors need to be careful to explicitly state the goals of the paper. If the goal of the paper is to create the best Othello player, then some of their choices may be questionable (such as simply searching 2 ply ahead). A finely tuned min-max algorithm that employs various pruning tactics may be able to due an extremely good job. When combined with heuristics, this may make for a powerful competitor (as has been the case with games such as Chess). However, if the goal of the paper was an exploration of reinforcement learning in this domain, then the paper may be more appropriate. Either way, the paper should make the goals and ideas investigated a bit clearer.
Dave
|
| Gyozo Gidofalvi (Victor)
|
5
|
 |
|
05-07-2001 12:39 PM ET (US)
|
|
I really liked the idea of applying reinforcement learning to automatic strategy acquisition for the game "Othello". The success of reinfocment learning in this domain is not surprising to me. Eventhough, the authors clearly stated that RL players initially only knew the game rules, i consider the min-max strategy a general heuristic for a whole class of games.
Although the notation used in the paper was more or less consistent, i found it sometimes confusing. The min-max strategy could have been stated simply as: "At game step t, chose the move that maximizes the evaluation function value at step t+1, considering the "worst" possible move of the opponent." Similarly, i have the same problem with the intuitive meaning of the reinforcment signal Vt, when t != tfin.
Like Bianca, i was missing the error bars from the graphs also. But the results for the 0-game for the corectness graph (figure 5) imply that the results are statistically signifficant at least for that method. Also, it would have been interesting to see the behavior of the strategies in earlier moves in the game. Such an analyzis may have given better insight into the effects of the intermediate reinforcment signal used.
Overall, i found the paper interesting, but agree with the comments made about the need for evaluation against better players (strategies evolved through GP for example).
|
| Joe Drish
|
6
|
 |
|
05-07-2001 02:36 PM ET (US)
|
|
I liked the paper. I think they could have provided a better explanation of the high-level intuition and motivation of the minmax search strategy. I would also be curious to see how they went about developing the heuristic players, and whether or not simply a brute-force search would work better. However, I do think that research in applying RL strategies for game playing in general is productive and a good thing.
|
| Hector Jasso
|
7
|
 |
|
05-09-2001 03:45 PM ET (US)
|
|
I agree that it is surprising that Reinforcement Learning actually works for a "strategy" game. I wonder if the fact that the layout of the table can change so dramatically from one move to the other makes Reinforcement Learning a good approach for Othello. That is, I wonder if it works for other games like chess where the layour of the table does not change so much.
On other things, I would like to comment on an idea that was presented during the presentation and I have always found intriguing: For games like chess and Othello, where it is computationally impossible to calculate all moves, we usually compare any strategy developed against a human player in order to make the results credible. But what happens when the game actually IS tractable? Should the evaluation of our strategy change? Put in another way, consider how the results presented change for a 6x6 Othello board, where there exists a strategy where blacks are assured to win. The algorithm presented by the authors (or any algorithm anyway) would NEVER win! But in 8x8 Othello, there exists a strategy such that blacks will always win, it's just that no one has been able to find it yet because it is untractable. So I find it disturbing that this ethereal being called tractability should haunt any heuristic developed, seeing that AI is full of heuristics. Or maybe this paradox defines the field?
Hector
|
|
|