Project: Othello
By Brian Whitman and Grace Junxin Zhang
During testing and example play, we found that the provided Othello project GP packet using the default settings (GPOthello.ini) to train GP Player is not very effective to generate good Othello GP Players: it will give a best individual such as (- white black) and then stop. After some experiment and analysis of the result, we found there are several things could be improved besides changing the settings to increase the standard for a good Player:
1. Fitness measurement during the learning process
Let scoreEdgar be the score that Edgar get for a game against the GP player
Let scoreRandom_N be the sum of the scores that the Random Player get for N
games against the GP player
Let scoreLength be the complexity of the hypothesis measured in terms of length
The original Fitness of a GP player is measured this way:
Playing five games against the random player is not enough to train a good GP player because most of the time,
the random player plays poorly. But if we play more than 5
games we will increase the training time significantly.
Therefore we change our fitness measurement:
Our heuristic to use this formula is as following:
If we assume Edgar is a "good" Othello player, we can use it to distinguish between a good and poor GP player.
Since both Edgar and the GP player play Othello deterministically, they can only
play one game (all others would return the same result). But playing only one game against Edgar will result a player whose
performance is rather unstable (a hypothesis that is not general enough, shaped
only by one game). So, we let the GP player play four more games against the random player.
According to the experiment, the score that a random player gets against a good
GP player can be as low as 1, while Edgar usually gets at least 20.
So we should
increase the weight of scoreRandom_N to make it more effective. In the original
measurement, it favors too much of a shorter hypothesis. Intuitively, since Othello is a
complex game, a good hypothesis for evaluating a board shouldn't be too simple. So
we decrease the weight of the complexity penalty which is expressed in
scoreLength.
2. Variations on the Hypothesis
We add following primitives to the GP tree:
Reasons for adding this primitive:
Reasons for adding this primitive:
Since Edgar is fixed to play WHITE, we modified the program so that the GP player
plays BLACK and the random player plays WHITE. This way the GP player can play against
both Edgar and the Random Player with a minimum of future modifications.
We conduct the following experiments to verify our solutions. To save time, we modified goodRuns to 2 in the default setting.
For detail of the variables please refer to the Results section
For fitness measurement:
GPOthello.java GPOthelloGP.evaluate()
scoreEdgar+10*scoreRandom+0.2*scoreLength
For testing against both Edgar and Random player:
OthelloCache.java OthelloCache()
play 1 game against Edgar, 49 games against random
For adding num_op_pos_moves
* OthelloBoard.java
Add two Statistics data member:
OthelloBoard.num_white_clusters and OthelloBoard.num_black_clusters
OthelloBoard.updateBoardStatistics()
* GPOthello.java
GPOthello.createNodeSet()
GPOthelloGene.evaluate()
places that call OthelloGPTester
* OthelloGPPlayer.java
OthelloGPPlayer.evalFunction()
OthelloGPPlayer.evalSubFunction()
OthelloGPPlayer.evalMove()
* OthelloGPTester.java (similar to OthelloGPPlayer.java)
OthelloGPTester.evalFunction()
OthelloGPTester.evalMove()
*OthelloCache places that call OthelloGPPlayer
For adding white_clusters, black_clusters,
* GPOthello.java
GPOthello.createNodeSet()
GPOthelloGene.evaluate()
* OthelloGPPlayer.java
OthelloGPPlayer.evalFunction()
OthelloGPPlayer.evalSubFunction()
OthelloGPPlayer.evalMove()
* OthelloGPTester.java (similar to OthelloGPPlayer.java)
OthelloGPTester.evalFunction()
OthelloGPTester.evalMove()
Results
Settings | exp 1 stc det | exp2 stc det | exp3 stc det | exp4 stc det |
---|---|---|---|---|
PopulationSize | 200 | 64 | 64 | 64 |
NumberOfGenerations | 31 | 50 | 50 | 50 |
TerminationFitness | 5.0 | 25.0 | 25.0 | 25.0 |
GoodRuns | 2 | 2 | 2 | 2 |
Fitness measurement | scoreRandom_5 | formula 1 | formula 1 | formula 1 |
Good Runs | ||||
goodRun #1 | ||||
Run number | 1 | 14(**) | 2 | 12 |
best fitness | 1.00 | 28.4 | 14.8 | 23.6 |
RPB | black | RPB_1 | RPB_3 | RPB_5 |
test result(*) | 0/47 | 1/47 | 1/49 | 1/49 |
goodRun #2 | ||||
Run number | 2 | 3(**) | 13 | 12(***) |
best fitness | 3.00 | 28.8 | 22.60 | 30.4 |
RPB | ( - black white ) | RPB_2 | RPB_4 | RPB_6 |
test result(*) | 0/48 | 1/47 | 1/49 | 1/49 |
RPB_1: ( * ( - ( - ( * black_corners black_corners ) white_near_corners ) white_near_corners ) ( * ( * ( - ( * black_corners black_edges ) white_near_corners ) black_edges ) black_edges ))
RPB_2: ( + ( * ( * white_near_corners ( / black_edges white_near_corners )) white_corners ) black_edges )
RPB_3: ( * black ( / ( * black ( - black num_op_pos_moves )) white_edges ))
PRP_4: ( / ( + ( + ( + black_corners 10 ) ( - black white )) ( - ( - ( - black white ) ( + num_op_pos_moves white_near_corners )) ( - white white_corners ))) ( - white white_corners ))
RPB_5: ( + ( - ( / ( + ( + ( - white black_edges ) ( * black_edges black_clusters )) ( + ( - white_clusters 10 ) ( - white black_edges ))) ( * ( + ( - ( + ( / ( + black black_near_corners ) ( * ( / black black ) ( * 10 white_near_corners ))) ( * black_edges black_clusters )) ( / ( * ( * white_corners white_edges ) ( / white_edges white )) ( + ( / white_edges white ) ( * black_edges black_clusters )))) ( / black_clusters black_corners )) ( * 10 white_near_corners ))) ( / ( - white_clusters 10 ) ( - white_clusters 10 ))) ( + black_corners ( * ( * ( * 10 white_near_corners ) ( / ( + black black_near_corners ) ( + black black_near_corners ))) ( / ( / ( - white_clusters 10 ) ( - white_clusters 10 )) ( + ( / white_edges white ) ( * black_edges black_clusters ))))))
RPB_6: ( + ( - ( / ( + ( - white black_edges ) ( * black_edges black_clusters )) ( * ( + black black_near_corners ) ( * 10 white_near_corners ))) ( / ( - white_clusters 10 ) ( - white_clusters 10 ))) ( + black_corners ( * ( * 10 white_near_corners ) ( / ( * white_corners white_edges ) ( + black_corners white_clusters )))))
(**)
after 26 runs it still can not get a good run. We choose 2 individuals with the
top fitness out of all the generations of all the runs.
(***)
At run 16, the program was killed unexpectedly, so we only got one good run before that.
Using the original fitness measurement can't generate a good GP player most of the time. We show this in experiment 1.
A modified fitness measurement can result in a GP player that is capable of defeating Edgar. We show this in experiments 2 and 3.
By adding the two new primitives num_op_pos_moves and white_clusters, black_clusters: (contrast experiments 3 and 4 with experiment 2)