The following are caveats and technical items of addendum that I am providing to supplement and help smooth over the use of the GP-Othello assignment by other machine learning teachers. -Eric Note from one student: The training on CS machines in the department took a very long time. We were usi ng "nohup" command to run the processes overnight. The processes were getting killed before they were able to generate any results. As we have figured out later, the process of drawing the best individual could not have bee n handled together with the "nohup" command. We were able to get the results only after we turned off the drawing op tion in the "ini" file. Turn off the number-of-good-runs and complexity penalty parameters in the GP implementation. The next few emails are important addendums towards running this assignment smoothly: Subject: GP Othello assignment Folks, The TA from last semester and myself are making sure the Othello and GP source are still available online. Below is the email to get folks started F97 on the GP system. It gives a little assignment to play with the GP system -- YOU ARE EXEMPT FROM ACTUALLY DOING THIS ASSIGNMENT AND HANDING IT IN! But you are more than welcome to look at its details in order to learn about the GP system. Note that we hope to put add your Othello writeups to the Othello page. ----------------------------------------------------------------- ML Class, As promised in class, here is your short assignment to familiarize yourself with the Java GP implementation we will use for Project Othello. It will only take you a couple hours at the most. We strongly urge you to do this before Tuesday, since it will also introduce and further enforce genetic programming concepts for the midterm, and will help you understand our presentation of Project Othello in class on Tuesday. 0. Read about GA and GP in the text, and the in-class handout of 3 papers (all stapled together as one). 1. Follow the directions /u/niform/eeskin/TA/dist/GP/README-1.txt (on cs machines) to install and test the implementation. On AcIS machines, look in ~ee67/GP/ This will get you started on a system that tries to regress on x**4 + x**3 + x**2 + x. That is, it is evaluating the fitness of individual function trees in the population by how well they predict the values of the above equation for 20 values of x (i.e., 20 fitness/training cases). The fitness measure is not squared error, but, more simply, is sum of the absolute errors over these 20 cases. This is the "classic" GP symbolic regression paradigm. 2. Add as a new primitive the constant "2", which can now appear as a terminal in hypotheses (i.e., function trees). Note, only the files in the SymbReg directory need to be modified for this assignment. 3. Change the target function to 2*(x**2) + 4. 4. Recompile and run it. 5. Try (only) 4-6 runs with variations on at least two parameters, such as the population size, frequency of crossover, number of fitness cases (which cannot be changed in the .ini file), etc. You may want to decrease the GoodRuns parameter (number of runs before it stops) down to only 1 run. 6. Email the resulting function trees (text only) and fitnesses thereof to email: evs at cs dot columbia dot edu. 7. Is anything wrong with this classic paradigm? In particular, do you see it testing for generalization performance? What would be needed to make a fair assessment of resulting hypotheses? Do not try to implement this. Have fun, Eric Subject: GP assignment ADDENDUM S98 ML folks, A few more items re othello project: 1. Have you received the photocopy handout with 3 GP articles? I think CVN kept it and does that stuff. Else, let me know and I will photo it and bring to CVN ASAP. 2. One student said: The training on CS machines in the department took a very long time. We were usi ng "nohup" command to run the processes overnight. The processes were getting killed before they were able to generate any results. As we have figured out later, the process of drawing the best individual could not have bee n handled together with the "nohup" command. We were able to get the results only after we turned off the drawing op tion in the "ini" file. 3. turn off number-of-good-runs so it only does one run 4. turn off complexity penalty, probably, unless you want to experiment with it 5. Below is a message from last semester. ---------------------------------------------------- ML Folks, Please read this entire message before starting your Othello work. Here are two more hypothesis represenation alternatives to help inspire ideas. The second come from Chris (as well as a GA Othello paper we have). If you have any additional ideas that you are not doing yourself, please share them with everyone over email. * more fundamental, simple primitives, e.g., x/y coordinates of the piece just placed to get to this new board configuration -- could GP use this and automatically build the concepts of "edge", "corner", "one-away-from-corner", or other useful concepts we haven't thought of? (Get rid of (some) other primitives for this experiment.) * perform a shallow search, e.g., minimax, and apply the tree at the end-points of this search. That is, resulting player uses the hypothesis/tree to evaluate endpoints of a shallow search to determine which is the best current move. This of course requires a fair amount of programming, and results in an even lengthier fitness evaluation -- other changes would be required to keep the fitness measure fast enough. ------------------------------------------------ If you are on a cs account, you need to use it judiciously. It is very easy to clog up the systems with our long processes. This always creates instant negative vibes from all the cs users. We are at risk as a class because of computational hog-ness of this project. Therefore, please limit yourself to one or two runs going at a time. You can write shell scripts to sequentially do an array of runs with certain variations. fyi, you can use nohup to start a process that keeps going even if you log out. It is better to not do runs on the department's "cluster" machines, such as age, ground, shadow and a couple others. Also, try to use machines that have less going on, less people logged in. It is good to use the machines in SRL. There is a cluster of HPs that are fast, but I don't know if they have java or java compilers or JIT. They are called: donner blitzen cupid dancer prancer comet dasher vixen Note that one or two of these are permanently down. If you discover these do/don't have java, please email everyone. btw, (only) if you have time, it is better to do multiple runs (different random seeds) of the same experiment, to examine average performance. Finally, unless absolutely necessary, please send cs system questions/issues to myself and Eleazar *before* sending email to crf, so we can screen questions -- they are overloaded, and our place on cs machines is touch and go... Thanks, Eric Subject: JAVA GP fyi, Here is a version of the README for the gp system -- the one in the system directory may be more up to date... ------------------------------ In order to install the Genetic Programming package into you directory, you must copy all of the files into a directory in your account from the directory "/u/niform/eeskin/TA/dist/GP/" You must also copy all of the sub directories to this directory so you must use the "-R" flag to copy everything. This will give you a copy of the genetic programming package. Not there are several sub directories within the package. "gpjpp" contains the source files. "docs" contains the javadoc documenation for the source files. The other directories corespond to example applications. In order to compile the GP package, you must first include the compiler in your path. To do this add to your path on the cs cluster: "/opt/SUNWJava/JDK-1.1.3/bin" In order for java to find the relevent source files, add the following paths to your CLASSPATH variable: "/opt/SUNWJava/JDK-1.1.3/lib/classes.zip" "{your_path}/GP" later you will also add: "{your_path}/othello/" Now that you have set these variables, it is time to compile. Enter the directory "GP/gpjpp" and type "javac *.java" This should take a while but you should recieve no errors. Then enter the directory "GP/SymReg" and type "javac *.java" Now the Symbolic Representation Package should be running. In order to execute it, you must type: "java SybReg" This should be all you need to get the GP implementation up and running in your directory. There are several files in the SymReg directory that are of interest. The SymReg.java is the source file where the Sybolic Regression is defined. Note this is the ONLY source file that has to do with Symbolic regression. You will not need much more to use the GP with othello. There is also a file called SymReg.ini . This file has a lot of parameters for the GP. A complete set of possible parameters are contined the in documentation for the GPVariables class. Note you can tweak these parameters pretty easily to experiment with the GP. There is also a file that is called SymReg.stc that gives statistics on a run of the program. A file called SymReg.det can be reated if you set the PrintDetails option to true in the SymReg.ini file. This will give you pleanty of details for each generation including best and worst individual. After a run is over, a gif file of the tree of the best individual is created. Note, to keep the GP from crashing you must set an appropriate DISPLAY variable. Note also that the documentation is in html.
email: evs at cs dot columbia dot edu