Introduction. For this homework, you will implement an entire machine induction system from scratch. In particular, given a set of training cases (each one a set of facts, i.e., a knowledge base), your system will derive a rule (i.e., a Horn clause) that classifies any given knowledge base. This learned rule will then be tested on example knowledge bases that were not used during the learning process.
I highly recommend you read through this entire assignment at least once immediately. Then, before our next class, you should spend at least a few hours starting the implementation.
If you program the algorithm in a high-level language like Java or LISP I think you'll find progress pretty smooth. C will be more difficult.
Reading:
Reading on belief networks and planning is assigned below in Part III.
Russel and Norvig, Sections 18.1 and 18.2. Also, 18.5, which describes the algorithm CURRENT-BEST-LEARNING, of which the FIND-S algorithm, below, is a variation. In 18.5, the "Least-commitment search" and "Discussion" sub-subsections are optional. They describe version spaces (which will be covered early in the machine learning course next semester).
Although you are not responsible for Chapter 19, let me bring your attention to the fact that it includes methods to automatically learn belief networks (e.g., given the graph itself, automatically find the values in each conditional probability table).
Also, you may find optional section 21.4 interesting. Actually, you really should look through it for just a few minutes after reading through and understanding this entire homework project. It explores ways to automatically learn logical sentences in general. In this homework, we do this in a somewhat limited way, but following the same principles; Section 21.4's FOIL algorithm creates a Horn clause by going from general to specific (while, as you'll read below, we are going the opposite direction).
If you like machine learning, consider taking the course (W4771) next semester (it is only offered every other year).
Your assignment is to implement the FIND-S algorithm, below. You should submit a hardcopy of your code (well-commented), as well as showing your system work on the examples below -- the result of training (what is learned), and the result of applying what is learned to the testing data.
For this project you will make use of your Forward-Chaining system in two ways. 1. You will probably re-use parts of your old code to build this system, since you need to read in facts from standard input, output facts and rules, and manipulate a Rule structure and variable bindings. 2. This new system will learn a rule (over training data), and you will use your old system to apply the rule (to testing data).
It is up to you whether you connect the two systems in your code. If you don't want to, you can just use the Forward-Chaining system as-is, without making any changes to it. If you did not complete a working Forward-Chaining system in the last homework, you may use someone else's system. In this case, make it clear on your homework submission.
One intuitive way to understand FIND-S is that it finds what the positive training examples have in common, i.e., what is their "least common denominator" -- what is the most you can say that is true about them all?
FIND-S Algorithm ("Find most specific (consistent concept)") Local variable h is the hypothesis we are forming. It is a concept (i.e., a boolean-valued function) in a specific form: a single horn-clause implication (i.e., a rule) with no variables on the RHS. Use the first positive training example to directly create the first value of h. Just take all the facts except the target classification fact and put them on the LHS, and then put the target on the RHS. This is the most specific rule that will successfully cover this case, i.e., it is not general at all since it only applies to this one case. for each additional positive training example, e: generalize h as little as possible so it also covers eThat's it. That's the whole algorithm. Well, almost. See we have to define very carefully what we mean by "as little as possible" and "generalize". Let's make that a sub-routine:
Algorithm "generalize h as little as possible so it also covers e": initialize local variable oldvalues, a set of variable bindings, to empty initialize local variable newvalues, a set of variable bindings, to empty for each atom on the LHS of h, a1 if there is an atom in e, a2, with the same predicate as a1 for each argument of a1, g1, corresponding with constant g2 from a2 if g1 is a variable and newvalues binds it to other than g2, change it to a brand new variable add to newvalues: the binding of the new variable to constant g2 else if g1 is a variable add to newvalues the binding of variable g1 to constant g2 else if g1 is a different constant than g2 if [there is a variable bound to g2's value in newvalues and that same variable is bound to g1 in oldvalues] /* This condition requires a nested loop to check */ /* so make it a separate function */ change g1 into that variable else change g1 into a brand new variable add to newvalues: the binding of the new variable to constant g2 add to oldvalues: the binding of the new variable to constant g1 else g1 is the same constant as g2 -- do nothing to h END OF FOR-LOOP else (a1 has a predicate that doesn't appear in e at all) eliminate a1 from hTechnical notes:
For your information and edification, the following simplifying assumptions are implicit in the above algorithm. These assumptions make our algorithm much simpler than the FOIL algorithm in the text's optional reading -- FOIL has to perform a search procedure, but our algorithm has no searching at all.
You must show a transcript of your system working on these examples.
GiveFootMassage(a,Eric) ^ Trained(b,a) ^ InTouchWithFeelings(b) -> Happy(Eric)
Here are the training examples. The target function will determine whether Eric is happy, by seeing whether the fact Happy(Eric) is listed in the training example. The only reason we know this defines the job of the target function is because I said so, i.e., for this example, we have chosen to try and predict under what circumstances Eric is happy. Note that the order in which facts are listed in each example is arbitrary.
We know that Happy(Eric) is not true in the first two examples -- those are the negative training examples. The rest are positive training examples.
CatDied(C) Owns(Eric,C) GiveFootMassage(Bill,Eric) Trained(Sally,Bill) AngryAllTheTime(Sally) InTouchWithFeelings(Andy) GiveFootMassage(Susan,Eric) Happy(Eric) Trained(Andy,Susan) WeatherToday(Sunny) WeatherToday(Rainy) GiveFootMassage(Frank,Eric) InTouchWithFeelings(Bob) Trained(Bob,Frank) Happy(Eric) Happy(Eric) InTouchWithFeelings(Rupp) Trained(Rupp,Zvi) Owns(Eric,Porche) GiveFootMassage(Zvi,Eric)You can follow a link from the course web page to see a transcript tracing the algorithm as applied to the three training cases above (as gone over in class).
Create a set of 3 positive and 3 negative testing examples. Unlike the training examples above, these examples are not supervised, i.e., they must not have the correct answer associated with each one. In this case, that means, even if Eric is happy, Happy(Eric) is not included. It is up to the learned hypothesis to derive this if it is the case. Show your F-C system working on your examples with the learned rule.
BadDay(Today) Cause(Injury,Pain) Weather(Rainy,Today) Happen(Injury,Today) Bad(Pain) Happy(Me) WorldPeaceAchieved(Today) FreeHotDogs(Today) Bad(Pain) Weather(Rainy,Yesterday) BadDay(Today) Happen(Insult,Today) Cause(Insult,Pain) Bad(Sadness) Happen(Nothing,Today) Cause(Insult,Sadness) Bad(Broke) Happen(Lawsuit,Today) Cause(Lawsuit,Broke) BadDay(Today) NobodyLoves(Me)
Once again, it is up to you to make up (from your head) the test data (3 negative and 3 positive examples). Remember the test data does not have the target classification fact in it -- it is up to the forward chaining system to derive it, if it is true. Show your system work on your test examples.
On(A,B) Sturdy(B) Bigger(B,A) StableSituation() Color(C,Yellow) WeighsLessThan(A,TenPounds) Bigger(C,A) On(A,C) Sturdy(C) StableSituation() WeighsLessThan(A,TenPounds) Color(C,Yellow) On(A,B) Bigger(B,A) Color(C,Yellow) WeighsLessThan(A,TenPounds) Sturdy(B) Bigger(B,C) StableSituation() Color(C,Yellow) WeighsLessThan(C,TenPounds) On(C,B) Color(C,Yellow) Bigger(A,C) On(A,C) Sturdy(C) WeighsLessThan(A,TenPounds) Sturdy(B) Color(C,Yellow) WeighsLessThan(C,TenPounds) On(C,B)
Write three positive and three negative training examples for a user who invokes ghostview if and only if she has, in any order, read an e-mail from someone, written to that same person, changed directories to some directory, and that directory has a ps file in it (and possibly done other things). Also, write the target function you expect your system to learn. Show your system working on this example.
email: evs at cs dot columbia dot edu