Artificial Intelligence (W4701)

Homework 4:

Horn Clause Learning System

Due Date: Thursday, December 9.

Introduction. For this homework, you will implement an entire machine induction system from scratch. In particular, given a set of training cases (each one a set of facts, i.e., a knowledge base), your system will derive a rule (i.e., a Horn clause) that classifies any given knowledge base. This learned rule will then be tested on example knowledge bases that were not used during the learning process.

I highly recommend you read through this entire assignment at least once immediately. Then, before our next class, you should spend at least a few hours starting the implementation.

If you program the algorithm in a high-level language like Java or LISP I think you'll find progress pretty smooth. C will be more difficult.

Reading:

Reading on belief networks and planning is assigned below in Part III.

Russel and Norvig, Sections 18.1 and 18.2. Also, 18.5, which describes the algorithm CURRENT-BEST-LEARNING, of which the FIND-S algorithm, below, is a variation. In 18.5, the "Least-commitment search" and "Discussion" sub-subsections are optional. They describe version spaces (which will be covered early in the machine learning course next semester).

Although you are not responsible for Chapter 19, let me bring your attention to the fact that it includes methods to automatically learn belief networks (e.g., given the graph itself, automatically find the values in each conditional probability table).

Also, you may find optional section 21.4 interesting. Actually, you really should look through it for just a few minutes after reading through and understanding this entire homework project. It explores ways to automatically learn logical sentences in general. In this homework, we do this in a somewhat limited way, but following the same principles; Section 21.4's FOIL algorithm creates a Horn clause by going from general to specific (while, as you'll read below, we are going the opposite direction).

If you like machine learning, consider taking the course (W4771) next semester (it is only offered every other year).

Part I: FIND-S for Horn Clauses

Your assignment is to implement the FIND-S algorithm, below. You should submit a hardcopy of your code (well-commented), as well as showing your system work on the examples below -- the result of training (what is learned), and the result of applying what is learned to the testing data.

For this project you will make use of your Forward-Chaining system in two ways. 1. You will probably re-use parts of your old code to build this system, since you need to read in facts from standard input, output facts and rules, and manipulate a Rule structure and variable bindings. 2. This new system will learn a rule (over training data), and you will use your old system to apply the rule (to testing data).

It is up to you whether you connect the two systems in your code. If you don't want to, you can just use the Forward-Chaining system as-is, without making any changes to it. If you did not complete a working Forward-Chaining system in the last homework, you may use someone else's system. In this case, make it clear on your homework submission.

One intuitive way to understand FIND-S is that it finds what the positive training examples have in common, i.e., what is their "least common denominator" -- what is the most you can say that is true about them all?

	FIND-S Algorithm  ("Find most specific (consistent concept)")

Local variable h is the hypothesis we are forming.  It is a concept
(i.e., a boolean-valued function) in a specific form: a single
horn-clause implication (i.e., a rule) with no variables on the RHS.

Use the first positive training example to directly create the first
value of h.  Just take all the facts except the target classification
fact and put them on the LHS, and then put the target on the RHS.
This is the most specific rule that will successfully cover
this case, i.e., it is not general at all since it only applies to
this one case.

for each additional positive training example, e:
      generalize h as little as possible so it also covers e

That's it. That's the whole algorithm. Well, almost. See we have to define very carefully what we mean by "as little as possible" and "generalize". Let's make that a sub-routine:













Algorithm "generalize h as little as possible so it also covers e":

initialize local variable oldvalues, a set of variable bindings, to empty
initialize local variable newvalues, a set of variable bindings, to empty
for each atom on the LHS of h, a1
	if there is an atom in e, a2, with the same predicate as a1
	    for each argument of a1, g1, corresponding with constant g2 from a2
	        if g1 is a variable and newvalues binds it to other than g2,
  	            change it to a brand new variable
		    add to newvalues: 
		        the binding of the new variable to constant g2
	        else if g1 is a variable
		    add to newvalues the binding of variable g1 to constant g2
		else if g1 is a different constant than g2
   	            if [there is a variable bound to g2's value in newvalues
		           and that same variable is bound to g1 in oldvalues]
			   /* This condition requires a nested loop to check */
			   /* so make it a separate function */
	                change g1 into that variable
		    else
			change g1 into a brand new variable
			add to newvalues: 
			    the binding of the new variable to constant g2
			add to oldvalues:
			    the binding of the new variable to constant g1
		else g1 is the same constant as g2 -- do nothing to h
	    END OF FOR-LOOP
        else (a1 has a predicate that doesn't appear in e at all)
    	    eliminate a1 from h

Technical notes:

You'll have to figure out how to generate "brand new" variable names in your implementation. Note that in these cases, it must create variable names that do not already exist anywhere in h.
Your system should be able to read from standard input sets of training examples, as shown below in part 2, after which it prints out the induced rule. Note that there is a blank line between each training example. Also, there must of course be some way for you to specify the target classification fact.

For your information and edification, the following simplifying assumptions are implicit in the above algorithm. These assumptions make our algorithm much simpler than the FOIL algorithm in the text's optional reading -- FOIL has to perform a search procedure, but our algorithm has no searching at all.

Each predicate appears at most once in any (training or testing) example.
The RHS of the learned hypothesis never has any variables.

Part II: Testing Your System

The following data sets provide examples to help you understand what your system must do, and to provide testable input for your system.

You must show a transcript of your system working on these examples.

The target concept (a.k.a., the target function, i.e., the right answer, i.e., the hypothesis we hope learning to learn) is: Eric is happy if he gets a foot massage from someone who was taught by someone who is in touch with their feelings.
GiveFootMassage(a,Eric) ^ Trained(b,a) ^ InTouchWithFeelings(b) -> Happy(Eric)
Here are the training examples. The target function will determine whether Eric is happy, by seeing whether the fact Happy(Eric) is listed in the training example. The only reason we know this defines the job of the target function is because I said so, i.e., for this example, we have chosen to try and predict under what circumstances Eric is happy. Note that the order in which facts are listed in each example is arbitrary.
We know that Happy(Eric) is not true in the first two examples -- those are the negative training examples. The rest are positive training examples.
```
CatDied(C)
Owns(Eric,C)

GiveFootMassage(Bill,Eric)
Trained(Sally,Bill)
AngryAllTheTime(Sally)

InTouchWithFeelings(Andy)
GiveFootMassage(Susan,Eric)
Happy(Eric)
Trained(Andy,Susan)
WeatherToday(Sunny)

WeatherToday(Rainy)
GiveFootMassage(Frank,Eric)
InTouchWithFeelings(Bob)
Trained(Bob,Frank)
Happy(Eric)

Happy(Eric)
InTouchWithFeelings(Rupp)
Trained(Rupp,Zvi)
Owns(Eric,Porche)
GiveFootMassage(Zvi,Eric)
```
You can follow a link from the course web page to see a transcript tracing the algorithm as applied to the three training cases above (as gone over in class).
Create a set of 3 positive and 3 negative testing examples. Unlike the training examples above, these examples are not supervised, i.e., they must not have the correct answer associated with each one. In this case, that means, even if Eric is happy, Happy(Eric) is not included. It is up to the learned hypothesis to derive this if it is the case. Show your F-C system working on your examples with the learned rule.
The target classification fact for the following example is BadDay(Today). How do you know when its a bad day? Your system will tell you, after examing this data:
```
BadDay(Today)
Cause(Injury,Pain)
Weather(Rainy,Today)
Happen(Injury,Today)
Bad(Pain)

Happy(Me)
WorldPeaceAchieved(Today)
FreeHotDogs(Today)

Bad(Pain)
Weather(Rainy,Yesterday)
BadDay(Today)
Happen(Insult,Today)
Cause(Insult,Pain)

Bad(Sadness)
Happen(Nothing,Today)
Cause(Insult,Sadness)

Bad(Broke)
Happen(Lawsuit,Today)
Cause(Lawsuit,Broke)
BadDay(Today)
NobodyLoves(Me)
```
Once again, it is up to you to make up (from your head) the test data (3 negative and 3 positive examples). Remember the test data does not have the target classification fact in it -- it is up to the forward chaining system to derive it, if it is true. Show your system work on your test examples.

The target classification fact for this example is StableSituation(). (Note that the need for something to be yellow won't necessarily make sense to your human mind in terms of "stability" -- just pretend that there is an unstable person very sensitive to yellow in a certain way and make sure your algorthm works very precisely correctly.) Show the resulting hypothesis.

On(A,B)
Sturdy(B)
Bigger(B,A)
StableSituation()
Color(C,Yellow)
WeighsLessThan(A,TenPounds)

Bigger(C,A)
On(A,C)
Sturdy(C)
StableSituation()
WeighsLessThan(A,TenPounds)
Color(C,Yellow)

On(A,B)
Bigger(B,A)
Color(C,Yellow)
WeighsLessThan(A,TenPounds)

Sturdy(B)
Bigger(B,C)
StableSituation()
Color(C,Yellow)
WeighsLessThan(C,TenPounds)
On(C,B)

Color(C,Yellow)
Bigger(A,C)
On(A,C)
Sturdy(C)
WeighsLessThan(A,TenPounds)

Sturdy(B)
Color(C,Yellow)
WeighsLessThan(C,TenPounds)
On(C,B)

Part III: Exercises in planning and belief networks

Reading: Russell and Norvig, Chapter 11 (Sections 11.6, 11.7 and 11.8 are optional), Section 12.1, Chapter 14, Chapter 15 (Sections 15.4, 15.5 and 15.6 are optional; in Section 15.3, you are only responsible for what the algorithm does -- its input and output -- and perhaps a general sense of how it does it).

(This exercise is not in planning or belief networks.) Suppose you wanted to apply your learning system to induce Unix user profiles based on user behavior. Specifically, let's say it is trying to predict when a user will run ghostview. If it can do this, then it can start up ghostview in the background before the user explicitly asks, which would save time.
Write three positive and three negative training examples for a user who invokes ghostview if and only if she has, in any order, read an e-mail from someone, written to that same person, changed directories to some directory, and that directory has a ps file in it (and possibly done other things). Also, write the target function you expect your system to learn. Show your system working on this example.
Exercise 11.1.
Exercise 11.4, but don't uses POP to solve it. Rather, follow the general ad-hoc strategy we showed in class: start with the goal, show an operator that would get to the goal, show the resulting constraints on the prerequisites of the operator and keep working backwards until you have a complete plan. Make sure you always show the connection between the effects and/or initial state and the prerequisites of each action.
Don't do exercise 11.5. What does it mean for a time machine to be "new"?
Exercises 14.3 and 14.11.
Exercises 15.1 and 15.6
Draw a belief network with nodes for CaptainError, HitIceberg, Sink, and WeakHull (C, H, S and W, respectively). S is directly influenced by H and W, and H is directly influenced by C. Make up values that seem reasonable to you for the conditional probability tables, one table per node -- show the tables. Show how to infer answers to the following queries:
1. What is the prior probability of H?
2. If you know C, what is the probability of H?
3. If you know C, what is the probability of S?
4. If you know H, what is the probability of S?
5. If you know C and H, what is the probability of S?
6. If you know S, what is the probability of C?
7. If you know S and W, what is the probability of C?

email: evs at cs dot columbia dot edu