CIS 120 Fall 2007 Homework 4: Machine Learning

Due: Friday, February 29th at 5 PM.

Files to submit: Homework.java NotAPictureException.java IncompatibleExampleException.java Example.java Perceptron.java NearestNeighbor.java NotADirException.java

In this project we will perform some simple machine learning. We will learn to distinguish human faces from cougar faces, using the Perceptron learning algorithm and the Nearest Neighbor learning algorithm.

Cougar

Human

The javadocs outline the classes that you should implement. Starting points for these classes as well as the training and testing images are available in a zip file: hw4supplied.zip. The images for the human faces are from Rice University while the images of cougar faces are from the Caltech 101 dataset.

Supplied Code (javadocs)

Classes You Need to Implement

You need to implement all the classes for which there are javadocs. The exceptions should be self-explanatory (we have provided an implementation of one of them for reference). You can implement them in any order, but it is probably easiest to implement the exceptions first and the other classes in the order we have given them here.

Example.java

The Example class does most of the work. It contains the representation of the image that the learning algorithms will use, as well as the methods to deal with that representation. The representation we will use for the relevant data is a one dimensional double array. The constructor of the Example class converts an image (a SimplePicture object) into this representation. It needs to do three things:

We only use some of the pixels, because even with our small images (250*300 pixels) using all of them takes up a lot of memory. The interval Example variable controls what fraction of the pixels in an image we include in our representation. We include the pixel at (0,0) and every intervalth row and column. For example, if interval is set to 3 and we have a 4x5 image with the values:

10

20

30

40

50

60

70

80

90

100

110

120

130

140

150

160

170

180

190

200

We would include entries corresponding to the pixels with values (10, 40, 130, 160). That is we consider only rows 0,3,6,etc and columns 0,3,6,etc.

We represent a pixel as a double with the value of (pix-127.0)/128.0 where pix is the value of the pixel.

The entire image is represented as a one dimensional array of doubles. A possible representation would then be a double array containing the values: (-0.9140625,-0.6796875,0.0234375,0.2578125).

The Example class supports several methods:

An Example also tracks whether it is supposedly a 'yes' or a 'no' Example by storing a label (fixed at creation time). For training Examples, this is assumed to be correct; later, for testing Examples, the algorithm will ignore the Example's label and form its own opinion. To allow for algorithms to check Example type while they are being trained, there is another method, getLabel.

Here are some sample interactions that should work when you are done.

> Example inst1 = new Example(new SimplePicture("../human/arnab1.jpg"), 4, true);
> Example inst2 = new Example(new SimplePicture("../human/kurt1.jpg"), 4, true);
> inst1.getLabel()
true
> inst1.dot(inst2)
694.4557495117188
> inst1.add(inst2)
> inst1.dot(inst2)
2055.0662841796875
> inst1.subtract(inst2)
> inst1.dot(inst2)
694.4557495117188
> inst1.distance(inst2)
37.70682547629806
> inst1.distance(inst1)
0.0

Homework.java

The Homework class contains some methods for reading in images directly to Examples. See the javadocs for more information on what these are and what they do. It will probably be useful to look at the javadocs for java.io.File. In particular the exists, isDirectory and list methods.

NearestNeighbor.java

This is the first class that extends the OnlineLearningAlgorithm class.

The Nearest Neighbor algorithm maintains two lists of Examples. Each time the training method is called it adds an Example to the appropriate list. When asked to classify an Example the algorithm finds the training Example with the smallest distance to the new Example and decides the Example in question should have the corresponding label (and then returns whether this agrees with its provided label or not). For example, if the nearest Example is on the true list, then the classify method would call it a true Example, and return true if the Example called itself a true Example, or false otherwise. If the NearestNeighbor object was never trained, then classify should return false.

Keep in mind the implications of Example's getLabel: important during training, but potentially bogus during classification.

Perceptron.java

The main idea behind the Perceptron algorithm is to use the value of a dot product as a measure of the similarity of two Examples. If the Examples are very similar then the dot product will be a large positive number, while if they are very different the dot product will be a large negative number.

The algorithm maintains an Example object that is very similar to all the true Examples that it has seen in training and very different from all the false Examples it has seen. This Example object represents the separating hyperplane (the name has to do with the linear algebra that provides a theory of what kinds of problems a Perceptron can learn). When asked to classify a new Example, the algorithm computes the dot product between the Example and its separating hyperplane. If the result is more than 0, then it classifies the Example as a true Example; otherwise it considers it a false Example. This is how the algorithm uses the separating hyperplane that it has learned from training.

In order to learn a separating hyperplane, the algorithm will use another simple fact. If we add an Example to our separating hyperplane then the dot product of that Example with the separating hyperplane increases. If we subtract an Example, then the dot product with that Example decreases. To train with a new Example, the algorithm checks whether the Example would have been classified correctly. If so, then it decides that it has nothing new to learn from this Example right now and does nothing. Otherwise, it adds the Example to its separating hyperplane if that Example was a true Example and subtracts it if it was a false Example. To summarize: we test whether we would have made a mistake on this Example and if we would have, we take a simple step to reduce the error. Even such a simple algorithm is actually pretty powerful.

Here is the output of running the main method of class Homework. You do not need to do anything for this method — it is provided as a way to test your code. It reads in the directories of training images as well as the test images and runs the Perceptron algorithm on them. (Since Perceptron functions best with mixed input (not all of one type of Example before the other), we use a seeded Random to shuffle the test list.) The output below is for just Perceptron, with no shuffling:

read 6 human and 5 cougar Examples
perceptron says ../test/amber1.jpg is human
perceptron says ../test/amy1.jpg is human
perceptron says ../test/andrew1.jpg is human
perceptron says ../test/andy1.jpg is human
perceptron says ../test/andyp1.jpg is human
perceptron says ../test/anita1.jpg is human
perceptron says ../test/chris1.jpg is human
perceptron says ../test/cougar_10.jpg is cougar
perceptron says ../test/cougar_11.jpg is human
perceptron says ../test/cougar_12.jpg is cougar
perceptron says ../test/cougar_13.jpg is cougar
perceptron says ../test/cougar_14.jpg is cougar
perceptron says ../test/cougar_15.jpg is human
perceptron says ../test/cougar_16.jpg is cougar
perceptron says ../test/cougar_17.jpg is cougar
perceptron says ../test/cougar_18.jpg is cougar
perceptron says ../test/cougar_5.jpg is cougar
perceptron says ../test/cougar_6.jpg is cougar
perceptron says ../test/cougar_7.jpg is cougar
perceptron says ../test/cougar_8.jpg is cougar
perceptron says ../test/cougar_9.jpg is cougar
perceptron says ../test/dane1.jpg is human
perceptron says ../test/eric1.jpg is human
perceptron says ../test/gabe1.jpg is human
perceptron says ../test/hill4.jpg is human
perceptron says ../test/indra1.jpg is human
perceptron says ../test/jack1.jpg is human
perceptron says ../test/jill1.jpg is human
perceptron says ../test/jimmy1.jpg is human
perceptron says ../test/joan1.jpg is human
perceptron says ../test/kevin1.jpg is human
perceptron says ../test/nate1.jpg is human
perceptron says ../test/paul1.jpg is human
perceptron says ../test/pooja1.jpg is human
perceptron says ../test/reuven1.jpg is human
perceptron says ../test/tats1.jpg is human
perceptron says ../test/thad1.jpg is human
perceptron says ../test/tim1.jpg is human
perceptron says ../test/tulika1.jpg is human
perceptron says ../test/zach1.jpg is human

 

Extra credit:

KNearestNeighbor.java

An extension of the Nearest Neighbor algorithm is the K-Nearest Neighbor algorithm. As an extra-credit problem, implement this extension as a class called KNearestNeighbor. Note: if there are the same number of true and false Examples that are nearest to the given Example, then classify should decide it's a false Example. You might find it useful to use the sort method from java.util.Collections.

Sample Final Output

Finally, here is sample output from Homework for when all three learning algorithms are done.

read 6 human and 5 cougar Examples
perceptron says ../test/gabe1.jpg is human, nearest says ../test/gabe1.jpg is human, 3-nearest says ../test/gabe1.jpg is human
perceptron says ../test/thad1.jpg is human, nearest says ../test/thad1.jpg is human, 3-nearest says ../test/thad1.jpg is human
perceptron says ../test/dane1.jpg is human, nearest says ../test/dane1.jpg is human, 3-nearest says ../test/dane1.jpg is human
perceptron says ../test/chris1.jpg is human, nearest says ../test/chris1.jpg is human, 3-nearest says ../test/chris1.jpg is human
perceptron says ../test/nate1.jpg is human, nearest says ../test/nate1.jpg is human, 3-nearest says ../test/nate1.jpg is human
perceptron says ../test/andyp1.jpg is human, nearest says ../test/andyp1.jpg is human, 3-nearest says ../test/andyp1.jpg is human
perceptron says ../test/eric1.jpg is human, nearest says ../test/eric1.jpg is human, 3-nearest says ../test/eric1.jpg is human
perceptron says ../test/cougar_12.jpg is cougar, nearest says ../test/cougar_12.jpg is cougar, 3-nearest says ../test/cougar_12.jpg is cougar
perceptron says ../test/cougar_14.jpg is cougar, nearest says ../test/cougar_14.jpg is cougar, 3-nearest says ../test/cougar_14.jpg is cougar
perceptron says ../test/cougar_7.jpg is cougar, nearest says ../test/cougar_7.jpg is cougar, 3-nearest says ../test/cougar_7.jpg is cougar
perceptron says ../test/hill4.jpg is human, nearest says ../test/hill4.jpg is human, 3-nearest says ../test/hill4.jpg is human
perceptron says ../test/cougar_5.jpg is cougar, nearest says ../test/cougar_5.jpg is cougar, 3-nearest says ../test/cougar_5.jpg is cougar
perceptron says ../test/cougar_16.jpg is cougar, nearest says ../test/cougar_16.jpg is cougar, 3-nearest says ../test/cougar_16.jpg is cougar
perceptron says ../test/cougar_13.jpg is cougar, nearest says ../test/cougar_13.jpg is cougar, 3-nearest says ../test/cougar_13.jpg is cougar
perceptron says ../test/jimmy1.jpg is human, nearest says ../test/jimmy1.jpg is human, 3-nearest says ../test/jimmy1.jpg is human
perceptron says ../test/joan1.jpg is human, nearest says ../test/joan1.jpg is cougar, 3-nearest says ../test/joan1.jpg is human
perceptron says ../test/cougar_8.jpg is cougar, nearest says ../test/cougar_8.jpg is cougar, 3-nearest says ../test/cougar_8.jpg is cougar
perceptron says ../test/reuven1.jpg is human, nearest says ../test/reuven1.jpg is human, 3-nearest says ../test/reuven1.jpg is human
perceptron says ../test/kevin1.jpg is human, nearest says ../test/kevin1.jpg is human, 3-nearest says ../test/kevin1.jpg is human
perceptron says ../test/tulika1.jpg is human, nearest says ../test/tulika1.jpg is human, 3-nearest says ../test/tulika1.jpg is human
perceptron says ../test/paul1.jpg is human, nearest says ../test/paul1.jpg is human, 3-nearest says ../test/paul1.jpg is human
perceptron says ../test/cougar_18.jpg is cougar, nearest says ../test/cougar_18.jpg is cougar, 3-nearest says ../test/cougar_18.jpg is cougar
perceptron says ../test/andrew1.jpg is human, nearest says ../test/andrew1.jpg is human, 3-nearest says ../test/andrew1.jpg is human
perceptron says ../test/amber1.jpg is human, nearest says ../test/amber1.jpg is human, 3-nearest says ../test/amber1.jpg is human
perceptron says ../test/tats1.jpg is human, nearest says ../test/tats1.jpg is cougar, 3-nearest says ../test/tats1.jpg is human
perceptron says ../test/jill1.jpg is human, nearest says ../test/jill1.jpg is cougar, 3-nearest says ../test/jill1.jpg is human
perceptron says ../test/jack1.jpg is human, nearest says ../test/jack1.jpg is human, 3-nearest says ../test/jack1.jpg is human
perceptron says ../test/zach1.jpg is human, nearest says ../test/zach1.jpg is human, 3-nearest says ../test/zach1.jpg is human
perceptron says ../test/cougar_15.jpg is human, nearest says ../test/cougar_15.jpg is cougar, 3-nearest says ../test/cougar_15.jpg is cougar
perceptron says ../test/amy1.jpg is human, nearest says ../test/amy1.jpg is human, 3-nearest says ../test/amy1.jpg is human
perceptron says ../test/indra1.jpg is human, nearest says ../test/indra1.jpg is human, 3-nearest says ../test/indra1.jpg is human
perceptron says ../test/anita1.jpg is human, nearest says ../test/anita1.jpg is human, 3-nearest says ../test/anita1.jpg is human
perceptron says ../test/pooja1.jpg is human, nearest says ../test/pooja1.jpg is human, 3-nearest says ../test/pooja1.jpg is human
perceptron says ../test/cougar_6.jpg is cougar, nearest says ../test/cougar_6.jpg is cougar, 3-nearest says ../test/cougar_6.jpg is cougar
perceptron says ../test/tim1.jpg is human, nearest says ../test/tim1.jpg is human, 3-nearest says ../test/tim1.jpg is human
perceptron says ../test/cougar_17.jpg is cougar, nearest says ../test/cougar_17.jpg is cougar, 3-nearest says ../test/cougar_17.jpg is cougar
perceptron says ../test/cougar_9.jpg is cougar, nearest says ../test/cougar_9.jpg is cougar, 3-nearest says ../test/cougar_9.jpg is cougar
perceptron says ../test/andy1.jpg is human, nearest says ../test/andy1.jpg is human, 3-nearest says ../test/andy1.jpg is human
perceptron says ../test/cougar_10.jpg is cougar, nearest says ../test/cougar_10.jpg is cougar, 3-nearest says ../test/cougar_10.jpg is cougar
perceptron says ../test/cougar_11.jpg is human, nearest says ../test/cougar_11.jpg is cougar, 3-nearest says ../test/cougar_11.jpg is cougar