This assignment is due before 11:00PM on Wednesday, October 18, 2017. There are two parts to this homework assignment:

• A programming assignment on kd-trees. This assignment also requires you to submit unit tests along with your implementation.
• A set of written questions about Binary Search Trees.

You’ll need to submit your solutions to both parts of the homework before the deadline.

Collaboration policy.

• You must do this assignment by yourself.
• You must never give or expose your solutions to an assignment to anyone who is taking the course. For example, you may not place your solutions in a public location (such as a website, a public code repository, or a printout left in a lab). If you leave your computer unattended, be sure to protect it with a password.
• You must never view someone else’s solutions to a programming assignment (or variant of an assignment). For example, you may not download solutions to a Coursera version of the assignment from the web. You may not use Coursera’s autograder to check the correctness of your solution.
• All solutions are checked with plagrism detection software. Any assignment that is flagged by the software will be automatically referred to the Office of Student Conduct, which will adjudicate whether the course collaboration policy was violated. The first violation will result in your overall course grade being decreased by one letter grade. A second violation will result in an F in the class.

# Programming Assignment 4: Kd-Trees

This assignment is also used in Coursera. You may not download solutions to a Coursera version of the assignment from the web. You may not use Coursera’s autograder to check the correctness of your solution.

Write a data type to represent a set of points in the unit square (all points have x- and y-coordinates between 0 and 1) using a 2d-tree to support efficient range search (find all of the points contained in a query rectangle) and nearest-neighbor search (find a closest point to a query point). 2d-trees have numerous applications, ranging from classifying astronomical objects to computer animation to speeding up neural networks to mining data to image retrieval.

The goals of this part of the assignment are:

• Understand Binary Search Trees (BSTs).
• Extend BSTs to a more generalized version with multidimensional keys.
• Implement an efficient data structure for ordered operations in scientific applications.

## Geometric primitives

To get started, use the following geometric primitives for points and axis-aligned rectangles in the plane.

The immutable data type Point2D (part of algs4.jar) represents points in the plane. Here is the subset of its API that you may use:

The immutable data type RectHV (part of algs4.jar) represents axis-aligned rectangles. Here is the subset of its API that you may use:

Do not modify these data types.

## Brute-force implementation

Write a mutable data type PointSET.java that represents a set of points in the unit square. Implement the following API by using a red–black BST:

### Implementation requirements

You must use either SET or java.util.TreeSet; do not implement your own red–black BST.

### Performance requirements

Your implementation should support insert() and contains() in time proportional to the logarithm of the number of points in the set in the worst case; it should support nearest() and range() in time proportional to the number of points in the set.

### Corner cases

Throw a java.lang.IllegalArgumentException if any argument is null.

## 2d-tree implementation

Write a mutable data type KdTree.java that uses a 2d-tree to implement the same API (but replace PointSET with KdTree). A 2d-tree is a generalization of a BST to two-dimensional keys. The idea is to build a BST with points in the nodes, using the x- and y-coordinates of the points as keys in strictly alternating sequence.

• Search and insert. The algorithms for search and insert are similar to those for BSTs, but at the root we use the x-coordinate (if the point to be inserted has a smaller x-coordinate than the point at the root, go left; otherwise go right); then at the next level, we use the y-coordinate (if the point to be inserted has a smaller y-coordinate than the point in the node, go left; otherwise go right); then at the next level the x-coordinate, and so forth.
insert (0.7, 0.2)
insert (0.5, 0.4)
insert (0.2, 0.3)
insert (0.4, 0.7)
insert (0.9, 0.6)
• Draw. A 2d-tree divides the unit square in a simple way: all the points to the left of the root go in the left subtree; all those to the right go in the right subtree; and so forth, recursively. Your draw() method should draw all of the points to standard draw in black and the subdivisions in red (for vertical splits) and blue (for horizontal splits). This method need not be efficient—it is primarily for debugging.

The prime advantage of a 2d-tree over a BST is that it supports efficient implementation of range search and nearest-neighbor search. Each node corresponds to an axis-aligned rectangle in the unit square, which encloses all of the points in its subtree. The root corresponds to the unit square; the left and right children of the root corresponds to the two rectangles split by the x-coordinate of the point at the root; and so forth.

• Range search. To find all points contained in a given query rectangle, start at the root and recursively search for points in both subtrees using the following pruning rule: if the query rectangle does not intersect the rectangle corresponding to a node, there is no need to explore that node (or its subtrees). A subtree is searched only if it might contain a point contained in the query rectangle.

• Nearest-neighbor search. To find a closest point to a given query point, start at the root and recursively search in both subtrees using the following pruning rule: if the closest point discovered so far is closer than the distance between the query point and the rectangle corresponding to a node, there is no need to explore that node (or its subtrees). That is, search a node only if it might contain a point that is closer than the best one found so far. The effectiveness of the pruning rule depends on quickly finding a nearby point. To do this, organize the recursive method so that when there are two possible subtrees to go down, you always choose the subtree that is on the same side of the splitting line as the query point as the first subtree to explore—the closest point found while exploring the first subtree may enable pruning of the second subtree.

## Clients

You may use the following interactive client programs to test and debug your code.

• KdTreeVisualizer.java computes and draws the 2d-tree that results from the sequence of points clicked by the user in the standard drawing window.
• RangeSearchVisualizer.java reads a sequence of points from a file (specified as a command-line argument) and inserts those points into a 2d-tree. Then, it performs range searches on the axis-aligned rectangles dragged by the user in the standard drawing window.
• NearestNeighborVisualizer.java reads a sequence of points from a file (specified as a command-line argument) and inserts those points into a 2d-tree. Then, it performs nearest-neighbor queries on the point corresponding to the location of the mouse in the standard drawing window.

## Analysis of running time and memory usage (optional and not graded)

• Give the total memory usage in bytes (using tilde notation) of your 2d-tree data structure as a function of the number of points n, using the memory-cost model from lecture and Section 1.4 of the textbook. Count all memory that is used by your 2d-tree, including memory for the nodes, points, and rectangles.

• Give the expected running time in seconds (using tilde notation) to build a 2d-tree on n random points in the unit square. (Do not count the time to read in the points from standard input.)

• How many nearest-neighbor calculations can your 2d-tree implementation perform per second for input100K.txt (100,000 points) and input1M.txt (1 million points), where the query points are random points in the unit square? (Do not count the time to read in the points or to build the 2d-tree.) Repeat this question but with the brute-force implementation.

## Deliverables

Submit the files PointSET.java and KdTree.java along with PointSETTest.java and KdTreeTest.java.

We will supply algs4.jar. Your may not call library functions except those in those in java.lang, java.util, and algs4.jar.

# Written Assignment 4: Binary Search Trees

The goals of this assignment are to test your understanding of the material covered in sections 3.1 to 3.3 of the textbook, and the lecture and recitation materials. You should read the textbook chapters before doing this part of the assignment.

Written homeworks must be typeset in LaTeX and submitted in PDF format. Please insert a page break between each question, so that your answer to each question starts on a new page in your PDF document.

## Q1. BST height

What is the best case BST height? Worst case? If shuffling , probabilistically, leads to a log n tree height, why don’t we simply shuffle our input data before building our BST based symbol table to avoid worst case behavior?

## Q2. Deleting nodes from a BST

Describe two methods for deleting nodes from a BST. What effect do they have on its running time? (No need for a formal proof, an English explanation is fine).

## Q3. Traversals

Draw the binary tree for this level-order traversal: P E S A N V L Y I. Give the in-order traversal, preorder traversal, and postorder traversal of the tree that you drew, including elements for the null leaves. For each of these travesals, could you reconstruct the tree and if so, explain how.

## Q4. Proof 1

Prove that if a node in a BST has two children, its successor has no left child and its predecessor has no right child.

## Q5. Proof 2

Prove that no compare-based algorithm can build a BST using fewer than lg( N!) ~ N lg N compares.