Skip to main content

This assignment is due before 11:00PM on Wednesday, September 13, 2017. There are two parts to this homework assignment:

• A programming assignment on percolation
• A set of questions about Union-Find and Analysis of Algorithms

You’ll need to submit your solutions to both parts of the homework before the deadline.

Collaboration policy.

• You must do this assignment by yourself.
• You must never give or expose your solutions to an assignment to anyone who is taking the course. For example, you may not place your solutions in a public location (such as a website, a public code repository, or a printout left in a lab). If you leave your computer unattended, be sure to protect it with a password.
• You must never view someone else’s solutions to a programming assignment (or variant of an assignment). For example, you may not download solutions to a Coursera version of the assignment from the web.
• All solutions are checked with plagiarism detection software. Any assignment that is flagged by the software will be automatically referred to the Office of Student Conduct, which will adjudicate whether the course collaboration policy was violated. The first violation will result in your overall course grade being decreased by one letter grade. A second violation will result in an F in the class.

Programming Assignment 1: Percolation

Write a program to estimate the value of the percolation threshold via Monte Carlo simulation.

The goals of this part of the assignment are:

• Refresh your memory of Java if you’re rusty after the summer
• Install Java and an IDE
• Give the professor and the TAs a chance to test out the new grading infrastructure
• Show how an efficient data structure can facilitate scientific experiments in other fields

Install Java and Eclipse

You should install Java and the Eclipse IDE on your computer for your operating system. You should also download algs4.jar, which contains Java classes for I/O and all of the algorithms in the textbook, and import the jar file in your Eclipse project.

To access a class in algs4.jar from a Java class that you write, you will need to include import statements, such as the ones below:

Note that your code must be in the default package; if you use a package statement, the autograder will not be able to assess your work.

Percolation

Given a composite system comprised of randomly distributed insulating and metallic materials: what fraction of the materials need to be metallic so that the composite system is an electrical conductor? Given a porous landscape with water on the surface (or oil below), under what conditions will the water be able to drain through to the bottom (or the oil to gush through to the surface)? Scientists have defined an abstract process known as percolation to model such situations.

The model

We model a percolation system using an $n$-by-$n$ grid of sites. Each site is either open or blocked. A full site is an open site that can be connected to an open site in the top row via a chain of neighboring (left, right, up, down) open sites. We say the system percolates if there is a full site in the bottom row. In other words, a system percolates if we fill all open sites connected to the top row and that process fills some open site on the bottom row. (For the insulating/metallic materials example, the open sites correspond to metallic materials, so that a system that percolates has a metallic path from top to bottom, with full sites conducting. For the porous substance example, the open sites correspond to empty space through which water might flow, so that a system that percolates lets water fill open sites, flowing from top to bottom.)

The problem

In a famous scientific problem, researchers are interested in the following question: if sites are independently set to be open with probability $p$ (and therefore blocked with probability $1 − p$), what is the probability that the system percolates? When $p$ equals 0, the system does not percolate; when $p$ equals 1, the system percolates. The plots below show the site vacancy probability p versus the percolation probability for 20-by-20 random grid (left) and 100-by-100 random grid (right).

When $n$ is sufficiently large, there is a threshold value $p*$ such that when $% $ a random $n$-by-$n$ grid almost never percolates, and when $p > p*$, a random $n$-by-$n$ grid almost always percolates. No mathematical solution for determining the percolation threshold $p*$ has yet been derived. Your task is to write a computer program to estimate $p*$.

Percolation data type

To model a percolation system, create a data type Percolation with the following API:

Corner cases

By convention, the row and column indices are integers between 1 and $n$, where (1, 1) is the upper-left site: Throw a java.lang.IllegalArgumentException if any argument to open(), isOpen(), or isFull() is outside its prescribed range. The constructor should throw a java.lang.IllegalArgumentException if $n$ ≤ 0.

Performance requirements

The constructor should take time proportional to $n^2$; all methods should take constant time plus a constant number of calls to the union–find methods union(), find(), connected(), and count().

Monte Carlo simulation

To estimate the percolation threshold, consider the following computational experiment:

• Initialize all sites to be blocked.
• Repeat the following until the system percolates:
• Choose a site uniformly at random among all blocked sites.
• Open the site.
• The fraction of sites that are opened when the system percolates provides an estimate of the percolation threshold.

For example, if sites are opened in a 20-by-20 lattice according to the snapshots below, then our estimate of the percolation threshold is 204/400 = 0.51 because the system percolates when the 204th site is opened.

By repeating this computation experiment $T$ times and averaging the results, we obtain a more accurate estimate of the percolation threshold. Let $x_t$ be the fraction of open sites in computational experiment $t$. The sample mean $\bar{x}$ provides an estimate of the percolation threshold; the sample standard deviation $s$; measures the sharpness of the threshold.

Assuming $T$ is sufficiently large (say, at least 30), the following provides a 95% confidence interval for the percolation threshold:

To perform a series of computational experiments, create a data type PercolationStats with the following API.

The constructor should throw a java.lang.IllegalArgumentException if either $n$ ≤ 0 or trials ≤ 0. Also, include a main() method that takes two command-line arguments $n$ and $T$, performs $T$ independent computational experiments (discussed above) on an $n$-by-$n$ grid, and prints the sample mean, sample standard deviation, and the 95% confidence interval for the percolation threshold. Use StdRandom to generate random numbers; use StdStats to compute the sample mean and sample standard deviation.

Analysis of running time and memory usage (optional and not graded)

Implement the Percolation data type using the quick find algorithm in QuickFindUF.

Use Stopwatch to measure the total running time of PercolationStats for various values of $n$ and $T$. How does doubling $n$ change the total running time? How does doubling $T$ change the total running time? Give a formula (using tilde notation) of the total running time on your computer (in seconds) as a single function of both $n$ and $T$.

Using the 64-bit memory-cost model from lecture, give the total memory usage in bytes (using tilde notation) that a Percolation object uses to model an n-by-n percolation system. Count all memory that is used, including memory for the union–find data structure. Now, implement the Percolation data type using the weighted quick union algorithm in WeightedQuickUnionUF. Answer the questions in the previous paragraph.

Deliverables

Submit only Percolation.java and PercolationStats.java. Your Percolation.java should use the weighted quick-union algorithm from the WeightedQuickUnionUF class in the algs4.jar. Your submission may not call library functions except those in StdIn, StdOut, StdRandom, StdStats, WeightedQuickUnionUF, and packages and methods under java.lang like java.lang.Math.sqrt().

Written Assignment 1: Union-Find and Analysis of Algorithms

The goals of this assignment are to test your understanding of the material covered in sections 1.4 and 1.5 of the textbook, and the lecture and recitation materials. You should read the textbook chapters before doing this part of the assignment.

Written homeworks must be typeset in LaTeX and submitted in PDF format.

Q1. Weighted quick-union by height

Develop a UF implementation that uses the same basic strategy as weighted quick-union but keeps track of the tree height and always links the shorter tree to the taller one. Prove a logarithmic upper bound on the height of the trees with N nodes with your algorithm.

Q2. Different uses of the id array in Union Find.

For the following diagram:

1. Give the contents of the id[] for the Quick Union algorithm discussed in class.
2. Give the contents of the id[] for the Quick Find algorithm.

Q3: Analysis of Algorithms

For each of the statements below, please say whether it is true or false, and give a 1 sentence explanation of your answer.

1. Worst case analysis provides a running time bound that holds for every input of length N.
2. Worst case analysis is usually easier to establish than average case analysis.
3. We retain lower order terms in asymptotic analysis, since we are concerned with getting a very accurate estimate of running time.
4. Constant factors can depend on system architecture, choice of compiler or programming language.
5. To establish the bounds on the class of algorithms that solve a problem, we typically implement an algorithm to establish the lower bound, and rely on a proof to establish the upper bound.
6. Big Oh provides a good estimate of the average running time for an algorithm.
7. Asymptotic analysis is concerned with large values of N and can be inaccurate for small N.
8. If an algorithm has a running time of Θ(N log N ) then: (say True/False for each of the items below, and explain each).   a. It is O(N log N). b. It is Ω(N log N).   c. It is optimal.
9. If the lower bound on the class of algorithms that solve a problem is an algorithm Ω($N^2$), and an algorithm in that class is O($N^2$) then:   a. The algorithm is Θ($N^2$). b. The algorithm is Ω($N^2$).   c. The algorithm is optimal.
10. If two algorithms have the same running time in terms of Big Oh, then they will have equivalent running times in Tilde

Q4: Evaluating the Claims of A Startup Company

Facebook has hired you as a special advisor to Mark Zuckerberg. Congrats - taking CIS 121 really paid off. Mark considers acquiring a startup in stealth mode. The company claims to have invented a new algorithm that will use polynomial time to solve a problem previously thought to be solvable in exponential time. The company refuses to release its code because it is worried that it will be stolen. They will allow you to do black box testing by sending whatever inputs you want to their server. You get back the output, and you can time how long it takes to run. Mark asks you to test it on the double, so you sketch out this chart.

Input size Response time
64 2.9
128 24
256 188
512 1503
1024 12026
1. What is your estimate of the order of growth of the company’s algorithm? How did you arrive at this estimate?