For all assignments, we have a dedicated cluster of machines, named spec01 through approximately spec40 (the exact number will vary depending on resources available to CETS), for this course. These machines are set up with less port blocking than a typical SEAS machine: it is up to you to use them responsibly.
Development will be in Java 6, aka JDK 1.6.
We recommend the use of subversion, a version control system, for maintaining your project code. See here for details.
As a development environment, we recommend Eclipse 3.4 or later (available on the spec cluster). You can get an Eclipse plug-in for subversion here.
Assignment 1: Web and application servers; thread pools; learning APIs.
You will also want the servlet helper classes, the servlet API jar, and the simple command line servlet runner (aka TestHarness).
Note that if you use Eclipse, you may ultimately need to set the classpath using the Eclipse GUI (this is via Window|Preferences, Java, Build Path, User Libraries in Eclipse 3.x).
Some useful URLs:
Assignment 2: web crawling, XPath, XQuery.
You will also need to use the following:
Assignment 3: Web services, distributed hash tables.
You will want the following:
Final team project: P2P web crawler and search engine.
We will be using Amazon's Elastic Compute Cloud (EC2) and Simple Storage Service (S3) for the project. See here for details on how to get started with EC2. We will be distributing some EC2 credits that may cover a portion of the costs. The "Getting Started" guide is here. An excerpt tailored for our class is at this location.
At least one member of your team will need to learn Hadoop MapReduce. You may want to install locally according to these instructions.