CSE 455:  Internet and Web Systems

Project Description

The project will be to build a peer-to-peer-based information retrieval and categorization system – in some ways a hybrid between today’s different search engine techniques and those used for databases.

 

The front-end will consist of an indexing system, search interface, APIs for exchanging queries and answers with other systems, and facilities for defining views that can contain the results of queries plus the results of manual insertions and deletions.  There will be multiple back-ends:  crawlers for email, Amazon product information, the web, etc.

 

Architecturally, the system will include the following:

 

See here for further details.

You can use this link to test your web crawler in a controlled environment.

Project code is due by 1AM Monday 4/26/04 and reports are due by 1:30PM on 5/5/04.