CMSC 471 Course Project

Building an Intelligent Agent for Search and Rescue

Introduction

With the increasing nationwide concern with domestic disasters, there has been a surge in developing rescue mechanisms for victims of these disasters. One major recent effort includes developing autonomous robots that search through a disaster area for victims. These autonomous robots can search in areas that would be hazardous for rescue workers, aiding their search and possibly providing limited treatment to the victims.

Robocup rescue and the NIST Urban Search and Rescue competitions are two of the more popular forums for developing agents to solve this problem. However, the current simulators offered by these groups focus on realistic simulation, which focuses much of the initial development of an intelligent agent on low-level robotic hardware issues, such as driving in a straight line from one location to another. I've developed a 2D simulator that will allow you to develop an intelligent agent to solve the search and rescue problem from a higher-level perspective, allowing you to ignore the low-level details and focus on applying artificial intelligence techniques to the problem.

Overview

In this project, you will develop an intelligent agent for search and rescue in a 2D gridworld simulating the interior of a building struck by disaster. Your intelligent agent's task will be to search the building for victims and carry as many of them as possible to safety while they are still alive in a limited period of time.

Your intelligent agent will control a simulated robot with sophisticated sensors and effectors, including a long-range object recognition system, GPS, short-range medical diagnostic sensors, and accurate navigation. Your agent will be preloaded with the structural plans for the building, but will need to explore the building for victims simultaneously with other competing agents. The disaster may have blocked certain areas of the building, so your agent must be robust to such changes. Additionally, the structural plans of the building include only the floorplan, so your agent should be robust to the location of objects within the building (such as desks and chairs) that may complicate navigation. The robot's long-range object recognition system and GPS will be useful in navigating the disaster-striken building.

Your agent's task is to locate victims and to carry them to one of the building's exits. Your agent is only credited with rescuing live victims, so the robot's short-range medical diagnostic sensors may provide information useful for determining the status of a patient's injuries and predicting how long they will survive. Your agent might use this information to triage the victims and determine the order in which to rescue them.

Although search and rescue is usually a collaborative effort (using multiagent techniques), your intelligent agent will be (indirectly) competing with other rescue agents to rescue the most live victims. In the spirit of the application, your agent should not take any action to hinder another agent's rescue efforts, but may be affected by them. For example, a victim your agent plans on rescuing may have already been rescued by a competing agent. The purpose of this competition is to gauge the effectiveness of each intelligent agent's strategy, not to use adversarial reasoning techniques to thwart another agent's rescue efforts.

Project Requirements

Project teams

You will work in teams of two or three people to develop your intelligent agent. A team of three people is expected to have done 1.5x the work of a two-person team; the project will not become easier for larger teams, although, larger teams may be able to develop more sophisticated solutions.

Some of the project deliverables will be individual efforts and others (most notably the design and coding) will be group efforts.

You may not work alone without my permission. Students who have a strong preference for working alone should talk with me privately about the workload; I will expect a student working individually to produce the same scope and quality project as a two-person team.

AI requirement

Your intelligent agent must use at least two AI techniques/concepts that we are studying this semester. You may (and are encouraged) to incorporate more techniques into your intelligent agent, but you are required to use at least two.

Simply hacking together a controller for the robot that does not use any AI techniques will earn you a low grade, even if it wins the competition. Conversely, a well-designed agent that correctly applies several AI techniques can earn you an A, even if it does very poorly in the competition.

Here are some examples of AI techniques which you may wish to incorporate into your design:

Heuristic search and/or local search
Constraint satisfaction
Adversarial reasoning (except you are prohibited from taking adversarial action against other rescue agents)
Logical reasoning (forward-chaining, resolution theorem proving)
Planning (STRIPS, partial-order, or hierarchical task network)
Inductive machine learning (decision trees, SVMs, naive Bayes, neural networks, etc...)
Reinforcement learning (that is, iteratively learning reactive policies from reward feedback, typically through repeated "self-play")
Bayesian reasoning

This list is not exhaustive. I would prefer that you only consider techniques which we will cover in-class; however, I will consider allowing you to use other AI techniques if you make a strong argument for their use in your project proposal.

In particular, you are prohibited from taking any form of adversarial action against competing rescue agents to deliberately reduce the number of victims competing agents can rescue. For example, your agent can not deliberately attempt to complicate other agents' navigation around the gridworld. Such an approach is against the spirit of the application and will be construed as dishonest behavior in the tournament. You are welcome to use adversarial reasoning against the victims (for example, to pin a moving victim into a corner so you can rescue him or her) or other aspects of the environment, but not against other rescue agents.

Grading

As stated on the syllabus, this project will count for 25% of your final grade. The project grade will be broken down as follows:

Proposal -- 5%
Design -- 5%
Tournament Behavior -- 5% for whether and how well your agent works (following the rules, not crashing, playing fair, returning actions within the allotted time, completing the simulation)
Tournament Performance -- 5% for how well it performs in the actual tournament
Project report and implementation -- 80%

Additionally, I will ask you several times to individually send me a brief summary of how the project is going. These reports are confidential and individual. They will not count toward your grade, but may help me allocate credit within the team. These reports will also alert me to any developing problems, so please be frank and honest. At a minimum, these reports should include: how you are dividing the work, whether you think the balance within the team is fair, and whether there are any problems with the team.

Deliverables and Milestones

Wed 9/26	Project description out
Wed 10/3	Project teams formed
Wed 10/17	Project proposal due (5% of project grade)
Mon 11/5	Project design due (5% of project grade)
Mon 11/12	Tournament dry run #1
Wed 12/5	Tournament dry run #2. Draft report (optional) due.
Wed 12/10	Tournament (5% for behavior, 5% for tournament performance)
Wed 12/19	Project code and final report due (80% of project grade; due at beginning of the final exam)

All written assignments will be due at the start of class in hard copy on the due date. The course lateness policy applies. I expect all assignments to be well-written, clear and organized, with proper grammar and spelling. Proofread your work before you turn it in.

Project teams formed

(Group deliverable.) Your team should turn in a single sheet of paper listing all of the team members. If you need help finding someone else to work with in the class, let me know ASAP.

Project proposal

(Group deliverable) The proposal should be a 1-2 page description of your team's proposed project. Each team should turn in one proposal that you worked on together. It is not binding, and can be changed later. However, if you do change it later, your team should submit another proposal, most likely before your design is due.

Your proposal should include the following information at a minimum: what AI techniques/concepts have you chosen? How do you anticipate using these techniques to solve the problem? What issues do you anticipate with these techniques? What resources have you identified that will help you develop/code these techniques or understand the search and rescue problem? How does your team plan to divide the work? What challenges and difficulties do you anticipate with your proposed project?

You don't need to have a deep understanding of the techniques you plan to use at this point. However, your proposal should demonstrate that you've thought through the problem, done a bit of reading on the techniques you've chosen, and understand how they may apply to your proposed solution.

Project design

(Individual deliverable) Each team member is responsible for turning in their own project design report; although designing the solution, of course, is group work. The designs should be comprehensive and should describe parts of the project that other teams members will work on primarily.

The design should detail the approach your team proposed. It should begin with an overview of the approach (several paragraphs to several pages in length), a diagram of your system architecture, details on each element of the design (especially why and how it will work), a list of the major functions you intend to write, a breakdown of how your team intends to divide the work, and a schedule of internal milestones and deadlines for your team. Overall, this report should be at least several pages in length.

I do expect you to do significant outside reading on the AI techniques you've chosen.

Tournament dry run #1

The first dry run is very simple and will be homework. The goal is for your agent to demonstrate that it can control the robot and perform basic navigation around the gridworld without bumping into walls or objects. I will provide more details closer to the time when it is due.

Tournament dry run #2 and (optional) draft report

For the second dry run, you should have a working intelligent agent. Your agent may still be tweaked for the tournament, but it should be mostly functional, work without crashing, and be able to complete an episode of search and rescue. The idea is that the dry run will be as close as possible to a single episode of the tournament. I will provide more details closer to the deadline.

Additionally, I am giving you the option of submitting a draft of your final report to me for feedback before the final version is due. This is purely optional, but I highly recommend it.

Tournament

The tournament will be in-class and will give you a chance to compete with other teams' intelligent agents. The tournament will use several simulated buildings that you have never before seen, although they will be similar to ones you have seen. More details to come later.

Project code and final report

(Individual deliverable) Your final report should describe your project in detail. Neatness, clarity, organization, and writing skills all count. Be warned -- I expect this report to be well-written. If you have any reservations about this, I strongly encourage you to submit a draft report and to make use of the UMBC writing center in the library.

Here is a suggestion for how you might organize it, although you are free to change this format:

Introduction: a basic overview of the search and rescue problem and your approach to solving it.
Methods and Design: a detailed description of your intelligent agent's design, details on the AI techniques you chose to use in building your agent (including background sections describing the methods in general), details on why you chose these techniques and how you applied these techniques to the problem. (For example, what adaptations, simplifications, or extensions did you need to make to the general technique to get it to work?)
Challenges: a description of the main challenges you faced in developing the agent and how you adapted to overcome them.
Future Work: what would you do if you had more time? How would you change your approach, what would you have done differently, and what extensions (other techniques?) would you include and how would you use them in your agent?
References: a complete list of properly cited references (in APA format) that you used in designing and developing your agent. These should include all sources that you consulted while learning about the AI techniques as well.
Appendix with a hard copy of your code and any data you've gathered in developing the agent. This data might include the performance of your agent competing against itself, against a human, and/or against the course-provided agents. Your appendix might also include graphical depictions of any learned models your agent used in diagnosing the victims.

Your code must also be submitted electronically via UMBC's submit command to cs471 project.

Gridworld Search and Rescue Description

Gridworld

The simulated building lies on a rectangular gridworld with m rows and n columns. Each cell in the grid can be occupied by only one object at a time (with the single exception of your agent carrying another object, as described later).

The gridworld has cardinal directions north, south, east and west and an inherent coordinate system, with the south-west corner having coordinate (0,0). The x-coordinate increases from the origin for cells to the east, and the y-coordinate increases for cells to the north. All coordinates remain fixed throughout the duration of the simulation.

Each cell in the gridworld may have up to four walls, corresponding to each of the four directions. Objects cannot move through the walls and sensors cannot penetrate the walls. Walls cannot be demolished. For simplicity, the building does not contain any doors that require opening.

The agent starts at one of the building's entrances (assigned randomly) and must rescue victims by returning them to any of the building's entrances (for simulated pickup by rescue workers). The gridworld cells at the building's entry points are flagged with special markers, allowing the agent to detect them within long-distance sensor range. The building's outer walls lie inside the gridworld's boundaries, allowing passage outside of the building for the agent to move between entrances, if necessary.

Starting information

At initialization, the agent will be given the size of the gridworld, the location of all walls of the building (the building's "structural diagram"), and the location of all building entry points. Note that the number and location of victims, the location of other agents, and the location of objects within the building are unknown to the agent at this time. However, it is known that no victim is outside the building (otherwise, rescue workers would have picked them up already).

Objects within the gridworld

While the agent knows the building's layout at initialization and can navigate roughly based on it, there may be objects within the gridworld that complicate the navigation paths. For example, a hallway may be blocked by debris, effectively acting as another "wall" in the building. The building will contain an assortment of standard objects (such as furniture, chairs, etc.), some of which can be moved by the agent and others of which cannot. Since only one object can occupy a gridworld cell, these objects will also complicate navigation for the agent. Moreover, since multiple rescue agents are present in the environment simultaneously, another agent may move an object during the simulation, making the location of these objects slightly dynamic.

Figure 1: A screenshot from the visualization tool (version 0.02) with different aspects of the simulated world labeled in black.

The Rescue Agent

Your intelligent agent will control a simulated robot with sophisticated sensors and effectors. In keeping with a level of realism, the sensors cannot penetrate the walls, making the environment partially observable to the intelligent agent. Additionally, the rescue agent's effectors are constrained to a limited number of actions.

Sensors

The rescue agent is equipped with the following sensors:

Long-range object recognition and localization sensors

The long-range sensors cover a rectangular range around the agent. This rectangular range covers four grid cells in all directions, forming a nine-by-nine square with the agent in the center of the square. Any object within this range will be identified by name and its precise coordinate location. Also, any object being carried by that object will be identified. Additionally, any walls within this range will be included. The long-range sensors cannot penetrate walls, so the covered area may be reduced depending on the agent's location and will not always be a complete rectangle.

These sensors are always active and provide information to the agent immediately following every action.

A short-range medical diagnostic sensor array

The short-range sensor array must be activated by the agent and provides information on the victim in the specified adjacent cell (north, south, east, or west only). This sensor characterizes the victim as a feature vector of real-valued data. This feature vector might include such information as the victim's heart rate, systolic blood pressure, diastolic blood pressure, etc., but exact specifications of the sensor data will be forthcoming.

I will make available to you labeled training data that use can use to learn a model to triage the victims and predict how long they will survive. The exact specification of the feature vector will be available when I release this data.

This sensor is activated by the action sense and requires a direction {north, south, east, west} to specify the adjacent cell.

Self-feedback sensors

The self-feedback sensors provide information on the agent itself and any object it is carrying. This information will include the agent's current location, the current simulation time, and the name of the object it is carrying. Additionally, it will provide the status of the last action the agent attempted to execute, such as whether the action succeeded.

These sensors are always active and provide information to the agent immediately following every action.

Effectors and actions

The resue agent is equiped with omni-directional wheels, allowing the agent to move in any of the cardinal directions immediately and a lift capable of carrying an object. These effectors allow the agent to perform the following actions:

Move

The move action provides for basic navigation in the gridworld along the cardinal directions. It requires a direction {north, south, east, west} to specify where to move. The action may fail if it attempts to violate the rules of the gridworld, such as trying to move the agent through a wall or to a cell already occupied by an object.

Pickup

The pickup action allows you to pick up an object in an adjacent cell and carry them with you as you navigate the gridworld. It requires a direction {north, south, east, west} to specify the adjacent cell. The robot is not able to pick up all objects, so the move may fail. Also, the agent can carry only one object at a time. Obviously, the pickup action cannot work through walls.

Dropoff

The dropoff action is the opposite of the pickup action: it drops the object your agent is carrying into the specified adjacent cell. Like pickup, it requires a direction {north, south, east, west} to specify the adjacent cell. The dropoff action cannot work through walls or when another object is occupying the target cell.

Sense

The sense action activates the short-range medical diagnostic sensors on an object in the specified adjacent cell specified by a direction {north, south, east, west}. The feature vector response from the sensor is returned to the agent as the status of this action. Recall that the short-range sensors cannot penetrate walls.

No-Op

Additionally, there is a noop action available to the agent, which specifies that the agent will perform no action this timestep. Trying to execute an invalid action (one that doesn't exist) will default to a noop.

Victims

Like victims in the real-world, the simulated victims each behave differently. They are autonomous and move around the environment. Less-injured victims may move around quite a bit.

Also like victims in the real-world, the simulated victims may die. They may be dead at the start of the simulation or may die during the simulation. The intelligent agent has access to victim vital signs and diagnostic information through its short range sensors; this data might be useful in triaging the victim and predicting how long the victim will live.

Victims in general "want" to be rescued, and will not move if they see a rescue agent. However, you may observe some victims exhibiting "scared" behavior and running away from the rescue agents. Such behaviors are not built into the simulation, but may be emergent. We will touch on emergent behaviors later in the semester.

The Simulation

Initialization

Upon registration with the simulation, the agent will be given the initial information about the gridworld (including the gridworld size, wall locations, and the building exits). The agent will be given 1 minute to process this information before the simulation begins.

Simulation time

The simulation will run for a certain number of timesteps and then total the score for each agent. Each action is considered to take one timestep.

I have not yet determined the number of timesteps we will use in the tournament (it might vary between rounds), but it will be large. Perhaps 250 - 500 timesteps or more, depending on the size and complexity of the gridworld. We may have to reduce this if we find that the tournament episodes are taking too long.

The perception-response cycle

After initialization, at each simulation timestep, the agent will be presented with its current perception of the world. This perception will include data from the long-range sensors, and self-feedback. If the short-range sensors were activated the previous timestep, the result from those sensors will be available via the self-feedback. Once the agent is presented with the perception, it will have a limited time (one minute) in which to respond with an action that the robot should execute in the current timestep.

The simulation will proceed as soon as all agents have responded with their actions for the current timestep. However, if an agent does not respond back with an action within the limited time, that agent's action for the current timestep will default to a noop. Even if the agent chooses to execute a noop action, it should respond and explicitly execute that action so the simulation can proceed, rather than waiting for the timer to expire.

Obviously, if the simulation runs for a large number of timesteps and each agent takes the maximum amount of time to respond with the next action, the simulation will run for a very long time. Please keep this in mind and consider that your agent should return a new move within a few seconds under most circumstances.

Conflicting actions

It may be the case that multiple rescue agents choose to execute actions which conflict. For example, two agents may try to move into the same cell at the same time. In such a case, one agent's action will succeed and the other agents' conflicting actions will fail; this choice will be determined randomly.

Actions from a non-agent (such as a victim) will never conflict with any action from an agent.

Scoring

Each agent will be credited with one point for each live victim it delivers to a building exit. Dead victims are not worth any points, nor are victims that rescue themselves by wandering to an exit.

Results

At the end of each search and rescue episode (after the allotted number of timesteps have elapsed), the agent will be informed of its score. This feedback might be useful to reinforcement learning schemes or other AI techniques which involve performance feedback.

Implementation

Your implementation of the intelligent agent will be in Lisp and must run on the UMBC GL systems. I will provide the simulator, a real-time visualization tool, and a Lisp client interface to the simulator. Additionally, I may provide various utility functions, data structures, tools, and gridworld maps which you may find useful in your implementation.

As with the rest of the course, any code you write for this project must be your own. There is one exception to this as described below in the Policy on using code libraries. Of course, you are permitted to share code within your team.

Linking your intelligent agent to the simulator will be a simple procedure. You will need to override three functions:

initialize(x-dimension y-dimension walls exits) This function will be called upon registering the agent with the simulator informing your agent of the gridworld dimensions, a list of the walls, and the location of the building exits.

choose-action(perceptions) This function will be called during each perception-response cycle. It will take in a list of perceptions and should returning a valid action.

inform-results(score) This function will be called at the end of the simulation to inform the agent of its score. You do not have to override this function if your agent does not need performance feedback.

Details on each of these functions and how to implement them will be available at the release of the simulator, on or before Friday, October 12th. Please note that this project is new, and as such, there may be minor changes between the project description and the release of the simulator. However, I expect those changes to be minor and to have a minimal effect on your design.

Your intelligent agent must follow the interaction protocols documented in this project description and with the simulator release. In particular, your intelligent agent is prohibited from interacting with the server outside of the perscribed methods. Similarly, you are prohibited from reverse-engineering any of the simulation or visualization code for any reason (including and especially in an attempt to discover the inner workings of the simulation server or visualization tool).

Policy on using code libraries

There are many code libraries for AI techniques freely available on the internet. However, the purpose of this project is to give you experience with developing and applying the AI techniques we study in-class. For this reason, your implementation should not use any AI code libraries without my prior approval.

There is an exception, however. Later in this semester, we will be using the Weka machine learning toolkit when we study machine learning. You are welcome to use the Weka machine learning toolkit in developing your project. Be warned that Weka is in Java, so it won't be very useful to you as a code library. However, you may decide to use it to train a machine learning model for triaging and diagnosing the victims based on their feature vector description from the short-range sensors. You could then take this learned model and implement it yourself in Lisp for use by your agent.

If your team decides that it absolutely must use another code library that I haven't mentioned here, your team must meet with me in-person and present a very compelling argument for using it before I will allow you to use it. Make sure you do this before you start using the library in your code, for you own sake.

Utility functions

Your team can decide to share utility functions that other teams may find useful. Such utilities should be e-mailed to me for review, and then I will post them to the course project website for other teams to use. Such utility functions must be well-documented and correct.

Your team can earn up to five extra-credit percentage points on the project by posting such utility functions, which is a nice incentive to do it. However, by posting such utilities, you may give up your strategy to other teams.