The SUBTLE team brings together researchers with expertise in a wide range of disciplines: computational linguistics, including formal language theory, computational semantics and parsing; syntax, semantics and pragmatics within linguistic theory; probabilistic modeling and machine learning; robotics; and human robot interaction (HRI). Our methodology will be significantly empirical. Thus, methodologies and research in HRI form the backdrop of the research plan, and the collection of a significant experimental corpus will be among our deliverables. At the heart of our proposal is fundamental new research (but extending previous work by each of us) to develop methods for constructing a computationally tractable end-to-end system for a habitable subset of English, one that takes us from utterances all the way to the understanding of them, including both a formal representation of the implicit meaning of utterances and the generation of control programs for a robot platform, here an iRobot ATRV-JR. In parallel, we will also develop a virtual simulation of the USAR environment to enable inexpensive large-scale corpus collection to proceed during many stages of system development.

Initial linguistic steps involve computations of syntactic structure leading to a semantic representation, with the final interpretation achieving pragmatic enrichment. At every stage, multiple interpretations are possible and likely. New machine learning techniques, which we will develop and which also form a backdrop to the component pipeline, will allow us to manage these ambiguities in a tractable way so that only the intended interpretation survives (and if more than one survives the system should be able to request clarification). Finally, we will extend and combine aspects of previous research to move from linguistically oriented logical representations of meaning to sets of constraints for robot controllers, and then on to the controllers themselves.

Such goals have been conceived earlier and some partially implemented. However, they have not turned out to be scalable. What makes this goal achievable now are the following developments: (1) the development of mathematical (formal and logical) systems for syntax, semantics, and pragmatics (including some aspects of discourse); (2) the application of new methods in machine learning for structured data to develop methods for statistical language processing using raw and annotated corpora, including machine learning methods capable of dealing with a vast array of features in a computationally tractable manner; (3) the emergence of HRI as a viable subfield; (4) advances in robot control.