Fluent and effective communication between warfighters is imperative, to convey orders and intentions and to ensure adequate situational awareness. Members of the same unit must continually exchange information as the environment changes. As autonomous bots move onto the battlefield, we must ensure that they can participate in these complex linguistic interactions. This almost necessarily means equipping them with natural language capabilities. A soldier whose native language is English can communicate a rich range of information and intentions in English with little appreciable increase in cognitive load, even when that soldier is under high stress. We cannot ask the soldier to compromise these abilities for the sake of our bots. We must strive to bring our bots to the soldier’s level.
Consider the following exchange between firefighters in a hostile urban setting, the 1999 Worcester Cold Storage and Warehouse Fire, under high stress and time pressure due to being lost and rapidly running out of air (these firefighters lost their lives before they could be rescued):
I’m on the second floor. Come up that flight of stairs and you go through a freezer chest, and I’m in the back where another freezer chest is. That’s where you’ll find it.
The second firefighter answers the question literally asked, but then appears to give a command that is unrelated to the question. But the implicit question the first speaker asks, what linguists call the question under discussion or QUD, is how he and his team can find the second speaker. What appears to be a command by the second speaker is a set of directions that answers the QUD. Such implicit meaning is fundamental to human language use. Effective communication with autonomous bots in real-time, high-stress situations requires that bots understand not only what is literally said, but also what is intended. In short, we must move from robust sentence processing, the current state of the art, to robust utterance understanding.
To achieve these goals, the communication system must exploit broad context. Natural language utterances in isolation are highly ambiguous, but experiments have shown that humans are rarely aware of this fact. Ambiguities pass unnoticed because humans bring contextual information to bear on the task of utterance understanding, and this information disambiguates most (though not all) utterances. Thus, if a bot (or, for that matter, any computational system) is going to cope with the realities of human linguistic communication, it will need to be situated in its environment, pooling information from its surroundings and using that information to determine not only which sentences its interlocutor used, but also what those sentences were intended to communicate. It must also interact with its environment as a result of language use. Thus, the understanding of intentions must lead to action, and so a physically situated bot (as opposed to an agent in a purely informational universe) must be able to translate utterance-meanings into responses that begin with specific plans to move through and interact with the environment and result in signals that drive appropriate motors and activate appropriate information-gathering sensors.
Painful experience has shown that if a very rich interface language is to be habitable, i.e., easily learned and fluently used without constant reference to manuals, it must be a subset of natural language, and, if a proper subset, carefully and empirically tailored to be habitable. To date, the only way to assure this is by empirical means: the design methodology must be: (1) collect data about the usability of proposed subsets, (2) modify the design if necessary, and (3) iterate this process as necessary. Because the language design must easily evolve, there are strong practical advantages if the underlying linguistic specification incorporates formal models of language, in terms of syntactic structure, the structure of explicit meanings (semantics), and the enrichment of explicit meanings with additional implicit meaning (pragmatics). These formal models must be rich enough to allow the range of phenomena that actually occur in natural languages, yet they must allow computationally efficient, automatic analysis methods so that we can study the habitability and effectiveness of the resulting computational artifacts.
We propose to undertake fundamental research to develop a language understanding capability of the kind described above for a bot functioning in a real-world environment, focusing on the task of Urban Search and Rescue (USAR), which can be tested by civilians and yet is a subset of capabilities required by the military. The structure of this capability, and the research team we propose to bring to bear to develop it, is shown in figure 1.
The SUBTLE team brings together researchers with the expertise in the requisite wide range of disciplines for this pipeline of components: computational linguistics, including formal language theory, computational semantics and parsing; syntax, semantics and pragmatics within linguistic theory; probabilistic modeling and machine learning; robotics; and human–robot interaction (HRI). Our methodology will be significantly empirical. Thus, methodologies and research in HRI form the backdrop of the research plan, and the collection of a significant experimental corpus will be among our deliverables. At the heart of our proposal is fundamental new research (but extending previous work by each of us) to develop methods for constructing a computationally tractable endto- end system for a habitable subset of English, one that takes us from utterances all the way to the understanding of them, including both a formal representation of the implicit meaning of utterances and the generation of control programs for a robot platform, here an iRobot ATRV-JR. In parallel, we will also develop a virtual simulation of the USAR environment to enable inexpensive large-scale corpus collection to proceed during many stages of system development.
As figure 1 shows, initial linguistic steps involve computations of syntactic structure leading to a semantic representation, with the final interpretation achieving pragmatic enrichment. At every stage, multiple interpretations are possible and likely. New machine learning techniques, which we will develop and which also form a backdrop to the component pipeline, will allow us to manage these ambiguities in a tractable way so that only the intended interpretation survives (and if more than one survives the system should be able to request clarification). Finally, we will extend and combine aspects of previous research to move from linguistically oriented logical representations of meaning to sets of constraints for robot controllers, and then on to the controllers themselves.
Such goals have been conceived earlier and some partially implemented. However, they have not turned out to be scalable. What makes this goal achievable now are the following developments: (1) the development of mathematical (formal and logical) systems for syntax, semantics, and pragmatics (including some aspects of discourse); (2) the application of new methods in machine learning for structured data to develop methods for statistical language processing using raw and annotated corpora, including machine learning methods capable of dealing with a vast array of features in a computationally tractable manner; (3) the emergence of HRI as a viable subfield; (4) advances in robot control.
We strongly believe that until significant progress towards this central pipeline of understanding is achieved, incorporation of additional information channels such as gesture and intonation is premature. Indeed, because continuous speech transcription technology will advance rapidly during the next decade, given other intensive DoD research programs, it is best to focus on informal text input rather than battle current high error rates for casual speech.
The SUBTLE project will lead to (a) new frameworks and theories for (i) robust automated understanding of linguistic intention and (ii) linking linguistic intention to appropriate robot control, (b) new machine learning algorithms for structured problems such as natural language, (c) a formal computational specification of a significant subset of natural language (English), and (d) a testbed system for investigating HRI using natural language which will ultimately enable military designers to develop powerful communication methods between bots and humans. The project will consist of eight interrelated research tasks. We will maintain a SUBTLE testbed that integrates research results, providing a basis for two integrated demonstrations during the research period. In addition, through workshops and community building activities, we will also seed a community of researchers in this area that will drive research for the coming decades. These tasks are:
- Task ML Machine learning methods for language interpretation using time- and space-efficient inference and learning techniques based on stochastic search (McCallum, Pereira).
- Task HRI Human–robot interaction (Yanco) to discover the aspects of language and robot algorithms that are necessary to support human–bot dialogue.
- Task L1 Syntactic analysis using Tree-Adjoining Grammars (Joshi) and associated efficient discriminative parsing algorithms (Joshi, Marcus, Pereira).
- Task L2 Semantic interpretation of Tree-Adjoining Grammar parse trees (Joshi, Romero), with emphasis on quantificational structures, including attitude predications (need, believe).
- Task L3 Pragmatic enrichment with a decision-theoretic formalization of Gricean reasoning (Potts), using the same probabilistic foundation as the rest of the project’s members.
- Task ACT1 From propositions to movement in the Parameterized Action Representation (PAR) formalism (Badler, Joshi) to bridge the gap between semantics and agent control.
- Task ACT2 A new formalism for mathematically formulating specifications for robot motion planning using linear temporal logic and computation tree logic (Pappas).
- Task DATA Corpus collection (i) to begin to measure language entropy, (ii) to test the adequacy of our language coverage, and (iii) to provide data to train statistical models for use at each level of language analysis (Marcus, Badler).
- Activity HBC Creating an interdisciplinary community of researchers in Human–Bot Communication (HBC) to bridge the areas of robotics, human-robot interaction, and computational linguistics (Yanco, Marcus, and others.).