Computer and Information Science (CIS)
Modern approaches to machine translation, like those used in Google's online translation system, are data-driven. Statistical translation models are trained using bilingual parallel texts, which consist of sentences in one language paired with their translation into another language. Translation dictionaries and their associated probabilities are extracted from human-translated parallel texts, and are then used as the basic building blocks in the automatic translation systems. These data-driven methods have been shown to be extremely successful for translating languages which have large bitexts. Chris's research focuses on extending these methods to a much wider range of the world's languages. His work examines: (1) improving the underlying translation models through the incorporation of rich syntactic models, (2) using crowdsourcing to translate large volumes of text at low cost, achieving professional level translation quality using non-professional translators, and (3) exploring new techniques for learning translation without bilingual training data, instead using distributional properties across languages that can be observed in large collections monolingual texts. Additionally, Chris has also made contributions to the classic AI problem of understanding language. His approach to the problem of natural language understanding uses the data and methods from translation. In particular, he has shown how it is possible to automatically learn paraphrases and other meaning-preserving English transformations using bilingual data. He and his students are exploring how paraphrases can be used to recognize that two sentences share the same meaning, even when they have no words in common like in "Riots in Denmark were sparked by 12 editorial cartoons that were offensive to Muhammad" and "Twelve illustrations insulting the prophet caused unrest".
Affiliations: Chair - North American chapter of the Association for Computational Linguistics (NAACL), Editorial Board - Transactions of the Association for Computational Linguistics (TACL)
PhD 2008 - University of Edinburgh
MS 2002 - University of Edinburgh
BS 2000 - Stanford University