(Last updated October 2016)
I teach three courses at Penn. Two are courses related to my research, and one is an introductory undergraduate course. My teaching philosophy is that students are best engaged through hands-on work. This idea is borne out in the project-based design of my classes, and in my mentorship of undergraduates and master’s students through research projects. Here, I’ll describe my courses, my work with undergraduates and master’s students, and my personal efforts at increasing the number of women in computer science.
I have been working in the field of machine translation since 2000, when statistical data-driven approaches became the dominant paradigm. Statistical machine translation emerged as the clear winner of more than a decade’s worth of DARPA bakeoffs, and it underlies commercial translation services like Google’s online translation platform and Skype’s speech-to-speech translation. I co-designed a course at Johns Hopkins University with my colleagues Adam Lopez and Matt Post to teach the fundamental techniques that underlie statistical machine translation to graduate students and advanced undergrads. In addition to a set of 20+ full lectures, we created an innovative set of hands-on projects. We published a journal article about the course projects (Lopez et al, 2013). Here is the abstract of the article:
Machine translation (MT) draws from several different disciplines, making it a complex subject to teach. There are excellent pedagogical texts, but problems in MT and current algorithms for solving them are best learned by doing. As a centerpiece of our MT course, we devised a series of open-ended challenges for students in which the goal was to improve performance on carefully constrained instances of four key MT tasks: alignment, decoding, evaluation, and reranking. Students brought a diverse set of techniques to the problems, including some novel solutions which performed remarkably well. A surprising and exciting outcome was that student solutions or their combinations fared competitively on some tasks, demonstrating that even newcomers to the field can help improve the state-of-the-art on hard NLP problems while simultaneously learning a great deal. The problems, baseline code, and results are freely available.
All of the projects represent contemporary research topics within the field of machine translation. Unlike problem sets, in that they are open-ended assignments, and none of them has a “correct” solution. However, they all have objective measures of how good the students’ solutions are. We are able to automatically score their solutions and we maintain a public leaderboard that shows which of the students have the current best solution. This encourages friendly competition among the students, and sometimes drives the students to create novel solutions that approach the state-of-the-art. Our course materials and lectures are all freely available, and they have been used to teach courses at several other universities (Carnegie Mellon University, Johns Hopkins University, University of Illinois at Urbana-Champaign, Simon Frasier University, and the University of Edinburgh). Professors at these universities have contributed improvements to the class. We advertise our courses collectively at http://mt-class.org/.
I helped to popularize the use of crowdsourcing within the field of natural language processing (NLP). Through my grants, I have spent approximately $250,000 to create new data sets for NLP that would have cost millions using previous techniques. Crowdsourcing has transformed the way that I approach my research. Instead of taking an existing data set and thinking “How can I build a model that performs better on this problem?”, I now think “What problems do I actually want to solve, and what sort of data do I need to solve them?” This opens new horizons in research, and allows creative solutions to new problems.
My interest in crowdsourcing has expanded beyond NLP. I created a new course on Crowdsourcing and Human Computation. Here is the course description:
Crowdsourcing and human computation are emerging fields that sit squarely at the intersection of economics and computer science. They examine how people can be used to solve complex tasks that are currently beyond the capabilities of artificial intelligence algorithms. Online marketplaces like Mechanical Turk and CrowdFlower provide an infrastructure that allows micropayments to be given to people in return for completing human intelligence tasks. This opens up previously unthinkable possibilities like people being used as function calls in software. We will investigate how crowdsourcing can be used for computer science applications like machine learning, next-generation interfaces, and data mining. Beyond these computer science aspects, we will also delve into topics like prediction markets, the sharing economy, how businesses capitalize on collective intelligence, and the fundamental principles that underlie democracy and other group decision-making processes.
I designed the course to appeal to students in Penn’s interdisciplinary major Networks and Social Systems Engineering (“NETS” for short). NETS students bring computer science fundamentals to bear on a variety of problems in other disciplines. Among other things, my course shows how computer science can be used to do things like empower epidemiologists and enable data-driven public policy.
In the homework assignments, we build a structured database about all reported incidents of gun violence in the United States. This idea came through a collaboration with Doug Weibe, a professor in Department of Biostatistics and Epidemiology in Penn’s School of Medicine, who studies gun violence from a public heath perspective. Congress has blocked the CDC and NIH from conducting research on this topic. The homework assignments in NETS 213 combine together to create exactly the sort of gun violence database that partisan congressional action has sought to block. First, students use machine learning to train a text classifier to predict whether an article describes an incident of gun violence or not. They apply it to more than 2 million web pages harvested from over 2,000 local newspapers around the country. Next, they have crowd workers validate the predictions of the classifier. The students learn how to use Mechanical Turk and how to perform quality control on contributions from anonymous crowd workers. They then build an interface that allows crowd workers to extract structured information from the gun violence articles (including things like the location of the shooting, demographic information of the shooter and the victim, and details about the circumstances like whether alcohol was involved, or if it was an incident of domestic violence). Finally they create visualizations to analyze the data that they created.
My goal is to engage students by showing them how research can have social impact. So far the strategy has been working. The course enrollment doubled from 25 students in the first year, to 50 in the second. More than 100 students have signed up to take it when I teach it again this Spring.
This year I have started teaching CIS 121, which is Penn’s introduction to data structures and algorithms. This is the third or fourth course that undergraduates take in their computer science major, depending on whether they have had programming experience in high school or not. It’s an incredibly fun course to teach for a lot of reasons. This is the course where the science part of computer science really comes into focus through the analysis of algorithms. Students become much better programmers because they implement and analyze the algorithms and data structures that are the fundamental building blocks of more sophisticated programs. Also, this course is probably the most valuable course that students take to prepare for computer science job interviews.
The course has an enrollment of more than 200 students. I work with a staff of 26 incredible undergraduate TAs. In many ways the TAs have a larger influence on the students’ experience in the course. They run the recitations and give the majority of the office hours in the course. Over the years the TAs have designed the programming assignments as well. In addition to lecturing and writing exams and written homework assignments, my role is really organizational and logistical.
When I teach the course in the future, I hope to help turn it into a conduit for recruiting more students into the major, and to encourage more students from outside of Engineering to a minor in computer science. I would like to make the course accessible to all students by showing the practical value of the course material. I have made efforts to recruit more women to be part of the teaching staff. In my first year teaching the course, only 3 out of 19 of the staff were women. Now 10 of the 26 TAs are women, and the 3 head TAs are all women.
I am currently working with 21 undergraduate and master’s students. I supervise these students through independent students, or paid research assistantships, and through team project work. My own career was profoundly influenced by working with a professor when I was an undergraduate at Stanford University. It was that experience of doing research that made me decide that grad school was in my future, which in turn lead to my career in academia. I have a policy that I will start a research project with any student who is interested in working with me. I formulate a reasonably circumscribed project that should be doable in 1-2 months. At the end of that time, the student and I evaluate whether we should continue, with the idea that either of us can back out without causing offense. The student gets to assess whether NLP research actually fits their interests. I assess whether it is worth it for me to set aside time to work with the student, which I typically do if the student is enthusiastic and productive. I have weekly one-on-one meetings with each of the students, plus a weekly group meeting where the students get to practice presenting their research in a way that is understandable to others and where the students get advice from one another.
I am proud of my track record of trying to promote women in computer science. At Johns Hopkins University, I was the chair of the diversity committee for the CS department. I helped to start a Women in Computer Science (WiCS) group. The goals of WiCS were to foster a sense of community and to improve retention of women in our undergraduate program. We offered undergraduates mentorship from female graduate students and from faculty (like myself), introduced the students to research opportunities, offered them advice on applying for graduate programs and jobs post-graduation, and wrote letters of recommendation for NSF Fellowships and grad school. I mentored 6 undergraduates at JHU through WiCS. One of them, Ellie Pavlick, went on to apply to the PhD program at Penn. She has become one of my strongest PhD students.
Of the PhD students postdocs, and visiting scholars who I am currently advising at JHU and Penn, 6 of 9 are women. At Penn, 5 of 6 of them are women. Of the undergraduates and master’s students who I am now mentoring, 13 of 21 are women. I hope to continue improving the gender diversity of the computer science department here, and I feel that the best way of doing so it to engage women in research.
Learning to translate with products of novices: a suite of open-ended challenge problems for teaching MT.
Adam Lopez, Matt Post, Chris Callison-Burch, Jonathan Weese, Juri Ganitkevitch, Narges Ahmidi, Olivia Buzek, Leah Hanson, Beenish Jamil, Matthias Lee, Ya-Ting Lin, Henry Pao, Fatima Rivera, Leili Shahriyari, Debu Sinha, Adam Teichert, Stephen Wampler, Michael Weinberger, Daguang Xu, Lin Yang, and Shang Zhao.