CIS 620 - Learning in Few-Labels Settings

Spring 2021, University of Pennsylvania

Dan Roth

Course Description

Machine Learning works when we have a lot of labeled data. However, in many realistic settings we do not have enough training data. In most cases this is due to semantic shift (a shift in the labels space Y) or domain shift (where the domain X of the target is different from the domain for which we have training data) but can also be due to the complexity and compositionality of the task. Some examples for these setting are:

Variable label space: imagine that you classify documents into a set of topical labels, and then want to use a different label space. Or that you classify entities into their semantic types, and then want to update the set of semantic types used.
Domain adaptation: you train a model on news data but you want to use it on email data, where you don’t have training data
Low Resource Languages: only 30 languages (out of around 3,500 written languages) have annotated data for basic tasks such as named entity recognition. How can we develop basic NLP tools for other languages?
Complex tasks: many natural language understanding decisions are “one-in-a-million” – they are very sparse; how can we learn models for these?

And, of course, similar challenges exist in computer vision and other sub areas of AI.

The goal of this class is to define and understand the space of Learning in Low Labels Settings – understand the problems and the methods that have been studied for these setting. We will do this mostly in the context of natural language understanding with, possibly, some digressions to computer vision.

We will consider methods such as

few/zero-shot setting
semi-supervised and transductive learning
self-supervised learning
the use of incidental supervision signals
transfer learning
adaptation methods

And do it in the context of multiple tasks.

You will read, present and discuss papers, and work on two projects. A small, well-defined one, in the first third of the semester, and a large and open ended one in the rest of the semester.

Important Dates

Date	Event
Feb 15, 2021	First Critical Survey Due
Mar 8, 2021	Second Critical Survey Due
Mar 15, 2021	Project 1 Paper Submission Deadline and Presentation
Mar 22, 2021	Project 2 Proposal Due
Mar 29, 2021	Third Critical Survey Due
Apr 5, 2021	Project 2 Progress Report and Brief Presentation
Apr 19, 2021	Fourth Critical Survey Due
Apr 26, 2021	Project 2 Final Presentation
May 5, 2021	Project 2 Due

Pre-requisites

Machine Learning class; CIS 419/519/520 or equivalent. NLP: Knowledge of NLP (equivalent to a basic Computational Linguistics/NLP class).

CIS 620 - Learning in Few-Labels Settings

Spring 2021, University of Pennsylvania

Dan Roth

Course Description

Important Dates

Pre-requisites

Time and Location

Lectures

Office Hours