Minimum Description Length Penalization for Group and Multi-Task Sparse Learning
ML/DM
NLP/CL
Paramveer Dhillon, Dean Foster and Lyle Ungar. JMLR (Journal of Machine Learning Research 12),
Feb. 2011(Avg. Impact factor ~3 - 5)
We propose a framework MIC (Multiple Inclusion Criterion) for learning sparse models based on the information theoretic Minimum Description Length (MDL) principle. MIC provides an elegant way of incorporating arbitrary sparsity patterns in the feature space by using two-part MDL coding schemes. We present MIC based models for the problems of grouped feature selection (MIC-Group) and multi-task feature selection (MIC-Multi). MIC-Group assumes that the features are divided into groups and induces two level sparsity, selecting a subset of
the feature groups, and also selecting features within each selected group.
MIC-Multi applies when there are multiple related tasks
that share the same set of potentially predictive features. It also
induces two level sparsity, selecting a subset of the features, and
then selecting which of the tasks each feature should be added to. Lastly, we
propose a model, TransFeat, that can be used to transfer knowledge from a set of
previously learned tasks to a new task that is expected to share
similar features. All three methods are designed for selecting
a small set of predictive features from a large pool of candidate
features. We demonstrate the effectiveness of our approach with
experimental results on data from genomics and from word sense disambiguation problems.
@article{dhillon11a,
author = {Paramveer S. Dhillon and Dean Foster and Lyle Ungar},
title = {Minimum Description Length Penalization for Group and Multi-Task Sparse Learning},
journal = {Journal of Machine Learning Research (JMLR)},
volume = {12},
month = {February},
year = {2011},
issn = {1532-4435},
pages = {525--564},
numpages = {40},
publisher = {JMLR.org}
}
Partial Sparse Canonical Correlation Analysis (PSCCA) for population studies in Medical Imaging
BMI/NI
Paramveer Dhillon, Brian Avants, Lyle Ungar, James Gee. ISBI (IEEE International Symposium on Biomedical Imaging), Barcelona, Spain, May 2012(Acceptance Rate: Unknown)
We propose a new multivariate method, partial sparse canonical
correlation analysis (PSCCA), for computing the statistical
comparisons needed by population studies in medical imaging. PSCCA is a
multivariate generalization of linear regression that allows one to
statistically parameterize imaging studies in terms of
multiple views of the population (e.g., the full collection of
measurements taken from an image set along with batteries of cognitive
or genetic data) while controlling for nuisance variables. This paper
develops the theory of PSCCA, provides an algorithm and illustrates
PSCCA performance on both simulated and real datasets. We show, as a
first application and evaluation of this new methodology, that
PSCCA can improve detection power over mass univariate approaches
while retaining the interpretability and biological plausibility of
the estimated effects. We also discuss the strengths, limitations and
future potential of this methodology.
Deterministic Annealing for Semi-Supervised Structured Output Learning
ML/DM
Paramveer Dhillon, Sathiya Keerthi, Olivier Chapelle, Kedar Bellare and S. Sundararajan. AISTATS (International Conference on Artificial Intelligence and Statistics), La Palma, Canary Islands, April 2012(Acceptance Rate: < 33.5%)
In this paper we propose a new approach for semi-supervised structured output learning. Our approach uses relaxed labeling on unlabeled data to deal with the combinatorial nature of the label space and further uses domain constraints to guide the learning. Since the overall objective is non-convex, we alternate between the optimization of the model parameters and the label distribution of unlabeled data. The alternating optimization coupled with deterministic annealing helps us achieve better local optima and as a result our approach leads to better constraint satisfaction during inference. Experimental results on sequence labeling benchmarks show superior performance of our approach compared to CoDL (Constraint Driven Learning) and PR (Posterior Regularization).
@inproceedings{dhillon_aistats12,
Author = {Paramveer S. Dhillon and S. Sathiya Keerthi and Kedar Bellare and Olivier Chapelle and S. Sundararajan},
Title = {Deterministic Annealing for Semi-Supervised Structured Output Learning.},
Booktitle = {Proceedings of the International Conference on
Artificial Intelligence and Statistics},
Volume = {15},
Year = {2012}
}
Paramveer Dhillon, Dean Foster and
Lyle Ungar. NIPS 24 (Advances in Neural
Information Processing Systems), Granada, Spain, Dec. 2011(Acceptance Rate: 21.8%)
Recently, there has been substantial interest in using
large amounts of unlabeled data to learn word representations
which can then be used as features in supervised classifiers for NLP tasks.
However, most current approaches are slow to train, do not
model the context of the word, and lack theoretical grounding. In this
paper, we present a new learning method, Low Rank Multi-View Learning
(LR-MVL) which uses a fast spectral method to estimate
low dimensional context-specific word representations from unlabeled
data. These representation features can then be used with any
supervised learner. LR-MVL is extremely fast, gives guaranteed
convergence to a global optimum, is theoretically elegant, and
achieves state-of-the-art performance on named entity recognition
(NER) and chunking problems.
@inproceedings{dhillon11multiviewcca,
title = {Multi-View Learning of Word Embeddings via CCA},
author = {Paramveer S. Dhillon and Dean Foster and Lyle Ungar},
booktitle = {Advances in Neural Information Processing Systems (NIPS)},
volume={24},
year = {2011}
}
Semi-supervised Multi-task Learning of Structured Prediction Models for Web Information ExtractionOral Presentation
ML/DM
Paramveer Dhillon, S. Sundararajan and S. Sathiya Keerthi. CIKM (ACM International Conference on Information and Knowledge Management), Glasgow, U.K, Oct. 2011(Acceptance Rate (Full Paper): 15.0%)
Extracting information from web pages is an important problem; it has several applications such as providing improved search results and construction of databases to serve user queries. In this paper we propose a novel structured prediction method to address two important aspects of the extraction problem: (1) labeled data is available only for a small number of sites and (2) a machine learned global model does not generalize adequately well across many websites. For this purpose, we propose a weight space based graph regularization method. This method has several advantages. First, it can use unlabeled data to address the limited labeled data problem and falls in the class of graph regularization based semi-supervised learning approaches. Second, to address the generalization inadequacy of a global model, this method builds a local model for each website. Viewing the problem of building a local model for each website as a task, we learn the models for a collection of s
ites jointly; thus our method can also be seen as a graph regularization based multi-task learning approach. Learning the m
odels jointly with the proposed method is very useful in two ways: (1) learning a local model for a website can be effectiv
ely influenced by labeled and unlabeled data from other websites; and (2) even for a website with only unlabeled examples i
t is possible to learn a decent local model. We demonstrate the efficacy of our method on several real-life data; experimental results show that significant performance improvement can be obtained by combining semi-supervised and multi-task learn
ing in a single framework.
@inproceedings{dhillon11cikm,
author = {Dhillon, Paramveer S. and Sellamanickam, Sundararajan and Selvaraj, Sathiya Keerthi},
title = {Semi-supervised multi-task learning of structured prediction models for web information extraction},
booktitle = {Proceedings of the 20th ACM international conference on Information and knowledge management},
series = {CIKM '11},
year = {2011},
isbn = {978-1-4503-0717-8},
location = {Glasgow, Scotland, UK},
pages = {957--966},
numpages = {10},
publisher = {ACM},
address = {New York, NY, USA},
keywords = {information extraction, multitask learning, semi-supervised learning, structured predictions},
}
A New Approach to Lexical Disambiguation of Arabic Text
NLP/CL
Rushin Shah, Paramveer Dhillon, Mark Liberman, Dean Foster, Mohamed Maamouri and Lyle Ungar. EMNLP (International Conference on Empirical Methods in
Natural Language Processing), Cambridge, MA, U.S.A,
Oct. 2010(Acceptance Rate:
25.0%)
We describe a model for the lexical analysis of Arabic text, using the
lists of alternatives supplied by a broad-coverage morphological
analyzer, SAMA, which include stable lemma IDs that correspond to
combinations of broad word sense categories and POS tags.
We break down each of the hundreds of thousands of possible lexical
labels into its constituent elements, including lemma ID and
part-of-speech. Features are computed for each
lexical token based on its local and document-level context and used
in a novel, simple, and highly efficient two-stage supervised
machine learning algorithm that overcomes the extreme sparsity of
label distribution in the training data. The resulting system achieves
accuracy of 90.6\% for its first choice, and 96.2\% for its top two
choices, in selecting among the alternatives provided by the SAMA
lexical analyzer. We have successfully used this system in
applications such as an online reading helper for intermediate
learners of the Arabic language, and a tool for improving the
productivity of Arabic Treebank annotators.
@InProceedings{shah-dhillon_emnlp10,
author = {Rushin Shah and Paramveer S. Dhillon and Mark Liberman and Dean Foster and Mohamed Maamouri and Lyle Ungar},
title = {A New Approach to Lexical Disambiguation of Arabic Text},
booktitle = {Proceeding of the EMNLP 2010 Conference},
month = {Oct.},
year = {2010},
address = {Cambridge, MA, U.S.A},
publisher = {International Conference on Empirical Methods in Natural Language Processing (EMNLP) }
Learning Better Data Representation using Inference-Driven Metric Learning (IDML)
NLP/CL
Paramveer Dhillon, Partha Pratim Talukdar and Koby Crammer. ACL (Annual Meeting of the Association of Computational Linguistics), Uppsala, Sweden, July 2010(Acceptance Rate: 22.0%)
We initiate a study comparing effectiveness of the transformed spaces learned by recently proposed supervised, and semi-supervised metric learning algorithms to those generated by previously proposed unsupervised dimensionality reduction methods (e.g., PCA). Through a variety of experiments on different real-world datasets, we find IDML-IT, a semi-supervised metric learning algorithm to be the most effective.
@InProceedings{dhillon_acl10,
author = {Paramveer S. Dhillon and Partha Pratim Talukdar and Koby Crammer},
title = {Learning Better Data Representation using Inference-Driven Metric Learning (IDML)},
booktitle = {Proceedings of the ACL 2010 Conference},
month = {July },
year = {2010},
address = {Uppsala, Sweden},
publisher = {Association for Computational Linguistics}
}
Paramveer Dhillon, Dean Foster and Lyle Ungar. AISTATS (International Conference on Artificial Intelligence and Statistics), Sardinia, Italy, May 2010(Acceptance Rate: 40.58%)
Feature selection for supervised learning can be greatly improved by
making use of the fact that features often come in classes. For example, in gene
expression data, the genes which serve as features may be
divided into classes based on their membership in gene families
or pathways. When labeling words with senses for word sense
disambiguation, features fall into classes including adjacent words, their parts of speech,
and the topic and venue of the document the word is in. We present a streamwise feature selection method that allows
dynamic generation and selection of features, while taking advantage of the different feature classes, and the fact that they are
of different sizes and have different (but unknown) fractions of good features.
Experimental results show that our approach provides significant improvement in performance and is
computationally less expensive than comparable ``batch" methods that do not take advantage of the feature classes
and expect all features to be known in advance.
@inproceedings{dhillon_aistats10,
Author = {Paramveer S. Dhillon and Dean Foster and Lyle Ungar},
Title = {Feature Selection using Multiple Streams.},
Booktitle = {Proceedings of the International Conference on
Artificial Intelligence and Statistics},
Volume = {13},
Year = {2010}
}
Transfer Learning, Feature Selection and Word Sense Disambiguation
NLP/CL
Paramveer Dhillon and Lyle Ungar. ACL-IJCNLP (Annual Meeting of the Association of Computational Linguistics), Singapore, Aug. 2009(Acceptance Rate: 24.6%)
We propose a novel approach for improving Feature Selection for Word
Sense Disambiguation by incorporating a feature relevance prior for
each word indicating which features are more likely to be
selected. We use transfer of knowledge from similar words to
learn this prior over the features, which permits us to learn
higher accuracy models, particularly for the rarer word senses.
Results on the OntoNotes verb data show significant improvement over the baseline
feature selection algorithm and results that are
comparable to or better than other state-of-the-art methods.
@InProceedings{dhillon_acl09,
author = {Dhillon, Paramveer S. and Ungar, Lyle H.},
title = {Transfer Learning, Feature Selection and Word Sense Disambiguation},
booktitle = {Proceedings of the ACL-IJCNLP 2009 Conference Short Papers},
month = {August},
year = {2009},
address = {Suntec, Singapore},
publisher = {Association for Computational Linguistics},
pages = {257--260},
url = {http://www.aclweb.org/anthology/P/P09/P09-2065}
}
Multi-Task Feature Selection using the Multiple Inclusion Criterion (MIC)
ML/DM
Paramveer Dhillon, Brian Tomasik, Dean Foster and Lyle Ungar. ECML-PKDD (European Conference on Machine Learning), Bled, Slovenia, Sept. 2009(Acceptance Rate: 24.9%)
We address the problem of joint feature selection in multiple related
classification or regression tasks. When doing feature selection
across multiple tasks, usually one can ``borrow strength" across these
tasks to get a more sensitive criterion for deciding which features to
select. We propose a novel method, the Multiple Inclusion
Criterion (MIC), which can be used in stepwise feature selection to
improve feature selection across multiple related tasks. Our approach allows each feature
to be added to none, some, or all of the tasks. MIC is
most beneficial for selecting a small set of predictive features from
a large pool of features, as is common in genomic and biological
datasets. Experimental results on such datasets show
that MIC usually outperforms other competing multi-task learning
methods not only in terms of accuracy but also by building simpler
and more interpretable models.
@inproceedings{dhillon_ecml09,
author = {Paramveer S. Dhillon and Brian Tomasik and Dean Foster and Lyle Ungar},
title = {Multi-Task Feature Selection Using The Multiple Inclusion Criterion (MIC)},
booktitle = {European Conference on Machine Learning (ECML)-PKDD},
publisher = {Springer},
series = {Lecture Notes in Computer Science},
month = {September},
year = {2009},
city = {Bled},
country = {Slovenia}
}
Efficient Feature Selection in the Presence of Multiple Feature Classes
ML/DM
Paramveer Dhillon, Dean Foster and Lyle Ungar. ICDM (IEEE International Conference on Data Mining), Pisa, Italy, December 2008 (Acceptance Rate: 19.9%)
We present an information theoretic approach to feature selection when
the data possesses feature classes. Feature classes are pervasive in
real data. For example, in gene expression data, the genes which serve
as features may be divided into classes based on their membership in
gene families or pathways. When doing word sense disambiguation or
named entity extraction, features fall into classes including adjacent
words, their parts of speech, and the topic and venue of the document
the word is in. When predictive features occur predominantly in a
small number of feature classes, our information theoretic approach
significantly improves feature selection. Experiments on real and
synthetic data demonstrate substantial improvement in predictive
accuracy over the standard $\ell_0$ penalty-based stepwise and streamwise
feature selection methods as well as over Lasso and Elastic Nets, all
of which are oblivious to the existence of feature classes.
@inproceedings{dhillonICDM08,
author = {Paramveer S. Dhillon and Dean Foster and Lyle H. Ungar},
title = {Efficient Feature Selection in the Presence of Multiple Feature Classes},
booktitle = {ICDM (International Conference on Data Mining)},
year = {2008},
pages = {779-784},
ee = {http://dx.doi.org/10.1109/ICDM.2008.56},
crossref = {DBLP:conf/icdm/2008}
}
Inference Driven Metric Learning for Graph Construction
ML/DM
Paramveer Dhillon, Partha Pratim Talukdar and Koby Crammer. NESCAI (North East Student Symposium on Artificial Intelligence), Amherst, MA, U.S.A, April 2010
Graph-based semi-supervised learning (SSL) methods usually consist of two stages: in the first stage, a graph is constructed from the set of input instances; and in the second stage, the available label information along with the constructed graph is used to assign labels to the unlabeled instances.
Most of the previously proposed graph construction methods are unsupervised in nature, as they ignore the label information already present in the SSL setting in which they operate. In this paper, we explore how available labeled instances can be used to construct a better graph which is tailored to the current classification task. To achieve this goal, we evaluate effectiveness of various supervised metric learning algorithms during graph construction. Additionally, we propose a new metric learning framework: Inference Driven Metric Learning (IDML), which extends existing supervised metric learning algorithms to exploit widely available unlabeled data during the metric learning step itself. We provide extensive empirical evidence demonstrating that inference over graph constructed using IDML learned metric can lead to significant reduction in classification error, compared to inference over graphs constructed using existing techniques.
Finally, we demonstrate how active learning can be successfully incorporated within the the IDML framework to reduce the amount of supervision necessary during graph construction.
@article{dhillon_nescai10,
author = {Paramveer S. Dhillon and Partha Pratim Talukdar and Koby Crammer},
title = {Inference Driven Metric Learning for Graph Construction},
journal ={NESCAI (North East Student Symposium on Artificial Intelligence)},
year = {2010},
address = {Amherst, MA, USA}
}
Combining Appearance and Motion for Human Action Classification in Videos
CV/IP
Paramveer Dhillon, Sebastian Nowozin and Christoph Lampert. International Workshop on Visual Scene Understanding (ViSU) at CVPR 2009, Miami, Florida, U.S.A
An important cue to high level scene understanding is to analyze the objects in the scene and their behavior and interactions. In this paper, we study the problem of classification of activities in videos, as this is an integral component of any scene understanding system, and present a novel approach for recognizing human action categories in videos by combining information from appearance and motion of human body parts. Our approach is based on tracking human body parts by using mixture particle filters and then clustering the particles using local non - parametric clustering, hence associating a local set of particles to each cluster mode. The trajectory of these cluster modes provides the ``motion'' information and the ``appearance'' information is provided by the statistical information about the relative motion of these local set of particles over a number of frames. Later we use a ``Bag of Words" model to build one histogram per video sequence from the set of these robust appearance and motion descriptors. These histograms provide us characteristic information which helps us to discriminate among various human actions which ultimately helps us in better understanding of the complete scene.
We tested our approach on the standard KTH and Weizmann human action datasets and the results were comparable to the state of the art methods. Additionally our approach is able to distinguish between activities that involve the motion of complete body from those in which only certain body parts move. In other words, our method discriminates well between activities with ``global body motion" like running, jogging etc. and ``local motion" like waving, boxing etc.
@article{ dhillonCVPR-VISU09,
author = {P.S. Dhillon and S. Nowozin and C.H. Lampert},
title = {Combining appearance and motion for human action classification in videos},
journal ={Computer Vision and Pattern Recognition Workshop},
volume = {0},
year = {2009},
isbn = {978-1-4244-3994-2},
pages = {22-29},
doi = {http://doi.ieeecomputersociety.org/10.1109/CVPR.2009.5204237},
publisher = {IEEE Computer Society},
address = {Los Alamitos, CA, USA},
}
Robust Real-Time Face Tracking Using an Active Camera
CV/IP
(Undergrad Research)
Paramveer Dhillon International Workshop on CISIS (Springer-Lecture Notes in Computer Science (LNCS)), Burgos, Spain
This paper addresses the problem of facial feature detection
and tracking in real-time using a single active camera. The variable parameters of the camera (i.e. pan, tilt and zoom) are changed adaptively
to track the face of the agent in successive frames and detect the facial
features which may be used for facial expression analysis for surveillance
or mesh generation for animation purposes, at a later stage. Our track-
ing procedure assumes planar motion of the face. It also detects invalid
feature points i.e. those feature points which do not correspond to actual
facial features, but are outliers. They are subsequently abandoned by our
procedure in order to extract ``high level'' information from the face for
facial mesh generation or emotion recognition which might be helpful for
Video Surveillance purposes. The only limitation on the performance of
the procedure is imposed by the maximum pan/tilt range of the camera.
@inproceedings{dhillon_cisis,
author = {Paramveer S. Dhillon},
title = {Robust Real-Time Face Tracking Using an Active Camera},
booktitle = {Proceedings of 2nd International Workshop on CISIS},
publisher = {Springer},
series = {Advances in Intelligent and Soft Computing , Vol. 63},
month = {September},
year = {2009},
isbn = {978-3-642-04090-0},
country={Spain}
}
Contact Information:
Department of Computer & Information Sciences
University of Pennsylvania
Levine Hall, 3330 Walnut Street
Philadelphia, PA 19104-6106
e-mail: dhillon@cis.upenn.edu