We propose robust methods for estimating camera egomotion in noisy, real-world monocular image sequences in the general case of unknown observer rotation and translation with two views and a small baseline. We introduce the expected residual likelihood method (ERL), which estimates confidence weights for noisy optical flow data using likelihood distributions of the residuals of the flow field under a range of counterfactual model parameters. We show that ERL is effective at identifying outliers and recovering appropriate confidence weights in many settings. We find that ERL outperforms modern robust lifted kernel methods and baseline monocular egomotion estimation strategies on the challenging KITTI dataset, while adding almost no runtime cost over baseline egomotion methods.

arXiv GithubIt has been very difficult in past work to understand what kinds of representations Neural Networks give. On top of that, there is still relatively little work on learning motion from a Neural Network. Thus we seek to learn in an controlled and scientific manner what a Neural Network can learn in the motion domain. Specifically, given an optical flow field, this work seeks to learn how well a neural network can automatically learn the egomotion generating that flow field. Additionally, we seek to test what representations it learns that makes it more robust in the presence of perturbations (such as outliers in the flow). To do this scientifically, we generate synthetic flow fields from realistic depth structures and train it under different regimes to see how well it can adapt and how it changes in different training circumstances.

GithubExample of simple synthetic scene

Flow generated from the scene, with the legend of the flow in the upper right. Given various perturbations of this, how well can a Neural Network learn the egomotion?

Articulated objects, such as robotic arms or the human body, appear in many applications of computer vision, e.g. motion capture. In order to estimate their pose more precisely, we are researching ways to exploit the geometric properties of the joints of an articulated object. This work seeks to estimate the full pose of an articulated object using the projection of the joints onto an image. It is an extention of previous work with "Articulated motion estimation from a monocular image sequence using spherical tangent bundles" by Spyridon Leonardos and Kostas Daniilidis, extending it to handle ambiguities in the projections of the joints

GithubMotion in an articulated arm and the projection of the joints on the image

Creating a particle filter for an arbitrary manifold, using the sphere as an example

This work seeks to find an abstract representation of general motion using only weak signals from video sequences. Using these weak signals, we train a Neural Network to learn a representation of motion. As proof of concept we have been using ViZDoom to generate simple synthetic motion sequences to train on to extract the motion representation.

Simple diagram of extracting motion from an arbitrary sequence of frames generated from ViZDoom

I am a Computer Science PhD at the University of Pennsylvania under Kostas Daniilidis, researching geometric computer vision and machine learning and how the two relate. I have a Bachelor's of Science in Computer Science from UCLA where I studied under Stefano Soatto, working on SLAM, specifically mapping.