Oleh Rybkin

I am a second year Ph.D. student in the GRASP laboratory at the University of Pennsylvania advised by Kostas Daniilidis. I am now spending the summer of 2019 at UC Berkeley with Sergey Levine. I am interested in deep learning, computer vision, and cognitive robotics.

Previously, I received my bachelor's degree from Czech Technical University in Prague, where I also worked on camera geometry as an undergraduate researcher advised by Tomas Pajdla. For this research, I've spent two summers at INRIA and TiTech, with Josef Sivic and Akihiko Torii respectively.

My name

Google Scholar  /  GitHub  /  Email  /  CV  /  LinkedIn

  • May 2019: Three new workshop papers accepted to ICML and RSS workshops!
  • Apr 2019: New preprint on keyframe-based video prediction.
  • Mar 2019: I gave an invited talk on predictive models at Google, Mountain View (slides).
  • Feb 2019: I will be spending Spring 2019 at UC Berkeley with Sergey Levine.
  • Dec 2018: Paper on discovering an agent's action space accepted to ICLR 2019 in New Orleans.
  • Jul 2018: I presented our work on discovering an agent's action space at the ICVSS 2018 in Sicily.

I am broadly interested in neural network models as computational models of intelligence, which includes problems in artificial intelligence, machine perception, and cognitive robotics. My recent interests are in temporal representation learning through generative and predictive models. Specifically, I've been working toward making machines understand phenomena like agent motion, physics, and interesting moments in time through video prediction, and make use of this undertanding for planning. I am also exploring several ideas in intrinsic curiosity, and meta-learning.

During my bachelor's, I worked on camera geometry for structure from motion and proposed an algorithm for robust estimation of camera focal length. Check out this and my other fun projects on my GitHub page.

HEDGE: Hierarchical Event-Driven Generation
Frederik Ebert*, Karl Pertsch*, Oleh Rybkin*, Chelsea Finn, Dinesh Jayaraman, Sergey Levine
Workshop on Generative Modeling and Model-Based Reasoning for Robotics and AI at ICML, 2019
paper / poster / workshop page

We propose a hierarchical predictive model that predicts a sequence starting from the high level events and progressively fills in finer and finer details. We train the model on goal-conditioned prediction on up to 80-frames (=12.5 seconds) videos.

Visual Planning with Semi-Supervised Stochastic Action Representations
Karl Schmeckpeper, David Han, Kostas Daniilidis, Oleh Rybkin
Workshop on Generative Modeling and Model-Based Reasoning for Robotics and AI at ICML, 2019
paper / poster / workshop page

We learn to infer an action representation from either motor or sensory input by using a dual variational autoencoder. By learning a dynamics model in such semi-supervised manner, we achieve both high data efficiency and planning performance.

Perception-Driven Curiosity with Bayesian Surprise
Bernadette Bucher, Anton Arapin, Ramanan Sekar, Feifei Duan, Marc Badger, Kostas Daniilidis, Oleh Rybkin
Workshop on Combining Learning and Reasoning at RSS, 2019
paper / poster / workshop page

We use variational inference to learn a dynamics model of image observations, and construct an agent that maximizes Bayesian surprise of the future frames. The Bayesian agent is more robust to stochastic environments than simpler prior prediction schemes.

KeyIn: Discovering Subgoal Structure with Keyframe-based Video Prediction
Karl Pertsch*, Oleh Rybkin*, Jingyun Yang, Kosta Derpanis, Joseph Lim, Kostas Daniilidis, Andrew Jaegle
Workshop on Task-Agnostic Reinforcement Learning at ICLR, 2019
project page & videos / arXiv / poster / slides / workshop page

We discover keyframes in videos by learning to select frames that enable prediction of the entire sequence. We show that our method improves performance of hierarchical planning by finding meaningful keyframes in demonstration data.

Hover the mouse (or tap the screen) here to see the video.

Learning what you can do before doing anything
Oleh Rybkin*, Karl Pertsch*, Kosta Derpanis, Kostas Daniilidis, Andrew Jaegle
International Conference on Learning Representations (ICLR), 2019
project page & videos / paper / arXiv / poster / slides

We learn to discover an agent's action space along with a dynamics model from pure video data. After a calibration stage, the model can be used to perform model predictive control, requiring orders of magnitude fewer action-annotated videos than other methods.

Hover the mouse (or tap the screen) here to see the video.

Predicting the Future with Transformational States
Andrew Jaegle, Oleh Rybkin, Kosta Derpanis, Kostas Daniilidis
ArXiv, 2018
project page & videos / arXiv

The model predicts future video frames by learning to represent the present state of a system together with a high-level transformation that is used to produce its future state.

Hover the mouse (or tap the screen) here to see the video.


The reasonable ineffectiveness of pixel metrics for future prediction

MSE loss and its variants are commonly used for training and evaluation of future prediction. But is this the right thing to do?

Hover the mouse (or tap the screen) here to see the video.

website template credit