Oleh Rybkin

I am a Ph.D. student in the GRASP laboratory at the University of Pennsylvania advised by Kostas Daniilidis. I am interested in deep learning, computer vision, and robotics. Most of my recent work concerns deep predictive models of videos.

I received my bachelor's degree from Czech Technical University in Prague, where I worked with Tomas Pajdla. I've spent time at INRIA with Josef Sivic, TiTech with Akihiko Torii, and UC Berkeley with Sergey Levine and Chelsea Finn.

My name

Google Scholar  /  GitHub  /  Email  /  CV  /  LinkedIn

  • Jun 2019: Three new workshop papers presented at ICML and RSS workshops!
  • Apr 2019: New preprint on keyframe-based video prediction.
  • Mar 2019: I gave an invited talk on predictive models at Google, Mountain View (slides).
  • Feb 2019: I will be spending Spring and Summer 2019 at UC Berkeley with Sergey Levine and Chelsea Finn.
  • Dec 2018: Paper on discovering an agent's action space accepted to ICLR 2019 in New Orleans.
  • Jul 2018: I presented our work on discovering an agent's action space at the ICVSS 2018 in Sicily.

I am interested in building agents that are capable of predicting the future, and using this prediction capability to act in the world. I believe that using vision as sensing modality is crucial for making such agents general purpose, and testing these algorithms on a real robotic system is one of the only sure ways to make progress towards intelligence. My recent work in this area involves machines trying to understand agent motion, physics, interesting moments in time, and human behavior, as well as intrinsically motivated machines.

During my bachelor's, I worked on camera geometry for structure from motion and proposed an algorithm for robust estimation of camera focal length. Check out this and my other fun projects on my GitHub page.

HEDGE: Hierarchical Event-Driven Generation
Frederik Ebert*, Karl Pertsch*, Oleh Rybkin*, Chelsea Finn, Dinesh Jayaraman, Sergey Levine
Workshop on Generative Modeling and Model-Based Reasoning for Robotics and AI at ICML, 2019
paper / poster / workshop page

We propose a hierarchical predictive model that predicts a sequence starting from the high level events and progressively fills in finer and finer details. We train the model on goal-conditioned prediction on up to 80-frames (=12.5 seconds) videos.

Visual Planning with Semi-Supervised Stochastic Action Representations
Karl Schmeckpeper, David Han, Kostas Daniilidis, Oleh Rybkin
Workshop on Generative Modeling and Model-Based Reasoning for Robotics and AI at ICML, 2019
paper / poster / workshop page

We learn to infer an action representation from either motor or sensory input by using a dual variational autoencoder. By learning a dynamics model in such semi-supervised manner, we achieve both high data efficiency and planning performance.

Perception-Driven Curiosity with Bayesian Surprise
Bernadette Bucher, Anton Arapin, Ramanan Sekar, Feifei Duan, Marc Badger, Kostas Daniilidis, Oleh Rybkin
Workshop on Combining Learning and Reasoning at RSS, 2019
paper / poster / workshop page

We learn a latent variable model for dynamics of image observations, and use it to construct an agent that maximizes Bayesian surprise of the future frames. The Bayesian agent can perform exploration that is more robust in stochastic environments than simpler prior prediction schemes.

KeyIn: Discovering Subgoal Structure with Keyframe-based Video Prediction
Karl Pertsch*, Oleh Rybkin*, Jingyun Yang, Kosta Derpanis, Joseph Lim, Kostas Daniilidis, Andrew Jaegle
Workshop on Task-Agnostic Reinforcement Learning at ICLR, 2019
project page & videos / arXiv / poster / slides / talk (1 minute) / workshop page

We discover keyframes in videos by learning to select frames that enable prediction of the entire sequence. We show that our method improves performance of hierarchical planning by finding meaningful keyframes in demonstration data.

Hover the mouse (or tap the screen) here to see the video.

Learning what you can do before doing anything
Oleh Rybkin*, Karl Pertsch*, Kosta Derpanis, Kostas Daniilidis, Andrew Jaegle
International Conference on Learning Representations (ICLR), 2019
project page & videos / paper / arXiv / poster / slides

We learn to discover an agent's action space along with a dynamics model from pure video data. After a calibration stage, the model can be used to perform model predictive control, requiring orders of magnitude fewer action-annotated videos than other methods.

Hover the mouse (or tap the screen) here to see the video.

Predicting the Future with Transformational States
Andrew Jaegle, Oleh Rybkin, Kosta Derpanis, Kostas Daniilidis
ArXiv, 2018
project page & videos / arXiv

The model predicts future video frames by learning to represent the present state of a system together with a high-level transformation that is used to produce its future state.

Hover the mouse (or tap the screen) here to see the video.


The reasonable ineffectiveness of pixel metrics for future prediction

MSE loss and its variants are commonly used for training and evaluation of future prediction. But is this the right thing to do?

Hover the mouse (or tap the screen) here to see the video.

Science reading list

The Structure of Scientific Revolutions, Thomas S. Kuhn.
Vision, David C. Marr.

Computing Machinery and Intelligence, Alan M. Turing.
The importance of stupidity in scientific research, Martin A. Schwartz.
As we may think, Vannevar Bush.

Note for undergraduate/master students

I am actively looking for students who are strongly motivated to work on a research project, including students who want to do a Master's thesis. Check out some of my work above and if you find it interesting, do send me an email!

Current mentees: Ramanan Sekar, Shenghao Zhou.
Previous mentees: Karl Schmeckpeper (PhD @ Penn), Anton Arapin (MS @ UChicago).

website template credit