Oleh Rybkin

I am a second year Ph.D. student in the GRASP laboratory at the University of Pennsylvania advised by Kostas Daniilidis. I am now spending the Spring semester of 2019 at UC Berkeley with Sergey Levine.

I am interested in deep learning, computer vision, and cognitive robotics. Recently, I've been working on using deep predictive models to discover different kinds of semantic structure in video.

Previously, I received my bachelor's degree from Czech Technical University in Prague, where I also worked on camera geometry as an undergraduate researcher advised by Tomas Pajdla. For this research, I've spent two summers at INRIA and TiTech, with Josef Sivic and Akihiko Torii respectively.

My name

Google Scholar  /  GitHub  /  Email  /  CV  /  LinkedIn

  • Apr 2019: New preprint on keyframe-based video prediction.
  • Mar 2019: I gave an invited talk on predictive models at Google, Mountain View (slides).
  • Feb 2019: I will be spending Spring 2019 at UC Berkeley with Sergey Levine.
  • Dec 2018: Paper on discovering an agent's action space accepted to ICLR 2019 in New Orleans.
  • Jul 2018: I presented our work on discovering an agent's action space at the ICVSS 2018 in Sicily.

My general interest is in creating algorithms that have certain properties of human intelligence missing from current AI methods, which is broad and encompasses problems in artificial intelligence, machine perception, and cognitive robotics. To this end, I've been working toward making machines understand phenomena like agent motion, physics, and interesting moments in time through video prediction. I am also exploring several ideas related to intrinsic curiosity, and meta-learning.

During my bachelor's, I worked on camera geometry for structure from motion and proposed an algorithm for robust estimation of camera focal length. Check out this and my other fun projects on my GitHub page.

KeyIn: Discovering Subgoal Structure with Keyframe-based Video Prediction
Karl Pertsch*, Oleh Rybkin*, Jingyun Yang, Kosta Derpanis, Joseph Lim, Kostas Daniilidis, Andrew Jaegle
ArXiv, 2019
project page & videos / arXiv / poster

We discover keyframes in videos by learning to select frames that enable prediction of the entire sequence. We show that our method improves performance of hierarchical planning by finding meaningful keyframes in demonstration data.

Hover the mouse (or tap the screen) here to see the video.

Learning what you can do before doing anything
Oleh Rybkin*, Karl Pertsch*, Kosta Derpanis, Kostas Daniilidis, Andrew Jaegle
International Conference on Learning Representations (ICLR), 2019
project page & videos / paper / arXiv / poster

We learn an agent's action space along with a predictive model of observations from pure video data. The model can then be used to perform model predictive control, requiring orders of magnitude fewer action-annotated videos than other methods.

Hover the mouse (or tap the screen) here to see the video.

Predicting the Future with Transformational States
Andrew Jaegle, Oleh Rybkin, Kosta Derpanis, Kostas Daniilidis
ArXiv, 2018
project page & videos / arXiv

The model predicts future video frames by learning to represent the present state of a system together with a high-level transformation that is used to produce its future state.

Hover the mouse (or tap the screen) here to see the video.


The reasonable ineffectiveness of pixel metrics for future prediction

MSE loss and its variants are commonly used for training and evaluation of future prediction. But is this the right thing to do?

Hover the mouse (or tap the screen) here to see the video.

website template credit