SMODICE: Versatile Offline Imitation Learning via State Occupancy Matching

May 15, 15150·

Yecheng Jason Ma

Andrew Shen

Dinesh Jayaraman

Osbert Bastani

· 0 min read

PDF Cite arXiv Webpage Code

Abstract

We propose State Matching Offline DIstribution Correction Estimation (SMODICE), a novel and versatile algorithm for offline imitation learning (IL) via state-occupancy matching. We show that the SMODICE objective admits a simple optimization procedure through an application of Fenchel duality and an analytic solution in tabular MDPs. Without requiring access to expert actions, SMODICE can be effectively applied to three offline IL settings: (i) imitation from observations (IfO), (ii) IfO with dynamics or morphologically mismatched expert, and (iii) example-based reinforcement learning, which we show can be formulated as a state-occupancy matching problem. We extensively evaluate SMODICE on both gridworld environments as well as on high-dimensional offline benchmarks. Our results demonstrate that SMODICE is effective for all three problem settings and significantly outperforms prior state-of-art.

Type

Publication

In ICML

Last updated on Feb 8, 80849

Reinforcement Learning Robot Transfer

← Fighting Fire with Fire: Avoiding DNN Shortcuts through Priming May 15, 15150

Know Thyself: Transferable Visuomotor Control Through Robot-Awareness Jan 4, 4040 →