3

imitation: Clean Imitation Learning Implementations

imitation provides open-source implementations of imitation and reward learning algorithms in PyTorch. We include three inverse …

Adam Gleave, Mohammad Taufeeque, Juan Rocamonde, Erik Jenner, Steven H. Wang, Sam Toyer, Maximilian Ernestus, Nora Belrose, Scott Emmons, Stuart Russell

Calculus on MDPs: Potential Shaping as a Gradient

In reinforcement learning, different reward functions can be equivalent in terms of the optimal policies they induce. A particularly …

Erik Jenner, Herke Van Hoof, Adam Gleave

Reducing Exploitability with Population Based Training

Self-play reinforcement learning has achieved state-of-the-art, and often superhuman, performance in a variety of zero-sum games. Yet …

Pavel Czempin, Adam Gleave

A Primer on Maximum Causal Entropy Inverse Reinforcement Learning

Inverse Reinforcement Learning (IRL) algorithms infer a reward function that explains demonstrations provided by an expert acting in …

Adam Gleave, Sam Toyer

Uncertainty Estimation for Language Reward Models

Language models can learn a range of capabilities from unsupervised training on text corpora. However, to solve a particular problem …

Adam Gleave, Geoffrey Irving