In “An Optimistic Perspective on Offline RL”, we propose a simple experimental setup for offline RL on Atari 2600 games, based on logged experiences of a DQN agent. We demonstrate that it is possible to train agents with high returns that outperform the data collection agents using standard off-policy RL algorithms, without explicitly correcting for any distribution mismatch. We also develop a robust RL algorithm, called random ensemble mixture (REM), which shows promising results on offline RL. Overall, we present an optimistic perspective that robust RL algorithms trained on sufficiently large and diverse offline datasets can lead to high quality behaviour, strengthening the emerging data-driven RL paradigm. To facilitate the development and evaluation of offline RL methods, we are also publicly releasing the DQN Replay Dataset and have open-sourced our code. More details can be found at offline-rl.github.io.
链接地址:https://ai.googleblog.com/2020/04/an-optimistic-perspective-on-offline.html