Single-machine implementations of distributed reinforcement learning algorithms with Ray and PyTorch. See below for a quick, extremely simple demo of Ray. Also see below for general notes on implementations.
ApeX DQN (also with Quantile Regression DQN) (Currently fixing memory leak issues)
ApeX DPG (Currently fixing memory leak issues)
D4PG (with Quantile Regression DQN instead of C51)
Ray allows zero-cost reading of numpy arrays. The distributed components are implemented such that only numpy arrays are communicated across processes (actor -> replay buffer, replay buffer -> learner, learner -> parameter server, parameter server -> actor).
At the moment, the implementations only run on single machines—I do not have the experience/resources to implement the algorithms on multi-node settings.
I also do not have the computational resources to fully learn PyBullet environments, so I cannot provide pre-trained models.