作者:IODA 时间:2019-07-05 点击数:
TF2RL是一个深度强化学习库,你可以使用TensorFlow 2.0实现各种深度强化学习算法。
Github项目地址:https://github.com/keiohta/tf2rl
算法
项目支持以下算法:
Algorithm - 算法
Dicrete action - 离散行动
Continuous action - 持续行动
Support - 支持
Category - 类别
VPG
✓
GAE
Model-free On-policy RL 无模型策略强化学习
DQN (包括 DDQN, Prior. DQN, Duel. DQN, Distrib. DQN, Noisy DQN)
-
ApeX
Model-free Off-policy RL 无模型离策略强化学习
DDPG (包括 TD3, BiResDDPG)
SAC
GAIL (包括 Spectral Normalization)
Imitation Learning 模仿学习
以下论文已在tf2rl中实现:
Policy Gradient Methods for Reinforcement Learning with Function Approximation, code
High-Dimensional Continuous Control Using Generalized Advantage Estimation, code
Playing Atari with Deep Reinforcement Learning, code
Human-level control through Deep Reinforcement Learning, code
Deep Reinforcement Learning with Double Q-learning, code
Prioritized Experience Replay, code
Dueling Network Architectures for Deep Reinforcement Learning, code
A Distributional Perspective on Reinforcement Learning, code
Noisy Networks for Exploration, code
Distributed Prioritized Experience Replay, code
Continuous control with deep reinforcement learning, code
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor, code
Addressing Function Approximation Error in Actor-Critic Methods, code
Deep Residual Reinforcement Learning, code
Generative Adversarial Imitation Learning, code
Spectral Normalization for Generative Adversarial Networks, code
安装
你可以从PyPI中安装tf2rl:
$ pip install tf2rl
或者,你也可以从源代码安装:
$ git clone https://github.com/keiohta/tf2rl.git tf2rl$ cd tf2rl$ pip install .
入门
以下是如何在Pendulum环境中训练DDPG代理的快速示例:
import gymfrom tf2rl.algos.ddpg import DDPGfrom tf2rl.experiments.trainer import Trainer parser = Trainer.get_argument() parser = DDPG.get_argument(parser) args = parser.parse_args() env = gym.make("Pendulum-v0") test_env = gym.make("Pendulum-v0") policy = DDPG( state_shape=env.observation_space.shape, action_dim=env.action_space.high.size, gpu=-1, # Run on CPU. If you want to run on GPU, specify GPU number memory_capacity=10000, max_action=env.action_space.high[0], batch_size=32, n_warmup=500) trainer = Trainer(policy, env, args, test_env=test_env) trainer()
你可以在示例中检查已实现的算法。 例如,如果你想训练DDPG代理:
# You must change directory to avoid importing local files.$ cd examples$ python run_ddpg.py