OpenI/PARL: PARL 是一个高性能、灵活的强化学习框架 - PARL - OpenI

关于GCU、沐曦GPGPU、MLU、0卡V100资源4月7日恢复上架的公告>>> 关于共建具身智能开源数据集的倡议>>> 关于云脑任务中统一路径访问方式的公告>>> 关于将启智集群GPU资源迁移至智算集群的公告>>>

Bo Zhou d33f30025c replace PE with compiler(new feature in paddle151). (#99 ) * fix the compatibility issue * fix the comment issue * support paddle 1.5.1 and replace PE with compiler * yapf&copyright * yapf * fix the teamcity problem * fix the teamcity problem * fix comment * only support paddle 1.5.1 * Cmake * fix comment		4 years ago
..
.benchmark	A2C example (#62)	5 years ago

README.md	replace PE with compiler(new feature in paddle151). (#99)	4 years ago

actor.py	breaking changes#1 (#95)	4 years ago

atari_agent.py	replace PE with compiler(new feature in paddle151). (#99)	4 years ago

atari_model.py	breaking changes#1 (#95)	4 years ago

impala_config.py	Refine (#67)	5 years ago

learner.py	breaking changes#1 (#95)	4 years ago

run_actors.sh	A2C example (#62)	5 years ago

train.py	Refine (#67)	5 years ago

README.md

Reproduce IMPALA with PARL
- Atari games introduction
- Benchmark result
How to use

Reproduce IMPALA with PARL

Based on PARL, the IMPALA algorithm of deep reinforcement learning is reproduced, and the same level of indicators of the paper is reproduced in the classic Atari game.

IMPALA in
Impala: Scalable distributed deep-rl with importance weighted actor-learner architectures

Atari games introduction

Please see here to know more about Atari games.

Benchmark result

Result with one learner (in a P40 GPU) and 32 actors (in 32 CPUs).

PongNoFrameskip-v4: mean_episode_rewards can reach 18-19 score in about 7~10 minutes.
Results of other games in an hour.

IMPALA_Breakout IMPALA_BeamRider

IMPALA_Qbert IMPALA_SpaceInvaders

How to use

Dependencies

paddlepaddle>=1.5.1
parl
gym
atari-py

Distributed Training:

Learner

python train.py

Actors (Suggest: 32+ actors in 32+ CPUs)

for i in $(seq 1 32); do
    python actor.py &
done;
wait

You can change training settings (e.g. env_name, server_ip) in impala_config.py.
Training result will be saved in log_dir/train/result.csv.

Reference

PARL 是一个高性能、灵活的强化学习框架

https://parl.readthedocs.io

ai开发工具

Python C++ JavaScript Shell Markdown other

zenghongsheng@baidu.com 2466956298@qq.com zhoubo01@baidu.com zenghsh3@gmail.com haonanyu@baidu.com yu239@users.noreply.github.com bestwanglei@gmail.com skylian@users.noreply.github.com emailweixu@gmail.com wyattliang@gmail.com 39279048+Banmahhhh@users.noreply.github.com alexqdh@foxmail.com lianxiaochen@gmail.com xuwei06@baidu.com

How to access data resources in code