OpenI/PARL: PARL 是一个高性能、灵活的强化学习框架 - PARL - OpenI

关于GCU、沐曦GPGPU、MLU、0卡V100资源4月7日恢复上架的公告>>> 关于共建具身智能开源数据集的倡议>>> 关于云脑任务中统一路径访问方式的公告>>> 关于将启智集群GPU资源迁移至智算集群的公告>>>

Bo Zhou d33f30025c replace PE with compiler(new feature in paddle151). (#99 ) * fix the compatibility issue * fix the comment issue * support paddle 1.5.1 and replace PE with compiler * yapf&copyright * yapf * fix the teamcity problem * fix the teamcity problem * fix comment * only support paddle 1.5.1 * Cmake * fix comment		4 years ago
..
.benchmark	add benchmark of GA3C (#71)	5 years ago

README.md	replace PE with compiler(new feature in paddle151). (#99)	4 years ago

atari_agent.py	breaking changes#1 (#95)	4 years ago

atari_model.py	GA3C example (#63)	5 years ago

ga3c_config.py	GA3C example (#63)	5 years ago

learner.py	breaking changes#1 (#95)	4 years ago

run_simulators.sh	GA3C example (#63)	5 years ago

simulator.py	GA3C example (#63)	5 years ago

train.py	Refine (#67)	5 years ago

README.md

Reproduce GA3C with PARL
- Atari games introduction
- Benchmark result
How to use

Reproduce GA3C with PARL

Based on PARL, the GA3C algorithm of deep reinforcement learning has been reproduced, reaching the same level of indicators as the paper in Atari benchmarks.

Original paper: GA3C: GPU-based A3C for Deep Reinforcement Learning

A hybrid CPU/GPU version of the Asynchronous Advantage Actor-Critic (A3C) algorithm.

Atari games introduction

Please see here to know more about Atari games.

Benchmark result

Results with one learner (in a P40 GPU) and 24 simulators (in 12 CPU) in 10 million sample steps.
GA3C_Pong GA3C_Breakout
GA3C_BeamRider GA3C_Qbert
GA3C_SpaceInvaders

How to use

Dependencies

paddlepaddle>=1.5.1
parl
gym
atari-py

Distributed Training

Learner

python train.py

Simulators (Suggest: 24 simulators in 12+ CPUs)

for i in $(seq 1 24); do
    python simulator.py &
done;
wait

You can change training settings (e.g. env_name, server_ip) in ga3c_config.py.
Training result will be saved in log_dir/train/result.csv.

[Tips] The performance can be influenced dramatically in a slower computational environment, especially when training with low-speed CPUs. It may be caused by the policy-lag problem.

Reference

tensorpack

PARL 是一个高性能、灵活的强化学习框架

https://parl.readthedocs.io

ai开发工具

Python C++ JavaScript Shell Markdown other

zenghongsheng@baidu.com 2466956298@qq.com zhoubo01@baidu.com zenghsh3@gmail.com haonanyu@baidu.com yu239@users.noreply.github.com bestwanglei@gmail.com skylian@users.noreply.github.com emailweixu@gmail.com wyattliang@gmail.com 39279048+Banmahhhh@users.noreply.github.com alexqdh@foxmail.com lianxiaochen@gmail.com xuwei06@baidu.com

How to access data resources in code