wuzb1005/PARL: PARL 是一个高性能、灵活的强化学习框架 - PARL - OpenI

关于GCU、沐曦GPGPU、MLU、0卡V100资源4月7日恢复上架的公告>>> 关于共建具身智能开源数据集的倡议>>> 关于云脑任务中统一路径访问方式的公告>>> 关于将启智集群GPU资源迁移至智算集群的公告>>>

History

rical730 f5a0d0d0d1 use environment variable `PARL_BACKEND` to specify backend framework (#531 ) * use `PARL_BACKEND` to specify framwork * add `os.environ['PARL_BACKEND'] = 'fluid'` to all fluid example * yapf * allow running train_with_xpu.py and add check_version_for_fluid() * add check_version_for_fluid() to all fluid example * fix bugs of import os * yapf * yapf * Edit comment		3 years ago
..
.benchmark	add benchmark of GA3C (#71)	5 years ago

README.md	paddle version upgrade to 1.8.5, add python3.8 unit test. (#496)	3 years ago

actor.py	first pr (#113)	4 years ago

atari_agent.py	paddle version upgrade to 1.8.5, add python3.8 unit test. (#496)	3 years ago

atari_model.py	first pr (#113)	4 years ago

ga3c_config.py	first pr (#113)	4 years ago

train.py	use environment variable `PARL_BACKEND` to specify backend framework (#531)	3 years ago

README.md

Reproduce GA3C with PARL
- Atari games introduction
- Benchmark result
How to use

Reproduce GA3C with PARL

Based on PARL, the GA3C algorithm of deep reinforcement learning has been reproduced, reaching the same level of indicators as the paper in Atari benchmarks.

Original paper: GA3C: GPU-based A3C for Deep Reinforcement Learning

A hybrid CPU/GPU version of the Asynchronous Advantage Actor-Critic (A3C) algorithm.

Atari games introduction

Please see here to know more about Atari games.

Benchmark result

Results with one learner (in a P40 GPU) and 24 simulators (in 12 CPU) in 10 million sample steps.
GA3C_Pong GA3C_Breakout
GA3C_BeamRider GA3C_Qbert
GA3C_SpaceInvaders

How to use

Dependencies

paddlepaddle>=1.8.5
parl
gym==0.12.1
atari-py==0.1.7

Distributed Training

At first, We can start a local cluster with 24 CPUs:

xparl start --port 8010 --cpu_num 24

Note that if you have started a master before, you don't have to run the above
command. For more information about the cluster, please refer to our
documentation

Then we can start the distributed training by running:

python train.py

[Tips] The performance can be influenced dramatically in a slower computational
environment, especially when training with low-speed CPUs. It may be caused by
the policy-lag problem.

Reference

PARL 是一个高性能、灵活的强化学习框架

ai开发工具

Python C++ JavaScript Markdown Shell other

2466956298@qq.com zenghongsheng@baidu.com 39279048+Banmahhhh@users.noreply.github.com likejiao@baidu.com zhoubo01@baidu.com 76139596+ShuaibinLi@users.noreply.github.com 52879090+YuechengLiu@users.noreply.github.com royxroy@163.com zenghsh3@gmail.com tan_ze@outlook.com 52879090+liuyuecheng-github@users.noreply.github.com haonanyu@baidu.com cclauss@me.com yu239@users.noreply.github.com tangzhiyi11@users.noreply.github.com bestwanglei@gmail.com skylian@users.noreply.github.com emailweixu@gmail.com wangzelong0663@gmail.com wyattliang@gmail.com bnujli@gmail.com yhan_shen@163.com 40143136+Ynjxsjmh@users.noreply.github.com 41483463+goshawk22@users.noreply.github.com alexqdh@foxmail.com

How to access data resources in code