v0.4.5
API Change
- Move default examples about adding new env from extending
BaseEnv
to utilizeDingEnvWrapper
- rename
final_eval_reward
toeval_episode_return
in all related codes (including envs and evaluators)
Env
- add beergame supply chain optimization env (#512)
- add env gym_pybullet_drones (#526)
- rename
eval reward
toepisode return
(#536)
Algorithm
- add policy gradient algo implementation (#544)
- add MADDPG algo implementation (#550)
- add IMPALA continuous algo implementation (#551)
- add MADQN algo implementation (#540)
Enhancement
- add new task IMPALA-type distributed training scheme (#321)
- add load and save method for replaybuffer (#542)
- add more DingEnvWrapper example (#525)
- add evaluator more info viz support (#538)
- add trackback log for subprocess env manager (#534)
Fix
- fix halfcheetah td3 config file (#537)
- fix mujoco action_clip args compatibility bug (#535)
- fix atari a2c config entry bug
- fix drex unittest compatibility bug
Style
- add Roadmap issue of DI-engine (#548)
- update related project link and new env doc
New Project
- PPOxFamily: PPO x Family DRL Tutorial Course
- ACE: [AAAI 2023] Official PyTorch implementation of paper "ACE: Cooperative Multi-agent Q-learning with Bidirectional Action-Dependency".
Contributors: @PaParaZz1 @sailxjx @zjowowen @hiha3456 @Weiyuhong-1998 @kxzxvbk @song2181 @zerlinwang