v0.4.8
API Change
stop value
is not the necessary field in config, defaults tomath.inf
, users can indicatemax_env_step
ormax_train_iter
in training entry to run the program with a fixed termination condition.
Env
- fix gym hybrid reward dtype bug (#664)
- fix atari env id noframeskip bug (#655)
- fix typo in gym any_trading env (#654)
- update td3bc d4rl config (#659)
- polish bipedalwalker config
Algorithm
- add EDAC offline RL algorithm (#639)
- add LN and GN norm_type support in ResBlock (#660)
- add normal value norm baseline for PPOF (#658)
- polish last layer init/norm in MLP (#650)
- polish TD3 monitor variable
Enhancement
- add MAPPO/MASAC task example (#661)
- add PPO example for complex env observation (#644)
- add barrier middleware (#570)
Fix
- fix abnormal collector log and add record_random_collect option (#662)
- fix to_item compatibility bug (#646)
- fix trainer dtype transform compatibility bug
- fix pettingzoo 1.23.0 compatibility bug
- fix ensemble head unittest bug
Style
New Repo
- LightZero: A lightweight and efficient MCTS/AlphaZero/MuZero algorithm toolkit.
Full Changelog: v0.4.6...v0.4.7
Contributors: @PaParaZz1 @zjowowen @puyuan1996 @SolenoidWGT @Super1ce @karroyan @zhangpaipai @eltociear