v0.4.8

PaParaZz1 released this 25 May 05:27

· 146 commits to main since this release

API Change

stop value is not the necessary field in config, defaults to math.inf, users can indicate max_env_step or max_train_iter in training entry to run the program with a fixed termination condition.

Env

fix gym hybrid reward dtype bug (#664)
fix atari env id noframeskip bug (#655)
fix typo in gym any_trading env (#654)
update td3bc d4rl config (#659)
polish bipedalwalker config

Algorithm

add EDAC offline RL algorithm (#639)
add LN and GN norm_type support in ResBlock (#660)
add normal value norm baseline for PPOF (#658)
polish last layer init/norm in MLP (#650)
polish TD3 monitor variable

Enhancement

add MAPPO/MASAC task example (#661)
add PPO example for complex env observation (#644)
add barrier middleware (#570)

Fix

fix abnormal collector log and add record_random_collect option (#662)
fix to_item compatibility bug (#646)
fix trainer dtype transform compatibility bug
fix pettingzoo 1.23.0 compatibility bug
fix ensemble head unittest bug

Style

fix incompatible gym version bug in Dockerfile.env (#653)
add more algorithm docs

New Repo

LightZero: A lightweight and efficient MCTS/AlphaZero/MuZero algorithm toolkit.

Full Changelog: v0.4.6...v0.4.7

Contributors: @PaParaZz1 @zjowowen @puyuan1996 @SolenoidWGT @Super1ce @karroyan @zhangpaipai @eltociear

Contributors

eltociear, karroyan, and 6 other contributors

Assets 2