Skip to content

v0.4.8

Compare
Choose a tag to compare
@PaParaZz1 PaParaZz1 released this 25 May 05:27
· 146 commits to main since this release

API Change

  1. stop value is not the necessary field in config, defaults to math.inf, users can indicate max_env_step or max_train_iter in training entry to run the program with a fixed termination condition.

Env

  1. fix gym hybrid reward dtype bug (#664)
  2. fix atari env id noframeskip bug (#655)
  3. fix typo in gym any_trading env (#654)
  4. update td3bc d4rl config (#659)
  5. polish bipedalwalker config

Algorithm

  1. add EDAC offline RL algorithm (#639)
  2. add LN and GN norm_type support in ResBlock (#660)
  3. add normal value norm baseline for PPOF (#658)
  4. polish last layer init/norm in MLP (#650)
  5. polish TD3 monitor variable

Enhancement

  1. add MAPPO/MASAC task example (#661)
  2. add PPO example for complex env observation (#644)
  3. add barrier middleware (#570)

Fix

  1. fix abnormal collector log and add record_random_collect option (#662)
  2. fix to_item compatibility bug (#646)
  3. fix trainer dtype transform compatibility bug
  4. fix pettingzoo 1.23.0 compatibility bug
  5. fix ensemble head unittest bug

Style

  1. fix incompatible gym version bug in Dockerfile.env (#653)
  2. add more algorithm docs

New Repo

  1. LightZero: A lightweight and efficient MCTS/AlphaZero/MuZero algorithm toolkit.

Full Changelog: v0.4.6...v0.4.7

Contributors: @PaParaZz1 @zjowowen @puyuan1996 @SolenoidWGT @Super1ce @karroyan @zhangpaipai @eltociear