- Fixed a but where magnitude pruning pruned too many parameters when the weight was dense (>95% density) and the pruning rate was small (<5%). First experiments on LeNet-5 Caffe indicate that this change did not affect performance for networks that learn to have dense weights. I will replicate this across architectures to make sure this bugfix does not change performance.
- Fixed instabilities in SET (sparse evolutionary training) pruning which could cause nan values in specific circumstances.
- Added basic docstring documentation
- MNIST/CIFAR: Separate log files are not created for different models/densities/names.
- MNIST/CIFAR: Aggregate mean test accuracy with standard errors can now be automatically extracted from logs with
python get_results_from_logs.py
.
- Changed names from "death" to "prune" to be more consistent with the terminology in the paper.
- Added --verbose argument to print the parameter distribution before/after pruning at the end of each epoch. By default, the pruning distribution will no longer be printed.
- Removed --sparse flag and added --dense flag. The default is args.dense==False and thus sparse mode is enabled by default. To run a dense model just pass the --dense argument.
- Fixed a bug where global pruning would throw an error if a layer was fully dense and had a low prune rate.
- Added FP16 support. Any model can now be run in 16-bit by passing the apex
FP16_Optimizer
into theMasking
class and replacingloss.backward()
withoptimizer.backward(loss)
. - Added adapted Dynamic Sparse Reparameterization codebase that works with sparse momentum.
- Added modular architecture for growth/prune/redistribution algorithms which is decoupled from the main library. This enables you to write your own prune/growth/redistribution algorithms without touched the library internals. A tutorial on how to add your own functions was also added: How to Add Your Own Algorithms.
- Changed to boolean indexing for PyTorch 1.2 compatibility.
- Fixed an error where an error can occur for global pruning algorithms if very few weights were removed for a layer.
- Removed momentum reset. This feature did not have any effect on performance and made the algorithm more complex.
- Fixed an error where two layers of VGG16 were removed by use of the
remove_weight_partial_name()
function. Results were slightly degraded, but weights needed for dense performance and relative ordering compared to other methods remained the same.
- Evaluation script can now aggregate log files organized in a folder hierarchy. For each folder results will be aggregated.
- Added decay schedule argument. One can choose between Linear and Cosine prune rate decay schedules.
- Added new ImageNet baseline which is based on the codebase of Mostafa & Wang, 2019.
- Added a max-thread argument which can be used to set the total maximum data loader threads for training, validation and test set data loaders.