XCompression

Introduction

This is an Basic compression library that integrates some pruning and quantization algorithms.

Included method

Pruning

Quantization

Type	Name	Paper
QAT	Default
QAT	DoReFa	DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients
QAT	PACT	Parameterized Clipping Activation for Quantized Neural Networks
QAT	TQT	Trained Quantization Thresholds for Accurate and Efficient Fixed-Point Inference of Deep Neural Networks
QAT	LSQ	Learned Step Size Quantization
QAT	LSQ+	LSQ+: Improving low-bit quantization through learnable offsets and better initialization
QAT	DSQ	Differentiable Soft Quantization: Bridging Full-Precision and Low-Bit Neural Networks

Results and Models

Here are some experiment results.
We will release more quantized models with different configurations soon.

All these models can be downloaded from Dropbox.

Network	Config. File	Model	Bitwidth (W/A)	Top-1 Acc. (%)	Top-5 Acc. (%)
ResNet-18	link	link	3/2	66.9	87.2

User Guide

Install Dependencies

First install library dependencies within an Anaconda environment.

# Create a environment with Python 3.8
conda create -n xcom python=3.8
# PyTorch GPU version >= 1.5
conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
# Tensorboard visualization tool
conda install tensorboard
# Miscellaneous
conda install scikit-learn pyyaml munch

Run Scripts with Your Configurations

Run the main.py with your modified configuration file.

python main.py /path/to/your/config/file.yaml

The quantization parameters are defined as follows:

w_bits          -->bit width of quantized weight
a_bits          -->bit width of quantized activation
q_type          -->quant_type:0-fixed
q_level         -->quant_level:0-per_channel, 1-per_layer (only for weight, activation uses per-layer by default)
w_symmetric     -->symmetric for weight
a_symmetric     -->symmetric for activation
weight_observer -->quant_weight_observer:0-MinMaxObserver, 1-MovingAverageMinMaxObserver
bn_fuse         -->batch-normalization fuse
bn_fuse_calib   -->batch-normalization fuse calibration
qaft            -->quantization-aware-finetune
ptq             -->use post-training-quantization
ptq_control     -->ptq control flag
ptq_batch       -->the batch of ptq
ptq_percentile  -->the percentile of ptq

You can find some example configuration files in the test folder.

3.1 KiB Raw Permalink Blame History