ENet
Model description
ENet is a semantic segmentation architecture which utilises a compact encoder-decoder architecture.
Some design choices include:
- Using the SegNet approach to downsampling y saving indices of elements chosen in max pooling layers, and using them to produce sparse upsampled maps in the decoder.
- Early downsampling to optimize the early stages of the network and reduce the cost of processing large input frames. The first two blocks of ENet heavily reduce the input size, and use only a small set of feature maps.
- Using PReLUs as an activation function.
- Using dilated convolutions.
- Using Spatial Dropout.
Step 1: Installing
Install packages
pip3 install 'scipy' 'matplotlib' 'pycocotools' 'opencv-python' 'easydict' 'tqdm'
Step 2: Training
Preparing datasets
Go to visit COCO official website, then select the COCO dataset you want to download.
Take coco2017 dataset as an example, specify /path/to/coco2017
to your COCO path in later training process, the unzipped dataset path structure sholud look like:
coco2017
├── annotations
│ ├── instances_train2017.json
│ ├── instances_val2017.json
│ └── ...
├── train2017
│ ├── 000000000009.jpg
│ ├── 000000000025.jpg
│ └── ...
├── val2017
│ ├── 000000000139.jpg
│ ├── 000000000285.jpg
│ └── ...
├── train2017.txt
├── val2017.txt
└── ...
Training on COCO dataset
bash train_enet_dist.sh --data-path /path/to/coco2017/ --dataset coco
Reference
Ref: https://github.com/LikeLy-Journey/SegmenTron
Ref: torchvision