关于GCU、沐曦GPGPU、MLU、0卡V100资源4月7日恢复上架的公告>>> 关于共建具身智能开源数据集的倡议>>> 关于云脑任务中统一路径访问方式的公告>>> 关于将启智集群GPU资源迁移至智算集群的公告>>>

History

jbwang1997 a828499d28 [Docs] Replace markdownlint with mdformat for avoiding installing ruby (#8009 )		2 years ago
..
README.md	[Docs] Replace markdownlint with mdformat for avoiding installing ruby (#8009)	2 years ago

maskformer_r50_mstrain_16x1_75e_coco.py	[Enhance] MaskFormer refactor (#7471)	2 years ago

maskformer_swin-l-p4-w12_mstrain_64x1_300e_coco.py	Release mask2former (#7595)	2 years ago

metafile.yml	Release mask2former (#7595)	2 years ago

README.md

MaskFormer

MaskFormer

Per-Pixel Classification is Not All You Need for Semantic Segmentation

Abstract

Modern approaches typically formulate semantic segmentation as a per-pixel classification task, while instance-level segmentation is handled with an alternative mask classification. Our key insight: mask classification is sufficiently general to solve both semantic- and instance-level segmentation tasks in a unified manner using the exact same model, loss, and training procedure. Following this observation, we propose MaskFormer, a simple mask classification model which predicts a set of binary masks, each associated with a single global class label prediction. Overall, the proposed mask classification-based method simplifies the landscape of effective approaches to semantic and panoptic segmentation tasks and shows excellent empirical results. In particular, we observe that MaskFormer outperforms per-pixel classification baselines when the number of classes is large. Our mask classification-based method outperforms both current state-of-the-art semantic (55.6 mIoU on ADE20K) and panoptic segmentation (52.7 PQ on COCO) models.

Introduction

MaskFormer requires COCO and COCO-panoptic dataset for training and evaluation. You need to download and extract it in the COCO dataset path.
The directory should be like this.

mmdetection
├── mmdet
├── tools
├── configs
├── data
│   ├── coco
│   │   ├── annotations
│   │   │   ├── panoptic_train2017.json
│   │   │   ├── panoptic_train2017
│   │   │   ├── panoptic_val2017.json
│   │   │   ├── panoptic_val2017
│   │   ├── train2017
│   │   ├── val2017
│   │   ├── test2017

Results and Models

Backbone	style	Lr schd	Mem (GB)	Inf time (fps)	PQ	SQ	RQ	PQ_th	SQ_th	RQ_th	PQ_st	SQ_st	RQ_st	Config	Download	detail
R-50	pytorch	75e	16.2	-	46.854	80.617	57.085	51.089	81.511	61.853	40.463	79.269	49.888	config	model \| log	This version was mentioned in Table XI, in paper Masked-attention Mask Transformer for Universal Image Segmentation
Swin-L	pytorch	300e	27.2	-	53.249	81.704	64.231	58.798	82.923	70.282	44.874	79.863	55.097	config	model \| log	-

Citation

@inproceedings{cheng2021maskformer,
  title={Per-Pixel Classification is Not All You Need for Semantic Segmentation},
  author={Bowen Cheng and Alexander G. Schwing and Alexander Kirillov},
  journal={NeurIPS},
  year={2021}
}