BEVFormer
Model description
In this work, the authors present a new framework termed BEVFormer, which learns unified BEV representations with spatiotemporal transformers to support multiple autonomous driving perception tasks. In a nutshell, BEVFormer exploits both spatial and temporal information by interacting with spatial and temporal space through predefined grid-shaped BEV queries. To aggregate spatial information, the authors design a spatial cross-attention that each BEV query extracts the spatial features from the regions of interest across camera views. For temporal information, the authors propose a temporal self-attention to recurrently fuse the history BEV information.
The proposed approach achieves the new state-of-the-art 56.9% in terms of NDS metric on the nuScenes test set, which is 9.0 points higher than previous best arts and on par with the performance of LiDAR-based baselines.
Prepare
Install mmcv-full.
cd mmcv
bash clean_mmcv.sh
bash build_mmcv.sh
bash install_mmcv.sh
Install mmdet and mmseg.
pip3 install mmdet==2.25.0
pip3 install mmsegmentation==0.25.0
Install mmdet3d from source code.
cd ../mmdetection3d
pip3 install -r requirements.txt,OR pip3 install -r requirements/optional.txt,pip3 install -r requirements/runtime.txt,pip3 install -r requirements/tests.txt
python3 setup.py install
Install timm.
pip3 install timm
NuScenes
Download nuScenes V1.0-mini data and CAN bus expansion data HERE. Prepare nuscenes data by running
Download CAN bus expansion
cd ..
mkdir data
cd data
# download 'can_bus.zip'
unzip can_bus.zip
# move can_bus to data dir
Prepare nuScenes data
We genetate custom annotation files which are different from mmdet3d's
cd ..
python3 tools/create_data.py nuscenes --root-path ./data/nuscenes --out-dir ./data/nuscenes --extra-tag nuscenes --version v1.0-mini --canbus ./data
Using the above code will generate nuscenes_infos_temporal_{train,val}.pkl
.
Prepare pretrained models
mkdir ckpts
cd ckpts & wget https://github.com/zhiqi-li/storage/releases/download/v1.0/bevformer_r101_dcn_24ep.pth
cd ..
Prerequisites
Please ensure you have prepared the environment and the nuScenes dataset.
Train and Test
Train BEVFormer with 8 GPUs
./tools/dist_train.sh ./projects/configs/bevformer/bevformer_base.py 8
Eval BEVFormer with 8 GPUs
./tools/dist_test.sh ./projects/configs/bevformer/bevformer_base.py ./path/to/ckpts.pth 8
Note: using 1 GPU to eval can obtain slightly higher performance because continuous video may be truncated with multiple GPUs. By default we report the score evaled with 8 GPUs.
Using FP16 to train the model.
The above training script can not support FP16 training,
and we provide another script to train BEVFormer with FP16.
./tools/fp16/dist_train.sh ./projects/configs/bevformer_fp16/bevformer_tiny_fp16.py 8
Results on BI-V100
GPUs |
model |
NDS |
mAP |
1x8 |
bevformer_base |
0.3516 |
0.3701 |
Reference:
Paper in arXiv