#22 master

Closed
xutianbao wants to merge 30 commits from master into cae_decoder4
  1. +763
    -0
      DATASET.md
  2. +64
    -0
      README.md
  3. +3
    -2
      cfgs/finetune_classification/full/finetune_scan_hardest.yaml
  4. +2
    -2
      models/act.py
  5. +4
    -3
      tools/builder.py
  6. +3
    -3
      utils/parser.py

+ 763
- 0
DATASET.md View File

@@ -104,6 +104,769 @@ The overall directory structure should be:
Please prepare the dataset following [PointNet](https://github.com/yanx27/Pointnet_Pointnet2_pytorch):
download the `Stanford3dDataset_v1.2_Aligned_Version` from [here](http://buildingparser.stanford.edu/dataset.html), and get the processed `stanford_indoor3d` with:

```shell
cd data_utils
python collect_indoor3d_data.py
```## Datasets

The overall directory structure should be:
```shell
│ACT/
├──cfgs/
├──data/
│ ├──ModelNet/
│ ├──ModelNetFewshot/
│ ├──ScanObjectNN/
│ ├──ShapeNet55-34/
│ ├──shapenetcore_partanno_segmentation_benchmark_v0_normal/
│ ├──Stanford3dDataset_v1.2_Aligned_Version/
│ ├──s3dis/
├──datasets/
├──.......
```

### ModelNet40 Dataset:

```shell
│ModelNet/
├──modelnet40_normal_resampled/
│ ├── modelnet40_shape_names.txt
│ ├── modelnet40_train.txt
│ ├── modelnet40_test.txt
│ ├── modelnet40_train_8192pts_fps.dat
│ ├── modelnet40_test_8192pts_fps.dat
```
* Download: The data can be downloaded from [Point-BERT](https://github.com/lulutang0608/Point-BERT/blob/49e2c7407d351ce8fe65764bbddd5d9c0e0a4c52/DATASET.md), or from the [official website](https://modelnet.cs.princeton.edu/#) and processed by yourself.

### ModelNet Few-shot Dataset:
```shell
│ModelNetFewshot/
├──5way10shot/
│ ├── 0.pkl
│ ├── ...
│ ├── 9.pkl
├──5way20shot/
│ ├── ...
├──10way10shot/
│ ├── ...
├──10way20shot/
│ ├── ...
```

* Download: The data can be downloaded from [Point-BERT](https://github.com/lulutang0608/Point-BERT/blob/49e2c7407d351ce8fe65764bbddd5d9c0e0a4c52/DATASET.md). We use the same data split as theirs.

### ScanObjectNN Dataset:
```shell
│ScanObjectNN/
├──main_split/
│ ├── training_objectdataset_augmentedrot_scale75.h5
│ ├── test_objectdataset_augmentedrot_scale75.h5
│ ├── training_objectdataset.h5
│ ├── test_objectdataset.h5
├──main_split_nobg/
│ ├── training_objectdataset.h5
│ ├── test_objectdataset.h5
```
* Download: The data can be downloaded from [official website](https://hkust-vgd.github.io/scanobjectnn/).

### ShapeNet55/34 Dataset:

```shell
│ShapeNet55-34/
├──shapenet_pc_masksurf_with_normal/
│ ├── 02691156-1a04e3eab45ca15dd86060f189eb133.npy
│ ├── 02691156-1a6ad7a24bb89733f412783097373bdc.npy
│ ├── .......
├──ShapeNet-55/
│ ├── train.txt
│ └── test.txt
```

* Download: The data can be downloaded from [Point-BERT](https://github.com/lulutang0608/Point-BERT/blob/49e2c7407d351ce8fe65764bbddd5d9c0e0a4c52/DATASET.md). We use the same data split as theirs.

### ShapeNetPart Dataset:

```shell
|shapenetcore_partanno_segmentation_benchmark_v0_normal/
├──02691156/
│ ├── 1a04e3eab45ca15dd86060f189eb133.txt
│ ├── .......
│── .......
│──train_test_split/
│──synsetoffset2category.txt
```

* Download: The data can be downloaded from [official website](https://shapenet.cs.stanford.edu/media/shapenetcore_partanno_segmentation_benchmark_v0_normal.zip).

### S3DIS Dataset:

```shell
|Stanford3dDataset_v1.2_Aligned_Version/
├──Area_1/
│ ├── conferenceRoom_1
│ ├── .......
│── .......
│stanford_indoor3d
│──Area_1_conferenceRoom_1.npy
│──Area_1_office_19.npy
```
Please prepare the dataset following [PointNet](https://github.com/yanx27/Pointnet_Pointnet2_pytorch):
download the `Stanford3dDataset_v1.2_Aligned_Version` from [here](http://buildingparser.stanford.edu/dataset.html), and get the processed `stanford_indoor3d` with:

```shell
cd data_utils
python collect_indoor3d_data.py
```## Datasets

The overall directory structure should be:
```shell
│ACT/
├──cfgs/
├──data/
│ ├──ModelNet/
│ ├──ModelNetFewshot/
│ ├──ScanObjectNN/
│ ├──ShapeNet55-34/
│ ├──shapenetcore_partanno_segmentation_benchmark_v0_normal/
│ ├──Stanford3dDataset_v1.2_Aligned_Version/
│ ├──s3dis/
├──datasets/
├──.......
```

### ModelNet40 Dataset:

```shell
│ModelNet/
├──modelnet40_normal_resampled/
│ ├── modelnet40_shape_names.txt
│ ├── modelnet40_train.txt
│ ├── modelnet40_test.txt
│ ├── modelnet40_train_8192pts_fps.dat
│ ├── modelnet40_test_8192pts_fps.dat
```
* Download: The data can be downloaded from [Point-BERT](https://github.com/lulutang0608/Point-BERT/blob/49e2c7407d351ce8fe65764bbddd5d9c0e0a4c52/DATASET.md), or from the [official website](https://modelnet.cs.princeton.edu/#) and processed by yourself.

### ModelNet Few-shot Dataset:
```shell
│ModelNetFewshot/
├──5way10shot/
│ ├── 0.pkl
│ ├── ...
│ ├── 9.pkl
├──5way20shot/
│ ├── ...
├──10way10shot/
│ ├── ...
├──10way20shot/
│ ├── ...
```

* Download: The data can be downloaded from [Point-BERT](https://github.com/lulutang0608/Point-BERT/blob/49e2c7407d351ce8fe65764bbddd5d9c0e0a4c52/DATASET.md). We use the same data split as theirs.

### ScanObjectNN Dataset:
```shell
│ScanObjectNN/
├──main_split/
│ ├── training_objectdataset_augmentedrot_scale75.h5
│ ├── test_objectdataset_augmentedrot_scale75.h5
│ ├── training_objectdataset.h5
│ ├── test_objectdataset.h5
├──main_split_nobg/
│ ├── training_objectdataset.h5
│ ├── test_objectdataset.h5
```
* Download: The data can be downloaded from [official website](https://hkust-vgd.github.io/scanobjectnn/).

### ShapeNet55/34 Dataset:

```shell
│ShapeNet55-34/
├──shapenet_pc_masksurf_with_normal/
│ ├── 02691156-1a04e3eab45ca15dd86060f189eb133.npy
│ ├── 02691156-1a6ad7a24bb89733f412783097373bdc.npy
│ ├── .......
├──ShapeNet-55/
│ ├── train.txt
│ └── test.txt
```

* Download: The data can be downloaded from [Point-BERT](https://github.com/lulutang0608/Point-BERT/blob/49e2c7407d351ce8fe65764bbddd5d9c0e0a4c52/DATASET.md). We use the same data split as theirs.

### ShapeNetPart Dataset:

```shell
|shapenetcore_partanno_segmentation_benchmark_v0_normal/
├──02691156/
│ ├── 1a04e3eab45ca15dd86060f189eb133.txt
│ ├── .......
│── .......
│──train_test_split/
│──synsetoffset2category.txt
```

* Download: The data can be downloaded from [official website](https://shapenet.cs.stanford.edu/media/shapenetcore_partanno_segmentation_benchmark_v0_normal.zip).

### S3DIS Dataset:

```shell
|Stanford3dDataset_v1.2_Aligned_Version/
├──Area_1/
│ ├── conferenceRoom_1
│ ├── .......
│── .......
│stanford_indoor3d
│──Area_1_conferenceRoom_1.npy
│──Area_1_office_19.npy
```
Please prepare the dataset following [PointNet](https://github.com/yanx27/Pointnet_Pointnet2_pytorch):
download the `Stanford3dDataset_v1.2_Aligned_Version` from [here](http://buildingparser.stanford.edu/dataset.html), and get the processed `stanford_indoor3d` with:

```shell
cd data_utils
python collect_indoor3d_data.py
```## Datasets

The overall directory structure should be:
```shell
│ACT/
├──cfgs/
├──data/
│ ├──ModelNet/
│ ├──ModelNetFewshot/
│ ├──ScanObjectNN/
│ ├──ShapeNet55-34/
│ ├──shapenetcore_partanno_segmentation_benchmark_v0_normal/
│ ├──Stanford3dDataset_v1.2_Aligned_Version/
│ ├──s3dis/
├──datasets/
├──.......
```

### ModelNet40 Dataset:

```shell
│ModelNet/
├──modelnet40_normal_resampled/
│ ├── modelnet40_shape_names.txt
│ ├── modelnet40_train.txt
│ ├── modelnet40_test.txt
│ ├── modelnet40_train_8192pts_fps.dat
│ ├── modelnet40_test_8192pts_fps.dat
```
* Download: The data can be downloaded from [Point-BERT](https://github.com/lulutang0608/Point-BERT/blob/49e2c7407d351ce8fe65764bbddd5d9c0e0a4c52/DATASET.md), or from the [official website](https://modelnet.cs.princeton.edu/#) and processed by yourself.

### ModelNet Few-shot Dataset:
```shell
│ModelNetFewshot/
├──5way10shot/
│ ├── 0.pkl
│ ├── ...
│ ├── 9.pkl
├──5way20shot/
│ ├── ...
├──10way10shot/
│ ├── ...
├──10way20shot/
│ ├── ...
```

* Download: The data can be downloaded from [Point-BERT](https://github.com/lulutang0608/Point-BERT/blob/49e2c7407d351ce8fe65764bbddd5d9c0e0a4c52/DATASET.md). We use the same data split as theirs.

### ScanObjectNN Dataset:
```shell
│ScanObjectNN/
├──main_split/
│ ├── training_objectdataset_augmentedrot_scale75.h5
│ ├── test_objectdataset_augmentedrot_scale75.h5
│ ├── training_objectdataset.h5
│ ├── test_objectdataset.h5
├──main_split_nobg/
│ ├── training_objectdataset.h5
│ ├── test_objectdataset.h5
```
* Download: The data can be downloaded from [official website](https://hkust-vgd.github.io/scanobjectnn/).

### ShapeNet55/34 Dataset:

```shell
│ShapeNet55-34/
├──shapenet_pc_masksurf_with_normal/
│ ├── 02691156-1a04e3eab45ca15dd86060f189eb133.npy
│ ├── 02691156-1a6ad7a24bb89733f412783097373bdc.npy
│ ├── .......
├──ShapeNet-55/
│ ├── train.txt
│ └── test.txt
```

* Download: The data can be downloaded from [Point-BERT](https://github.com/lulutang0608/Point-BERT/blob/49e2c7407d351ce8fe65764bbddd5d9c0e0a4c52/DATASET.md). We use the same data split as theirs.

### ShapeNetPart Dataset:

```shell
|shapenetcore_partanno_segmentation_benchmark_v0_normal/
├──02691156/
│ ├── 1a04e3eab45ca15dd86060f189eb133.txt
│ ├── .......
│── .......
│──train_test_split/
│──synsetoffset2category.txt
```

* Download: The data can be downloaded from [official website](https://shapenet.cs.stanford.edu/media/shapenetcore_partanno_segmentation_benchmark_v0_normal.zip).

### S3DIS Dataset:

```shell
|Stanford3dDataset_v1.2_Aligned_Version/
├──Area_1/
│ ├── conferenceRoom_1
│ ├── .......
│── .......
│stanford_indoor3d
│──Area_1_conferenceRoom_1.npy
│──Area_1_office_19.npy
```
Please prepare the dataset following [PointNet](https://github.com/yanx27/Pointnet_Pointnet2_pytorch):
download the `Stanford3dDataset_v1.2_Aligned_Version` from [here](http://buildingparser.stanford.edu/dataset.html), and get the processed `stanford_indoor3d` with:

```shell
cd data_utils
python collect_indoor3d_data.py
```## Datasets

The overall directory structure should be:
```shell
│ACT/
├──cfgs/
├──data/
│ ├──ModelNet/
│ ├──ModelNetFewshot/
│ ├──ScanObjectNN/
│ ├──ShapeNet55-34/
│ ├──shapenetcore_partanno_segmentation_benchmark_v0_normal/
│ ├──Stanford3dDataset_v1.2_Aligned_Version/
│ ├──s3dis/
├──datasets/
├──.......
```

### ModelNet40 Dataset:

```shell
│ModelNet/
├──modelnet40_normal_resampled/
│ ├── modelnet40_shape_names.txt
│ ├── modelnet40_train.txt
│ ├── modelnet40_test.txt
│ ├── modelnet40_train_8192pts_fps.dat
│ ├── modelnet40_test_8192pts_fps.dat
```
* Download: The data can be downloaded from [Point-BERT](https://github.com/lulutang0608/Point-BERT/blob/49e2c7407d351ce8fe65764bbddd5d9c0e0a4c52/DATASET.md), or from the [official website](https://modelnet.cs.princeton.edu/#) and processed by yourself.

### ModelNet Few-shot Dataset:
```shell
│ModelNetFewshot/
├──5way10shot/
│ ├── 0.pkl
│ ├── ...
│ ├── 9.pkl
├──5way20shot/
│ ├── ...
├──10way10shot/
│ ├── ...
├──10way20shot/
│ ├── ...
```

* Download: The data can be downloaded from [Point-BERT](https://github.com/lulutang0608/Point-BERT/blob/49e2c7407d351ce8fe65764bbddd5d9c0e0a4c52/DATASET.md). We use the same data split as theirs.

### ScanObjectNN Dataset:
```shell
│ScanObjectNN/
├──main_split/
│ ├── training_objectdataset_augmentedrot_scale75.h5
│ ├── test_objectdataset_augmentedrot_scale75.h5
│ ├── training_objectdataset.h5
│ ├── test_objectdataset.h5
├──main_split_nobg/
│ ├── training_objectdataset.h5
│ ├── test_objectdataset.h5
```
* Download: The data can be downloaded from [official website](https://hkust-vgd.github.io/scanobjectnn/).

### ShapeNet55/34 Dataset:

```shell
│ShapeNet55-34/
├──shapenet_pc_masksurf_with_normal/
│ ├── 02691156-1a04e3eab45ca15dd86060f189eb133.npy
│ ├── 02691156-1a6ad7a24bb89733f412783097373bdc.npy
│ ├── .......
├──ShapeNet-55/
│ ├── train.txt
│ └── test.txt
```

* Download: The data can be downloaded from [Point-BERT](https://github.com/lulutang0608/Point-BERT/blob/49e2c7407d351ce8fe65764bbddd5d9c0e0a4c52/DATASET.md). We use the same data split as theirs.

### ShapeNetPart Dataset:

```shell
|shapenetcore_partanno_segmentation_benchmark_v0_normal/
├──02691156/
│ ├── 1a04e3eab45ca15dd86060f189eb133.txt
│ ├── .......
│── .......
│──train_test_split/
│──synsetoffset2category.txt
```

* Download: The data can be downloaded from [official website](https://shapenet.cs.stanford.edu/media/shapenetcore_partanno_segmentation_benchmark_v0_normal.zip).

### S3DIS Dataset:

```shell
|Stanford3dDataset_v1.2_Aligned_Version/
├──Area_1/
│ ├── conferenceRoom_1
│ ├── .......
│── .......
│stanford_indoor3d
│──Area_1_conferenceRoom_1.npy
│──Area_1_office_19.npy
```
Please prepare the dataset following [PointNet](https://github.com/yanx27/Pointnet_Pointnet2_pytorch):
download the `Stanford3dDataset_v1.2_Aligned_Version` from [here](http://buildingparser.stanford.edu/dataset.html), and get the processed `stanford_indoor3d` with:

```shell
cd data_utils
python collect_indoor3d_data.py
```## Datasets

The overall directory structure should be:
```shell
│ACT/
├──cfgs/
├──data/
│ ├──ModelNet/
│ ├──ModelNetFewshot/
│ ├──ScanObjectNN/
│ ├──ShapeNet55-34/
│ ├──shapenetcore_partanno_segmentation_benchmark_v0_normal/
│ ├──Stanford3dDataset_v1.2_Aligned_Version/
│ ├──s3dis/
├──datasets/
├──.......
```

### ModelNet40 Dataset:

```shell
│ModelNet/
├──modelnet40_normal_resampled/
│ ├── modelnet40_shape_names.txt
│ ├── modelnet40_train.txt
│ ├── modelnet40_test.txt
│ ├── modelnet40_train_8192pts_fps.dat
│ ├── modelnet40_test_8192pts_fps.dat
```
* Download: The data can be downloaded from [Point-BERT](https://github.com/lulutang0608/Point-BERT/blob/49e2c7407d351ce8fe65764bbddd5d9c0e0a4c52/DATASET.md), or from the [official website](https://modelnet.cs.princeton.edu/#) and processed by yourself.

### ModelNet Few-shot Dataset:
```shell
│ModelNetFewshot/
├──5way10shot/
│ ├── 0.pkl
│ ├── ...
│ ├── 9.pkl
├──5way20shot/
│ ├── ...
├──10way10shot/
│ ├── ...
├──10way20shot/
│ ├── ...
```

* Download: The data can be downloaded from [Point-BERT](https://github.com/lulutang0608/Point-BERT/blob/49e2c7407d351ce8fe65764bbddd5d9c0e0a4c52/DATASET.md). We use the same data split as theirs.

### ScanObjectNN Dataset:
```shell
│ScanObjectNN/
├──main_split/
│ ├── training_objectdataset_augmentedrot_scale75.h5
│ ├── test_objectdataset_augmentedrot_scale75.h5
│ ├── training_objectdataset.h5
│ ├── test_objectdataset.h5
├──main_split_nobg/
│ ├── training_objectdataset.h5
│ ├── test_objectdataset.h5
```
* Download: The data can be downloaded from [official website](https://hkust-vgd.github.io/scanobjectnn/).

### ShapeNet55/34 Dataset:

```shell
│ShapeNet55-34/
├──shapenet_pc_masksurf_with_normal/
│ ├── 02691156-1a04e3eab45ca15dd86060f189eb133.npy
│ ├── 02691156-1a6ad7a24bb89733f412783097373bdc.npy
│ ├── .......
├──ShapeNet-55/
│ ├── train.txt
│ └── test.txt
```

* Download: The data can be downloaded from [Point-BERT](https://github.com/lulutang0608/Point-BERT/blob/49e2c7407d351ce8fe65764bbddd5d9c0e0a4c52/DATASET.md). We use the same data split as theirs.

### ShapeNetPart Dataset:

```shell
|shapenetcore_partanno_segmentation_benchmark_v0_normal/
├──02691156/
│ ├── 1a04e3eab45ca15dd86060f189eb133.txt
│ ├── .......
│── .......
│──train_test_split/
│──synsetoffset2category.txt
```

* Download: The data can be downloaded from [official website](https://shapenet.cs.stanford.edu/media/shapenetcore_partanno_segmentation_benchmark_v0_normal.zip).

### S3DIS Dataset:

```shell
|Stanford3dDataset_v1.2_Aligned_Version/
├──Area_1/
│ ├── conferenceRoom_1
│ ├── .......
│── .......
│stanford_indoor3d
│──Area_1_conferenceRoom_1.npy
│──Area_1_office_19.npy
```
Please prepare the dataset following [PointNet](https://github.com/yanx27/Pointnet_Pointnet2_pytorch):
download the `Stanford3dDataset_v1.2_Aligned_Version` from [here](http://buildingparser.stanford.edu/dataset.html), and get the processed `stanford_indoor3d` with:

```shell
cd data_utils
python collect_indoor3d_data.py
```## Datasets

The overall directory structure should be:
```shell
│ACT/
├──cfgs/
├──data/
│ ├──ModelNet/
│ ├──ModelNetFewshot/
│ ├──ScanObjectNN/
│ ├──ShapeNet55-34/
│ ├──shapenetcore_partanno_segmentation_benchmark_v0_normal/
│ ├──Stanford3dDataset_v1.2_Aligned_Version/
│ ├──s3dis/
├──datasets/
├──.......
```

### ModelNet40 Dataset:

```shell
│ModelNet/
├──modelnet40_normal_resampled/
│ ├── modelnet40_shape_names.txt
│ ├── modelnet40_train.txt
│ ├── modelnet40_test.txt
│ ├── modelnet40_train_8192pts_fps.dat
│ ├── modelnet40_test_8192pts_fps.dat
```
* Download: The data can be downloaded from [Point-BERT](https://github.com/lulutang0608/Point-BERT/blob/49e2c7407d351ce8fe65764bbddd5d9c0e0a4c52/DATASET.md), or from the [official website](https://modelnet.cs.princeton.edu/#) and processed by yourself.

### ModelNet Few-shot Dataset:
```shell
│ModelNetFewshot/
├──5way10shot/
│ ├── 0.pkl
│ ├── ...
│ ├── 9.pkl
├──5way20shot/
│ ├── ...
├──10way10shot/
│ ├── ...
├──10way20shot/
│ ├── ...
```

* Download: The data can be downloaded from [Point-BERT](https://github.com/lulutang0608/Point-BERT/blob/49e2c7407d351ce8fe65764bbddd5d9c0e0a4c52/DATASET.md). We use the same data split as theirs.

### ScanObjectNN Dataset:
```shell
│ScanObjectNN/
├──main_split/
│ ├── training_objectdataset_augmentedrot_scale75.h5
│ ├── test_objectdataset_augmentedrot_scale75.h5
│ ├── training_objectdataset.h5
│ ├── test_objectdataset.h5
├──main_split_nobg/
│ ├── training_objectdataset.h5
│ ├── test_objectdataset.h5
```
* Download: The data can be downloaded from [official website](https://hkust-vgd.github.io/scanobjectnn/).

### ShapeNet55/34 Dataset:

```shell
│ShapeNet55-34/
├──shapenet_pc_masksurf_with_normal/
│ ├── 02691156-1a04e3eab45ca15dd86060f189eb133.npy
│ ├── 02691156-1a6ad7a24bb89733f412783097373bdc.npy
│ ├── .......
├──ShapeNet-55/
│ ├── train.txt
│ └── test.txt
```

* Download: The data can be downloaded from [Point-BERT](https://github.com/lulutang0608/Point-BERT/blob/49e2c7407d351ce8fe65764bbddd5d9c0e0a4c52/DATASET.md). We use the same data split as theirs.

### ShapeNetPart Dataset:

```shell
|shapenetcore_partanno_segmentation_benchmark_v0_normal/
├──02691156/
│ ├── 1a04e3eab45ca15dd86060f189eb133.txt
│ ├── .......
│── .......
│──train_test_split/
│──synsetoffset2category.txt
```

* Download: The data can be downloaded from [official website](https://shapenet.cs.stanford.edu/media/shapenetcore_partanno_segmentation_benchmark_v0_normal.zip).

### S3DIS Dataset:

```shell
|Stanford3dDataset_v1.2_Aligned_Version/
├──Area_1/
│ ├── conferenceRoom_1
│ ├── .......
│── .......
│stanford_indoor3d
│──Area_1_conferenceRoom_1.npy
│──Area_1_office_19.npy
```
Please prepare the dataset following [PointNet](https://github.com/yanx27/Pointnet_Pointnet2_pytorch):
download the `Stanford3dDataset_v1.2_Aligned_Version` from [here](http://buildingparser.stanford.edu/dataset.html), and get the processed `stanford_indoor3d` with:

```shell
cd data_utils
python collect_indoor3d_data.py
```## Datasets

The overall directory structure should be:
```shell
│ACT/
├──cfgs/
├──data/
│ ├──ModelNet/
│ ├──ModelNetFewshot/
│ ├──ScanObjectNN/
│ ├──ShapeNet55-34/
│ ├──shapenetcore_partanno_segmentation_benchmark_v0_normal/
│ ├──Stanford3dDataset_v1.2_Aligned_Version/
│ ├──s3dis/
├──datasets/
├──.......
```

### ModelNet40 Dataset:

```shell
│ModelNet/
├──modelnet40_normal_resampled/
│ ├── modelnet40_shape_names.txt
│ ├── modelnet40_train.txt
│ ├── modelnet40_test.txt
│ ├── modelnet40_train_8192pts_fps.dat
│ ├── modelnet40_test_8192pts_fps.dat
```
* Download: The data can be downloaded from [Point-BERT](https://github.com/lulutang0608/Point-BERT/blob/49e2c7407d351ce8fe65764bbddd5d9c0e0a4c52/DATASET.md), or from the [official website](https://modelnet.cs.princeton.edu/#) and processed by yourself.

### ModelNet Few-shot Dataset:
```shell
│ModelNetFewshot/
├──5way10shot/
│ ├── 0.pkl
│ ├── ...
│ ├── 9.pkl
├──5way20shot/
│ ├── ...
├──10way10shot/
│ ├── ...
├──10way20shot/
│ ├── ...
```

* Download: The data can be downloaded from [Point-BERT](https://github.com/lulutang0608/Point-BERT/blob/49e2c7407d351ce8fe65764bbddd5d9c0e0a4c52/DATASET.md). We use the same data split as theirs.

### ScanObjectNN Dataset:
```shell
│ScanObjectNN/
├──main_split/
│ ├── training_objectdataset_augmentedrot_scale75.h5
│ ├── test_objectdataset_augmentedrot_scale75.h5
│ ├── training_objectdataset.h5
│ ├── test_objectdataset.h5
├──main_split_nobg/
│ ├── training_objectdataset.h5
│ ├── test_objectdataset.h5
```
* Download: The data can be downloaded from [official website](https://hkust-vgd.github.io/scanobjectnn/).

### ShapeNet55/34 Dataset:

```shell
│ShapeNet55-34/
├──shapenet_pc_masksurf_with_normal/
│ ├── 02691156-1a04e3eab45ca15dd86060f189eb133.npy
│ ├── 02691156-1a6ad7a24bb89733f412783097373bdc.npy
│ ├── .......
├──ShapeNet-55/
│ ├── train.txt
│ └── test.txt
```

* Download: The data can be downloaded from [Point-BERT](https://github.com/lulutang0608/Point-BERT/blob/49e2c7407d351ce8fe65764bbddd5d9c0e0a4c52/DATASET.md). We use the same data split as theirs.

### ShapeNetPart Dataset:

```shell
|shapenetcore_partanno_segmentation_benchmark_v0_normal/
├──02691156/
│ ├── 1a04e3eab45ca15dd86060f189eb133.txt
│ ├── .......
│── .......
│──train_test_split/
│──synsetoffset2category.txt
```

* Download: The data can be downloaded from [official website](https://shapenet.cs.stanford.edu/media/shapenetcore_partanno_segmentation_benchmark_v0_normal.zip).

### S3DIS Dataset:

```shell
|Stanford3dDataset_v1.2_Aligned_Version/
├──Area_1/
│ ├── conferenceRoom_1
│ ├── .......
│── .......
│stanford_indoor3d
│──Area_1_conferenceRoom_1.npy
│──Area_1_office_19.npy
```
Please prepare the dataset following [PointNet](https://github.com/yanx27/Pointnet_Pointnet2_pytorch):
download the `Stanford3dDataset_v1.2_Aligned_Version` from [here](http://buildingparser.stanford.edu/dataset.html), and get the processed `stanford_indoor3d` with:

```shell
cd data_utils
python collect_indoor3d_data.py

+ 64
- 0
README.md View File

@@ -4,10 +4,64 @@ Created by [Runpei Dong](https://runpeidong.com), [Zekun Qi](https://github.com/

[OpenReview](https://openreview.net/forum?id=8Oun8ZUVe8N) | [arXiv](https://arxiv.org/abs/2212.08320) | [Models](https://drive.google.com/drive/folders/1hZUmqRvAg64abnkaI1HxctfQPTetWpaH?usp=share_link)

Created by [Runpei Dong](https://runpeidong.com), [Zekun Qi](https://github.com/qizekun), [Linfeng Zhang](https://scholar.google.com.hk/citations?user=AK9VF30AAAAJ&hl=en&oi=ao), [Junbo Zhang](https://scholar.google.com.hk/citations?user=rSP0pGQAAAAJ&hl=en), [Jianjian Sun](https://scholar.google.com.hk/citations?user=MVZrGkYAAAAJ&hl=en&oi=ao), [Zheng Ge](https://scholar.google.com.hk/citations?user=hJ-VrrIAAAAJ&hl=en&oi=ao), [Li Yi](https://ericyi.github.io/), [Kaisheng Ma](http://group.iiis.tsinghua.edu.cn/~maks/leader.html)

The success of deep learning heavily relies on large-scale data with comprehensive labels, which is more expensive and time-consuming to fetch in 3D compared to 2D images or natural languages.
This promotes the potential of utilizing models pretrained with data more than 3D as teachers for cross-modal knowledge transferring.
In this paper, we revisit masked modeling in a unified fashion of knowledge distillation, and we show that foundational Transformers pretrained with 2D images or natural languages can help self-supervised 3D representation learning through training **A**utoencoders as **C**ross-Modal **T**eachers (**ACT**:clapper:).
The pretrained Transformers are transferred as cross-modal 3D teachers using discrete variational autoencoding self-supervision, during which the Transformers are frozen with prompt tuning for better knowledge inheritance.
The latent features encoded by the 3D teachers are used as the target of masked point modeling, wherein the dark knowledge is distilled to the 3D Transformer students as foundational geometry understanding.
Our ACT pretrained 3D learner achieves state-of-the-art generalization capacity across various downstream benchmarks, e.g., 88.21% overall accuracy on ScanObjectNN.

The success of deep learning heavily relies on large-scale data with comprehensive labels, which is more expensive and time-consuming to fetch in 3D compared to 2D images or natural languages.
This promotes the potential of utilizing models pretrained with data more than 3D as teachers for cross-modal knowledge transferring.
In this paper, we revisit masked modeling in a unified fashion of knowledge distillation, and we show that foundational Transformers pretrained with 2D images or natural languages can help self-supervised 3D representation learning through training **A**utoencoders as **C**ross-Modal **T**eachers (**ACT**:clapper:).
The pretrained Transformers are transferred as cross-modal 3D teachers using discrete variational autoencoding self-supervision, during which the Transformers are frozen with prompt tuning for better knowledge inheritance.
The latent features encoded by the 3D teachers are used as the target of masked point modeling, wherein the dark knowledge is distilled to the 3D Transformer students as foundational geometry understanding.
Our ACT pretrained 3D learner achieves state-of-the-art generalization capacity across various downstream benchmarks, e.g., 88.21% overall accuracy on ScanObjectNN.

[OpenReview](https://openreview.net/forum?id=8Oun8ZUVe8N) | [arXiv](https://arxiv.org/abs/2212.08320) | [Models](https://drive.google.com/drive/folders/1hZUmqRvAg64abnkaI1HxctfQPTetWpaH?usp=share_link)

This repository contains the code release of paper **Autoencoders as Cross-Modal Teachers: Can Pretrained 2D Image Transformers Help 3D Representation Learning?** (ICLR 2023).

This repository contains the code release of paper **Autoencoders as Cross-Modal Teachers: Can Pretrained 2D Image Transformers Help 3D Representation Learning?** (ICLR 2023).

## ACT:clapper:

The success of deep learning heavily relies on large-scale data with comprehensive labels, which is more expensive and time-consuming to fetch in 3D compared to 2D images or natural languages.
This promotes the potential of utilizing models pretrained with data more than 3D as teachers for cross-modal knowledge transferring.
In this paper, we revisit masked modeling in a unified fashion of knowledge distillation, and we show that foundational Transformers pretrained with 2D images or natural languages can help self-supervised 3D representation learning through training **A**utoencoders as **C**ross-Modal **T**eachers (**ACT**:clapper:).
The pretrained Transformers are transferred as cross-modal 3D teachers using discrete variational autoencoding self-supervision, during which the Transformers are frozen with prompt tuning for better knowledge inheritance.
The latent features encoded by the 3D teachers are used as the target of masked point modeling, wherein the dark knowledge is distilled to the 3D Transformer students as foundational geometry understanding.
Our ACT pretrained 3D learner achieves state-of-the-art generalization capacity across various downstream benchmarks, e.g., 88.21% overall accuracy on ScanObjectNN.

The success of deep learning heavily relies on large-scale data with comprehensive labels, which is more expensive and time-consuming to fetch in 3D compared to 2D images or natural languages.
This promotes the potential of utilizing models pretrained with data more than 3D as teachers for cross-modal knowledge transferring.
In this paper, we revisit masked modeling in a unified fashion of knowledge distillation, and we show that foundational Transformers pretrained with 2D images or natural languages can help self-supervised 3D representation learning through training **A**utoencoders as **C**ross-Modal **T**eachers (**ACT**:clapper:).
The pretrained Transformers are transferred as cross-modal 3D teachers using discrete variational autoencoding self-supervision, during which the Transformers are frozen with prompt tuning for better knowledge inheritance.
The latent features encoded by the 3D teachers are used as the target of masked point modeling, wherein the dark knowledge is distilled to the 3D Transformer students as foundational geometry understanding.
Our ACT pretrained 3D learner achieves state-of-the-art generalization capacity across various downstream benchmarks, e.g., 88.21% overall accuracy on ScanObjectNN.

The success of deep learning heavily relies on large-scale data with comprehensive labels, which is more expensive and time-consuming to fetch in 3D compared to 2D images or natural languages.
This promotes the potential of utilizing models pretrained with data more than 3D as teachers for cross-modal knowledge transferring.
In this paper, we revisit masked modeling in a unified fashion of knowledge distillation, and we show that foundational Transformers pretrained with 2D images or natural languages can help self-supervised 3D representation learning through training **A**utoencoders as **C**ross-Modal **T**eachers (**ACT**:clapper:).
The pretrained Transformers are transferred as cross-modal 3D teachers using discrete variational autoencoding self-supervision, during which the Transformers are frozen with prompt tuning for better knowledge inheritance.
The latent features encoded by the 3D teachers are used as the target of masked point modeling, wherein the dark knowledge is distilled to the 3D Transformer students as foundational geometry understanding.
Our ACT pretrained 3D learner achieves state-of-the-art generalization capacity across various downstream benchmarks, e.g., 88.21% overall accuracy on ScanObjectNN.

The success of deep learning heavily relies on large-scale data with comprehensive labels, which is more expensive and time-consuming to fetch in 3D compared to 2D images or natural languages.
This promotes the potential of utilizing models pretrained with data more than 3D as teachers for cross-modal knowledge transferring.
In this paper, we revisit masked modeling in a unified fashion of knowledge distillation, and we show that foundational Transformers pretrained with 2D images or natural languages can help self-supervised 3D representation learning through training **A**utoencoders as **C**ross-Modal **T**eachers (**ACT**:clapper:).
The pretrained Transformers are transferred as cross-modal 3D teachers using discrete variational autoencoding self-supervision, during which the Transformers are frozen with prompt tuning for better knowledge inheritance.
The latent features encoded by the 3D teachers are used as the target of masked point modeling, wherein the dark knowledge is distilled to the 3D Transformer students as foundational geometry understanding.
Our ACT pretrained 3D learner achieves state-of-the-art generalization capacity across various downstream benchmarks, e.g., 88.21% overall accuracy on ScanObjectNN.


The success of deep learning heavily relies on large-scale data with comprehensive labels, which is more expensive and time-consuming to fetch in 3D compared to 2D images or natural languages. This promotes the potential of utilizing models pretrained with data more than 3D as teachers for cross-modal knowledge transferring. In this paper, we revisit masked modeling in a unified fashion of knowledge distillation, and we show that foundational Transformers pretrained with 2D images or natural languages can help self-supervised 3D representation learning through training **A**utoencoders as **C**ross-Modal **T**eachers (**ACT**:clapper:). The pretrained Transformers are transferred as cross-modal 3D teachers using discrete variational autoencoding self-supervision, during which the Transformers are frozen with prompt tuning for better knowledge inheritance. The latent features encoded by the 3D teachers are used as the target of masked point modeling, wherein the dark knowledge is distilled to the 3D Transformer students as foundational geometry understanding. Our ACT pretrained 3D learner achieves state-of-the-art generalization capacity across various downstream benchmarks, e.g., 88.21% overall accuracy on ScanObjectNN.

The success of deep learning heavily relies on large-scale data with comprehensive labels, which is more expensive and time-consuming to fetch in 3D compared to 2D images or natural languages. This promotes the potential of utilizing models pretrained with data more than 3D as teachers for cross-modal knowledge transferring. In this paper, we revisit masked modeling in a unified fashion of knowledge distillation, and we show that foundational Transformers pretrained with 2D images or natural languages can help self-supervised 3D representation learning through training **A**utoencoders as **C**ross-Modal **T**eachers (**ACT**:clapper:). The pretrained Transformers are transferred as cross-modal 3D teachers using discrete variational autoencoding self-supervision, during which the Transformers are frozen with prompt tuning for better knowledge inheritance. The latent features encoded by the 3D teachers are used as the target of masked point modeling, wherein the dark knowledge is distilled to the 3D Transformer students as foundational geometry understanding. Our ACT pretrained 3D learner achieves state-of-the-art generalization capacity across various downstream benchmarks, e.g., 88.21% overall accuracy on ScanObjectNN.


The success of deep learning heavily relies on large-scale data with comprehensive labels, which is more expensive and time-consuming to fetch in 3D compared to 2D images or natural languages. This promotes the potential of utilizing models pretrained with data more than 3D as teachers for cross-modal knowledge transferring. In this paper, we revisit masked modeling in a unified fashion of knowledge distillation, and we show that foundational Transformers pretrained with 2D images or natural languages can help self-supervised 3D representation learning through training **A**utoencoders as **C**ross-Modal **T**eachers (**ACT**:clapper:). The pretrained Transformers are transferred as cross-modal 3D teachers using discrete variational autoencoding self-supervision, during which the Transformers are frozen with prompt tuning for better knowledge inheritance. The latent features encoded by the 3D teachers are used as the target of masked point modeling, wherein the dark knowledge is distilled to the 3D Transformer students as foundational geometry understanding. Our ACT pretrained 3D learner achieves state-of-the-art generalization capacity across various downstream benchmarks, e.g., 88.21% overall accuracy on ScanObjectNN.

<div align="center">
@@ -161,3 +215,13 @@ If you find our work useful in your research, please consider citing:
url={https://openreview.net/forum?id=8Oun8ZUVe8N}
}
```


## Environment

This codebase was tested with the following environment configurations. It may work with other versions.
- Ubuntu 18.04
- CUDA 11.3
- GCC 7.5.0
- Python 3.8.8
- PyTorch 1.10.0

+ 3
- 2
cfgs/finetune_classification/full/finetune_scan_hardest.yaml View File

@@ -1,7 +1,8 @@
optimizer :
type: AdamW
kwargs:
lr: 0.0008
lr: 0.0005
# lr: 0.0002
weight_decay: 0.05
# layer_decay: 0.85

@@ -32,7 +33,7 @@ model:
NAME: PointTransformer
embed_dim: 384
depth: 12
drop_path_rate: 0.1
drop_path_rate: 0.2
cls_dim: 15
num_heads: 6
group_size: 32


+ 2
- 2
models/act.py View File

@@ -903,9 +903,9 @@ class PointTransformer(nn.Module):
self.side_projection = nn.Linear(self.embed_dim, self.embed_dim, bias=False)

def build_loss_func(self):
# self.loss_ce = nn.CrossEntropyLoss()
self.loss_ce = nn.CrossEntropyLoss()
#
self.loss_ce = LabelSmoothingCrossEntropy(smoothing=0.3)
# self.loss_ce = LabelSmoothingCrossEntropy(smoothing=0.1)

def get_loss_acc(self, ret, gt):
loss = self.loss_ce(ret, gt.long())


+ 4
- 3
tools/builder.py View File

@@ -100,10 +100,11 @@ def build_opti_sche(base_model, config):
print("Param groups = %s" % json.dumps(parameter_group_names, indent=2))
return list(parameter_group_vars.values())
# config.optimizer.kwargs.layer_decay = 0.85
assigner = LayerDecayValueAssigner(list(0.85 ** (config.model.depth + 1 - i) for i in range(config.model.depth + 2)))
assigner = LayerDecayValueAssigner(list(0.65 ** (config.model.depth + 1 - i) for i in range(config.model.depth + 2)))
opti_config = config.optimizer
if opti_config.type == 'AdamW':
param_groups = add_weight_decay(base_model, weight_decay=opti_config.kwargs.weight_decay, get_num_layer=assigner.get_layer_id, get_layer_scale=assigner.get_scale)
# param_groups = add_weight_decay(base_model, weight_decay=opti_config.kwargs.weight_decay)
optimizer = optim.AdamW(param_groups, **opti_config.kwargs)
elif opti_config.type == 'RAdam':
param_groups = add_weight_decay(base_model, weight_decay=opti_config.kwargs.weight_decay)
@@ -123,8 +124,8 @@ def build_opti_sche(base_model, config):
scheduler = CosineLRScheduler(optimizer,
t_initial=sche_config.kwargs.epochs,
cycle_mul=1.,
lr_min=1e-6,
# lr_min=1e-7,
# lr_min=1e-6,
lr_min=1e-7,
cycle_decay=0.1,
warmup_lr_init=1e-6,
# warmup_lr_init=1e-7,


+ 3
- 3
utils/parser.py View File

@@ -7,7 +7,7 @@ def get_args():
parser.add_argument(
'--config',
type = str,
default='cfgs/finetune_classification/full/finetune_scan_hardest.yaml',
default='cfgs/pretrain/full/pretrain_act_distill.yaml',
help = 'yaml config file')
parser.add_argument(
'--launcher',
@@ -33,7 +33,7 @@ def get_args():
parser.add_argument('--exp_name', type=str, default='default', help='experiment name')
parser.add_argument('--loss', type=str, default='cd1', help='loss name')
parser.add_argument('--start_ckpts', type=str, default=None, help='reload used ckpt path')
parser.add_argument('--ckpts', type=str, default='/tmp/dataset/point-cae-epoch-300/point-cae-epoch-300.pth', help='test used ckpt path')
parser.add_argument('--ckpts', type=str, default=None, help='test used ckpt path')
parser.add_argument('--val_freq', type=int, default=1, help='test freq')
parser.add_argument(
'--vote',
@@ -58,7 +58,7 @@ def get_args():
parser.add_argument(
'--finetune_model',
action='store_true',
default=True,
default=False,
help='finetune modelnet with pretrained weight')
parser.add_argument(
'--scratch_model',


Loading…
Cancel
Save