11 KiB

Raw Permalink Blame History

KGNN for Ascend

KGNN Description
Model Architecture
Dataset
Features
- Mixed Precision
Environment Requirements
Script Description
Model Description
- Performance
  - Training Performance
  - Inference Performance
Description of Random Situation
ModelZoo Homepage

KGNN Description

KGNN can effectively capture drug and its potential neighborhoods by mining their associated relations in KG. To extract both high-order structures and semantic relations of the KG, KGNN learn from the neighborhoods for each entity in KG as their local receptive, and then integrate neighborhood information with bias from representation of the current entity. This way, the receptive field can be naturally extended to multiple hops away to model highorder topological information and to obtain drugs potential long-distance correlations. This idea was proposed in the paper "KGNN: Knowledge Graph Neural Network for Drug-Drug Interaction Prediction.", published in 2020.

Paper Xuan Lin, Zhe Quan, Zhi-Jie Wang, Tengfei Ma, Xiangxiang Zeng College of Information Science and Engineering, Hunan University, College of Computer Science, Chongqing University, Published in IJCAI-2020.

Model architecture

The network structure can be decomposed into three parts: DDI Extraction and KG Construction, KGNN Layer and Drug-drug Interaction Prediction.It takes the parsed DDI matrix and knowledge graph obtained from preprocessing (i.e., DDI extraction and KG construction) of dataset as the input. It outputs the interaction value for the drug-drug pair. Remind that the central idea of KGNN is to consider both high-order structure and semantic relation, by using graph neural network to encode the drug and its topological neighborhood information to a distributed representation.

Dataset

Dataset used KEGG-drug

Dataset: KEGG-drug:
- Drugs: 1925
- Interactions: 56983
- Entities: 129910
- Relation Types: 167
- KG Triples: 362870

In this project, the file organization is recommended as below:

.
└─raw_data
  ├─kegg
    ├─approved_example.txt
    ├─entity2id.txt
    ├─relation2id.txt
    └─train2id.txt

Features

Environment Requirements

Hardware（Ascend）
- Prepare hardware environment with Ascend processor.
Framework
- MindSpore
For more information, please check the resources below：
- MindSpore Tutorials
- MindSpore Python API

Script description

Script and sample code

.
└─kgnn
  ├─README.md                             # descriptions about kgcn
  ├─scripts
    ├─run_eval_ascend.sh                  # launch standalone training with GPU platform(1p)
    ├─run_eval_gpu.sh                     # launch evaluating with GPU platform
    ├─run_infer_310.sh                    # launch evaluating with ascend platform
    ├─run_standalone_train_ascend.sh      # launch standalone training with ascend platform(1p)
    └─run_train_gpu.sh                    # launch standalone training with GPU platform
  ├─src
    ├─model_utils
      ├─config.py                         # parameter configuration
      ├─device_adapter.py                 # device adapter
      ├─local_adapter.py                  # local adapter
      └─moxing_adapter.py                 # moxing adapter
    ├─aggregator.py                       # aggregator function
    ├─data_loader.py                      # data loader function
    ├─dataset.py                          # dataset function
    ├─kgcn.py                             # kgcn model
    ├─loss.py                             # loss
    └─utils.py                            # General components
  ├─default_config.yaml                   # parameter configuration
  ├─eval.py                               # evaluation script
  ├─export.py                             # export scripts
  ├─postprocess.py                        # postprocess script
  ├─preprocess.py                         # preprocess scripts
  └─train.py                              # training script

Training process

Usage

Ascend:

# standalone training
bash run_standalone_train_ascend.sh [RAW_DATA_DIR] [DATASET] [DEVICE_ID]
# example: bash run_standalone_train_ascend.sh ./raw_data/ kegg 0

Ascend on Openi:

1. Upload dataset.
2. Create a training task and set the following options: 
	Computing Resources: Ascend NPU
	Startup file: train.py
	Dataset: kegg.zip
	Running parameters: enable_modelarts = True
3. Download the model at the "Results Download" after completing the training.

GPU:

# standalone training
bash run_train_gpu.sh [RAW_DATA_DIR] [DATASET] [DEVICE_ID]
# example: bash run_train_gpu.sh ./raw_data/ kegg 0

Launch

# training example
    Ascend:
      # standalone training
      bash run_standalone_train_ascend.sh [RAW_DATA_DIR] [DATASET] [DEVICE_ID]
      # example: bash run_standalone_train_ascend.sh ./raw_data/ kegg 0
    GPU:
      # standalone training
      bash run_train_gpu.sh [RAW_DATA_DIR] [DATASET] [DEVICE_ID]
      # example: bash run_train_gpu.sh ./raw_data/ kegg 0

Result

Training result will be stored in the example path. Checkpoints will be stored at ckpt_url by default, and training log will be redirected to ./log

(1p)
...
Train epoch time: 21830.078 ms, per step time: 496.138 ms
epoch: 19 step: 1, loss is 0.09186707437038422
epoch: 19 step: 2, loss is 0.09054221957921982
epoch: 19 step: 3, loss is 0.0800129845738411
epoch: 19 step: 4, loss is 0.07796357572078705
epoch: 19 step: 5, loss is 0.07740001380443573
epoch: 19 step: 6, loss is 0.08665880560874939
epoch: 19 step: 7, loss is 0.09129385650157928
epoch: 19 step: 8, loss is 0.08729872107505798
epoch: 19 step: 9, loss is 0.08645150810480118
epoch: 19 step: 10, loss is 0.09781090915203094
epoch: 19 step: 11, loss is 0.10565177351236343
epoch: 19 step: 12, loss is 0.08685533702373505
epoch: 19 step: 13, loss is 0.08123263716697693
epoch: 19 step: 14, loss is 0.08262551575899124
epoch: 19 step: 15, loss is 0.08548043668270111
...

Eval process

Usage

You can start training using python or shell scripts. The usage of shell scripts as follows:

Ascend:

# eval example
  Ascend:
    bash run_eval_ascend.sh [CKPT_PATH] [DEVICE_ID]
    # example: bash run_eval_ascend.sh "./KGNN2.0/ckpt/ckpt_0/KGNN-34_44.ckpt" 0

Ascend on Openi:

1. Upload ckpt file.
2. Create a inference task and set the following options: 
	Computing Resources: Ascend NPU
	Select model: KGNN-12_44.ckpt
	Startup file: eval.py
	Dataset: kegg.zip
	Running parameters: enable_modelarts = True
3. Find the inference results in log.

Launch

Modify the parameters in eval.py and run:

# eval example
  GPU:
    bash run_eval_gpu.sh [CKPT_PATH] [DEVICE_ID]
    # example: bash run_eval_gpu.sh "./KGNN2.0/ckpt/ckpt_0/KGNN-34_44.ckpt" 0

checkpoint can be produced in training process.

Result

Evaluation result will be stored in the output file of evaluation script, you can find result like the followings in log.

Logging Info - test_auc: 0.9422428483164229, test_acc: 0.8876509232954546, test_f1: 0.8932487731478626, test_aupr: 0.9188073906877906

Inference Process

Export MindIR

python export.py --checkpoint_path=[checkpoint_path]

The checkpoint_path parameter is required,

Infer on Ascend310

# Ascend310 inference
bash run_infer_310.sh [MINDIR_PATH] [DATASET_NAME] [DATASET_PATH] [NEED_PREPROCESS] [DEVICE_ID]

DATASET_NAME must be in ['kegg', 'drugbank'].
NEED_PREPROCESS means weather need preprocess or not, it's value is 'y' or 'n'.
DEVICE_ID is optional, default value is 0.

Result

Inference result is saved in current path, you can find result like this in acc.log file.

Model description

Performance

Training Performance

Parameters	Ascend	GPU
Model Version	KGNN	KGNN
Resource	Ascend910	GPU3090
uploaded Date	10/28/2022	10/28/2022
MindSpore Version	1.8.0	1.8.0
Dataset	KEGG-drug	KEGG-drug
Batch_size	2048	2048
Training Parameters	epoch: 20, batch_size: 2048, lr=0.0005	epoch: 20, batch_size: 2048, lr=0.0005
Optimizer	Adam	Adam
Loss Function	binary cross-entropy	binary cross-entropy
Scripts	kgnn scripts	kgnn scripts

Inference Performance

Parameters	Ascend	GPU
Model Version	KGNN	KGNN
Resource	Ascend910	GPU3090
uploaded Date	10/28/2022	10/28/2022
MindSpore Version	1.8.0	1.8.0
Dataset	KEGG-drug	KEGG-drug
Batch_size	2048	2048
Accuracy	88.32%	88.90%
Model for inference	127M (.ckpt file)	127M (.ckpt file)

Description of Random Situation

There are two random situations:

Seed is set in data_loader.py. It is for neighbor random sampling.
Seed is set in data_loader.py. It is used to partition dataset.

ModelZoo Homepage

Please check the official homepage.

11 KiB Raw Permalink Blame History

KGNN for Ascend

Usage

Launch

Result

Usage

Launch

Result

Inference Process

Infer on Ascend310

Result

Training Performance

Inference Performance

11 KiB

Raw Permalink Blame History