Are you sure you want to delete this task? Once this task is deleted, it cannot be recovered.
root 51a67ccbf1 | 1 year ago | |
---|---|---|
.. | ||
InitModel | 1 year ago | |
SampleData | 1 year ago | |
README.md | 1 year ago | |
__init__.py | 1 year ago | |
f1.py | 1 year ago | |
match_utils.py | 1 year ago | |
modeling.py | 1 year ago | |
optimization.py | 1 year ago | |
ranking_metrics.py | 1 year ago | |
tokenization.py | 1 year ago | |
train_HiCapsRKL.py | 1 year ago |
This repository includes the source code for the paper "Leveraging Capsule Routing to Associate Knowledge with Medical Literature Hierarchically".
Basically, the program takes the medical literature, the RCor text fragment, the KImp text fragment, and the knowledge as input, and predict a label to indicate the relevance degree between the medical literature and the knowledge.
More details about the underneath model can be found in the submitted paper.
Libraries: ubuntu = 16.04, cuda = 10.2, cudnn = 8, GPU card = NVIDIA Tesla V100 * 1
Dependencies: python > 3.5, tensorflow > 1.10.0, pdb, numpy, tdqm, codecs
HiCapsRKL
├── SampleData
│ ├── train.tsv
│ ├── relevance_prediction_test_data
│ │ ├── test.tsv
│ ├── medical_literature_retrieval_test_data
│ │ ├── test.tsv
├── InitModel
│ ├── modellink.txt
├── init.py
├── match_utils.py
├── modeling.py
├── optimization.py
├── tokenization.py
├── train_HiCapsRKL.py
├── f1.py
├── ranking_metrics.py
└── README.md
The training data (train.tsv), the relevance prediction test data (relevance_prediction_test_data/test.tsv), and the medical literature retrieval test data (medical_literature_retrieval_test_data/test.tsv) are randomly sampled from each whole set
and these data could be used to run the training and testing process for this code.
The directory contains the BERT-Base, Chinese
pre-trained model as the initial checkpoint for training HiCapsRKL. If needed, one can download these paramters from https://storage.googleapis.com/bert_models/2018_11_03/chinese_L-12_H-768_A-12.zip .
python train_HiCapsRKL.py --task_name=medrkg --do_train=true --data_dir=SampleData \
--vocab_file=InitModel/vocab.txt --bert_config_file=InitModel/bert_config.json \
--init_checkpoint=InitModel/bert_model.ckpt --max_seq_length=256 --train_batch_size=8 \
--learning_rate=2e-5 --num_train_epochs=10.0 --output_dir=output_dir/
* python train_HiCapsRKL.py --task_name=medrkg --do_predict=true \
--data_dir=SampleData/relevance_prediction_test_data --vocab_file=InitModel/vocab.txt \
--bert_config_file=InitModel/bert_config.json --init_checkpoint=output_dir/\*\*\*.ckpt \
--output_dir=output_dir/
* python f1.py output_dir
* python train_HiCapsRKL.py --task_name=medrkg --do_predict=true \
--data_dir=SampleData/medical_literature_retrieval_test_data \
--vocab_file=InitModel/vocab.txt --bert_config_file=InitModel/bert_config.json \
--init_checkpoint=output_dir/\*\*\*.ckpt --output_dir=output_dir/
* python ranking_metrics.py output_dir
If you use any of the resources listed here, please cite:
@inproceedings{liu-etal-2021-leveraging,
title = "Leveraging Capsule Routing to Associate Knowledge with Medical Literature Hierarchically",
author = "Liu, Xin and
Chen, Qingcai and
Chen, Junying and
Zhou, Wenxiu and
Liu, Tingyu and
Yang, Xinlan and
Peng, Weihua",
booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing",
month = nov,
year = "2021",
address = "Online and Punta Cana, Dominican Republic",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2021.emnlp-main.285",
pages = "3518--3532",
abstract = "Integrating knowledge into text is a promising way to enrich text representation, especially in the medical field. However, undifferentiated knowledge not only confuses the text representation but also imports unexpected noises. In this paper, to alleviate this problem, we propose leveraging capsule routing to associate knowledge with medical literature hierarchically (called HiCapsRKL). Firstly, HiCapsRKL extracts two empirically designed text fragments from medical literature and encodes them into fragment representations respectively. Secondly, the capsule routing algorithm is applied to two fragment representations. Through the capsule computing and dynamic routing, each representation is processed into a new representation (denoted as caps-representation), and we integrate the caps-representations as information gain to associate knowledge with medical literature hierarchically. Finally, HiCapsRKL are validated on relevance prediction and medical literature retrieval test sets. The experimental results and analyses show that HiCapsRKLcan more accurately associate knowledge with medical literature than mainstream methods. In summary, HiCapsRKL can efficiently help selecting the most relevant knowledge to the medical literature, which may be an alternative attempt to improve knowledge-based text representation. Source code is released on GitHub.",
}
医学自然语言处理算法库,包括命名实体 、语义关系抽取、时间序列预测、预训练模型等。
Text Pickle Python HTML SVG other
Dear OpenI User
Thank you for your continuous support to the Openl Qizhi Community AI Collaboration Platform. In order to protect your usage rights and ensure network security, we updated the Openl Qizhi Community AI Collaboration Platform Usage Agreement in January 2024. The updated agreement specifies that users are prohibited from using intranet penetration tools. After you click "Agree and continue", you can continue to use our services. Thank you for your cooperation and understanding.
For more agreement content, please refer to the《Openl Qizhi Community AI Collaboration Platform Usage Agreement》