蔡梓博 7da707aeed 提交文件		1 month ago
.idea	提交文件	1 month ago

__pycache__	提交文件	1 month ago

dataPrepare	提交文件	1 month ago

final	提交文件	1 month ago

modules	提交文件	1 month ago

Graph neural topic model with commonsense knowledge.pdf	提交文件	1 month ago

README.md	提交文件	1 month ago

en_core_web_sm-2.3.0.tar.gz	提交文件	1 month ago

environment.yml	提交文件	1 month ago

get_overall_results.py	提交文件	1 month ago

graph_dataloader_rgcn.py	提交文件	1 month ago

least_angle.py	提交文件	1 month ago

logger.py	提交文件	1 month ago

main.py	提交文件	1 month ago

main_modify.py	提交文件	1 month ago

overall_results_reuters.sh	提交文件	1 month ago

pretrain_rgcn_reuters.py	提交文件	1 month ago

readme.txt	提交文件	1 month ago

requirements.txt	提交文件	1 month ago

rgcn.py	提交文件	1 month ago

run_reuters_mr0.1_path50_num50_1hop.sh	提交文件	1 month ago

run_reuters_mr0.01_path50_num50_1hop.sh	提交文件	1 month ago

run_reuters_mr0.1_path50_num50_2hop.sh	提交文件	1 month ago

run_reuters_mr0.01_path50_num50_2hop.sh	提交文件	1 month ago

run_reuters_mr0.1_path50_num100_1hop.sh	提交文件	1 month ago

run_reuters_mr0.01_path50_num100_1hop.sh	提交文件	1 month ago

run_reuters_mr0.1_path50_num100_2hop.sh	提交文件	1 month ago

run_reuters_mr0.01_path50_num100_2hop.sh	提交文件	1 month ago

run_reuters_mr0.01_path100_num50_1hop.sh	提交文件	1 month ago

run_reuters_mr0.1_path100_num50_1hop.sh	提交文件	1 month ago

run_reuters_mr0.1_path100_num50_2hop.sh	提交文件	1 month ago

run_reuters_mr0.01_path100_num50_2hop.sh	提交文件	1 month ago

run_reuters_mr0.1_path100_num100_1hop.sh	提交文件	1 month ago

run_reuters_mr0.01_path100_num100_1hop.sh	提交文件	1 month ago

run_reuters_mr0.1_path100_num100_2hop.sh	提交文件	1 month ago

run_reuters_mr0.01_path100_num100_2hop.sh	提交文件	1 month ago

run_reuters_rgcn_pretrain_1hop.sh	提交文件	1 month ago

run_reuters_rgcn_pretrain_2hop.sh	提交文件	1 month ago

settings.py	提交文件	1 month ago

ssh-import-id_5.7.orig.tar.gz	提交文件	1 month ago

test.log	提交文件	1 month ago

test_run.sh	提交文件	1 month ago

ubuntu-cpu-pip.sh	提交文件	1 month ago

ubuntu-cpu-pip.sh.1	提交文件	1 month ago

utils.py	提交文件	1 month ago

gntm-ck mindspore

This is the mindspore official repository for Graph neural topic model with commonsense knowledge

Introduction

https://www.sciencedirect.com/science/article/pii/S0306457322003168

Traditional topic models are based on the bag-of-words assumption, which states that the topic assignment of each word is independent of the others. However, this assumption ignores the relationship between words, which may hinder the quality of extracted topics. To address this issue, some recent works formulate documents as graphs based on word co-occurrence patterns. It assumes that if two words co-occur frequently, they should have the same topic. Nevertheless, it introduces noise edges into the model and thus hinders topic quality since two words co-occur frequently do not mean that they are on the same topic. In this paper, we use the commonsense relationship between words as a bridge to connect the words in each document. Compared to word co-occurrence, the commonsense relationship can explicitly imply the semantic relevance between words, which can be utilized to filter out noise edges. We use a relational graph neural network to capture the relation information in the graph. Moreover, manifold regularization is utilized to constrain the documents’ topic distributions. Experimental results on a public dataset show that our method is effective at extracting topics compared to baseline methods.

代码运行步骤

======================================= How to run =============================================

[1] Setup the environment:
conda env create -f environment.yml
(optional) pip install requirements.txt

[2] Set the ROOTPATH in file 'settings.py' to be the absolute path of this code

[3] All the data files are saved in the folders 'data/reuters/' and 'data/reuters_2hop/'.

The file 'preprocess.py' in the folder 'dataPrepare/' is used to remove the stop words 
and build the vocabulary in the raw reuters dataset.

The triples extracted from ConceptNet for each doc are saved in those files with the names 
prefixed 'all_doc_triples_'. The word pairs that has some commonsense relationship in each
doc are saved in those files with the names prefixed 'all_doc_pairs_'. 

To see how to represent the documents into graphs in the code, see the file 'graph_data_concept.py' 
in the folder 'dataPrepare/'.

[4] Pretrain the rgcn using the script 'run_reuters_rgcn_pretrain_xhop.sh' (x=1 or x=2)
bash run_reuters_rgcn_pretrain_1hop.sh
This is used to obtain the initial node embeddings in the rgcn.

[5] Train the GCNTM-CK model with the scripts 'run_reuters_mr{}_path{}num{}{}hop.sh' acoording to
different settings. For instance, the script run_reuters_mr0.1_path50_num50_1hop.sh will train
a model with the following parameter settings:

    - H (hop number) -> 1
    - P (maximum number of pairs) -> 50
    - R (maximum number of nearest neighbors) -> 50
    - \lambda (manifold coefficient) -> 0.1

```
bash run_reuters_mr0.1_path50_num50_1hop.sh
```
We train 5 times for each different topic settings and report the average results.

[6] After finishing training all the model variants, use the script 'overall_results_reuters.sh'
to obtain the final evaluation results, which will contain three topic coherence scores
(c_v, c_npmi, c_uci) and one topic diversity score (td). The results will be saved in an xlsx
file. The results for the main setting (H=1, P=100, R=100, \lambda=0.1) in the paper can be
found in the folder 'final/'.

The example log for some specifc topic setting of our model can be found in the folder 'models/' .

========================================= End =====================================================

华南理工大学蔡毅老师团队

mindspore-papers

Python Shell other

1150093064@qq.com

How to access data resources in code

README.md

gntm-ck mindspore

Introduction

代码运行步骤

Contributors (2) All

Contributors (2)
All