Are you sure you want to delete this task? Once this task is deleted, it cannot be recovered.
Maguire93 8682bcc36e | 1 year ago | |
---|---|---|
.idea | 1 year ago | |
demo | 1 year ago | |
src | 1 year ago | |
LICENSE | 1 year ago | |
README.md | 1 year ago | |
requirements.txt | 1 year ago |
The repository currently includes the following models.
Models in published papers
Model | Full name | Paper |
---|---|---|
NRMS | Neural News Recommendation with Multi-Head Self-Attention | https://www.aclweb.org/anthology/D19-1671/ |
Basic setup.
git clone https://github.com/yusanshi/NewsRecommendation
cd NewsRecommendation
pip3 install -r requirements.txt
Download and preprocess the data.
mkdir data && cd data
# Download GloVe pre-trained word embedding
wget https://nlp.stanford.edu/data/glove.840B.300d.zip
sudo apt install unzip
unzip glove.840B.300d.zip -d glove
rm glove.840B.300d.zip
# Download MIND dataset
# By downloading the dataset, you agree to the [Microsoft Research License Terms](https://go.microsoft.com/fwlink/?LinkID=206977). For more detail about the dataset, see https://msnews.github.io/.
# Uncomment the following lines to use the MIND Large dataset (Note MIND Large test set doesn't have labels, see #11)
# wget https://mind201910small.blob.core.windows.net/release/MINDlarge_train.zip https://mind201910small.blob.core.windows.net/release/MINDlarge_dev.zip https://mind201910small.blob.core.windows.net/release/MINDlarge_test.zip
# unzip MINDlarge_train.zip -d train
# unzip MINDlarge_dev.zip -d val
# unzip MINDlarge_test.zip -d test
# rm MINDlarge_*.zip
# Uncomment the following lines to use the MIND Small dataset (Note MIND Small doesn't have a test set, so we just copy the validation set as test set :)
wget https://mind201910small.blob.core.windows.net/release/MINDsmall_train.zip https://mind201910small.blob.core.windows.net/release/MINDsmall_dev.zip
unzip MINDsmall_train.zip -d train
unzip MINDsmall_dev.zip -d val
cp -r val test # MIND Small has no test set :)
rm MINDsmall_*.zip
# Preprocess data into appropriate format
cd ..
python3 src/data_preprocess.py
# Remember you shoud modify `num_*` in `src/config.py` by the output of `src/data_preprocess.py`
Modify src/config.py
to select target model. The configuration file is organized into general part (which is applied to all models) and model-specific part (that some models not have).
vim src/config.py
Run.
# Train and save checkpoint into `checkpoint/{model_name}/` directory
python3 src/train.py
# Load latest checkpoint and evaluate on the test set
python3 src/evaluate.py
You can visualize metrics with TensorBoard.
tensorboard --logdir=runs
# or
tensorboard --logdir=runs/{model_name}
# for a specific model
Tip: by adding
REMARK
environment variable, you can make the runs name in TensorBoard more meaningful. For example,REMARK=num-filters-300-window-size-5 python3 src/train.py
.
Model | AUC | MRR | nDCG@5 | nDCG@10 | Remark |
---|---|---|---|---|---|
baseline | 0.6253 | 0.2823 | 0.3051 | 0.3731 | |
+SGD | 0.5188 | 0.2148 | 0.2250 | 0.2905 | |
+AdamW | 0.6298 | 0.2841 | 0.3091 | 0.3765 |
Model | AUC | MRR | nDCG@5 | nDCG@10 | Remark |
---|---|---|---|---|---|
baseline | 0.6253 | 0.2823 | 0.3051 | 0.3731 | |
+BN | 0.5252 | 0.2476 | 0.2565 | 0.3181 | |
+GN | 0.6323 | 0.2884 | 0.3122 | 0.3795 | |
+IN | 0.6321 | 0.2847 | 0.3101 | 0.3785 | |
+LN | 0.6404 | 0.2905 | 0.3172 | 0.3835 |
Model | AUC | MRR | nDCG@5 | nDCG@10 | Remark |
---|---|---|---|---|---|
baseline | 0.6253 | 0.2823 | 0.3051 | 0.3731 | |
+LN +AdamW + Cosine decay | 0.6421 | 0.2960 | 0.3239 | 0.3890 |
cd ..
python3 src/web.py
@misc{yusanshi2020news-recommendation,
title={news-recommendation},
author={yusanshi},
publisher = {GitHub},
journal = {GitHub repository},
howpublished={\url{https://github.com/yusanshi/news-recommendation}},
year={2020}
}
No Description
Python JavaScript CSS SCSS HTML other
Dear OpenI User
Thank you for your continuous support to the Openl Qizhi Community AI Collaboration Platform. In order to protect your usage rights and ensure network security, we updated the Openl Qizhi Community AI Collaboration Platform Usage Agreement in January 2024. The updated agreement specifies that users are prohibited from using intranet penetration tools. After you click "Agree and continue", you can continue to use our services. Thank you for your cooperation and understanding.
For more agreement content, please refer to the《Openl Qizhi Community AI Collaboration Platform Usage Agreement》