Are you sure you want to delete this task? Once this task is deleted, it cannot be recovered.
liwangshengya c0530ded5a | 11 months ago | |
---|---|---|
data | 11 months ago | |
recq | 1 year ago | |
.gitignore | 1 year ago | |
BPR.py | 1 year ago | |
CPR.py | 1 year ago | |
DICE.py | 1 year ago | |
MultVAE.py | 1 year ago | |
README.md | 1 year ago | |
UBPR.py | 1 year ago | |
data_preprocess.py | 1 year ago | |
setup.py | 1 year ago |
为方便代码使用添加了两个参数
parser.add_argument("--loss_type",type=int,default=1,help="1:CPR,2:logcpr 3:mlpcpr")
parser.add_argument("--eval_type",type=int,default=1,help="1:CPR,2:logcpr,3:mlpcpr")
loss_type 是控制损失函数的计算方式,1是原始CPR,2是$\beta=sim_{u,i}*log_{2}(num_item/(1+n_{i}))$,3是$\beta=mlp(sim)$,4是用WMF的预训练向量相乘得到评分,直接输入到mlp层里去求beta.
eval_type的作用同上。
这里代码写的比较乱,三种情况的代码都经过测试可以跑通,有逻辑错误请指出。
添加了tiktok和mind2数据集,由于mind2的数据集处理有些问题,想要跑起来需要手动修改Data类中计算n_user和n_item的代码(直接给定n_user和n_item即可)。
# self.all_embeds_0 = tf.get_variable(
# "all_embeds_0",
# shape=[self.dataset.n_user + self.dataset.n_item, args.embed_size],
# initializer=self.initializer,
# )
#使用WMF的预训练向量初始化,根据需要选用
self.all_embeds_0 = tf.get_variable(
"all_embeds_0",
shape=[self.dataset.n_user + self.dataset.n_item, args.embed_size],
initializer=tf.constant_initializer((np.concatenate([self.user_pretrain, self.item_pretrain], axis=0)),
))
TensorFlow implementation of our paper "Cross Pairwise Ranking for Unbiased Item Recommendation" (WWW'22).
If you want to use our codes and datasets in your research, please cite:
@inproceedings{cpr22,
title={Cross Pairwise Ranking for Unbiased Item Recommendation},
author={Wan, Qi and He, Xiangnan and Wang, Xiang and Wu, Jiancan and Guo, Wei and Tang, Ruiming},
booktitle={Proceedings of the ACM Web Conference 2022},
pages={2370--2378},
year={2022}
}
And make sure GCC has been installed in your environment.
The preprocessed data have already been placed in the data/
folder. See data_preprocess.py if you want to know how they were generated from original data.
The samplers and evaluators in our codes are mainly implemented as extension modules by Cython & Cpp, which is much faster than Python implementation. Run the following command to compile all the extension modules.
python setup.py build_ext --inplace
You can safely ignore the warnings in the output of this command.
Here we list some commands that reproduce the results presented in our paper.
The following commands can reproduce the results of CPR on 4 backbones (MF, LightGCN, NeuMF, NGCF) and 2 datasets (MovieLens, Netflix).
(MF is equivalent to 0-layer LightGCN.)
python CPR.py --dataset movielens_10m --lr 0.0001 --reg 0.001 --ks 20 --batch_size 2048 --n_layer 0 --embed_type lightgcn --sample_rate 3 --sample_ratio 3 --eval_types valid test --eval_epoch 4 --early_stop 10
python CPR.py --dataset ml_10m_10he --lr 0.0001 --reg 0.001 --ks 20 --batch_size 2048 --n_layer 0 --embed_type lightgcn --sample_rate 3 --sample_ratio 3 --eval_types valid test --eval_epoch 4 --early_stop 10 --loss_type 1 --eval_type 1
python CPR.py --dataset netflix --lr 0.0001 --reg 0.001 --ks 20 --batch_size 2048 --n_layer 0 --embed_type lightgcn --sample_rate 3 --sample_ratio 2 --eval_types valid test --eval_epoch 4 --early_stop 10
python CPR.py --dataset movielens_10m --lr 0.001 --reg 0.01 --ks 20 --batch_size 2048 --n_layer 3 --embed_type lightgcn --sample_rate 8 --sample_ratio 7 --eval_types valid test --eval_epoch 1 --early_stop 10
python CPR.py --dataset netflix --lr 0.001 --reg 0.0001 --ks 20 --batch_size 2048 --n_layer 3 --embed_type lightgcn --sample_rate 7 --sample_ratio 4 --eval_types valid test --eval_epoch 1 --early_stop 10
python CPR.py --dataset movielens_10m --lr 0.001 --reg 0.0001 --weight_reg 0.01 --weight_sizes 256 --ks 20 --batch_size 2048 --eval_batch_size 16 --n_layer 0 --embed_type lightgcn --inference_type mlp --sample_rate 1 --sample_ratio 4 --eval_types valid test --eval_epoch 10 --early_stop 5
python CPR.py --dataset netflix --lr 0.001 --reg 0.0001 --weight_reg 0.01 --weight_sizes 256 --ks 20 --batch_size 2048 --eval_batch_size 16 --n_layer 0 --embed_type lightgcn --inference_type mlp --sample_rate 1 --sample_ratio 6 --eval_types valid test --eval_epoch 10 --early_stop 5
python CPR.py --dataset movielens_10m --lr 0.001 --reg 0.01 --weight_reg 0 --ks 20 --batch_size 2048 --n_layer 3 --embed_type ngcf --sample_rate 1.5 --sample_ratio 5 --eval_types valid test --eval_epoch 1 --early_stop 10
python CPR.py --dataset netflix --lr 0.0001 --reg 0.001 --weight_reg 0.01 --ks 20 --batch_size 2048 --n_layer 3 --embed_type ngcf --sample_rate 3 --sample_ratio 4 --eval_types valid test --eval_epoch 4 --early_stop 10
We also implement some of the baselines. The following commands can reproduce their results on MF.
python BPR.py --dataset movielens_10m --lr 0.0001 --reg 0 --ks 20 --batch_size 2048 --n_layer 0 --embed_type lightgcn --eval_types valid test --eval_epoch 4 --early_stop 10
python BPR.py --dataset netflix --lr 0.0001 --reg 0.00001 --ks 20 --batch_size 2048 --n_layer 0 --embed_type lightgcn --eval_types valid test --eval_epoch 4 --early_stop 10
python UBPR.py --dataset movielens_10m --lr 0.0001 --reg 0.001 --ks 20 --batch_size 2048 --n_layer 0 --embed_type lightgcn --ps_pow 0.8 --clip 0 --eval_types valid test --eval_epoch 4 --early_stop 10
python UBPR.py --dataset netflix --lr 0.0001 --reg 0.0001 --ks 20 --batch_size 2048 --n_layer 0 --embed_type lightgcn --ps_pow 0.7 --clip 0 --eval_types valid test --eval_epoch 4 --early_stop 10
python DICE.py --dataset movielens_10m --lr 0.0001 --reg 0.01 --ks 20 --batch_size 2048 --n_layer 0 --embed_type lightgcn --int_weight 9 --pop_weight 9 --dis_pen 0.0001 --margin 10 --margin_decay 0.9 --loss_decay 0.9 --eval_types valid test --eval_epoch 4 --early_stop 10
python DICE.py --dataset netflix --lr 0.0001 --reg 0.01 --ks 20 --batch_size 2048 --n_layer 0 --embed_type lightgcn --int_weight 9 --pop_weight 9 --dis_pen 0.0001 --margin 40 --margin_decay 0.9 --loss_decay 0.9 --eval_types valid test --eval_epoch 4 --early_stop 10
Here is an example of the output log. It is the output of command:
python CPR.py --dataset movielens_10m --lr 0.0001 --reg 0.001 --ks 20 --batch_size 2048 --n_layer 0 --embed_type lightgcn --sample_rate 3 --sample_ratio 3 --eval_types valid test --eval_epoch 4 --early_stop 10
...
Epoch 1 : 3.51854 s | loss = 0.69320 = 0.69319 + 0.00000
Epoch 2 : 2.98906 s | loss = 0.69284 = 0.69283 + 0.00000
Epoch 3 : 2.98795 s | loss = 0.69080 = 0.69079 + 0.00001
Epoch 4 : 3.20985 s | loss = 0.68585 = 0.68583 + 0.00002
============================================================================================================================================
[ valid set ]
---- Item ----
Recall @20 : 0.16620
Precision @20 : 0.02216
NDCG @20 : 0.08712
Rec @20 : 20.00000
ARP @20 :1530.23474
[ test set ]
---- Item ----
Recall @20 : 0.17162
Precision @20 : 0.03469
NDCG @20 : 0.10235
ARP @20 :1588.61609
Evaluation : 1.39915 s
============================================================================================================================================
...
Early stopping triggered.
Best epoch: 76.
[ test set ]
---- Item ----
Recall @20 : 0.20007
Precision @20 : 0.04061
NDCG @20 : 0.12209
ARP @20 :1071.91736
============================================================================================================================================
No Description
CSV Jupyter Notebook other
Dear OpenI User
Thank you for your continuous support to the Openl Qizhi Community AI Collaboration Platform. In order to protect your usage rights and ensure network security, we updated the Openl Qizhi Community AI Collaboration Platform Usage Agreement in January 2024. The updated agreement specifies that users are prohibited from using intranet penetration tools. After you click "Agree and continue", you can continue to use our services. Thank you for your cooperation and understanding.
For more agreement content, please refer to the《Openl Qizhi Community AI Collaboration Platform Usage Agreement》