关于GCU、沐曦GPGPU、MLU、0卡V100资源4月7日恢复上架的公告>>> 关于共建具身智能开源数据集的倡议>>> 关于云脑任务中统一路径访问方式的公告>>> 关于将启智集群GPU资源迁移至智算集群的公告>>>

12 KiB

Raw Permalink Blame History

MS MARCO Document Ranking

MS MARCO Document Ranking

MS MARCO (Microsoft Machine Reading Comprehension) is a large scale dataset, the current dataset has 1,010,916 unique real queries that were generated by sampling and anonymizing Bing usage logs. The corpus of document ranking task has 3.2 million documents and the training set has 367,013 queries. More details are available at MSMARCO Document Ranking.

Results

Results of the runs we submitted.

Retriever	Reranker	Coor-Ascent	dev	eval
ANCE FirstP	-	-	0.373	0.334
ANCE MaxP	-	-	0.383	0.342
ANCE FirstP+BM25	BERT Base FirstP	-	0.407	-
ANCE FirstP+BM25	BERT Base FirstP	+	0.431	0.380
ANCE MaxP	BERT Base MaxP	-	0.409	-
ANCE MaxP	BERT Base MaxP	+	0.432	0.391

Datasets & Checkpoints

For BERT FirstP, we concatenate the title and content of each document by a '[SEP]'. For BERT MaxP, we only use the content of each document. To reproduce our runs, we need to preprocess the official document file to the format: doc_id \t doc.

Type	File	Records	Format	Description
Corpus	msmarco-docs.tsv	3,213,835	tsv: docid, url, title, body	Document Collections
Train	msmarco-doctrain-queries.tsv	367,013	tsv: qid, query	Training Queries
Train	msmarco-doctrain-qrels.tsv	384,597	TREC qrels	Training Query-Doc Relevance Labels
Train	Training-Data-FirstP	7,340,240	tsv: qid, docid, label	ANCE FirstP training data
Train	Training-Data-MaxP	7,340,240	tsv: qid, docid, label	ANCE MaxP training data
Dev	msmarco-docdev-queries.tsv	5,193	tsv: qid, query	Dev Queries
Dev	msmarco-docdev-qrels.tsv	5,478	TREC qrels	Dev Query-Doc Relevance Labels
Dev	ANCE-FirstP-dev-top100	519,300	TREC submission	ANCE FirstP dev top100
Dev	ANCE-MaxP-dev-top100	519,300	TREC submission	ANCE MaxP dev top100
Test	docleaderboard-queries.tsv	5,793	tsv: qid, query	Test Queries
Test	ANCE-FirstP-eval-top100	579,300	TREC submission	ANCE FirstP eval top100
Test	ANCE-MaxP-eval-top100	579,300	TREC submission	ANCE MaxP eval top100
Model	BERT-Base-ANCE-FirstP	-	-	BERT Base ANCE FirstP checkpoint
Model	BERT-Base-ANCE-MaxP	-	-	BERT Base ANCE MaxP checkpoint
Model	F-MaxP	-	-	BERT Base ANCE MaxP Coor-Ascent weights

Inference

BERT FirstP

We provide the ANCE FirstP top-100 documents of dev and docleaderboard queries in aliyun in standard TREC format. You can click to download these data.

Preprocess dev and eval dataset, msmarco-docs-firstp.tsv is the preprocessed document file, each line is doc_id \t title [SEP] content:

python data/preprocess.py -input_trec data/ANCE_FirstP_dev.trec -input_qrels data/msmarco-docdev-qrels.tsv -input_queries data/msmarco-docdev-queries.tsv -input_docs data/msmarco-docs-firstp.tsv -output data/msmarco-doc_dev_firstp.jsonl
python data/preprocess.py -input_trec data/ANCE_FirstP_eval.trec -input_queries data/docleaderboard-queries.tsv -input_docs data/msmarco-docs-firstp.tsv -output data/msmarco-doc_eval_firstp.jsonl

The checkpoint of BERT Base FirstP is available at BERT-Base-ANCE-FirstP. Now you can reproduce ANCE FirstP + BERT Base FirstP, MRR@100(dev): 0.4079.

CUDA_VISIBLE_DEVICES=0 \
python inference.py \
        -task classification \
        -model bert \
        -max_input 12800000 \
        -test ./data/msmarco-doc_dev_firstp.jsonl \
        -vocab bert-base-uncased \
        -pretrain bert-base-uncased \
        -checkpoint ./checkpoints/bert-base_ance_firstp.bin \
        -res ./results/bert-base_ance_dev_firstp.trec \
        -max_query_len 64 \
        -max_doc_len 445 \
        -batch_size 256

BERT MaxP

ANCE MaxP top-100 documents of dev and docleaderboard queries are also provided.

Preprocess dev dataset, msmarco-docs-maxp.tsv is the preprocessed document file, each line is doc_id \t content:

python data/preprocess.py -input_trec data/ANCE_FirstP_dev.trec -input_qrels data/msmarco-docdev-qrels.tsv -input_queries data/msmarco-docdev-queries.tsv -input_docs data/msmarco-docs-firstp.tsv -output data/msmarco-doc_dev_maxp.jsonl
python data/preprocess.py -input_trec data/ANCE_FirstP_eval.trec -input_queries data/docleaderboard-queries.tsv -input_docs data/msmarco-docs-firstp.tsv -output data/msmarco-doc_eval_maxp.jsonl

The checkpoint of BERT Base MaxP is available at BERT-Base-ANCE-MaxP. Now you can reproduce ANCE MaxP + BERT Base MaxP, MRR@100(dev): 0.4094.

CUDA_VISIBLE_DEVICES=0 \
python inference.py \
        -task classification \
        -model bert \
        -max_input 12800000 \
        -test ./data/msmarco-doc_dev_maxp.jsonl \
        -vocab bert-base-uncased \
        -pretrain bert-base-uncased \
        -checkpoint ./checkpoints/bert-base_ance_maxp.bin \
        -res ./results/bert-base_ance_dev_maxp.trec \
        -max_query_len 64 \
        -max_doc_len 445 \
        -maxp \
        -batch_size 64

We also provide the weights of BERT Base MaxP features learned by Coor-Ascent: F-MaxP. First, generate the BERT Base MaxP features of eval dataset.

CUDA_VISIBLE_DEVICES=0 \
python gen_feature.py \
        -task classification \
        -model bert \
        -max_input 12800000 \
        -dev ./data/msmarco-doc_eval_maxp.jsonl \
        -vocab bert-base-uncased \
        -pretrain bert-base-uncased \
        -checkpoint ./checkpoints/bert-base_ance_maxp.bin \
        -res ./features/bert-base_ance_eval_maxp_features \
        -max_query_len 64 \
        -max_doc_len 445 \
        -maxp \
        -batch_size 64

Then, we compute the ranking score using the weights.

java -jar LeToR/RankLib-2.1-patched.jar -load checkpoints/f_maxp.ca -rank features/bert-base_ance_eval_maxp_features -score f0.score
python LeToR/gen_trec.py -dev data/msmarco-doc_eval_maxp.jsonl -res results/bert-base_ance_eval_maxp_ca.trec -k -1

Training

You can also finetune BERT yourself instead of using our checkpoints.

BERT FirstP

We provide our training data (qid did label): Training-Data-FirstP. 10 negative documents are randomly sampled for each training query from ANCE FirstP top-100 documents. Since the dev dataset is too large to evaluate every 10000 steps, we only evaluate the top-100 documents of the first 50 dev queries: msmarco-doc_dev_firstp-50.jsonl.

CUDA_VISIBLE_DEVICES=0 \
python train.py \
        -task classification \
        -model bert \
        -train queries=./data/msmarco-doctrain-queries.tsv,docs=./data/msmarco-docs-firstp.tsv,qrels=./data/msmarco-doctrain-qrels.tsv,trec=./data/bids_marco-doc_ance-firstp-10.tsv \
        -max_input 12800000 \
        -save ./checkpoints/bert-base-firstp.bin \
        -dev ./data/msmarco-doc_dev_firstp-50.jsonl \
        -qrels ./data/msmarco-docdev-qrels.tsv \
        -vocab bert-base-uncased \
        -pretrain bert-base-uncased \
        -res ./results/bert.trec \
        -metric mrr_cut_100 \
        -max_query_len 64 \
        -max_doc_len 445 \
        -epoch 1 \
        -batch_size 4 \
        -lr 3e-6 \
        -n_warmup_steps 100000 \
        -eval_every 10000

After BERT finetuning, we choose the best checkpoint on dev dataset to generate BERT features.

CUDA_VISIBLE_DEVICES=0 \
python gen_feature.py \
        -task classification \
        -model bert \
        -max_input 12800000 \
        -dev ./data/msmarco-doc_dev_firstp.jsonl \
        -vocab bert-base-uncased \
        -pretrain bert-base-uncased \
        -checkpoint ./checkpoints/bert-base-firstp.bin \
        -res ./features/bert-base_ance_dev_firstp_features \
        -max_query_len 64 \
        -max_doc_len 445 \
        -batch_size 256

Then, we run Coor-Ascent on these features using RankLib to learned the weight of each feature.

java -jar LeToR/RankLib-2.1-patched.jar -train features/bert-base_ance_dev_firstp_features -ranker 4 -metric2t RR@100 -save checkpoints/f_firstp.ca

Finally, we can generate the features of eval dataset, and compute the ranking scores using the feature weights, which is the same as that in the inference section.

BERT MaxP

We provde our training data (qid did label): Training-Data-MaxP. 10 negative documents are randomly sampled for each training query from ANCE MaxP top-100 documents. Since the dev dataset is too large to evaluate every 10000 steps, we only evaluate the top-100 documents of the first 50 dev queries: msmarco-doc_dev_maxp-50.jsonl.

Train.

CUDA_VISIBLE_DEVICES=0,1,2,3 \
python train.py \
        -task classification \
        -model bert \
        -train queries=./data/msmarco-doctrain-queries.tsv,docs=./data/msmarco-docs-maxp.tsv,qrels=./data/msmarco-doctrain-qrels.tsv,trec=./data/bids_marco-doc_ance-maxp-10.tsv \
        -max_input 12800000 \
        -save ./checkpoints/bert-base-maxp.bin \
        -dev ./data/msmarco-doc_dev_maxp-50.jsonl \
        -qrels ./data/msmarco-docdev-qrels.tsv \
        -vocab bert-base-uncased \
        -pretrain bert-base-uncased \
        -res ./results/bert.trec \
        -metric mrr_cut_100 \
        -max_query_len 64 \
        -max_doc_len 445 \
        -maxp \
        -epoch 1 \
        -batch_size 8 \
        -lr 2e-5 \
        -n_warmup_steps 50000 \
        -eval_every 10000

After BERT finetuning, we choose the best checkpoint on dev dataset to generate BERT features.

CUDA_VISIBLE_DEVICES=0 \
python gen_feature.py \
        -task classification \
        -model bert \
        -max_input 12800000 \
        -dev ./data/msmarco-doc_dev_maxp.jsonl \
        -vocab bert-base-uncased \
        -pretrain bert-base-uncased \
        -checkpoint ./checkpoints/bert-base-maxp.bin \
        -res ./features/bert-base_ance_dev_maxp_features \
        -max_query_len 64 \
        -max_doc_len 445 \
        -maxp \
        -batch_size 64

Then, we run Coor-Ascent on these features using RankLib to learned the weight of each feature.

java -jar LeToR/RankLib-2.1-patched.jar -train features/bert-base_ance_dev_maxp_features -ranker 4 -metric2t RR@100 -save checkpoints/f_maxp.ca

Finally, we can generate the features of eval dataset, and compute the ranking scores using the feature weights, which is the same as that in the inference section.

12 KiB Raw Permalink Blame History

MS MARCO Document Ranking

Results

Datasets & Checkpoints

Inference

BERT FirstP

BERT MaxP

Training

BERT FirstP

BERT MaxP

12 KiB

Raw Permalink Blame History