BAAI-WuDao f8e1c2cf92 | 3 years ago | |
---|---|---|
.. | ||
PLMConfig | 3 years ago | |
config | 3 years ago | |
config_parser | 3 years ago | |
dataset | 3 years ago | |
formatter | 3 years ago | |
model | 3 years ago | |
reader | 3 years ago | |
tools | 3 years ago | |
.gitignore | 3 years ago | |
ReadMe.md | 3 years ago | |
convert_roberta_lfm.py | 3 years ago | |
requirements.txt | 3 years ago | |
test.py | 3 years ago | |
train.py | 3 years ago |
This repository provides the source code and checkpoints of the paper "Lawformer: A Pre-trained Language Model for Chinese Legal Long Documents". You can download the checkpoint of Lawformer from the huggingface model hub or from here. Besides, the checkpoint of our baseline model, Legal RoBERTa, can be downloaded from here.
The new judgement prediction dataset, CAIL-Long, can be downloaded from here.
pip install -r requirements.txt
We have uploaded our model to the huggingface model hub. Make sure you have installed transformers.
>>> from transformers import AutoModel, AutoTokenizer
>>> tokenizer = AutoTokenizer.from_pretrained("hfl/chinese-roberta-wwm-ext")
>>> model = AutoModel.from_pretrained("thunlp/Lawformer")
>>> inputs = tokenizer("任某提起诉讼,请求判令解除婚姻关系并对夫妻共同财产进行分割。", return_tensors="pt")
>>> outputs = model(**inputs)
We pre-train Lawformer continuously from hfl/chinese-roberta-wwm-ext
. Therefore, we first convert the RoBERTa model to the Longformer by running the following command:
python3 convert_roberta_lfm.py
Then run the following command to pre-train the model:
python3 -m torch.distributed.launch --master_port 10086 --nproc_per_node 8 train.py -c config/Lawformer.config -g 0,1,2,3,4,5,6,7
If you use the pre-trained models, please cite this paper:
@article{xiao2021lawformer,
title={Lawformer: A Pre-trained Language Model forChinese Legal Long Documents},
author={Xiao, Chaojun and Hu, Xueyu and Liu, Zhiyuan and Tu, Cunchao and Sun, Maosong},
year={2021}
}
“悟道”项目开源模型
Python Text C++ Shell Cuda other
Dear OpenI User
Thank you for your continuous support to the Openl Qizhi Community AI Collaboration Platform. In order to protect your usage rights and ensure network security, we updated the Openl Qizhi Community AI Collaboration Platform Usage Agreement in January 2024. The updated agreement specifies that users are prohibited from using intranet penetration tools. After you click "Agree and continue", you can continue to use our services. Thank you for your cooperation and understanding.
For more agreement content, please refer to the《Openl Qizhi Community AI Collaboration Platform Usage Agreement》