Are you sure you want to delete this task? Once this task is deleted, it cannot be recovered.
BAAI-WuDao 4443e399c6 | 2 years ago | |
---|---|---|
.. | ||
.pytorch_pretrained_bert | 2 years ago | |
config | 2 years ago | |
config_tasks | 2 years ago | |
data_utils | 2 years ago | |
docker | 2 years ago | |
fp16 | 2 years ago | |
model | 2 years ago | |
mpu | 2 years ago | |
scripts | 2 years ago | |
tasks | 2 years ago | |
test | 2 years ago | |
.gitignore | 2 years ago | |
LICENSE | 2 years ago | |
README.md | 2 years ago | |
arguments.py | 2 years ago | |
blocklm_utils.py | 2 years ago | |
change_mp.py | 2 years ago | |
configure_data.py | 2 years ago | |
finetune_glm.py | 2 years ago | |
generate_samples.py | 2 years ago | |
generation_utils.py | 2 years ago | |
learning_rates.py | 2 years ago | |
pretrain_glm.py | 2 years ago | |
process_grid.py | 2 years ago | |
requirements.txt | 2 years ago | |
run_test.py | 2 years ago | |
train_utils.py | 2 years ago | |
utils.py | 2 years ago |
GLM is a General Language Model pretrained with an autoregressive blank-filling objective and can be finetuned on
various natural language understanding and generation tasks.
Please refer to our paper for a detailed description of GLM:
All NLP Tasks Are Generation Tasks: A General Pretraining Framework
Zhengxiao Du*, Yujie Qian*, Xiao Liu, Ming Ding, Jiezhong Qiu, Zhilin Yang, Jie Tang (*: equal contribution)
Part of the code is based on Megatron-LM and PET.
You can download the pretrained models used in the paper here.
Name | Params | File | Config |
---|---|---|---|
GLM-Base | 110M | glm-base-blank.tar.bz2 | model_blocklm_base.sh |
GLM-Large | 335M | glm-large-blank.tar.bz2 | model_blocklm_large.sh |
GLM-Large (multi-task) | 335M | glm-large-generation.tar.bz2 | model_blocklm_large_generation.sh |
GLM-410M (multi-task) | 410M | glm-1.25-generation.tar.bz2 | model_blocklm_1.25_generation.sh |
GLM-515M (multi-task) | 515M | glm-1.5-generation.tar.bz2 | model_blocklm_1.5_generation.sh |
GLM-RoBERTa | 335M | glm-roberta-large-blank.tar.bz2 | model_blocklm_roberta_large.sh |
GLM-XXLarge | 10B | apply here | model_blocklm_10B.sh |
dev set, single model, single-task finetuning
Model | COPA | WSC | RTE | WiC | CB | MultiRC | BoolQ | ReCoRD |
---|---|---|---|---|---|---|---|---|
GLM-XXLarge | 98.0 | 95.2 | 93.1 | 75.7 | 98.7/98.2 | 88.1/63.3 | 88.7 | 94.4/94.0 |
RoBERTa-Large | 94.0 | 91.3 | 86.6 | 75.6 | 98.2/- | 85.7/- | 86.9 | 89.5/89.0 |
DeBERTa-XXLarge-v2 | 97.0 | - | 93.5 | - | - | 87.8/63.6 | 88.3 | 94.1/93.7 |
CNN/Daily Mail (test set, no additional data used)
Model | ROUGE-1 | ROUGE-2 | ROUGE-L |
---|---|---|---|
GLM-XXLarge | 44.7 | 21.4 | 41.4 |
T5-11B | 43.5 | 21.6 | 40.7 |
PEGASUS-Large | 44.2 | 21.5 | 41.4 |
BART-Large | 44.2 | 21.3 | 40.9 |
XSum (test set, no additional data used)
Model | ROUGE-1 | ROUGE-2 | ROUGE-L |
---|---|---|---|
GLM-XXLarge | 48.9 | 25.7 | 40.4 |
PEGASUS-Large | 47.2 | 24.6 | 39.3 |
BART-Large | 45.1 | 22.3 | 37.3 |
test set, zero-shot
Model | LAMBADA (accuracy) | Wikitext103 (perplexity) |
---|---|---|
GLM-XXLarge (bi) | 72.35 | 11.33 |
GLM-XXLarge (uni) | 67.18 | 12.22 |
GPT-2 | 52.66 | 17.48 |
Megatron-LM (8.3B) | 66.51 | 10.81 |
Turing-NLG | 67.98 | 10.21 |
We prepare two docker images based on CUDA 10.2 and CUDA 11.2. You can pull the pre-built images from Docker Hub and run with docker v19.03+
docker run --gpus all --rm -it --ipc=host zxdu20/glm-cuda102
or replace glm-cuda102
with glm-cuda112
.
You can also modify the image according to your requirements in docker/cuda102.dockerfile and build the image yourself
docker build -f cuda102.dockerfile . -t glm-cuda102
Please first install PyTorch (we use 1.7.0) and apex, and then install other dependencies by pip install -r requirements.txt
git clone https://github.com/THUDM/GLM
cd GLM
We provide scripts for finetuning GLM on some downstream tasks.
Download the SuperGlue data and check the experiment setup in
scripts/ds_finetune_superglue.sh. Note that DATA_ROOT, CHECKPOINT_PATH, SAVE_PATH
need to be changed to your local path. You may also change the batch-size
and nproc_per_node
according to your
available hardware.
Run the following script (use the COPA dataset as an example)
bash scripts/ds_finetune_superglue.sh \
config_tasks/model_blocklm_10B.sh \
config_tasks/task_copa.sh
bash scripts/ds_finetune_superglue_prompt.sh \
config_tasks/model_blocklm_10B.sh \
config_tasks/task_copa.sh
DataProcessor
inPVP
inDownload the Gigaword, CNN/Daily Mail or XSum dataset and check the experiment setup in
scripts/ds_finetune_seq2seq.sh. Change DATA_ROOT, CHECKPOINT_PATH, SAVE_PATH
to your
local path.
Run the following script (use the CNN/Daily Mail dataset as an example)
bash scripts/ds_finetune_seq2seq.sh \
config_tasks/model_blocklm_10B.sh \
config_tasks/seq_cnndm_org.sh
The summaries are written into ./runs/experiment_name/test.jsonl.hyps
. The references are written into test.jsonl.refs
in the same directory. For calculating rouge, install file2rouge and download Stanford CoreNLP from here. Run the following script
bash scripts/evaluate_seq2seq.sh \
./runs/experiment_name/test.jsonl.hyps ./runs/experiment_name/test.jsonl.refs
DATA_ROOT, CHECKPOINT_PATH
in scripts/evaluate_lm.shbash scripts/evaluate_lm.sh \
config_tasks/model_blocklm_large_generation.sh \
config_tasks/zero_lambada.sh
DATA_ROOT, CHECKPOINT_PATH
bash scripts/evaluate_lm.sh \
config_tasks/model_blocklm_large_generation.sh \
config_tasks/zero_wikitext.sh
Download the Yahoo dataset and check the experiment setup in
scripts/finetune_blank.sh. Change DATA_ROOT, CHECKPOINT_PATH, SAVE_PATH
to your
local path.
Run the following script
bash scripts/finetune_blank.sh \
config_tasks/model_blocklm_large.sh \
config_tasks/seq_blank.sh
CHECKPOINT_PATH
to your local path. Run the following scriptbash scripts/generate_block.sh \
config_tasks/model_blocklm_large.sh
Example:
Context: Ng is an adjunct professor at [MASK] (formerly associate professor and Director of its Stanford AI Lab or SAIL ). Also a pioneer in online education, Ng co-founded Coursera and deeplearning.ai.
GLM: [CLS] ng is an adjunct professor at [MASK] ( formerly associate professor and director of its stanford ai lab or sail ) . also a pioneer in online education , ng co - founded coursera and deeplearning . ai . [PAD] <|startofpiece|> the stanford university
Run the following script to pre-train the GLM-Large model
bash scripts/ds_pretrain_nvidia.sh config/ds_block_large.sh
The script scripts/ds_pretrain_nvidia.sh launches the training program with DeepSpeed. You should change NUM_WORKERS
and NUM_GPUS_PER_WORKER
to the number of workers and the number of gpus per worker. Also change HOST_FILE_PATH
to the path to an OpenMPI-style hostfile. More details about DeepSpeed launcher can be found here.
The file config/ds_block_large.sh defines the hyperparameters for pretraining. Most of the arguments are fairly self-explanatory. Specifically, --train-data
can be multiple keywords defined in NAMED_CORPORA
in data_utils/corpora.py. The hyperparameters of the optimizer are defined in the corresponding json file under config
. The semantics of the json file can be found here.
Please cite our paper if you find this code useful for your research:
@article{DBLP:journals/corr/abs-2103-10360,
author = {Zhengxiao Du and
Yujie Qian and
Xiao Liu and
Ming Ding and
Jiezhong Qiu and
Zhilin Yang and
Jie Tang},
title = {All {NLP} Tasks Are Generation Tasks: {A} General Pretraining Framework},
journal = {CoRR},
volume = {abs/2103.10360},
year = {2021},
url = {https://arxiv.org/abs/2103.10360}
}
“悟道”项目开源模型
Python Text C++ Shell Cuda other
Dear OpenI User
Thank you for your continuous support to the Openl Qizhi Community AI Collaboration Platform. In order to protect your usage rights and ensure network security, we updated the Openl Qizhi Community AI Collaboration Platform Usage Agreement in January 2024. The updated agreement specifies that users are prohibited from using intranet penetration tools. After you click "Agree and continue", you can continue to use our services. Thank you for your cooperation and understanding.
For more agreement content, please refer to the《Openl Qizhi Community AI Collaboration Platform Usage Agreement》