CPM-Live

History

zh-zheng e0cee47bed update readme		1 year ago
..
config	update code	1 year ago

cpm_live	update code and add more examples	1 year ago

examples	update code and add more examples	1 year ago

scripts	update code	1 year ago

training_tasks	update code	1 year ago

.flake8	add evaluation code	1 year ago

README.MD	update readme	1 year ago

cpm-ant-plus.png	update banner	1 year ago

data_sample.txt	update code and add more examples	1 year ago

preprocess_dataset.py	update code and add more examples	1 year ago

pretrain_cpm_ant.py	update code and add more examples	1 year ago

pretrain_cpm_ant_plus.py	update code and add more examples	1 year ago

pyproject.toml	add lint	1 year ago

question_answering.py	update code	1 year ago

requirements.txt	Update requirements.txt	1 year ago

summarization.py	update code	1 year ago

tasks.json	update code	1 year ago

text_generation.py	update code and add more examples	1 year ago

translation.py	update code and add more examples	1 year ago

Overview

CPM-Ant+ is an open-source bilingual pre-trained language model (PLM) with 10B parameters, which is the second milestone of the live training process of CPM-Live. CPM-Ant+ is an enhanced version of CPM-Ant. For more details on CPM-Ant, please check here. The code, log files, and checkpoints of CPM-Ant+ are available under an open license.

New Features

Compared to CPM-Ant, CPM-Ant+ has several new features:

Support English. You can use CPM-Ant+ for English text generation now!
Support more NLP tasks. Benefiting from data augmentation techniques used in the pre-training stage, in addition to text generation, CPM-Ant+ can be used out-of-the-box on more NLP tasks, including question answering, translation, and extractive summarization.

Installation

First, you need to clone the cpm-ant-plus branch of this repository.

$ git clone -b cpm-ant-plus --single-branch https://github.com/OpenBMB/CPM-Live.git

Then, please make sure that your environment meets the following requirements:

python >= 3.7
torch >= 1.10

We recommend using Anaconda to manage the environment and installing additional dependencies from PyPI:

$ cd CPM-Live/cpm-live
$ pip install -r requirements.txt

Model

We release the checkpoint of CPM-Ant+ (10B), and you can download it from here.
If you want to compress CPM-Ant+ into smaller models, please check our guidelines in BMCook!

Usage

Delta Tuning with CPM-Ant+

If you want to adapt CPM-Ant+ to your own tasks, we recommend using parameter-efficient tuning (a.k.a., delta tuning). With the help of OpenDelta, we can conduct delta tuning without modifying the code of the original model.

1. Install OpenDelta

We install OpenDelta from source. Note that we use the with_bmtrain branch, which enables us to conduct distributed delta tuning on multiple computing nodes.

$ git clone -b with_bmtrain --single-branch https://github.com/thunlp/OpenDelta.git
$ cd OpenDelta
$ python setup.py install

2. Load the model

We need to download a checkpoint of CPM-Ant+ and load it.

from cpm_live.models import CPMAntPlus, CPMAntConfig
import bmtrain as bmt

bmt.init_distributed(seed=0)
config = CPMAntConfig.from_json_file("YOUR_PATH/cpm-ant-plus-10b.json")
ckpt_path = "YOUR_PATH/cpm-ant-plus-10b.pt"
model = CPMAntPlus(config=config)
bmt.load(model, ckpt_path)

3. Insert a delta model

Using Opendelta, we can insert a delta model (e.g. LoRA) into CPM-Ant+ with three lines of code:

from opendelta import LoraModel

delta_model = LoraModel(backbone_model=model, modified_modules=["project_q", "project_v"])
delta_model.freeze_module(exclude=["deltas"], set_state_dict=True)
delta_model.log()

4. Feed your data

We provide a sample of the data used for pre-training.
If you want to know how we convert the data to binary files, run the following command:

$ bash scripts/preprocess_dataset.sh

If you want to use CPM-Ant+ on your own tasks, we provide several examples of adapting CPM-Ant+ to the tasks on CUGE benchmark, including summarization, dialogue, classification, re-ranking. Please check the examples folder.

Applications

You can use CPM-Ant+ directly for various NLP tasks.

1. Text Generation

You can use CPM-Ant+ for text generation, either in Chinese or English. Currently, we implement two decoding strategies: beam search and top-k/top-p sampling. Here is an example:

$ python text_generation.py

2. Question Answering

CPM-Ant+ can answer your questions based on the provided document, try it!

$ python question_answering.py

3. Extractive Summarization

With the help of CPM-Ant+, you can extract key sentences from a document.

$ python summarization.py

4. Machine Translation

You can also use CPM-Ant+ for Chinese-English and English-Chinese translation!

$ python summarization.py

Low-Resource Inference

If you want to experience our big models but don't have enough GPU memory, we recommend using BMInf, which can help you use our models for inference on most consumer-level GPUs. Let's try it!

Install BMInf:

$ pip install bminf

Assuming that you have a GPU with 8G memory, you can run the text generation script with the following command:

$ python text_generation.py --use-bminf --memory-limit 4

Note that memory-limit should be less than total GPU memory, as there are some intermediate computation results needed to be stored.

No Description

Text Python Markdown other

584956921@qq.com

44703133+zh-zheng@users.noreply.github.com thu.hanxu13@gmail.com 46095864+jayzzhou-thu@users.noreply.github.com wenjx_thU@163.com gongbaitao11@gmail.com 45178523+gongbaitao@users.noreply.github.com

How to access data resources in code