CPM-Live

History

zh-zheng 2678a8f1a6 update notebook		1 year ago
..
config	refactor	1 year ago

cpm_live	update generation	1 year ago

examples	update readme	1 year ago

scripts	add data sample	1 year ago

training_tasks	Revert "fix bug about prompt tokens"	1 year ago

.flake8	add evaluation code	1 year ago

README.MD	update readme	1 year ago

cpm-ant.png	add banner	1 year ago

data_sample.txt	add data sample	1 year ago

notebook.ipynb	update notebook	1 year ago

preprocess_dataset.py	add data preprocessing code	1 year ago

pretrain_cpm_ant.py	add data preprocessing code	1 year ago

pyproject.toml	add lint	1 year ago

requirements.txt	Update requirements.txt	1 year ago

setup.py	add setup and notebook	1 year ago

tasks.json	update readme	1 year ago

text_generation.py	rename	1 year ago

Website • Blog • Models

Overview

CPM-Ant is an open-source Chinese pre-trained language model (PLM) with 10B parameters. It is also the first milestone of the live training process of CPM-Live. The training process is cost-effective and environment-friendly. CPM-Ant also achieves promising results with delta tuning on the CUGE benchmark. Besides the full model, we also provide various compressed versions to meet the requirements of different hardware configurations. The code, log files, and checkpoints of CPM-Ant are available under an open license. More specifically, CPM-Ant is:

Efficient: BMTrain enables us to take full advantage of distributed computing power to efficiently train big models. The training of CPM-Ant lasts 68 days and costs 430K RMB, which is much lower than the cost of existing model training practices. The greenhouse gas (GHG) emissions of training CPM-Ant are about 4872kg CO₂e, while the emissions of training T5-11B are 46.7t CO₂e.
Effective: OpenDelta enables us to adapt CPM-Ant to downstream tasks through delta tuning. In our experiments, by only tuning 6.3 million parameters, CPM-Ant has achieved the best performance on the 3/6 tasks in the CUGE benchmark, outperforming those baselines (CPM2 with 11B parameters and Yuan 1.0 with 245B parameters) that tune all parameters.
Economical: BMCook & BMInf enable us to drive CPM-Ant with limited computing resources. Based on BMInf, we can efficiently perform big model inference using a single GPU (even a consumer-level GPU like GTX 1060) instead of computing clusters. To make the deployment of CPM-Ant more economical, we use BMCook to further compress the original 10B CPM-Ant into multiple versions. These compressed checkpoints (7B, 3B, 1B, 300M) can meet the requirements of various low-resource scenarios.
Easy-to-Use: For both the original 10B model and its compressed versions, they can be loaded and run with only a few lines of code. We will integrate CPM-Ant into ModelCenter soon, making further development on our model easier.
Egalitarian: The training process of CPM-Ant is completely open. We have released all code, log files, and final checkpoints. All these files are publicly available. CPM-Ant also adopts an open license that allows commercial use.

Installation

First, you need to clone the cpm-ant branch of this repository.

$ git clone -b cpm-ant --single-branch https://github.com/OpenBMB/CPM-Live.git

Then, please make sure that your environment meets the following requirements:

python >= 3.7
torch >= 1.10

We recommend using Anaconda to manage the environment and installing additional dependencies from PyPI:

$ cd CPM-Live/cpm-live
$ pip install -r requirements.txt

Model Checkpoints

We release all checkpoints of CPM-Ant, including 10B model and its compressed versions.

Model	# Attn. Layers	# FFN Layers	Hidden Dim	Download
CPM-Ant-10B	48	48	4096	link
CPM-Ant-7B	37	32	4096	link
CPM-Ant-3B	37	32	2560	link
CPM-Ant-1B	25	21	2048	link
CPM-Ant-300M	25	21	512	link

Model Compression

With the help of BMCook, we use task-agnostic structured pruning on attention layers and feedforward layers to compress CPM-Ant. We list the configuration of each model in the above table, i.e., the number of remaining attention and feedforward layers, and the dimension of hidden states. If you are interested in how we compress CPM-Ant, please check our guidelines!

Here is a performance summary of all models:

Model	CPM-Ant-10B	CPM-Ant-7B	CPM-Ant-3B	CPM-Ant-1B	CPM-Ant-300M
Loss	2.420	2.510	2.603	2.759	2.998

Usage

Delta Tuning with CPM-Ant

If you want to adapt CPM-Ant to your own tasks, we recommend using parameter-efficient tuning (a.k.a., delta tuning). With the help of OpenDelta, we can conduct delta tuning without modifying the code of the original model.

1. Install OpenDelta

We install OpenDelta from source. Note that we use the with_bmtrain branch, which enables us to conduct distributed delta tuning on multiple computing nodes.

$ git clone -b with_bmtrain --single-branch https://github.com/thunlp/OpenDelta.git
$ cd OpenDelta
$ python setup.py install

2. Load the model

We need to download a checkpoint of CPM-Ant and load it.

from cpm_live.models import CPMAnt, CPMAntConfig
import bmtrain as bmt

bmt.init_distributed(seed=0)
config = CPMAntConfig.from_json_file("YOUR_PATH/cpm-ant-10b.json")
ckpt_path = "YOUR_PATH/cpm-ant-10b.pt"
# You can load the compressed models in the same way! 
# config = CPMAntConfig.from_json_file("YOUR_PATH/cpm-ant-3b.json")
# ckpt_path = "YOUR_PATH/cpm-ant-3b.pt"

model = CPMAnt(config=config)
bmt.load(model, ckpt_path)

3. Insert a delta model

Using Opendelta, we can insert a delta model (e.g. LoRA) into CPM-Ant with three lines of code:

from opendelta import LoraModel

delta_model = LoraModel(backbone_model=model, modified_modules=["project_q", "project_v"])
delta_model.freeze_module(exclude=["deltas"], set_state_dict=True)
delta_model.log()

4. Feed your data

We provide a sample of the data used for pre-training.
If you want to know how we convert the data to binary files, run the following command:

$ bash scripts/preprocess_dataset.sh

If you want to use CPM-Ant on your own tasks, we provide several examples of adapting CPM-Ant to the tasks on CUGE benchmark, including summarization, dialogue, classification, re-ranking. Please check the examples folder.

Text Generation

You can use CPM-Ant directly for text generation. Currently, we implement two decoding strategies: beam search and top-k/top-p sampling. Here is an example:

$ python text_generation.py

If you want to experience our big models but don't have enough GPU memory, we recommend using BMInf, which can help you use our models for inference on most consumer-level GPUs. Let's try it!

Install BMInf:

$ pip install bminf

Assuming that you have a GPU with 8G memory, you can run the text generation script with the following command:

$ python text_generation.py --use-bminf --memory-limit 4

Note that memory-limit should be less than total GPU memory, as there are some intermediate computation results needed to be stored.

No Description

Text Python Markdown other

584956921@qq.com

44703133+zh-zheng@users.noreply.github.com thu.hanxu13@gmail.com 46095864+jayzzhou-thu@users.noreply.github.com wenjx_thU@163.com gongbaitao11@gmail.com 45178523+gongbaitao@users.noreply.github.com

How to access data resources in code