Are you sure you want to delete this task? Once this task is deleted, it cannot be recovered.
zh-zheng 2678a8f1a6 | 1 year ago | |
---|---|---|
.. | ||
config | 1 year ago | |
cpm_live | 1 year ago | |
examples | 1 year ago | |
scripts | 1 year ago | |
training_tasks | 1 year ago | |
.flake8 | 1 year ago | |
README.MD | 1 year ago | |
cpm-ant.png | 1 year ago | |
data_sample.txt | 1 year ago | |
notebook.ipynb | 1 year ago | |
preprocess_dataset.py | 1 year ago | |
pretrain_cpm_ant.py | 1 year ago | |
pyproject.toml | 1 year ago | |
requirements.txt | 1 year ago | |
setup.py | 1 year ago | |
tasks.json | 1 year ago | |
text_generation.py | 1 year ago |
CPM-Ant is an open-source Chinese pre-trained language model (PLM) with 10B parameters. It is also the first milestone of the live training process of CPM-Live. The training process is cost-effective and environment-friendly. CPM-Ant also achieves promising results with delta tuning on the CUGE benchmark. Besides the full model, we also provide various compressed versions to meet the requirements of different hardware configurations. The code, log files, and checkpoints of CPM-Ant are available under an open license. More specifically, CPM-Ant is:
Efficient: BMTrain enables us to take full advantage of distributed computing power to efficiently train big models. The training of CPM-Ant lasts 68 days and costs 430K RMB, which is much lower than the cost of existing model training practices. The greenhouse gas (GHG) emissions of training CPM-Ant are about 4872kg CO2e, while the emissions of training T5-11B are 46.7t CO2e.
Effective: OpenDelta enables us to adapt CPM-Ant to downstream tasks through delta tuning. In our experiments, by only tuning 6.3 million parameters, CPM-Ant has achieved the best performance on the 3/6 tasks in the CUGE benchmark, outperforming those baselines (CPM2 with 11B parameters and Yuan 1.0 with 245B parameters) that tune all parameters.
Economical: BMCook & BMInf enable us to drive CPM-Ant with limited computing resources. Based on BMInf, we can efficiently perform big model inference using a single GPU (even a consumer-level GPU like GTX 1060) instead of computing clusters. To make the deployment of CPM-Ant more economical, we use BMCook to further compress the original 10B CPM-Ant into multiple versions. These compressed checkpoints (7B, 3B, 1B, 300M) can meet the requirements of various low-resource scenarios.
Easy-to-Use: For both the original 10B model and its compressed versions, they can be loaded and run with only a few lines of code. We will integrate CPM-Ant into ModelCenter soon, making further development on our model easier.
Egalitarian: The training process of CPM-Ant is completely open. We have released all code, log files, and final checkpoints. All these files are publicly available. CPM-Ant also adopts an open license that allows commercial use.
First, you need to clone the cpm-ant
branch of this repository.
$ git clone -b cpm-ant --single-branch https://github.com/OpenBMB/CPM-Live.git
Then, please make sure that your environment meets the following requirements:
We recommend using Anaconda to manage the environment and installing additional dependencies from PyPI:
$ cd CPM-Live/cpm-live
$ pip install -r requirements.txt
We release all checkpoints of CPM-Ant, including 10B model and its compressed versions.
Model | # Attn. Layers | # FFN Layers | Hidden Dim | Download |
---|---|---|---|---|
CPM-Ant-10B | 48 | 48 | 4096 | link |
CPM-Ant-7B | 37 | 32 | 4096 | link |
CPM-Ant-3B | 37 | 32 | 2560 | link |
CPM-Ant-1B | 25 | 21 | 2048 | link |
CPM-Ant-300M | 25 | 21 | 512 | link |
With the help of BMCook, we use task-agnostic structured pruning on attention layers and feedforward layers to compress CPM-Ant. We list the configuration of each model in the above table, i.e., the number of remaining attention and feedforward layers, and the dimension of hidden states. If you are interested in how we compress CPM-Ant, please check our guidelines!
Here is a performance summary of all models:
Model | CPM-Ant-10B | CPM-Ant-7B | CPM-Ant-3B | CPM-Ant-1B | CPM-Ant-300M |
---|---|---|---|---|---|
Loss | 2.420 | 2.510 | 2.603 | 2.759 | 2.998 |
If you want to adapt CPM-Ant to your own tasks, we recommend using parameter-efficient tuning (a.k.a., delta tuning). With the help of OpenDelta, we can conduct delta tuning without modifying the code of the original model.
We install OpenDelta from source. Note that we use the with_bmtrain
branch, which enables us to conduct distributed delta tuning on multiple computing nodes.
$ git clone -b with_bmtrain --single-branch https://github.com/thunlp/OpenDelta.git
$ cd OpenDelta
$ python setup.py install
We need to download a checkpoint of CPM-Ant and load it.
from cpm_live.models import CPMAnt, CPMAntConfig
import bmtrain as bmt
bmt.init_distributed(seed=0)
config = CPMAntConfig.from_json_file("YOUR_PATH/cpm-ant-10b.json")
ckpt_path = "YOUR_PATH/cpm-ant-10b.pt"
# You can load the compressed models in the same way!
# config = CPMAntConfig.from_json_file("YOUR_PATH/cpm-ant-3b.json")
# ckpt_path = "YOUR_PATH/cpm-ant-3b.pt"
model = CPMAnt(config=config)
bmt.load(model, ckpt_path)
Using Opendelta, we can insert a delta model (e.g. LoRA) into CPM-Ant with three lines of code:
from opendelta import LoraModel
delta_model = LoraModel(backbone_model=model, modified_modules=["project_q", "project_v"])
delta_model.freeze_module(exclude=["deltas"], set_state_dict=True)
delta_model.log()
We provide a sample of the data used for pre-training.
If you want to know how we convert the data to binary files, run the following command:
$ bash scripts/preprocess_dataset.sh
If you want to use CPM-Ant on your own tasks, we provide several examples of adapting CPM-Ant to the tasks on CUGE benchmark, including summarization, dialogue, classification, re-ranking. Please check the examples folder.
You can use CPM-Ant directly for text generation. Currently, we implement two decoding strategies: beam search and top-k/top-p sampling. Here is an example:
$ python text_generation.py
If you want to experience our big models but don't have enough GPU memory, we recommend using BMInf, which can help you use our models for inference on most consumer-level GPUs. Let's try it!
Install BMInf:
$ pip install bminf
Assuming that you have a GPU with 8G memory, you can run the text generation script with the following command:
$ python text_generation.py --use-bminf --memory-limit 4
Note that memory-limit
should be less than total GPU memory, as there are some intermediate computation results needed to be stored.
No Description
Text Python Markdown other
Dear OpenI User
Thank you for your continuous support to the Openl Qizhi Community AI Collaboration Platform. In order to protect your usage rights and ensure network security, we updated the Openl Qizhi Community AI Collaboration Platform Usage Agreement in January 2024. The updated agreement specifies that users are prohibited from using intranet penetration tools. After you click "Agree and continue", you can continue to use our services. Thank you for your cooperation and understanding.
For more agreement content, please refer to the《Openl Qizhi Community AI Collaboration Platform Usage Agreement》