Are you sure you want to delete this task? Once this task is deleted, it cannot be recovered.
zgy 9701411c7e | 1 year ago | |
---|---|---|
.. | ||
README-ZH.md | 1 year ago | |
README.md | 1 year ago |
Efficient Inference for Big Models
Overview • Demo • Documentation • Installation • Quick Start • Supported Models • 简体中文
cupy
and supports PyTorch backpropagation.generate
interface and added a new CPM 2.1 demo.BMInf (Big Model Inference) is a low-resource inference package for large-scale pretrained language models (PLMs). It has following features:
For more demos, please refer to BMInf-demos.
Our documentation provides more information about the package.
From pip: pip install bminf==1.0.2
From source code: download the package and run python setup.py install
From docker: docker run -it --gpus 1 -v $HOME/.cache/bigmodels:/root/.cache/bigmodels --rm openbmb/bminf python3 examples/fill_blank.py
Here we list the minimum and recommended configurations for running BMInf.
Minimum Configuration | Recommended Configuration | |
---|---|---|
Memory | 16GB | 24GB |
GPU | NVIDIA GeForce GTX 1060 6GB | NVIDIA Tesla V100 16GB |
PCI-E | PCI-E 3.0 x16 | PCI-E 3.0 x16 |
GPUs with compute
capability 6.1 or higher are supported by BMInf. Refer to the table to check whether your GPU is supported.
BMInf requires CUDA version >= 10.1 and all the dependencies can be automaticlly installed by the installation process.
If you want to use the backpropagation function with PyTorch, make sure torch
is installed on your device.
Here we provide a simple script for using BMInf.
Firstly, import a model from the model base (e.g. CPM1, CPM2, EVA).
import bminf
cpm2 = bminf.models.CPM2()
Then define the text and use the <span>
token to denote the blank to fill in.
text = "北京环球度假区相关负责人介绍,北京环球影城指定单日门票将采用<span>制度,即推出淡季日、平季日、旺季日和特定日门票。<span>价格为418元,<span>价格为528元,<span>价格为638元,<span>价格为<span>元。北京环球度假区将提供90天滚动价格日历,以方便游客提前规划行程。"
Use the fill_blank
function to obtain the results and replace <span>
tokens with the results.
for result in cpm2.fill_blank(text,
top_p=1.0,
top_n=5,
temperature=0.5,
frequency_penalty=0,
presence_penalty=0
):
value = result["text"]
text = text.replace("<span>", "\033[0;32m" + value + "\033[0m", 1)
print(text)
Finally, you can get the predicted text. For more examples, go to the examples
folder.
BMInf currently supports these models:
CPM2.1. CPM2.1 is an upgraded version of CPM2 [1], which is a general Chinese pre-trained language model with 11 billion parameters. Based on CPM2, CPM2.1 introduces a generative pre-training task and was trained via the continual learning paradigm. In experiments, CPM2.1 has a better generation ability than CPM2.
CPM1. CPM1 [2] is a generative Chinese pre-trained language model with 2.6 billion parameters. The architecture of CPM1 is similar to GPT [4] and it can be used in various NLP tasks such as conversation, essay generation, cloze test, and language understanding.
EVA. EVA [3] is a Chinese pre-trained dialogue model with 2.8 billion parameters. EVA performs well on many dialogue tasks, especially in the multi-turn interaction of human-bot conversations.
Besides these models, we are now working on adding more PLMs especially large-scale PLMs. We welcome every contributor to add their models to this project by proposing an issue.
Here we report the speeds of CPM2 encoder and decoder we have tested on different platforms. You can also run benchmark/cpm2/encoder.py
and benchmark/cpm2/decoder.py
to test the speed on your machine!
Implementation | GPU | Encoder Speed (tokens/s) | Decoder Speed (tokens/s) |
---|---|---|---|
BMInf | NVIDIA GeForce GTX 1060 | 718 | 4.4 |
BMInf | NVIDIA GeForce GTX 1080Ti | 1200 | 12 |
BMInf | NVIDIA GeForce GTX 2080Ti | 2275 | 19 |
BMInf | NVIDIA Tesla V100 | 2966 | 20 |
BMInf | NVIDIA Tesla A100 | 4365 | 26 |
PyTorch | NVIDIA Tesla V100 | - | 3 |
PyTorch | NVIDIA Tesla A100 | - | 7 |
We welcome everyone to contribute codes following our contributing guidelines.
You can also find us on other platforms:
The package is released under the Apache 2.0 License.
No Description
Python Markdown Cuda Shell Makefile other
Dear OpenI User
Thank you for your continuous support to the Openl Qizhi Community AI Collaboration Platform. In order to protect your usage rights and ensure network security, we updated the Openl Qizhi Community AI Collaboration Platform Usage Agreement in January 2024. The updated agreement specifies that users are prohibited from using intranet penetration tools. After you click "Agree and continue", you can continue to use our services. Thank you for your cooperation and understanding.
For more agreement content, please refer to the《Openl Qizhi Community AI Collaboration Platform Usage Agreement》