Are you sure you want to delete this task? Once this task is deleted, it cannot be recovered.
yuanzhoulvpi 50d0fd758c | 1 year ago | |
---|---|---|
.. | ||
v1_train_thuglm_lora | 1 year ago | |
v2_train_thuglm | 1 year ago | |
readme.md | 1 year ago |
simple_thu_chatglm6b
📣📣thuglm-6b
模型thuglm-6b
模型和gpt2
模型的差异thuglm-6b
模型源码,他的loss和gpt2
等自回归模型的loss,基本上是一样的。(这里只是考虑自回归类型的训练)#
# 这是thuglm模型的loss
if labels is not None:
lm_logits = lm_logits.to(torch.float32)
# Shift so that tokens < n predict n
shift_logits = lm_logits[..., :-1, :].contiguous()
shift_labels = labels[..., 1:].contiguous()
# Flatten the tokens
loss_fct = CrossEntropyLoss()
loss = loss_fct(shift_logits.view(-1, shift_logits.size(-1)), shift_labels.view(-1))
lm_logits = lm_logits.to(hidden_states.dtype)
loss = loss.to(hidden_states.dtype)
# src/transformers/models/gpt2/modeling_gpt2.py 的class GPT2LMHeadModel(GPT2PreTrainedModel):
# 这是gpt2的loss
loss = None
if labels is not None:
# Shift so that tokens < n predict n
shift_logits = lm_logits[..., :-1, :].contiguous()
shift_labels = labels[..., 1:].contiguous()
# Flatten the tokens
loss_fct = CrossEntropyLoss()
loss = loss_fct(shift_logits.view(-1, shift_logits.size(-1)), shift_labels.view(-1))
thuglm-6b
源码和transformers
包的gpt2
源码,长得非常像,设计模式是一摸一样的。从工程角度来看,你只要看过gpt2
thuglm-6b
的代码框架对你来说肯定不难。thuglm-6b
模型和transformers
包的gpt2
源码里面的模型,在forward
方法里面,需要的参数,基本上是保持一致的,因此。需要的数据样式,也都差不多。thuglm-6b
还没有所谓的thuglmForSequenceClassification
、thuglmForTokenClassification
gpt2
的风格来写,就行了。就是loss
更改一下,下游层更改一下。thuglm-6b
模型的态度thuglm-6b
gpt2
模型,可能是数据方面的问题,或者机器方面的问题。gpt2
模型而已。(抛开别的条件不谈)。thuglm-6b
模型序号 | 介绍 | 文件夹 | 是否已完成 | 是否还有bug |
---|---|---|---|---|
1 | 使用lora算法对thuglm-6b 微调 |
v1_train_thuglm-lora |
☑️ | ✅ |
2 | 使用transformers 的Trainer 对thuglm-6b 微调 |
v2_train_thuglm |
☑️ | ✅ |
thuglm-6b
模型 文件夹为v1_train_thuglm-lora
peft
https://github.com/huggingface/peftlora
算法)、对thuglm-6b
进行微调。在前面也说到,thuglm-6b
的ChatGLMForConditionalGeneration
loss和gpt2
的GPT2LMHeadModel
loss是差不多的,都是自回归模型,就是名字不一样而已。
因此,可以看看我的chinese-gpt2
模型训练的数据要求。
统计学人
】,然后回复【gpt2
】即可获得。.csv
格式的文件。其中有一列数据是content
content
就代表一句话,截图如下datasets
glob
包就能完成。当然,也可以直接生成一个数据,可以这么写
import numpy as np
import pandas as pd
import os
data_dir = "data"
os.makedirs(name=data_dir, exist_ok=True)
for i in range(20):
data = pd.DataFrame({'sentence': [
'ChatGLM-6B 是一个开源的、支持中英双语的对话语言模型,基于 [General Language Model (GLM)](https://github.com/THUDM/GLM) 架构,具有 62 亿参数。结合模型量化技术,'] * 100})
data.to_csv(f"{data_dir}/{i}.csv", index=False)
sentence
,或者叫content
。这就可以了。huggingface
的datasets
包,虽然我们这里使用的是csv
文件,但是,实际上,你使用json
格式的数据,都是可以的。训练大模型,需要的数据肯定也是非常大,担心自己不能处理几百G的数据么?其实不用担心,你只要传递所有的数据的路径即可。剩下的,就可以靠datasets
来帮你解决。他会自动对数据做处理,并且对数据所在的位置做内存映射,处理大数据简直是轻飘飘。
这里展示一下加载数据的细节
from glob import glob
from datasets import load_dataset
all_data_list = glob("v1_train_thuglm_lora/data/*")[:10] # 如果数据大,把这个列表变长一点就行了。
dataset = load_dataset(
"csv",
data_files={
"train": all_data_list[:6],
"validation": all_data_list[6:],
},
)
lora
这个算法,已经在peft
包中实现了。peft
包里面的examples
的peft_lora_seq2seq_accelerate_fsdp.py
thuglm
RuntimeError: expected scalar type Half but found Float
问题。有些人可能会问,lora
也没对thuglm
这类型的模型做支持啊,你这么用,难道不会有问题么?
lora.py
源码,在target_modules
里面,有列举了['q', 'v']
。# src/peft/tuners/lora.py
@dataclass
class LoraConfig(PeftConfig):
"""
This is the configuration class to store the configuration of a [`~peft.Lora`].
Args:
r (`int`): Lora attention dimension
target_modules (`Union[List[str],str]`): The names of the modules to apply Lora to.
lora_alpha (`float`): The alpha parameter for Lora scaling.
lora_dropout (`float`): The dropout probability for Lora layers.
merge_weights (`bool`):
Whether to merge the weights of the Lora layers with the base transformer model in `eval` mode.
fan_in_fan_out (`bool`): Set this to True if the layer to replace stores weight like (fan_in, fan_out)
enable_lora ( `List[bool]`): Used with `lora.MergedLinear`.
bias (`str`): Bias type for Lora. Can be 'none', 'all' or 'lora_only'
modules_to_save (`List[str]`):List of modules apart from LoRA layers to be set as trainable
and saved in the final checkpoint.
"""
r: int = field(default=8, metadata={"help": "Lora attention dimension"})
target_modules: Optional[Union[List[str], str]] = field(
default=None,
metadata={
"help": "List of module names or regex expression of the module names to replace with Lora."
"For example, ['q', 'v'] or '.*decoder.*(SelfAttention|EncDecAttention).*(q|v)$' "
},
)
transformers
的T5
模型源码,他里面的['q', 'v']
对应的是nn.Linear
层。# src/transformers/models/t5/modeling_t5.py
class T5Attention(nn.Module):
# def __init__(self, config: T5Config, has_relative_attention_bias=False):
# super().__init__()
# self.is_decoder = config.is_decoder
# self.has_relative_attention_bias = has_relative_attention_bias
# self.relative_attention_num_buckets = config.relative_attention_num_buckets
# self.relative_attention_max_distance = config.relative_attention_max_distance
# self.d_model = config.d_model
# self.key_value_proj_dim = config.d_kv
# self.n_heads = config.num_heads
# self.dropout = config.dropout_rate
# self.inner_dim = self.n_heads * self.key_value_proj_dim
self.q = nn.Linear(self.d_model, self.inner_dim, bias=False)
self.k = nn.Linear(self.d_model, self.inner_dim, bias=False)
self.v = nn.Linear(self.d_model, self.inner_dim, bias=False)
self.o = nn.Linear(self.inner_dim, self.d_model, bias=False)
因此,找到thuglm
模型中,有关nn.Linear
层的名称,就可以了。
使用lora
对thuglm
模型做修改
from peft import LoraConfig, TaskType, get_peft_model
from peft.utils.other import fsdp_auto_wrap_policy
from train_thuglm.v1_train_thuglm_lora.thuglm.modeling_chatglm import ChatGLMForConditionalGeneration
from train_thuglm.v1_train_thuglm_lora.thuglm.tokenization_chatglm import ChatGLMTokenizer
model = ChatGLMForConditionalGeneration.from_pretrained(
"THUDM/chatglm-6b", load_in_8bit=False)
tokenizer = ChatGLMTokenizer.from_pretrained("THUDM/chatglm-6b")
# 使用lora模型对thuglm做转换
peft_config = LoraConfig(
task_type=TaskType.CAUSAL_LM,
inference_mode=False, r=8, lora_alpha=32, lora_dropout=0.1,
# ['dense','dense_h_to_4h','dense_4h_to_h'] # 'query_key_value',
target_modules=['dense',
'dense_h_to_4h', 'dense_4h_to_h'],
)
model = get_peft_model(model, peft_config)
关键的部分,都已经被列举出来了,剩下的部分,基本上就是和训练pytorch
模型差不多了,就不再介绍了。
transformers
的Trainer
对thuglm-6b
微调主要做的事情有:
modeling_chatglm.py
模型源码,可以使用Tranformers
包的trainer
来进行训练。缺点
thu-chatglm-6b
文件夹中,但是要保留我放的modeling_chatglm.py
文件。No Description
CSV Jupyter Notebook Python
Dear OpenI User
Thank you for your continuous support to the Openl Qizhi Community AI Collaboration Platform. In order to protect your usage rights and ensure network security, we updated the Openl Qizhi Community AI Collaboration Platform Usage Agreement in January 2024. The updated agreement specifies that users are prohibited from using intranet penetration tools. After you click "Agree and continue", you can continue to use our services. Thank you for your cooperation and understanding.
For more agreement content, please refer to the《Openl Qizhi Community AI Collaboration Platform Usage Agreement》