deepseek-chat模型LorA微调完没有adapter_config.json #238

HolyCrazy · 2024-08-12T13:00:09Z

deepseek-chat模型LorA微调完没有adapter_config.json，看其他issue里说，是因为transformer的版本问题，LorA微调出来的参数和基座模型的参数直接合并了，但是直接运行Lora微调出来的参数，模型回复的所有token都是0～，感觉是没有进行推理～

KMnO4-zx · 2024-08-15T11:00:08Z

可以是尝试一下 transformers版本为 4.31.3

HolyCrazy · 2024-08-16T03:55:56Z

transformers 4.31.3没有～用了transformers==4.31.0～但还是不行～感觉是不是我脚本写的有问题～

MAX_LENGTH = 384 # Llama分词器会将一个中文字切分为多个token，因此需要放开一些最大长度，保证数据的完整性
path = '/mnt/bn/models/deepseek-coder-6.7b-instruct'
tokenizer = AutoTokenizer.from_pretrained(path, use_fast=False, trust_remote_code=True)
tokenizer.padding_side = 'right' # padding在右边

model = AutoModelForCausalLM.from_pretrained(path, trust_remote_code=True, torch_dtype=torch.half, device_map="auto")
model.generation_config = GenerationConfig.from_pretrained(path)
model.generation_config.pad_token_id = model.generation_config.eos_token_id

def process_func(example):
input_ids, attention_mask, labels = [], [], []
instruction = tokenizer(f"User: {example['instruction']+example['input']}\n\n", add_special_tokens=False) # add_special_tokens 不在开头加 special_tokens
response = tokenizer(f"Assistant: {example['output']}<｜end▁of▁sentence｜>", add_special_tokens=False)
input_ids = instruction["input_ids"] + response["input_ids"] + [tokenizer.pad_token_id]
attention_mask = instruction["attention_mask"] + response["attention_mask"] + [1] # 因为eos token咱们也是要关注的所以补充为1
labels = [-100] * len(instruction["input_ids"]) + response["input_ids"] + [tokenizer.pad_token_id]
if len(input_ids) > MAX_LENGTH: # 做一个截断
input_ids = input_ids[:MAX_LENGTH]
attention_mask = attention_mask[:MAX_LENGTH]
labels = labels[:MAX_LENGTH]
return {
"input_ids": input_ids,
"attention_mask": attention_mask,
"labels": labels
}

dataset = load_dataset("json", data_files={"/zhen_huan_dataset_1000.json"}, split = 'train')

processed_dataset = dataset.map(
process_func,
remove_columns=["instruction", "input", "output"]
)
print(processed_dataset)

config = LoraConfig(
task_type=TaskType.CAUSAL_LM,
target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
inference_mode=False, # 训练模式
r=8, # Lora 秩
lora_alpha=32, # Lora alaph，具体作用参见 Lora 原理
lora_dropout=0.1# Dropout 比例
)

output_dir="/code/lora"

args = TrainingArguments(
output_dir=output_dir,
per_device_train_batch_size=8,
gradient_accumulation_steps=2,
logging_steps=10,
num_train_epochs=3,
save_steps=100,
learning_rate=1e-4,
save_on_each_node=True,
gradient_checkpointing=True
)

trainer = Trainer(
model=model,
args=args,
train_dataset=processed_dataset,
data_collator=DataCollatorForSeq2Seq(tokenizer=tokenizer, padding=True),
)
trainer.train()
print("train finish_______________")

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

deepseek-chat模型LorA微调完没有adapter_config.json #238

deepseek-chat模型LorA微调完没有adapter_config.json #238

HolyCrazy commented Aug 12, 2024

KMnO4-zx commented Aug 15, 2024

HolyCrazy commented Aug 16, 2024 •

edited

Loading

deepseek-chat模型LorA微调完没有adapter_config.json #238

deepseek-chat模型LorA微调完没有adapter_config.json #238

Comments

HolyCrazy commented Aug 12, 2024

KMnO4-zx commented Aug 15, 2024

HolyCrazy commented Aug 16, 2024 • edited Loading

HolyCrazy commented Aug 16, 2024 •

edited

Loading