T5:全名Text-to-Text Transfer Transformer
模型是谷歌在2019年基于C4数据集训练的Transformer模型。
论文C Raffel,N Shazeer,A Roberts,K Lee,S Narang,M Matena,Y Zhou,W Li,PJ Liu, 2020
使用的数据集:WMT16
对应的文件路径如下:
└── wmt_en_ro
├── test.source
├── test.target
├── train.source
├── train.target
├── val.source
└── val.target
需开发者提前clone工程。
- 请参考使用脚本启动
示例命令如下,将会执行一个只有1层的T5模型训练
python run_mindformer.py --config configs/t5/run_t5_tiny_on_wmt16.yaml --run_mode train \
--device_target Ascend \
--train_dataset_dir /your_path/wmt_en_ro
其中device_target
根据用户的运行设备不同,可选GPU/Ascend/CPU
。config
的入参还可以为configs/t5/run_t5_small.yaml
,在
这个配置下将会加载t5_small
的权重并且开始执行微调。
需开发者提前pip安装。具体接口说明请参考API接口
- 模型计算Loss
from mindformers import T5ForConditionalGeneration, T5Tokenizer
model = T5ForConditionalGeneration.from_pretrained('t5_small')
tokenizer = T5Tokenizer.from_pretrained('t5_small')
src_output = tokenizer(["hello world"], padding='max_length', max_length=model.config.seq_length,
return_tensors='ms')
model_input = tokenizer(["So happy to see you!"], padding='max_length', max_length=model.config.max_decode_length,
return_tensors='ms')["input_ids"]
input_ids = src_output['input_ids']
attention_mask = src_output['attention_mask']
output = model(input_ids, attention_mask, model_input)
print(output)
# [5.64458]
- 推理
执行下述的命令,可以自动云上拉取t5_small
模型并且进行推理。
from mindformers import T5ForConditionalGeneration, T5Tokenizer
t5 = T5ForConditionalGeneration.from_pretrained("t5_small")
tokenizer = T5Tokenizer.from_pretrained("t5_small")
words = tokenizer("translate the English to the Romanian: UN Chief Says There Is No Military "
"Solution in Syria")['input_ids']
output = t5.generate(words, do_sample=False)
output = tokenizer.decode(output, skip_special_tokens=True)
print(output)
# "eful ONU declară că nu există o soluţie militară în Siri"
- Trainer接口开启训练/预测:
import mindspore; mindspore.set_context(mode=0, device_id=0)
from mindformers.trainer import Trainer
# 初始化预训练任务
trainer = Trainer(task='translation', model='t5_small', train_dataset="your data file path")
# 方式1: 开启训练,并使用训练好的权重进行推理
trainer.train()
res = trainer.predict(predict_checkpoint=True, input_data="translate the English to Romanian: a good boy!")
print(res)
#[{'translation_text': ['un băiat bun!']}]
# 方式2: 从obs下载训练好的权重并进行推理
res = trainer.predict(input_data="translate the English to Romanian: a good boy!")
print(res)
#[{'translation_text': ['un băiat bun!']}]
- pipeline接口开启快速推理
from mindformers.pipeline import pipeline
pipeline_task = pipeline("translation", model='t5_small')
pipeline_result = pipeline_task("translate the English to Romanian: a good boy!", top_k=3)
print(pipeline_result)
#[{'translation_text': ['un băiat bun!']}]
本仓库中的t5_small
来自于HuggingFace的t5_small
, 基于下述的步骤获取:
-
从上述的链接中下载
t5_small
的HuggingFace权重,文件名为pytorch_model.bin
-
执行转换脚本,得到转换后的输出文件
mindspore_t5.ckpt
python mindformers/models/t5/convert_weight.py --layers 6 --torch_path pytorch_model.bin --mindspore_path ./mindspore_t5.ckpt