We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
我在使用swift 3.1,Transformer 4.49, xtuner 0.1.23 进行Qwen2.5VL的多模态训练,因为数据较长,使用了sequence parallel,发现有的数据报rope维度不匹配,有的数据又能正常训练,所以在Qwen2.5的 modeling_qwen2_5_vl.py做了debug。
由于我的每一个任务都是固定有一个image输入的,debug显示rope对输入input id中图像维度判断有问题,仔细看,是有的数据中image ids被拆了是不完整的。例如:input_tokens: [151644, 8948, 198, 2610, 525, 35678, 11, 458, 15235, 10822, 17847, ..., 397, 151645, 198, 151644, 872, 198, 151652, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, ..., 151655, 151655] <- 没结束,应该是被序列并行拆掉了,导致rope的判断有误
看了一下swift 的 base.py中: if self.sequence_parallel_size > 1 and input_ids is not None: bs, seq_len = input_ids.shape position_ids = torch.arange(seq_len).unsqueeze(0).long().repeat(bs, 1) assert padding_side == 'right' or bs == 1, 'Sequence parallel only support padding_side=right' from swift.trainers.xtuner import get_xtuner_sequence_parallel_world_size if get_xtuner_sequence_parallel_world_size() > 1: from swift.trainers.xtuner import pad_and_split_for_sequence_parallel input_ids, labels, position_ids, attention_mask, loss_scale = pad_and_split_for_sequence_parallel( tokenizer, input_ids, labels, position_ids, attention_mask, loss_scale) res['position_ids'] = position_ids _local_var = locals()
貌似没有禁止多模态数据被截断,不知道是不是这个关系。
请问这种情况是当前框架暂时不支持,还是我的使用方式有问题呢?我现在比较好的策略是什么呢?是否是:动parallel机制,检测到vision输入没结束,就不拆或后续再拆么?
我的swift参数设置: MODEL_P="7B" TOKENIZERS_PARALLELISM=false NPROC_PER_NODE=4 CUDA_VISIBLE_DEVICES=0,1,2,3 MASTER_PORT=$METIS_WORKER_0_PORT MAX_PIXELS=1003520 swift sft --model /opt/tiger/soweval/C/Qwen2.5VL/Qwen2.5-VL-$MODEL_P-Instruct --torch_dtype bfloat16 --num_train_epochs 1 --train_type lora --lora_rank 16 --lora_alpha 32 --max_length 16000 --truncation_strategy delete --max_pixels 1003520 --output_dir /opt/tiger/soweval/C/creator_model/Qwen2.5VL/swift_train/output_$MODEL_P --dataset /opt/tiger/soweval/C/accumulated_data/train_data_post_process/2025-01-26_2025-02-20/init.jsonl --val_dataset /opt/tiger/soweval/C/accumulated_data/train_data_post_process/2025-01-26_2025-02-20/init.jsonl --per_device_train_batch_size 1 --per_device_eval_batch_size 1 --gradient_accumulation_steps 1 --learning_rate 5e-5 --deepspeed zero3 --save_total_limit 2 --eval_steps 10 --freeze_vit true --warmup_ratio 0.05 --dataloader_num_workers 4 --attn_impl eager --sequence_parallel_size 2 \
The text was updated successfully, but these errors were encountered:
No branches or pull requests
我在使用swift 3.1,Transformer 4.49, xtuner 0.1.23 进行Qwen2.5VL的多模态训练,因为数据较长,使用了sequence parallel,发现有的数据报rope维度不匹配,有的数据又能正常训练,所以在Qwen2.5的 modeling_qwen2_5_vl.py做了debug。
由于我的每一个任务都是固定有一个image输入的,debug显示rope对输入input id中图像维度判断有问题,仔细看,是有的数据中image ids被拆了是不完整的。例如:input_tokens: [151644, 8948, 198, 2610, 525, 35678, 11, 458, 15235, 10822, 17847, ..., 397, 151645, 198, 151644, 872, 198, 151652, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, ..., 151655, 151655] <- 没结束,应该是被序列并行拆掉了,导致rope的判断有误
看了一下swift 的 base.py中:
if self.sequence_parallel_size > 1 and input_ids is not None:
bs, seq_len = input_ids.shape
position_ids = torch.arange(seq_len).unsqueeze(0).long().repeat(bs, 1)
assert padding_side == 'right' or bs == 1, 'Sequence parallel only support padding_side=right'
from swift.trainers.xtuner import get_xtuner_sequence_parallel_world_size
if get_xtuner_sequence_parallel_world_size() > 1:
from swift.trainers.xtuner import pad_and_split_for_sequence_parallel
input_ids, labels, position_ids, attention_mask, loss_scale =
pad_and_split_for_sequence_parallel(
tokenizer, input_ids, labels, position_ids, attention_mask, loss_scale)
res['position_ids'] = position_ids
_local_var = locals()
貌似没有禁止多模态数据被截断,不知道是不是这个关系。
请问这种情况是当前框架暂时不支持,还是我的使用方式有问题呢?我现在比较好的策略是什么呢?是否是:动parallel机制,检测到vision输入没结束,就不拆或后续再拆么?
我的swift参数设置:
MODEL_P="7B"
TOKENIZERS_PARALLELISM=false NPROC_PER_NODE=4 CUDA_VISIBLE_DEVICES=0,1,2,3 MASTER_PORT=$METIS_WORKER_0_PORT MAX_PIXELS=1003520
swift sft
--model /opt/tiger/soweval/C/Qwen2.5VL/Qwen2.5-VL-$MODEL_P-Instruct
--torch_dtype bfloat16
--num_train_epochs 1
--train_type lora
--lora_rank 16
--lora_alpha 32
--max_length 16000
--truncation_strategy delete
--max_pixels 1003520
--output_dir /opt/tiger/soweval/C/creator_model/Qwen2.5VL/swift_train/output_$MODEL_P
--dataset /opt/tiger/soweval/C/accumulated_data/train_data_post_process/2025-01-26_2025-02-20/init.jsonl
--val_dataset /opt/tiger/soweval/C/accumulated_data/train_data_post_process/2025-01-26_2025-02-20/init.jsonl
--per_device_train_batch_size 1
--per_device_eval_batch_size 1
--gradient_accumulation_steps 1
--learning_rate 5e-5
--deepspeed zero3
--save_total_limit 2
--eval_steps 10
--freeze_vit true
--warmup_ratio 0.05
--dataloader_num_workers 4
--attn_impl eager
--sequence_parallel_size 2 \
The text was updated successfully, but these errors were encountered: