请问序列并行是否支持多模态任务 Qwen 2.5 VL #3215

lisonsown960321 · 2025-02-21T09:33:11Z

我在使用swift 3.1，Transformer 4.49, xtuner 0.1.23 进行Qwen2.5VL的多模态训练，因为数据较长，使用了sequence parallel，发现有的数据报rope维度不匹配，有的数据又能正常训练，所以在Qwen2.5的 modeling_qwen2_5_vl.py做了debug。

由于我的每一个任务都是固定有一个image输入的，debug显示rope对输入input id中图像维度判断有问题，仔细看，是有的数据中image ids被拆了是不完整的。例如：input_tokens: [151644, 8948, 198, 2610, 525, 35678, 11, 458, 15235, 10822, 17847, ..., 397, 151645, 198, 151644, 872, 198, 151652, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, ..., 151655, 151655] <- 没结束，应该是被序列并行拆掉了，导致rope的判断有误

看了一下swift 的 base.py中：
if self.sequence_parallel_size > 1 and input_ids is not None:
bs, seq_len = input_ids.shape
position_ids = torch.arange(seq_len).unsqueeze(0).long().repeat(bs, 1)
assert padding_side == 'right' or bs == 1, 'Sequence parallel only support padding_side=right'
from swift.trainers.xtuner import get_xtuner_sequence_parallel_world_size
if get_xtuner_sequence_parallel_world_size() > 1:
from swift.trainers.xtuner import pad_and_split_for_sequence_parallel
input_ids, labels, position_ids, attention_mask, loss_scale =
pad_and_split_for_sequence_parallel(
tokenizer, input_ids, labels, position_ids, attention_mask, loss_scale)
res['position_ids'] = position_ids
_local_var = locals()

貌似没有禁止多模态数据被截断，不知道是不是这个关系。

请问这种情况是当前框架暂时不支持，还是我的使用方式有问题呢？我现在比较好的策略是什么呢？是否是：动parallel机制，检测到vision输入没结束，就不拆或后续再拆么？

我的swift参数设置：
MODEL_P="7B"
TOKENIZERS_PARALLELISM=false NPROC_PER_NODE=4 CUDA_VISIBLE_DEVICES=0,1,2,3 MASTER_PORT=$METIS_WORKER_0_PORT MAX_PIXELS=1003520
swift sft
--model /opt/tiger/soweval/C/Qwen2.5VL/Qwen2.5-VL-$MODEL_P-Instruct
--torch_dtype bfloat16
--num_train_epochs 1
--train_type lora
--lora_rank 16
--lora_alpha 32
--max_length 16000
--truncation_strategy delete
--max_pixels 1003520
--output_dir /opt/tiger/soweval/C/creator_model/Qwen2.5VL/swift_train/output_$MODEL_P
--dataset /opt/tiger/soweval/C/accumulated_data/train_data_post_process/2025-01-26_2025-02-20/init.jsonl
--val_dataset /opt/tiger/soweval/C/accumulated_data/train_data_post_process/2025-01-26_2025-02-20/init.jsonl
--per_device_train_batch_size 1
--per_device_eval_batch_size 1
--gradient_accumulation_steps 1
--learning_rate 5e-5
--deepspeed zero3
--save_total_limit 2
--eval_steps 10
--freeze_vit true
--warmup_ratio 0.05
--dataloader_num_workers 4
--attn_impl eager
--sequence_parallel_size 2 \

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

请问序列并行是否支持多模态任务 Qwen 2.5 VL #3215

请问序列并行是否支持多模态任务 Qwen 2.5 VL #3215

lisonsown960321 commented Feb 21, 2025 •

edited

Loading

请问序列并行是否支持多模态任务 Qwen 2.5 VL #3215

请问序列并行是否支持多模态任务 Qwen 2.5 VL #3215

Comments

lisonsown960321 commented Feb 21, 2025 • edited Loading

lisonsown960321 commented Feb 21, 2025 •

edited

Loading