Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

请问序列并行是否支持多模态任务 Qwen 2.5 VL #3215

Open
lisonsown960321 opened this issue Feb 21, 2025 · 0 comments
Open

请问序列并行是否支持多模态任务 Qwen 2.5 VL #3215

lisonsown960321 opened this issue Feb 21, 2025 · 0 comments

Comments

@lisonsown960321
Copy link

lisonsown960321 commented Feb 21, 2025

我在使用swift 3.1,Transformer 4.49, xtuner 0.1.23 进行Qwen2.5VL的多模态训练,因为数据较长,使用了sequence parallel,发现有的数据报rope维度不匹配,有的数据又能正常训练,所以在Qwen2.5的 modeling_qwen2_5_vl.py做了debug。

由于我的每一个任务都是固定有一个image输入的,debug显示rope对输入input id中图像维度判断有问题,仔细看,是有的数据中image ids被拆了是不完整的。例如:input_tokens: [151644, 8948, 198, 2610, 525, 35678, 11, 458, 15235, 10822, 17847, ..., 397, 151645, 198, 151644, 872, 198, 151652, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, 151655, ..., 151655, 151655] <- 没结束,应该是被序列并行拆掉了,导致rope的判断有误

看了一下swift 的 base.py中:
if self.sequence_parallel_size > 1 and input_ids is not None:
bs, seq_len = input_ids.shape
position_ids = torch.arange(seq_len).unsqueeze(0).long().repeat(bs, 1)
assert padding_side == 'right' or bs == 1, 'Sequence parallel only support padding_side=right'
from swift.trainers.xtuner import get_xtuner_sequence_parallel_world_size
if get_xtuner_sequence_parallel_world_size() > 1:
from swift.trainers.xtuner import pad_and_split_for_sequence_parallel
input_ids, labels, position_ids, attention_mask, loss_scale =
pad_and_split_for_sequence_parallel(
tokenizer, input_ids, labels, position_ids, attention_mask, loss_scale)
res['position_ids'] = position_ids
_local_var = locals()

貌似没有禁止多模态数据被截断,不知道是不是这个关系。

请问这种情况是当前框架暂时不支持,还是我的使用方式有问题呢?我现在比较好的策略是什么呢?是否是:动parallel机制,检测到vision输入没结束,就不拆或后续再拆么?

我的swift参数设置:
MODEL_P="7B"
TOKENIZERS_PARALLELISM=false NPROC_PER_NODE=4 CUDA_VISIBLE_DEVICES=0,1,2,3 MASTER_PORT=$METIS_WORKER_0_PORT MAX_PIXELS=1003520
swift sft
--model /opt/tiger/soweval/C/Qwen2.5VL/Qwen2.5-VL-$MODEL_P-Instruct
--torch_dtype bfloat16
--num_train_epochs 1
--train_type lora
--lora_rank 16
--lora_alpha 32
--max_length 16000
--truncation_strategy delete
--max_pixels 1003520
--output_dir /opt/tiger/soweval/C/creator_model/Qwen2.5VL/swift_train/output_$MODEL_P
--dataset /opt/tiger/soweval/C/accumulated_data/train_data_post_process/2025-01-26_2025-02-20/init.jsonl
--val_dataset /opt/tiger/soweval/C/accumulated_data/train_data_post_process/2025-01-26_2025-02-20/init.jsonl
--per_device_train_batch_size 1
--per_device_eval_batch_size 1
--gradient_accumulation_steps 1
--learning_rate 5e-5
--deepspeed zero3
--save_total_limit 2
--eval_steps 10
--freeze_vit true
--warmup_ratio 0.05
--dataloader_num_workers 4
--attn_impl eager
--sequence_parallel_size 2 \

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant