Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

oom problem #155

Open
sms-s opened this issue Mar 13, 2025 · 1 comment
Open

oom problem #155

sms-s opened this issue Mar 13, 2025 · 1 comment

Comments

@sms-s
Copy link

sms-s commented Mar 13, 2025

I'm running on two A6000 GPUs(48G), but I'm encountering an out-of-memory error. Does anyone know how to optimize this? Here are the parameters:cd src/open-r1-multimodal
export DEBUG_MODE="true"
export CUDA_VISIBLE_DEVICES=1,2
RUN_NAME="Qwen2.5-VL-3B-GRPO-REC"
export LOG_PATH="./debug_log_$RUN_NAME.txt"
torchrun --nproc_per_node="2"
--nnodes="1"
--node_rank="0"
--master_addr="127.0.0.1"
--master_port="12346"
src/open_r1/grpo_rec.py
--deepspeed local_scripts/zero3.json
--output_dir quanzhong/$RUN_NAME
--model_name_or_path VLM-R1/Qwen2.5-VL-3B-Instruct
--dataset_name data_config/rec.yaml
--image_root VLM-R1/camotrain
--max_prompt_length 1024
--num_generations 2
--per_device_train_batch_size 1
--gradient_accumulation_steps 2
--logging_steps 1
--bf16
--torch_dtype bfloat16
--data_seed 42
--report_to wandb
--gradient_checkpointing true
--attn_implementation flash_attention_2
--num_train_epochs 2
--run_name $RUN_NAME
--save_steps 100
--save_only_model true

@xrc10
Copy link
Contributor

xrc10 commented Mar 14, 2025

Since you alreay set per_device_train_batch_size = 1, another thing to try is to set --max_pixels to a smaller values, like 401408

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants