-
Notifications
You must be signed in to change notification settings - Fork 476
Open
Description
When running:
python3 run.py --data CCOCR_MultiSceneOcr_Cord --model ${MODEL_NAME} --work-dir ${SUB_OUTPUT_DIR} --verbose
It fails with the OutOfMemory issue:
"File "/opt/conda/envs/cc-ocr-py310/lib/python3.10/site-packages/transformers/models/qwen2_5_vl/modeling_qwen2_5_vl.py", line 185, in eager_attention_forward
attn_weights = torch.matmul(query, key_states.transpose(2, 3)) * scaling
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 28.43 GiB. GPU 0 has a total capacity of 15.77 GiB of which 13.38 GiB is free. Including non-PyTorch memory, this process has 2.38 GiB memory in use. Of the allocated memory 2.00 GiB is allocated by PyTorch, and 9.25 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)"
In model.py inside the class Qwen2VLChat I have:
self.model = MODEL_CLS.from_pretrained(
model_path, torch_dtype='auto', device_map="auto", attn_implementation='eager', load_in_8bit=True, low_cpu_mem_usage=True, max_memory={0: "15GiB", 1: "15GiB", 2: "15GiB", 3: "15GiB", "cpu": "40GiB"}
)
Using eager attention, as flash attention could not be installed due to a system-level incompatibility with GLIBC (current version = GLIBC 2.31, required: 2.32)
Tried:
export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
Machine config: 4 x NVIDIA V100.
Any suggestions are highly appreciated!!
Metadata
Metadata
Assignees
Labels
No labels