[Bug] RuntimeError: CUDA out of memory. Tried to allocate 1048475.67 GiB #52

lin199811 · 2024-05-15T09:37:08Z

Prerequisite

I have searched Issues and Discussions but cannot get the expected help.
I have read the FAQ documentation but cannot get the expected help.
The bug has not been fixed in the latest version (dev-1.x) or latest version (dev-1.0).

Task

I'm using the official example scripts/configs for the officially supported tasks/models/datasets.

Branch

main branch https://github.com/open-mmlab/mmdetection3d

Environment

System environment:
sys.platform: linux
Python: 3.8.18 (default, Sep 11 2023, 13:40:15) [GCC 11.2.0]
CUDA available: True
MUSA available: False
numpy_random_seed: 1591519926
GPU 0,1,2,3,4,5,6,7: NVIDIA GeForce RTX 3090
CUDA_HOME: /usr
NVCC: Cuda compilation tools, release 10.1, V10.1.24
GCC: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
PyTorch: 1.11.0
PyTorch compiling details: PyTorch built with:

GCC 7.3
C++ Version: 201402
Intel(R) oneAPI Math Kernel Library Version 2023.1-Product Build 20230303 for Intel(R) 64 architecture applications
Intel(R) MKL-DNN v2.5.2 (Git Hash a9302535553c73243c632ad3c4c80beec3d19a1e)
OpenMP 201511 (a.k.a. OpenMP 4.5)
LAPACK is enabled (usually provided by MKL)
NNPACK is enabled
CPU capability usage: AVX2
CUDA Runtime 11.3
NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
CuDNN 8.2
Magma 2.5.2
Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.3, CUDNN_VERSION=8.2.0, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.11.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,

TorchVision: 0.12.0
OpenCV: 4.9.0
MMEngine: 0.10.3

Runtime environment:
cudnn_benchmark: False
mp_cfg: {'mp_start_method': 'fork', 'opencv_num_threads': 0}
dist_cfg: {'backend': 'nccl'}
seed: 1591519926
Distributed launcher: none
Distributed training: False
GPU number: 1

Reproduces the problem - code sample

Traceback (most recent call last):
File "/home/fudongyi/anaconda3/envs/embodiedscan/lib/python3.8/contextlib.py", line 131, in exit
self.gen.throw(type, value, traceback)
File "/home/fudongyi/anaconda3/envs/embodiedscan/lib/python3.8/site-packages/mmengine/optim/optimizer/optimizer_wrapper.py", line 283, in optim_context
yield
File "/home/fudongyi/anaconda3/envs/embodiedscan/lib/python3.8/site-packages/mmengine/model/base_model/base_model.py", line 114, in train_step
losses = self._run_forward(data, mode='loss') # type: ignore
File "/home/fudongyi/anaconda3/envs/embodiedscan/lib/python3.8/site-packages/mmengine/model/base_model/base_model.py", line 361, in _run_forward
results = self(**data, mode=mode)
File "/home/fudongyi/anaconda3/envs/embodiedscan/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/storage1/Fudongyi/AutoDrive/code/EmbodiedScan/embodiedscan/models/detectors/sparse_featfusion_single_stage.py", line 325, in forward
return self.loss(inputs, data_samples, **kwargs)
File "/storage1/Fudongyi/AutoDrive/code/EmbodiedScan/embodiedscan/models/detectors/sparse_featfusion_single_stage.py", line 242, in loss
losses = self.bbox_head.loss(x, batch_data_samples, **kwargs)
File "/storage1/Fudongyi/AutoDrive/code/EmbodiedScan/embodiedscan/models/dense_heads/fcaf3d_head.py", line 1037, in loss
outs = self(x)
File "/home/fudongyi/anaconda3/envs/embodiedscan/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/storage1/Fudongyi/AutoDrive/code/EmbodiedScan/embodiedscan/models/dense_heads/fcaf3d_head.py", line 1010, in forward
x = self._prune(x, prune_score)
File "/storage1/Fudongyi/AutoDrive/code/EmbodiedScan/embodiedscan/models/dense_heads/fcaf3d_head.py", line 1103, in _prune
interpolated_scores = scores.features_at_coordinates(coordinates)
File "/home/fudongyi/anaconda3/envs/embodiedscan/lib/python3.8/site-packages/MinkowskiEngine-0.5.4-py3.8-linux-x86_64.egg/MinkowskiEngine/MinkowskiSparseTensor.py", line 713, in features_at_coordinates
return MinkowskiInterpolationFunction().apply(
File "/home/fudongyi/anaconda3/envs/embodiedscan/lib/python3.8/site-packages/MinkowskiEngine-0.5.4-py3.8-linux-x86_64.egg/MinkowskiEngine/MinkowskiInterpolation.py", line 52, in forward
out_feat, in_map, out_map, weights = fw_fn(
RuntimeError: CUDA out of memory. Tried to allocate 1048475.67 GiB (GPU 0; 23.70 GiB total capacity; 1.47 GiB already allocated; 20.20 GiB free; 1.50 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Reproduces the problem - command or script

python tools/train.py configs/detection/mv-det3d_8xb4_embodiedscan-3d-284class-9dof.py --work-dir=work_dirs/mv-3ddet

Reproduces the problem - error message

There show the fw_fn() function tried to allocate 1048475.67 GiB at GPU. I think it is a bug. how can I solve this problem?

Additional information

No response

lin199811 · 2024-05-15T09:39:02Z

my train_dataloader is set (batch_size=1, num_workers=1)

Tai-Wang · 2024-05-16T08:57:36Z

Do you use our officially provided data? It is really strange to allocate such huge amount of memory for the multi-view 3D detection model.

lin199811 · 2024-05-19T02:24:36Z

I found this bug in MinkowskiEngine. I modify and rebuid the MinkowskiEngine code ( in src/spmm.cu coo_spmm function, I change the nnz to static_caststd::size_t(nnz). Therefore, this bug never seen again.

AmingWu · 2024-06-01T01:22:55Z

@lin199811 How long does your method need to train?

Zhaooyy · 2024-07-03T02:55:07Z

there's a lot of nnz in src/spmm.cu, should I change them all to static_caststd::size_t(nnz) or just some of them?

lin199811 closed this as completed May 19, 2024

mxh1999 pinned this issue May 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] RuntimeError: CUDA out of memory. Tried to allocate 1048475.67 GiB #52

[Bug] RuntimeError: CUDA out of memory. Tried to allocate 1048475.67 GiB #52

lin199811 commented May 15, 2024

lin199811 commented May 15, 2024

Tai-Wang commented May 16, 2024

lin199811 commented May 19, 2024

AmingWu commented Jun 1, 2024

Zhaooyy commented Jul 3, 2024

[Bug] RuntimeError: CUDA out of memory. Tried to allocate 1048475.67 GiB #52

[Bug] RuntimeError: CUDA out of memory. Tried to allocate 1048475.67 GiB #52

Comments

lin199811 commented May 15, 2024

Prerequisite

Task

Branch

Environment

Reproduces the problem - code sample

Reproduces the problem - command or script

Reproduces the problem - error message

Additional information

lin199811 commented May 15, 2024

Tai-Wang commented May 16, 2024

lin199811 commented May 19, 2024

AmingWu commented Jun 1, 2024

Zhaooyy commented Jul 3, 2024