-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] RuntimeError: CUDA out of memory. Tried to allocate 1048475.67 GiB #52
Comments
my train_dataloader is set (batch_size=1, num_workers=1) |
Do you use our officially provided data? It is really strange to allocate such huge amount of memory for the multi-view 3D detection model. |
I found this bug in MinkowskiEngine. I modify and rebuid the MinkowskiEngine code ( in src/spmm.cu coo_spmm function, I change the nnz to static_caststd::size_t(nnz). Therefore, this bug never seen again. |
@lin199811 How long does your method need to train? |
there's a lot of nnz in src/spmm.cu, should I change them all to static_caststd::size_t(nnz) or just some of them? |
Prerequisite
Task
I'm using the official example scripts/configs for the officially supported tasks/models/datasets.
Branch
main branch https://github.com/open-mmlab/mmdetection3d
Environment
System environment:
sys.platform: linux
Python: 3.8.18 (default, Sep 11 2023, 13:40:15) [GCC 11.2.0]
CUDA available: True
MUSA available: False
numpy_random_seed: 1591519926
GPU 0,1,2,3,4,5,6,7: NVIDIA GeForce RTX 3090
CUDA_HOME: /usr
NVCC: Cuda compilation tools, release 10.1, V10.1.24
GCC: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
PyTorch: 1.11.0
PyTorch compiling details: PyTorch built with:
GCC 7.3
C++ Version: 201402
Intel(R) oneAPI Math Kernel Library Version 2023.1-Product Build 20230303 for Intel(R) 64 architecture applications
Intel(R) MKL-DNN v2.5.2 (Git Hash a9302535553c73243c632ad3c4c80beec3d19a1e)
OpenMP 201511 (a.k.a. OpenMP 4.5)
LAPACK is enabled (usually provided by MKL)
NNPACK is enabled
CPU capability usage: AVX2
CUDA Runtime 11.3
NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
CuDNN 8.2
Magma 2.5.2
Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.3, CUDNN_VERSION=8.2.0, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.11.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,
TorchVision: 0.12.0
OpenCV: 4.9.0
MMEngine: 0.10.3
Runtime environment:
cudnn_benchmark: False
mp_cfg: {'mp_start_method': 'fork', 'opencv_num_threads': 0}
dist_cfg: {'backend': 'nccl'}
seed: 1591519926
Distributed launcher: none
Distributed training: False
GPU number: 1
Reproduces the problem - code sample
Traceback (most recent call last):
File "/home/fudongyi/anaconda3/envs/embodiedscan/lib/python3.8/contextlib.py", line 131, in exit
self.gen.throw(type, value, traceback)
File "/home/fudongyi/anaconda3/envs/embodiedscan/lib/python3.8/site-packages/mmengine/optim/optimizer/optimizer_wrapper.py", line 283, in optim_context
yield
File "/home/fudongyi/anaconda3/envs/embodiedscan/lib/python3.8/site-packages/mmengine/model/base_model/base_model.py", line 114, in train_step
losses = self._run_forward(data, mode='loss') # type: ignore
File "/home/fudongyi/anaconda3/envs/embodiedscan/lib/python3.8/site-packages/mmengine/model/base_model/base_model.py", line 361, in _run_forward
results = self(**data, mode=mode)
File "/home/fudongyi/anaconda3/envs/embodiedscan/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/storage1/Fudongyi/AutoDrive/code/EmbodiedScan/embodiedscan/models/detectors/sparse_featfusion_single_stage.py", line 325, in forward
return self.loss(inputs, data_samples, **kwargs)
File "/storage1/Fudongyi/AutoDrive/code/EmbodiedScan/embodiedscan/models/detectors/sparse_featfusion_single_stage.py", line 242, in loss
losses = self.bbox_head.loss(x, batch_data_samples, **kwargs)
File "/storage1/Fudongyi/AutoDrive/code/EmbodiedScan/embodiedscan/models/dense_heads/fcaf3d_head.py", line 1037, in loss
outs = self(x)
File "/home/fudongyi/anaconda3/envs/embodiedscan/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/storage1/Fudongyi/AutoDrive/code/EmbodiedScan/embodiedscan/models/dense_heads/fcaf3d_head.py", line 1010, in forward
x = self._prune(x, prune_score)
File "/storage1/Fudongyi/AutoDrive/code/EmbodiedScan/embodiedscan/models/dense_heads/fcaf3d_head.py", line 1103, in _prune
interpolated_scores = scores.features_at_coordinates(coordinates)
File "/home/fudongyi/anaconda3/envs/embodiedscan/lib/python3.8/site-packages/MinkowskiEngine-0.5.4-py3.8-linux-x86_64.egg/MinkowskiEngine/MinkowskiSparseTensor.py", line 713, in features_at_coordinates
return MinkowskiInterpolationFunction().apply(
File "/home/fudongyi/anaconda3/envs/embodiedscan/lib/python3.8/site-packages/MinkowskiEngine-0.5.4-py3.8-linux-x86_64.egg/MinkowskiEngine/MinkowskiInterpolation.py", line 52, in forward
out_feat, in_map, out_map, weights = fw_fn(
RuntimeError: CUDA out of memory. Tried to allocate 1048475.67 GiB (GPU 0; 23.70 GiB total capacity; 1.47 GiB already allocated; 20.20 GiB free; 1.50 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Reproduces the problem - command or script
python tools/train.py configs/detection/mv-det3d_8xb4_embodiedscan-3d-284class-9dof.py --work-dir=work_dirs/mv-3ddet
Reproduces the problem - error message
There show the fw_fn() function tried to allocate 1048475.67 GiB at GPU. I think it is a bug. how can I solve this problem?
Additional information
No response
The text was updated successfully, but these errors were encountered: