We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Have you ever meet such problems when you run the training code? It happened after the training process goes for a few iterations
*** Error in `/opt/conda/bin/python': double free or corruption (fasttop): 0x00007f0018011960 *** ======= Backtrace: ========= /lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7f026f6987e5] /lib/x86_64-linux-gnu/libc.so.6(+0x8037a)[0x7f026f6a137a] /lib/x86_64-linux-gnu/libc.so.6(cfree+0x4c)[0x7f026f6a553c] /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch.so(+0x3cead6e)[0x7f01f8755d6e] /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch.so(+0x3ceae19)[0x7f01f8755e19] /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch.so(+0x3ceaf95)[0x7f01f8755f95] /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch.so(_ZN5torch8autograd6Engine17evaluate_functionERNS0_8NodeTaskE+0x1210)[0x7f01f874d6b0] /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch.so(_ZN5torch8autograd6Engine11thread_mainEPNS0_9GraphTaskE+0x1c4)[0x7f01f874f564] /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so(_ZN5torch8autograd6python12PythonEngine11thread_initEi+0x2a)[0x7f026b2eebca] /opt/conda/lib/python3.7/site-packages/torch/_C.cpython-37m-x86_64-linux-gnu.so(+0xf14f)[0x7f026be2d14f] /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba)[0x7f026f9f26ba] /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f026f72841d] ======= Memory map: ======== 200000000-200200000 rw-s 00000000 00:06 533 /dev/nvidiactl 200200000-200400000 ---p 00000000 00:00 0 200400000-200404000 rw-s 00000000 00:06 533 /dev/nvidiactl 200404000-200600000 ---p 00000000 00:00 0 200600000-200a00000 rw-s 00000000 00:06 533 /dev/nvidiactl 200a00000-201600000 ---p 00000000 00:00 0 201600000-201800000 rw-s 00000000 00:06 533 /dev/nvidiactl 201800000-201804000 rw-s 00000000 00:06 533 /dev/nvidiactl 201804000-201a00000 ---p 00000000 00:00 0 201a00000-201e00000 rw-s 00000000 00:06 533 /dev/nvidiactl 201e00000-201e04000 rw-s 00000000 00:06 533 /dev/nvidiactl 201e04000-202000000 ---p 00000000 00:00 0 202000000-202400000 rw-s 00000000 00:06 533 /dev/nvidiactl 202400000-202404000 rw-s 00000000 00:06 533 /dev/nvidiactl 202404000-202600000 ---p 00000000 00:00 0 202600000-202a00000 rw-s 00000000 00:06 533 /dev/nvidiactl 202a00000-202a04000 rw-s 00000000 00:06 533 /dev/nvidiactl 202a04000-202c00000 ---p 00000000 00:00 0 202c00000-203000000 rw-s 00000000 00:06 533 /dev/nvidiactl 203000000-203004000 rw-s 00000000 00:06 533 /dev/nvidiactl
The text was updated successfully, but these errors were encountered:
I never meet this kind of error before, but according to this issue pytorch/pytorch#2205, I think it is caused by the system memory constrain
Sorry, something went wrong.
@dzk9528 hello,have you resolve this problem?
No branches or pull requests
Have you ever meet such problems when you run the training code? It happened after the training process goes for a few iterations
The text was updated successfully, but these errors were encountered: