Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

too large launch parameter: AddTakeGrad[130560,1], [64,1,1] #86

Open
KevinQian97 opened this issue Jul 7, 2019 · 0 comments
Open

too large launch parameter: AddTakeGrad[130560,1], [64,1,1] #86

KevinQian97 opened this issue Jul 7, 2019 · 0 comments

Comments

@KevinQian97
Copy link

Hi, I met a strange problem when setting the scale of the image from your [1000,600] to [1920, 1080] and get the error report below:

Traceback (most recent call last):
File "dff_rfcn/train_end2end.py", line 179, in
main()
File "dff_rfcn/train_end2end.py", line 176, in main
config.TRAIN.begin_epoch, config.TRAIN.end_epoch, config.TRAIN.lr, config.TRAIN.lr_step)
File "dff_rfcn/train_end2end.py", line 169, in train_net
arg_params=arg_params, aux_params=aux_params, begin_epoch=begin_epoch, num_epoch=end_epoch)
File "/Deep-Feature-Flow/dff_rfcn/core/module.py", line 981, in fit
self.update_metric(eval_metric, data_batch.label)
File "/Deep-Feature-Flow/dff_rfcn/core/module.py", line 1073, in update_metric
self._curr_module.update_metric(eval_metric, labels)
File "/Deep-Feature-Flow/dff_rfcn/core/module.py", line 674, in update_metric
self._exec_group.update_metric(eval_metric, labels)
File "/Deep-Feature-Flow/dff_rfcn/core/DataParallelExecutorGroup.py", line 481, in update_metric
eval_metric.update(labels, texec.outputs)
File "/usr/local/lib/python2.7/dist-packages/mxnet/metric.py", line 318, in update
metric.update(labels, preds)
File "/Deep-Feature-Flow/dff_rfcn/core/metric.py", line 51, in update
pred_label = mx.ndarray.argmax_channel(pred).asnumpy().astype('int32')
File "/usr/local/lib/python2.7/dist-packages/mxnet/ndarray/ndarray.py", line 1980, in asnumpy
ctypes.c_size_t(data.size)))
File "/usr/local/lib/python2.7/dist-packages/mxnet/base.py", line 252, in check_call
raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: [01:12:23] /work/mxnet/3rdparty/mshadow/mshadow/././././cuda/tensor_gpu-inl.cuh:58: too large launch parameter: AddTakeGrad[130560,1], [64,1,1]

Stack trace returned 10 entries:
[bt] (0) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x4114ba) [0x7f89e3d464ba]
[bt] (1) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x411ad1) [0x7f89e3d46ad1]
[bt] (2) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x4b7eddb) [0x7f89e84b3ddb]
[bt] (3) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x4e12eec) [0x7f89e8747eec]
[bt] (4) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x4e132cf) [0x7f89e87482cf]
[bt] (5) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x4f92da4) [0x7f89e88c7da4]
[bt] (6) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x2ceb179) [0x7f89e6620179]
[bt] (7) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x2cf1e67) [0x7f89e6626e67]
[bt] (8) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x2cd01c4) [0x7f89e66051c4]
[bt] (9) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x2cd44b3) [0x7f89e66094b3]

I tried to locate the problem and found it happened in /Deep-Feature-Flow/dff_rfcn/core/metric.py line46
pred_label = mx.ndarray.argmax_channel(pred).asnumpy().astype('int32')
if you delete asnumpy() then the problem get solved but the whole program will run much slower and many other codes need to be changed. Would you mind telling me why this error happened and if there are other methods to fix the problem?

Thanks for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant