You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I met a strange problem when setting the scale of the image from your [1000,600] to [1920, 1080] and get the error report below:
Traceback (most recent call last):
File "dff_rfcn/train_end2end.py", line 179, in
main()
File "dff_rfcn/train_end2end.py", line 176, in main
config.TRAIN.begin_epoch, config.TRAIN.end_epoch, config.TRAIN.lr, config.TRAIN.lr_step)
File "dff_rfcn/train_end2end.py", line 169, in train_net
arg_params=arg_params, aux_params=aux_params, begin_epoch=begin_epoch, num_epoch=end_epoch)
File "/Deep-Feature-Flow/dff_rfcn/core/module.py", line 981, in fit
self.update_metric(eval_metric, data_batch.label)
File "/Deep-Feature-Flow/dff_rfcn/core/module.py", line 1073, in update_metric
self._curr_module.update_metric(eval_metric, labels)
File "/Deep-Feature-Flow/dff_rfcn/core/module.py", line 674, in update_metric
self._exec_group.update_metric(eval_metric, labels)
File "/Deep-Feature-Flow/dff_rfcn/core/DataParallelExecutorGroup.py", line 481, in update_metric
eval_metric.update(labels, texec.outputs)
File "/usr/local/lib/python2.7/dist-packages/mxnet/metric.py", line 318, in update
metric.update(labels, preds)
File "/Deep-Feature-Flow/dff_rfcn/core/metric.py", line 51, in update
pred_label = mx.ndarray.argmax_channel(pred).asnumpy().astype('int32')
File "/usr/local/lib/python2.7/dist-packages/mxnet/ndarray/ndarray.py", line 1980, in asnumpy
ctypes.c_size_t(data.size)))
File "/usr/local/lib/python2.7/dist-packages/mxnet/base.py", line 252, in check_call
raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: [01:12:23] /work/mxnet/3rdparty/mshadow/mshadow/././././cuda/tensor_gpu-inl.cuh:58: too large launch parameter: AddTakeGrad[130560,1], [64,1,1]
I tried to locate the problem and found it happened in /Deep-Feature-Flow/dff_rfcn/core/metric.py line46
pred_label = mx.ndarray.argmax_channel(pred).asnumpy().astype('int32')
if you delete asnumpy() then the problem get solved but the whole program will run much slower and many other codes need to be changed. Would you mind telling me why this error happened and if there are other methods to fix the problem?
Thanks for your help!
The text was updated successfully, but these errors were encountered:
Hi, I met a strange problem when setting the scale of the image from your [1000,600] to [1920, 1080] and get the error report below:
Traceback (most recent call last):
File "dff_rfcn/train_end2end.py", line 179, in
main()
File "dff_rfcn/train_end2end.py", line 176, in main
config.TRAIN.begin_epoch, config.TRAIN.end_epoch, config.TRAIN.lr, config.TRAIN.lr_step)
File "dff_rfcn/train_end2end.py", line 169, in train_net
arg_params=arg_params, aux_params=aux_params, begin_epoch=begin_epoch, num_epoch=end_epoch)
File "/Deep-Feature-Flow/dff_rfcn/core/module.py", line 981, in fit
self.update_metric(eval_metric, data_batch.label)
File "/Deep-Feature-Flow/dff_rfcn/core/module.py", line 1073, in update_metric
self._curr_module.update_metric(eval_metric, labels)
File "/Deep-Feature-Flow/dff_rfcn/core/module.py", line 674, in update_metric
self._exec_group.update_metric(eval_metric, labels)
File "/Deep-Feature-Flow/dff_rfcn/core/DataParallelExecutorGroup.py", line 481, in update_metric
eval_metric.update(labels, texec.outputs)
File "/usr/local/lib/python2.7/dist-packages/mxnet/metric.py", line 318, in update
metric.update(labels, preds)
File "/Deep-Feature-Flow/dff_rfcn/core/metric.py", line 51, in update
pred_label = mx.ndarray.argmax_channel(pred).asnumpy().astype('int32')
File "/usr/local/lib/python2.7/dist-packages/mxnet/ndarray/ndarray.py", line 1980, in asnumpy
ctypes.c_size_t(data.size)))
File "/usr/local/lib/python2.7/dist-packages/mxnet/base.py", line 252, in check_call
raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: [01:12:23] /work/mxnet/3rdparty/mshadow/mshadow/././././cuda/tensor_gpu-inl.cuh:58: too large launch parameter: AddTakeGrad[130560,1], [64,1,1]
Stack trace returned 10 entries:
[bt] (0) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x4114ba) [0x7f89e3d464ba]
[bt] (1) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x411ad1) [0x7f89e3d46ad1]
[bt] (2) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x4b7eddb) [0x7f89e84b3ddb]
[bt] (3) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x4e12eec) [0x7f89e8747eec]
[bt] (4) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x4e132cf) [0x7f89e87482cf]
[bt] (5) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x4f92da4) [0x7f89e88c7da4]
[bt] (6) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x2ceb179) [0x7f89e6620179]
[bt] (7) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x2cf1e67) [0x7f89e6626e67]
[bt] (8) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x2cd01c4) [0x7f89e66051c4]
[bt] (9) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x2cd44b3) [0x7f89e66094b3]
I tried to locate the problem and found it happened in /Deep-Feature-Flow/dff_rfcn/core/metric.py line46
pred_label = mx.ndarray.argmax_channel(pred).asnumpy().astype('int32')
if you delete asnumpy() then the problem get solved but the whole program will run much slower and many other codes need to be changed. Would you mind telling me why this error happened and if there are other methods to fix the problem?
Thanks for your help!
The text was updated successfully, but these errors were encountered: