Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory error with T-DPSOM #3

Open
FrancescoDelBuono opened this issue Feb 18, 2022 · 0 comments
Open

Memory error with T-DPSOM #3

FrancescoDelBuono opened this issue Feb 18, 2022 · 0 comments

Comments

@FrancescoDelBuono
Copy link

Hi,
I'm trying to use T-DPSOM code with another multivariate time series dataset, where each series has a length of 144 and 9 channels.

I have modified the methods "inputs" and "x" in "TempDPSOM_model.py" in the following way:

@lazy_scope
def inputs(self):
   x = tf.placeholder(tf.float32, shape=[None, self.input_size, self.input_channels], name="x")
   return x

@lazy_scope
def x(self):
   x = tf.reshape(self.inputs, [-1, self.input_channels])
   return x

and the initialization of the model by passing the right values for "input_size" and "input_channels"

Actually the code generates the following error before the start of the training:

2022-02-18 13:55:36.786045: I tensorflow/core/common_runtime/bfc_allocator.cc:1074] 11 Chunks of size 86400000 totalling 906.37MiB
2022-02-18 13:55:36.786143: I tensorflow/core/common_runtime/bfc_allocator.cc:1074] 1 Chunks of size 156301824 totalling 149.06MiB
2022-02-18 13:55:36.786242: I tensorflow/core/common_runtime/bfc_allocator.cc:1074] 3 Chunks of size 345600000 totalling 988.77MiB
2022-02-18 13:55:36.786339: I tensorflow/core/common_runtime/bfc_allocator.cc:1078] Sum Total of in-use chunks: 2.04GiB
2022-02-18 13:55:36.786447: I tensorflow/core/common_runtime/bfc_allocator.cc:1080] total_region_allocated_bytes_: 2258003456 memory_limit_: 2258003559 available bytes: 103 curr_region_allocation_bytes_: 4516007424
2022-02-18 13:55:36.786613: I tensorflow/core/common_runtime/bfc_allocator.cc:1086] Stats: 
Limit:                      2258003559
InUse:                      2188879360
MaxInUse:                   2188879616
NumAllocs:                         181
MaxAllocSize:                345600000
Reserved:                            0
PeakReserved:                        0
LargestFreeBlock:                    0

2022-02-18 13:55:36.799434: W tensorflow/core/common_runtime/bfc_allocator.cc:474] ******************__*****************************************************************************xxx
2022-02-18 13:55:36.799597: W tensorflow/core/framework/op_kernel.cc:1733] RESOURCE_EXHAUSTED: failed to allocate memory
ERROR - hyperopt - Failed after 0:00:38!
Traceback (most recent calls WITHOUT Sacred internals):
  File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\tensorflow\python\client\session.py", line 1377, in _do_call
    return fn(*args)
  File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\tensorflow\python\client\session.py", line 1360, in _run_fn
    return self._call_tf_sessionrun(options, feed_dict, fetch_list,
  File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\tensorflow\python\client\session.py", line 1453, in _call_tf_sessionrun
    return tf_session.TF_SessionRun_wrapper(self._session, options, feed_dict,
tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found.
  (0) RESOURCE_EXHAUSTED: OOM when allocating tensor with shape[43200,2000] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
	 [[{{node optimize/gradients_2/zeros_10}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.

	 [[optimize/Adam_2/update/_92]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.

  (1) RESOURCE_EXHAUSTED: OOM when allocating tensor with shape[43200,2000] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
	 [[{{node optimize/gradients_2/zeros_10}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.

0 successful operations.
0 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent calls WITHOUT Sacred internals):
  File "C:/Users/delbu/Projects/PythonProjects/DPSOM/dpsom/TempDPSOM.py", line 660, in main
    results = train_model(model, data_train, data_val, endpoints_total_val, lr_val, prior_val)
  File "C:/Users/delbu/Projects/PythonProjects/DPSOM/dpsom/TempDPSOM.py", line 254, in train_model
    train_step_ae.run(feed_dict=f_dic)
  File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\tensorflow\python\framework\ops.py", line 2755, in run
    _run_using_default_session(self, feed_dict, self.graph, session)
  File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\tensorflow\python\framework\ops.py", line 5804, in _run_using_default_session
    session.run(operation, feed_dict)
  File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\tensorflow\python\client\session.py", line 967, in run
    result = self._run(None, fetches, feed_dict, options_ptr,
  File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\tensorflow\python\client\session.py", line 1190, in _run
    results = self._do_run(handle, final_targets, final_fetches,
  File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\tensorflow\python\client\session.py", line 1370, in _do_run
    return self._do_call(_run_fn, feeds, fetches, targets, options,
  File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\tensorflow\python\client\session.py", line 1396, in _do_call
    raise type(e)(node_def, op, message)  # pylint: disable=no-value-for-parameter
tensorflow.python.framework.errors_impl.ResourceExhaustedError: Graph execution error:

Detected at node 'optimize/gradients_2/zeros_10' defined at (most recent call last):
    File "C:/Users/delbu/Projects/PythonProjects/DPSOM/dpsom/TempDPSOM.py", line 638, in <module>
      def main(input_size, latent_dim, som_dim, learning_rate, decay_factor, alpha, beta, gamma, theta, ex_name, kappa, prior,
    File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\sacred\experiment.py", line 190, in automain
      self.run_commandline()
    File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\sacred\experiment.py", line 312, in run_commandline
      return self.run(
    File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\sacred\experiment.py", line 276, in run
      run()
    File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\sacred\run.py", line 238, in __call__
      self.result = self.main_function(*args)
    File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\sacred\config\captured_function.py", line 42, in captured_function
      result = wrapped(*args, **kwargs)
    File "C:/Users/delbu/Projects/PythonProjects/DPSOM/dpsom/TempDPSOM.py", line 652, in main
      model = TDPSOM(input_size=input_size, latent_dim=latent_dim, som_dim=som_dim, learning_rate=lr_val,
    File "C:\Users\delbu\Projects\PythonProjects\DPSOM\dpsom\TempDPSOM_model.py", line 114, in __init__
      self.optimize
    File "C:\Users\delbu\Projects\PythonProjects\DPSOM\dpsom\TempDPSOM_model.py", line 39, in decorator
      setattr(self, attribute, function(self))
    File "C:\Users\delbu\Projects\PythonProjects\DPSOM\dpsom\TempDPSOM_model.py", line 463, in optimize
      train_step_ae = optimizer.minimize(self.loss_reconstruction_ze, global_step=self.global_step)
Node: 'optimize/gradients_2/zeros_10'
Detected at node 'optimize/gradients_2/zeros_10' defined at (most recent call last):
    File "C:/Users/delbu/Projects/PythonProjects/DPSOM/dpsom/TempDPSOM.py", line 638, in <module>
      def main(input_size, latent_dim, som_dim, learning_rate, decay_factor, alpha, beta, gamma, theta, ex_name, kappa, prior,
    File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\sacred\experiment.py", line 190, in automain
      self.run_commandline()
    File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\sacred\experiment.py", line 312, in run_commandline
      return self.run(
    File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\sacred\experiment.py", line 276, in run
      run()
    File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\sacred\run.py", line 238, in __call__
      self.result = self.main_function(*args)
    File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\sacred\config\captured_function.py", line 42, in captured_function
      result = wrapped(*args, **kwargs)
    File "C:/Users/delbu/Projects/PythonProjects/DPSOM/dpsom/TempDPSOM.py", line 652, in main
      model = TDPSOM(input_size=input_size, latent_dim=latent_dim, som_dim=som_dim, learning_rate=lr_val,
    File "C:\Users\delbu\Projects\PythonProjects\DPSOM\dpsom\TempDPSOM_model.py", line 114, in __init__
      self.optimize
    File "C:\Users\delbu\Projects\PythonProjects\DPSOM\dpsom\TempDPSOM_model.py", line 39, in decorator
      setattr(self, attribute, function(self))
    File "C:\Users\delbu\Projects\PythonProjects\DPSOM\dpsom\TempDPSOM_model.py", line 463, in optimize
      train_step_ae = optimizer.minimize(self.loss_reconstruction_ze, global_step=self.global_step)
Node: 'optimize/gradients_2/zeros_10'
2 root error(s) found.
  (0) RESOURCE_EXHAUSTED: OOM when allocating tensor with shape[43200,2000] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
	 [[{{node optimize/gradients_2/zeros_10}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.

	 [[optimize/Adam_2/update/_92]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.

  (1) RESOURCE_EXHAUSTED: OOM when allocating tensor with shape[43200,2000] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
	 [[{{node optimize/gradients_2/zeros_10}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.

0 successful operations.
0 derived errors ignored.

Original stack trace for 'optimize/gradients_2/zeros_10':
  File "C:/Users/delbu/Projects/PythonProjects/DPSOM/dpsom/TempDPSOM.py", line 638, in <module>
    def main(input_size, latent_dim, som_dim, learning_rate, decay_factor, alpha, beta, gamma, theta, ex_name, kappa, prior,
  File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\sacred\experiment.py", line 190, in automain
    self.run_commandline()
  File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\sacred\experiment.py", line 312, in run_commandline
    return self.run(
  File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\sacred\experiment.py", line 276, in run
    run()
  File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\sacred\run.py", line 238, in __call__
    self.result = self.main_function(*args)
  File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\sacred\config\captured_function.py", line 42, in captured_function
    result = wrapped(*args, **kwargs)
  File "C:/Users/delbu/Projects/PythonProjects/DPSOM/dpsom/TempDPSOM.py", line 652, in main
    model = TDPSOM(input_size=input_size, latent_dim=latent_dim, som_dim=som_dim, learning_rate=lr_val,
  File "C:\Users\delbu\Projects\PythonProjects\DPSOM\dpsom\TempDPSOM_model.py", line 114, in __init__
    self.optimize
  File "C:\Users\delbu\Projects\PythonProjects\DPSOM\dpsom\TempDPSOM_model.py", line 39, in decorator
    setattr(self, attribute, function(self))
  File "C:\Users\delbu\Projects\PythonProjects\DPSOM\dpsom\TempDPSOM_model.py", line 463, in optimize
    train_step_ae = optimizer.minimize(self.loss_reconstruction_ze, global_step=self.global_step)
  File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\tensorflow\python\training\optimizer.py", line 477, in minimize
    grads_and_vars = self.compute_gradients(
  File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\tensorflow\python\training\optimizer.py", line 603, in compute_gradients
    grads = gradients.gradients(
  File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\tensorflow\python\ops\gradients_impl.py", line 165, in gradients
    return gradients_util._GradientsHelper(
  File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\tensorflow\python\ops\gradients_util.py", line 671, in _GradientsHelper
    out_grads[i] = control_flow_state.ZerosLike(op, i)
  File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\tensorflow\python\ops\control_flow_state.py", line 835, in ZerosLike
    return _ZerosLikeV1(op, index)
  File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\tensorflow\python\ops\control_flow_state.py", line 801, in _ZerosLikeV1
    return array_ops.zeros(zeros_shape, dtype=val.dtype)
  File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\tensorflow\python\util\traceback_utils.py", line 150, in error_handler
    return fn(*args, **kwargs)
  File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\tensorflow\python\util\dispatch.py", line 1082, in op_dispatch_handler
    return dispatch_target(*args, **kwargs)
  File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\tensorflow\python\ops\array_ops.py", line 2927, in wrapped
    tensor = fun(*args, **kwargs)
  File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\tensorflow\python\ops\array_ops.py", line 2988, in zeros
    output = fill(shape, constant(zero, dtype=dtype), name=name)
  File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\tensorflow\python\util\traceback_utils.py", line 150, in error_handler
    return fn(*args, **kwargs)
  File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\tensorflow\python\util\dispatch.py", line 1082, in op_dispatch_handler
    return dispatch_target(*args, **kwargs)
  File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\tensorflow\python\ops\array_ops.py", line 238, in fill
    result = gen_array_ops.fill(dims, value, name=name)
  File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\tensorflow\python\ops\gen_array_ops.py", line 3508, in fill
    _, _, _op, _outputs = _op_def_library._apply_op_helper(
  File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 740, in _apply_op_helper
    op = g._create_op_internal(op_type_name, inputs, dtypes=None,
  File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\tensorflow\python\framework\ops.py", line 3776, in _create_op_internal
    ret = Operation(
  File "C:\Users\delbu\Projects\PythonProjects\DPSOM\venv\lib\site-packages\tensorflow\python\framework\ops.py", line 2175, in __init__
    self._traceback = tf_stack.extract_stack_for_node(self._c_op)


  0%|          | 0/250 [00:31<?, ?it/s]

How can I solve it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant