You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In another cell use the same code but change the loss to 'mse'
Run the code using GPU CUDA
Expectation:
Keras runs the second code without restarting python
Actual:
Cannot run the second code
System: python-3.10.16 Keras 3.8 Torch 2.3.1 Cuda 12.4
I am building a GUI component that users can build custom architecture, while doing some random test I found that:
The following code (Code 1) causes assertion failed with torch backend. Tensorflow on the other hand completes this gracefully. But what is more troublesome is that the torch backend also fails to recover until python restart which is detrimental for interactive environments such as ipykernel and python-based IDEs.
An equivalent code (Code 2) implemented in Torch has a tendency to fail as well, but Torch is able to recover so you can re-run it without restarting the whole python processes.
{
"name": "RuntimeError",
"message": "CUDA error: device-side assert triggeredCUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.For debugging consider passing CUDA_LAUNCH_BLOCKING=1.Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.",
"stack": "---------------------------------------------------------------------------RuntimeError Traceback (most recent call last)Cell In[3], line 15 11 model = Sequential() 13 model.add(Input(shape=(1,)))---> 15 model.add(Dense(1, activation='relu')) 17 model.compile(optimizer='sgd', loss='binary_crossentropy') 19 model.fit(x_values, y_values, epochs=10, batch_size=1)File ~/python3.10/site-packages/keras/src/models/sequential.py:122, in Sequential.add(self, layer, rebuild) 120 self._layers.append(layer) 121 if rebuild:--> 122 self._maybe_rebuild() 123 else: 124 self.built = FalseFile ~/python3.10/site-packages/keras/src/models/sequential.py:141, in Sequential._maybe_rebuild(self) 139 if isinstance(self._layers[0], InputLayer) and len(self._layers) > 1: 140 input_shape = self._layers[0].batch_shape--> 141 self.build(input_shape) 142 elif hasattr(self._layers[0], \"input_shape\") and len(self._layers) > 1: 143 # We can build the Sequential model if the first layer has the 144 # `input_shape` property. This is most commonly found in Functional 145 # model. 146 input_shape = self._layers[0].input_shapeFile ~/python3.10/site-packages/keras/src/layers/layer.py:228, in Layer.__new__.<locals>.build_wrapper(*args, **kwargs) 226 with obj._open_name_scope(): 227 obj._path = current_path()--> 228 original_build_method(*args, **kwargs) 229 # Record build config. 230 signature = inspect.signature(original_build_method)File ~/python3.10/site-packages/keras/src/models/sequential.py:187, in Sequential.build(self, input_shape) 185 for layer in self._layers[1:]: 186 try:--> 187 x = layer(x) 188 except NotImplementedError: 189 # Can happen if shape inference is not implemented. 190 # TODO: consider reverting inbound nodes on layers processed. 191 returnFile ~/python3.10/site-packages/keras/src/utils/traceback_utils.py:122, in filter_traceback.<locals>.error_handler(*args, **kwargs) 119 filtered_tb = _process_traceback_frames(e.__traceback__) 120 # To get the full stack trace, call: 121 # `keras.config.disable_traceback_filtering()`--> 122 raise e.with_traceback(filtered_tb) from None 123 finally: 124 del filtered_tbFile ~/python3.10/site-packages/torch/_dynamo/eval_frame.py:451, in _TorchDynamoContext.__call__.<locals>._fn(*args, **kwargs) 449 prior = set_eval_frame(callback) 450 try:--> 451 return fn(*args, **kwargs) 452 finally: 453 set_eval_frame(prior)RuntimeError: CUDA error: device-side assert triggeredCUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.For debugging consider passing CUDA_LAUNCH_BLOCKING=1.Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions."
}
The text was updated successfully, but these errors were encountered:
The reason for the error is that Torch expects targets to lie in the range [0, 1] (which makes sense because we are using binary cross entropy, so target should lie between 0 and 1). What is your use-case exactly, i.e., why are you looking to set targets > 1.?
jobs-git
changed the title
Torch backend fails to recover after assertion failed
Torch-cuda backend fails to recover after assertion failed
Feb 21, 2025
Reproduction step:
Expectation:
Keras runs the second code without restarting python
Actual:
Cannot run the second code
System: python-3.10.16 Keras 3.8 Torch 2.3.1 Cuda 12.4
I am building a GUI component that users can build custom architecture, while doing some random test I found that:
The following code (Code 1) causes assertion failed with torch backend. Tensorflow on the other hand completes this gracefully. But what is more troublesome is that the torch backend also fails to recover until python restart which is detrimental for interactive environments such as ipykernel and python-based IDEs.
An equivalent code (Code 2) implemented in Torch has a tendency to fail as well, but Torch is able to recover so you can re-run it without restarting the whole python processes.
Code 1:
Code 2:
Error output:
The text was updated successfully, but these errors were encountered: