Sagemaker training smdebug/core/state_store.py FileNotFoundError #2370
Replies: 2 comments
-
Hello, I encountered a similar issue using Tensorflow estimator. I used pre-built container image with framework version = 2.2 and py_version = py37. It has trained 3 epochs and threw the error in the 4th epoch.
|
Beta Was this translation helpful? Give feedback.
-
Hi
I1121 13:42:35.783539 140148447449344 basic_session_run_hooks.py:260] loss = 0.65470797 (0.405 sec) |
Beta Was this translation helpful? Give feedback.
-
Hello,
I am trying to train a model using MXnet estimator. As soon as training starts I see following error when Sagemaker tries to upload checkpoints:
Few details about the job:
Beta Was this translation helpful? Give feedback.
All reactions