Error with PersistentDataset in pytorch distributed setting #2079
jpcenteno80
started this conversation in
General
Replies: 2 comments
-
Hi @yiheng-wang-nv , Could you please help verify Thanks. |
Beta Was this translation helpful? Give feedback.
0 replies
-
Hi @jpcenteno80 , I think this PR can fix your issue: #2086. Thanks. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I am using
.nrrd
files and was getting errors while usingCacheDataset
(probably related to that note under theITKReader
class about the process not being thread safe - I even updated ITK to version 5.2.0). So I switched toPersistentDataset
and it worked great on single GPU. However, I am following the setup in the tutorial for thedynunet_pipeline
using 1 node with 8 GPUs and I am getting an error ofNo such file or directory: 'persistent_data_cache/9fd3bad5a1225c76284263dc0bcbb196.temp_write_cache'
. This is the temp file generated while the final.pt
file is being created. I notice that the GPUs are already processing while the persistent dataset is being written to disk. So I was wondering how I could make sure the persistent dataset is first created before letting the GPUs start their work, without diverging too far from the template in thedynunet_pipeline
tutorial.Beta Was this translation helpful? Give feedback.
All reactions