Skip to content

refine-tomogram crashes due to /tmp/ out of disk space #36

@luk8r

Description

@luk8r

Hi,
I have been trying to use DDW, the preparation and trinaing steps seem to work but whenever I get to the refinement step, I get a crash after a few seconds of runtime. After some digging, I realized that there was not enough free space on /tmp on my cluster nodes and the job crashes because of that. Unless I missed something, maybe it would be possible to add a temp-path to the config? For myself I modified the normalization.py file in line 15:
with tempfile.TemporaryDirectory() as subtomo_dir:
to
with tempfile.TemporaryDirectory(dir='.') as subtomo_dir:
for a quick workaround.

Standardizing tomogram 'warp_tiltseries/reconstruction/even/Position_1_2_10.00Apx.mrc' before extracting sub-tomograms.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /software/micromamba/envs/ddw_env/ │
│ lib/python3.10/site-packages/torch/serialization.py:629 in save              │
│                                                                              │
│    626 │                                                                     │
│    627 │   if _use_new_zipfile_serialization:                                │
│    628 │   │   with _open_zipfile_writer(f) as opened_zipfile:               │
│ ❱  629 │   │   │   _save(obj, opened_zipfile, pickle_module, pickle_protocol │
│    630 │   │   │   return                                                    │
│    631 │   else:                                                             │
│    632 │   │   with _open_file_like(f, 'wb') as opened_file:                 │
│                                                                              │
│ software/micromamba/envs/ddw_env/ │
│ lib/python3.10/site-packages/torch/serialization.py:863 in _save             │
│                                                                              │
│    860 │   │   │   storage = storage.cpu()                                   │
│    861 │   │   # Now that it is on the CPU we can directly copy it into the  │
│    862 │   │   num_bytes = storage.nbytes()                                  │
│ ❱  863 │   │   zip_file.write_record(name, storage.data_ptr(), num_bytes)    │
│    864                                                                       │
│    865                                                                       │
│    866 def load(                                                             │
╰──────────────────────────────────────────────────────────────────────────────╯
RuntimeError: [enforce fail at inline_container.cc:764] . PytorchStreamWriter 
failed writing file data/0: file write failed

During handling of the above exception, another exception occurred:

╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /software/micromamba/envs/ddw_env/ │
│ lib/python3.10/site-packages/ddw/refine_tomogram.py:153 in refine_tomogram   │
│                                                                              │
│   150 │   with torch.no_grad():                                              │
│   151 │   │   for t0_file, t1_file in zip(tomo0_files, tomo1_files):         │
│   152 │   │   │   if recompute_normalization:                                │
│ ❱ 153 │   │   │   │   loc, scale = get_avg_model_input_mean_and_std(         │
│   154 │   │   │   │   │   tomo_file=t0_file,                                 │
│   155 │   │   │   │   │   subtomo_size=subtomo_size,                         │
│   156 │   │   │   │   │   subtomo_extraction_strides=3 * [subtomo_size - sub │
│                                                                              │
│ /software/micromamba/envs/ddw_env/ │
│ lib/python3.10/site-packages/ddw/utils/normalization.py:16 in                │
│ get_avg_model_input_mean_and_std                                             │
│                                                                              │
│   13 │   Computes the average mean and standard deviation of model-input-typ │
│   14 │   """                                                                 │
│   15 │   with tempfile.TemporaryDirectory() as subtomo_dir:                  │
│ ❱ 16 │   │   prepare_data(                                                   │
│   17 │   │   │   tomo0_files=[tomo_file],                                    │
│   18 │   │   │   tomo1_files=[tomo_file],                                    │
│   19 │   │   │   mask_files=[],                                              │
│                                                                              │
│ /software/micromamba/envs/ddw_env/ │
│ lib/python3.10/site-packages/ddw/prepare_data.py:231 in prepare_data         │
│                                                                              │
│   228 │   │   fitting_ids = [k for k in range(len(subtomos0)) if k not in va │
│   229 │   │                                                                  │
│   230 │   │   for idx in sorted(fitting_ids):                                │
│ ❱ 231 │   │   │   torch.save(                                                │
│   232 │   │   │   │   subtomos0[idx].clone(), f"{fitting_subtomo_dir}/subtom │
│   233 │   │   │   )                                                          │
│   234 │   │   │   torch.save(                                                │
│                                                                              │
│ /software/micromamba/envs/ddw_env/ │
│ lib/python3.10/site-packages/torch/serialization.py:628 in save              │
│                                                                              │
│    625 │   _check_save_filelike(f)                                           │
│    626 │                                                                     │
│    627 │   if _use_new_zipfile_serialization:                                │
│ ❱  628 │   │   with _open_zipfile_writer(f) as opened_zipfile:               │
│    629 │   │   │   _save(obj, opened_zipfile, pickle_module, pickle_protocol │
│    630 │   │   │   return                                                    │
│    631 │   else:                                                             │
│                                                                              │
│ /software/micromamba/envs/ddw_env/ │
│ lib/python3.10/site-packages/torch/serialization.py:476 in __exit__          │
│                                                                              │
│    473 │   │   │   super().__init__(torch._C.PyTorchFileWriter(self.name))   │
│    474 │                                                                     │
│    475 │   def __exit__(self, *args) -> None:                                │
│ ❱  476 │   │   self.file_like.write_end_of_file()                            │
│    477 │   │   if self.file_stream is not None:                              │
│    478 │   │   │   self.file_stream.close()                                  │
│    479                                                                       │
╰──────────────────────────────────────────────────────────────────────────────╯
RuntimeError: [enforce fail at inline_container.cc:595] . unexpected pos 448 vs 
342

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions