Skip to content

os.makedirs race condition #443

@turmon

Description

@turmon

Describe the bug
A race condition when creating a cache directory causes an Exception

To Reproduce

Traceback (most recent call last):
  File "/gpfs/scratch/exo-yield/Sandbox/hwo/Local/sandbox_driver.py", line 380, in <module>
    message = main(args, xspecs)
              ^^^^^^^^^^^^^^^^^^
  File "/gpfs/scratch/exo-yield/Sandbox/hwo/Local/sandbox_driver.py", line 269, in main
    sim = EXOSIMS.MissionSim.MissionSim(args.scriptfile, **xspecs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/gpfs/scratch/exo-yield/Sandbox/Python-venvs/exosims-202509-YUrun-mjt/EXOSIMS/EXOSIMS/MissionSim.py", line 167, in __init__
    self.SurveySimulation = get_module(
                            ^^^^^^^^^^^
  File "/gpfs/scratch/exo-yield/Sandbox/Python-venvs/exosims-202509-YUrun-mjt/EXOSIMS/EXOSIMS/SurveySimulation/coroOnlyScheduler.py", line 64, in __init__
    SurveySimulation.__init__(self, **specs)
  File "/gpfs/scratch/exo-yield/Sandbox/Python-venvs/exosims-202509-YUrun-mjt/EXOSIMS/EXOSIMS/Prototypes/SurveySimulation.py", line 262, in __init__
    self.cachedir = get_cache_dir(cachedir)
                    ^^^^^^^^^^^^^^^^^^^^^^^
  File "/gpfs/scratch/exo-yield/Sandbox/Python-venvs/exosims-202509-YUrun-mjt/EXOSIMS/EXOSIMS/util/get_dirs.py", line 203, in get_cache_dir
    cache_dir = get_exosims_dir("cache", cachedir)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/gpfs/scratch/exo-yield/Sandbox/Python-venvs/exosims-202509-YUrun-mjt/EXOSIMS/EXOSIMS/util/get_dirs.py", line 127, in get_exosims_dir
    os.makedirs(indir)
  File "<frozen os>", line 225, in makedirs
FileExistsError: [Errno 17] File exists: 'sims/Yokohama_Ultimate.fam/Trials.fam/TimeMultiplyAdd-v4.exp/Caches/s_YU_VIS_D7.42_iwa029_C2.30e-10_nf1.01e-12_att0.216_sn0.440'

==== The relevant lines from get_exosims_dir are:

   124          # if it doesn't exist, try creating it
   125          if not (os.path.isdir(indir)):
   126              try:
   127                  os.makedirs(indir)
   128              except PermissionError:
   129                  print("Cannot create directory: {indir}.")
   130  

This happens because the directory was created between the time os.path.isdir() is run, and os.makedirs() is run
(TOCTOU)

I'd suggest either:

  • Also catching "FileExistsError" in addition to PermissionError.
  • Using the exist_ok=True optional argument to os.makedirs()

Expected behavior
No exception.

JSON script:

EXOSIMS version:

  • v3.6.5

Additional context
On the JPL supercomputer, writing to a device using a GPFS storage system. There is only one process writing to or creating this directory (because the exception happens while a canary process is generating cache files). Nevertheless, it does happen (exceptions on 3 runs out of 768).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions