Skip to content

Job Not Found Error #93

@davramov

Description

@davramov

I am encountering an issue where submit_job() raises a Job Not Found error, even though job IDs are generated, successfully scheduled, and run on Perlmutter. This complicates an automated workflow I am developing where we need to wait for the results of one task before starting the next one.

From my script job_controller.py:

try:
  logger.info("Submitting reconstruction job script to Perlmutter.")
  job = self.client.perlmutter.submit_job(job_script)
except Exception as e:
  logger.error(f"Failed to submit or complete reconstruction job: {e}")

Error log from the exception:

13:11:27.498 | INFO    | orchestration.flows.bl832.job_controller - Submitting reconstruction job script to Perlmutter.
13:12:03.894 | ERROR   | orchestration.flows.bl832.job_controller - Failed to submit or complete reconstruction job: Job not found: 33821565

It seems like this could arise from one of the SfApiErrors raised by the submit_job() function defined in sfapi_client/_sync/compute.py

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions