Skip to content

Filelist-log file to yaml and with checksum, to be used with esm_tests for the new filedicts.py#973

Draft
mandresm wants to merge 3 commits intosprint/filedicts/mainfrom
sprint/filedicts/file_log
Draft

Filelist-log file to yaml and with checksum, to be used with esm_tests for the new filedicts.py#973
mandresm wants to merge 3 commits intosprint/filedicts/mainfrom
sprint/filedicts/file_log

Conversation

@mandresm
Copy link
Contributor

@mandresm mandresm commented Jun 7, 2023

Draft pull request for the new log_used_files method of the filedicts.

TODO

  • Decide whether we use kind or type property for the files to define whether they are input, output...
  • Standardize the variables used in the lo with the ones used at the _gather_file_movements (src, dest, kind vs source, target, type in the log)
  • Cleanup the test
  • Add the intermediate step

…s to config (to be available for the log_used_files function later, and add a unit test for the future log_file_movements method
@mandresm mandresm requested review from nwieters and pgierz June 7, 2023 14:48
@mandresm mandresm self-assigned this Jun 7, 2023
Copy link
Member

@pgierz pgierz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had some ideas, see below. I'm not going to push yes or no on the review yet, I think there are still some points we should discuss.

Comment on lines +1245 to +1248
config[sim_file_obj["component"]]["files"][sim_file_id]["intermediate"] = None
config[sim_file_obj["component"]]["files"][sim_file_id]["dest"] = (
sim_file_obj.paths[dest]
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% sure about this, it's a design thing...

what do you think of the following. At this point:

config[sim_file_obj["component"]]["files"][sim_file_id]

we have the SimulationFile. Should it know about the current phase of movement it is in, and potentially, if it needs an intermediate location?

I would say no, that job belongs somewhere else. Therefore that info also belongs somewhere else...here you are injecting extra info in the SimulationFile's dictionary, right? ...????

Maybe I am overthinking it.

if config["general"].get("verbose", False):
logger.info("\n::: Logging used files")

filetypes = config["general"]["relevant_filetypes"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be discussed: If we use the Enum thing, we should use it everywhere

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

relevant_filtypes is a list of files that is defined depending on the step tidy, compute, preprocess, ... (grep -r relevant_filtypes src/esm_runscripts). I suggest we don't touch it for consistency with previous versions. Also it seems natural to me that the relevant files for the given ESM-Tools phase are defined in the phase itself and not in filedicts.py.

return config


def log_used_files(config: ConfigSetup) -> ConfigSetup:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! The only thing I don't like about this function is that does two things: 1) Gathers all the model files and then 2) writes them to a location. I need number 1 in a separate place for my unknown files, so it would be nice to break this down into smaller pieces. I don't want to need to care (at this point at least) about the recipe order yet, or we will get into the situation that we rely on behaviour of one step of the recipe to even make the next one possible (and yes, I know we cannot avoid that entirely)

@github-actions
Copy link

This PR has been inactive for the last 365 days. It will now be marked as stale. Please close this PR if no longer needed.

@github-actions github-actions bot added the Stale label Jun 21, 2024
@pgierz
Copy link
Member

pgierz commented Jun 24, 2024

Hey @mandresm: want to re-activate work on FileDicts after the summer break? (very very slowly, on the side as a hobby project?)

@github-actions github-actions bot removed the Stale label Jun 24, 2024
@github-actions
Copy link

This PR has been inactive for the last 365 days. It will now be marked as stale. Please close this PR if no longer needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants