Filelist-log file to yaml and with checksum, to be used with esm_tests for the new filedicts.py#973
Filelist-log file to yaml and with checksum, to be used with esm_tests for the new filedicts.py#973mandresm wants to merge 3 commits intosprint/filedicts/mainfrom
Conversation
…s to config (to be available for the log_used_files function later, and add a unit test for the future log_file_movements method
pgierz
left a comment
There was a problem hiding this comment.
I had some ideas, see below. I'm not going to push yes or no on the review yet, I think there are still some points we should discuss.
| config[sim_file_obj["component"]]["files"][sim_file_id]["intermediate"] = None | ||
| config[sim_file_obj["component"]]["files"][sim_file_id]["dest"] = ( | ||
| sim_file_obj.paths[dest] | ||
| ) |
There was a problem hiding this comment.
I'm not 100% sure about this, it's a design thing...
what do you think of the following. At this point:
config[sim_file_obj["component"]]["files"][sim_file_id]we have the SimulationFile. Should it know about the current phase of movement it is in, and potentially, if it needs an intermediate location?
I would say no, that job belongs somewhere else. Therefore that info also belongs somewhere else...here you are injecting extra info in the SimulationFile's dictionary, right? ...????
Maybe I am overthinking it.
src/esm_runscripts/filedicts.py
Outdated
| if config["general"].get("verbose", False): | ||
| logger.info("\n::: Logging used files") | ||
|
|
||
| filetypes = config["general"]["relevant_filetypes"] |
There was a problem hiding this comment.
To be discussed: If we use the Enum thing, we should use it everywhere
There was a problem hiding this comment.
relevant_filtypes is a list of files that is defined depending on the step tidy, compute, preprocess, ... (grep -r relevant_filtypes src/esm_runscripts). I suggest we don't touch it for consistency with previous versions. Also it seems natural to me that the relevant files for the given ESM-Tools phase are defined in the phase itself and not in filedicts.py.
| return config | ||
|
|
||
|
|
||
| def log_used_files(config: ConfigSetup) -> ConfigSetup: |
There was a problem hiding this comment.
This looks great! The only thing I don't like about this function is that does two things: 1) Gathers all the model files and then 2) writes them to a location. I need number 1 in a separate place for my unknown files, so it would be nice to break this down into smaller pieces. I don't want to need to care (at this point at least) about the recipe order yet, or we will get into the situation that we rely on behaviour of one step of the recipe to even make the next one possible (and yes, I know we cannot avoid that entirely)
|
This PR has been inactive for the last 365 days. It will now be marked as stale. Please close this PR if no longer needed. |
|
Hey @mandresm: want to re-activate work on FileDicts after the summer break? (very very slowly, on the side as a hobby project?) |
|
This PR has been inactive for the last 365 days. It will now be marked as stale. Please close this PR if no longer needed. |
Draft pull request for the new
log_used_filesmethod of thefiledicts.TODO
kindortypeproperty for the files to define whether they areinput,output..._gather_file_movements(src,dest,kindvssource,target,typein the log)