Filelist-log file to yaml and with checksum, to be used with esm_tests for the new filedicts.py#973

Draft

mandresm wants to merge 3 commits intosprint/filedicts/mainfrom

sprint/filedicts/file_log

Contributor

mandresm commented Jun 7, 2023 •

edited

Loading

Draft pull request for the new log_used_files method of the filedicts.

TODO

Decide whether we use kind or type property for the files to define whether they are input, output...
Standardize the variables used in the lo with the ones used at the _gather_file_movements (src, dest, kind vs source, target, type in the log)
Cleanup the test
Add the intermediate step


          fix errors in filedicts._gather_file_movements, add src and dest path…

93444af

…s to config (to be available for the log_used_files function later, and add a unit test for the future log_file_movements method

mandresm requested review from nwieters and pgierz

June 7, 2023 14:48

mandresm commented

View reviewed changes

src/esm_runscripts/filedicts.py Show resolved Hide resolved


          add log_used_files to filedicts

abcbb44

mandresm self-assigned this

pgierz reviewed

View reviewed changes

Member

pgierz left a comment

I had some ideas, see below. I'm not going to push yes or no on the review yet, I think there are still some points we should discuss.

src/esm_runscripts/filedicts.py

Comment on lines +1245 to +1248

+                          config[sim_file_obj["component"]]["files"][sim_file_id]["intermediate"] = None
+                          config[sim_file_obj["component"]]["files"][sim_file_id]["dest"] = (
+                              sim_file_obj.paths[dest]
+                          )

Member

pgierz Jun 7, 2023

I'm not 100% sure about this, it's a design thing...

what do you think of the following. At this point:

config[sim_file_obj["component"]]["files"][sim_file_id]

we have the SimulationFile. Should it know about the current phase of movement it is in, and potentially, if it needs an intermediate location?

I would say no, that job belongs somewhere else. Therefore that info also belongs somewhere else...here you are injecting extra info in the SimulationFile's dictionary, right? ...????

Maybe I am overthinking it.

src/esm_runscripts/filedicts.py Show resolved Hide resolved

src/esm_runscripts/filedicts.py Outdated

+                  if config["general"].get("verbose", False):
+                      logger.info("\n::: Logging used files")
+                  filetypes = config["general"]["relevant_filetypes"]

Member

pgierz Jun 7, 2023

To be discussed: If we use the Enum thing, we should use it everywhere

Contributor Author

mandresm Jun 21, 2023

relevant_filtypes is a list of files that is defined depending on the step tidy, compute, preprocess, ... (grep -r relevant_filtypes src/esm_runscripts). I suggest we don't touch it for consistency with previous versions. Also it seems natural to me that the relevant files for the given ESM-Tools phase are defined in the phase itself and not in filedicts.py.

src/esm_runscripts/filedicts.py

		return config


		def log_used_files(config: ConfigSetup) -> ConfigSetup:

Member

pgierz Jun 7, 2023

This looks great! The only thing I don't like about this function is that does two things: 1) Gathers all the model files and then 2) writes them to a location. I need number 1 in a separate place for my unknown files, so it would be nice to break this down into smaller pieces. I don't want to need to care (at this point at least) about the recipe order yet, or we will get into the situation that we rely on behaviour of one step of the recipe to even make the next one possible (and yes, I know we cannot avoid that entirely)


          add some suggestions from @pgierz

36ac4b8

github-actions bot commented Jun 21, 2024

This PR has been inactive for the last 365 days. It will now be marked as stale. Please close this PR if no longer needed.

github-actions bot added the Stale label

Member

pgierz commented Jun 24, 2024

Hey @mandresm: want to re-activate work on FileDicts after the summer break? (very very slowly, on the side as a hobby project?)

github-actions bot removed the Stale label

github-actions bot commented Jun 25, 2025

This PR has been inactive for the last 365 days. It will now be marked as stale. Please close this PR if no longer needed.

github-actions bot added the Stale label

mandresm added ESM-Tools+ WP File Handling labels

mandresm added this to the ESM-Tools+ 4 File Handling milestone

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ESM-Tools+ Stale WP File Handling