-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extmodules workflow misidentifies name/version in certain filesystem hierarchies #2
Comments
Some additional details:
This means IPF needs to be able to collect information about modulefiles in way that will support either |
I've developed and am currently testing what I think may be a solution to this problem. Fundamentally, the extmodules workflow of IPF was failing to parse the filesystem hierarchy of TCL and/or Lua modules files in the same way that lmod does, leading to incorrect names/versions for modules. The obvious way around this is to use lmod itself as the canonical source for name/version, instead of trying to infer from the filesystem hierarchy. However, the various lmod commands (module avail, spider, etc) didn't return all the needed information, nor did they return what info they did in a particularly useful format. Then I stumbled upon lmod's Spider System Cache. One can use "update_lmod_system_cache_files" to create a lua cache file that contains all the info that spider knows about the modules on the system. Thus, my path forward was to add a "lmod_cache_file" parameter to the ExtendedModApplicationsStep of IPF. IPF uses the lupa python package to execute the lua code from the lmod_cache_file, then converts the spiderT table to a python dict. It then goes through the (converted) spiderT dict looking for fileT or metaModuleT structures. This all appears to work, and appears to solve the issue. Some care will have to be taken to ensure that the lmod_cache_file is created with the desired MODULEPATH, and is recreated with sufficient periodicity. If anyone has feedback, specifically with regard to how I am interpreting the spiderT table, (or if it is likely to be substantively different for different versions of lmod), it is appreciated. So far, I have only run on Expanse, so there may well be variations I haven't encountered yet. |
Current code that I am testing is in the lmod_cache branch of this repo |
@ericblau Thanks for the update. I will share with our team. |
I believe that this issue is fixed, as of commit c31125e, and release 1.7.1. As a note, the lmod_cache code is not part of 1.7.1, because the default behavior (without using lmod_cache) now addresses the issue for both modules and lmod. |
Thanks for the update @ericblau. I'll let our team know. |
This is a combination of 5 commits. 1st commit message: =================== CTT-221 create an install-from-repo method create prep script new script ipf_configure_modules config file for ipf_configure_modules removes need for cmdline options all steps in QUICKSTART can be cut-n-paste configure_extmodules looks for multiple conf files wfm - WorkFlow Manager wfm - use init.d files wfm - update init-WORKFLOW template, assume running as non-root new script to save_configs prep will restore config links add FAQ This is the commit message #2: ============================== CTT-303 Add version to IPF Quick Install fixes #15 update FAQ This is the commit message #3: ============================== CTT-304 backwards compatible quick install fixes #14 This is the commit message #4: ============================== Update README for 3 install methods Summary of changes to be committed: modified: README.md - Now a short overview, 3 install methods, and support info new file: docs/history.md - The old, long overview from README deleted: docs/Quickstart.md - broken into the following files... new file: docs/best-practices.md new file: docs/configure-workflows.md new file: docs/install-from-pip.md - Just the PIP parts new file: docs/testing.md renamed: QUICKSTART.md -> docs/install-from-github.md renamed: FAQ.md -> docs/install-from-github-FAQ.md renamed: docs/INSTALL.md -> docs/install-from-rpm.md - in all it's original glory, no changes made All of the following still have 'xsede' at the top assuming no updates have been made for ACCESS and these should probably go away, eventually renamed: docs/COMMUNITY_SOFTWARE_PROVIDER.md -> docs/xsede/COMMUNITY_SOFTWARE_PROVIDER.md renamed: docs/COMMUNITY_SOFTWARE_SP_SETUP.md -> docs/xsede/COMMUNITY_SOFTWARE_SP_SETUP.md renamed: docs/Configuring.Service.Files.OLD.md -> docs/xsede/Configuring.Service.Files.OLD.md renamed: docs/Configuring.Service.Files.md -> docs/xsede/Configuring.Service.Files.md renamed: docs/GENERIC_PUBLISHER.md -> docs/xsede/GENERIC_PUBLISHER.md renamed: docs/INSTALL.OLD -> docs/xsede/INSTALL.OLD
The Extmodules workflow assumes that modulefiles are stored in a directory structure where, at the leaf directories, the filename is the version, and the directory in which the file resides is the package name.
This is true in many cases, and irrelevant in many other cases, as IPF will override name/version from the filesystem if it can discover Name and Version key/value pairs inside the files. However, there are some files, at least on Expanse, such as the Bright Cluster Manager cuda Toolkit, where the file is at:
/cm/shared/modulefiles/cuda11.7/toolkit/11.7.1
so IPF identifies the "Name" as "toolkit"
This could be ameliorated if comments could be added to the module file, including a Name key/value pair. But it would be good if IPF could figure this situation out itself.
One possibility would be a rework of how the Extmodules flow traverses subdirectories of the directories in the MODULEPATH. This may require a better understanding of all the ways MODULEPATHs and module file hierarchies are set up in practice.
Another possibility is to try to integrate lmod's "spider" command into the workflow, though I don't think we would be able to rely solely on spider, as it doesn't understand various key/value pairs that XSEDE/ACCESS have defined as extensions for more specific, detailed information. But it might be possible to get a list of modules from spider and match them up to their module files for extra info extraction.
The text was updated successfully, but these errors were encountered: