Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extmodules workflow misidentifies name/version in certain filesystem hierarchies #2

Open
ericblau opened this issue Jun 28, 2023 · 6 comments
Assignees

Comments

@ericblau
Copy link

The Extmodules workflow assumes that modulefiles are stored in a directory structure where, at the leaf directories, the filename is the version, and the directory in which the file resides is the package name.

This is true in many cases, and irrelevant in many other cases, as IPF will override name/version from the filesystem if it can discover Name and Version key/value pairs inside the files. However, there are some files, at least on Expanse, such as the Bright Cluster Manager cuda Toolkit, where the file is at:
/cm/shared/modulefiles/cuda11.7/toolkit/11.7.1

so IPF identifies the "Name" as "toolkit"

This could be ameliorated if comments could be added to the module file, including a Name key/value pair. But it would be good if IPF could figure this situation out itself.

One possibility would be a rework of how the Extmodules flow traverses subdirectories of the directories in the MODULEPATH. This may require a better understanding of all the ways MODULEPATHs and module file hierarchies are set up in practice.

Another possibility is to try to integrate lmod's "spider" command into the workflow, though I don't think we would be able to rely solely on spider, as it doesn't understand various key/value pairs that XSEDE/ACCESS have defined as extensions for more specific, detailed information. But it might be possible to get a list of modules from spider and match them up to their module files for extra info extraction.

@tcooper
Copy link

tcooper commented Jun 28, 2023

Some additional details:

  • Lmod supports the use of both Lua and TCL modulefiles.
  • Expanse application software stack is built with Spack and generates Lua modules for Lmod.
  • Some vendor provided modulefiles on Expanse are TCL modulefiles (for example, Bright Cluster Manager CUDA related).

This means IPF needs to be able to collect information about modulefiles in way that will support either Lmod or environment-modules usage on the resource. It should also support both Lua and TCL modulefiles and not necessarily assume either is being used exclusively regardless of which module environment is being used.

@ericblau
Copy link
Author

ericblau commented Jan 3, 2024

I've developed and am currently testing what I think may be a solution to this problem.

Fundamentally, the extmodules workflow of IPF was failing to parse the filesystem hierarchy of TCL and/or Lua modules files in the same way that lmod does, leading to incorrect names/versions for modules.

The obvious way around this is to use lmod itself as the canonical source for name/version, instead of trying to infer from the filesystem hierarchy. However, the various lmod commands (module avail, spider, etc) didn't return all the needed information, nor did they return what info they did in a particularly useful format.

Then I stumbled upon lmod's Spider System Cache. One can use "update_lmod_system_cache_files" to create a lua cache file that contains all the info that spider knows about the modules on the system.

Thus, my path forward was to add a "lmod_cache_file" parameter to the ExtendedModApplicationsStep of IPF. IPF uses the lupa python package to execute the lua code from the lmod_cache_file, then converts the spiderT table to a python dict. It then goes through the (converted) spiderT dict looking for fileT or metaModuleT structures.
It then goes through each fileT and for each module name in the fileT, adds the "fn" field as a module file that IPF should look at (along with the module name, and "Version" field). IPF then opens each module file, to check to see if there are keyword overrides for any fields, and publishes the module.

This all appears to work, and appears to solve the issue. Some care will have to be taken to ensure that the lmod_cache_file is created with the desired MODULEPATH, and is recreated with sufficient periodicity.

If anyone has feedback, specifically with regard to how I am interpreting the spiderT table, (or if it is likely to be substantively different for different versions of lmod), it is appreciated. So far, I have only run on Expanse, so there may well be variations I haven't encountered yet.

@ericblau ericblau self-assigned this Jan 3, 2024
@ericblau
Copy link
Author

ericblau commented Jan 3, 2024

Current code that I am testing is in the lmod_cache branch of this repo

@tcooper
Copy link

tcooper commented Jan 4, 2024

@ericblau Thanks for the update. I will share with our team.

@ericblau
Copy link
Author

I believe that this issue is fixed, as of commit c31125e, and release 1.7.1. As a note, the lmod_cache code is not part of 1.7.1, because the default behavior (without using lmod_cache) now addresses the issue for both modules and lmod.

@tcooper
Copy link

tcooper commented Jan 25, 2024

Thanks for the update @ericblau. I'll let our team know.

andylytical added a commit that referenced this issue Jan 7, 2025
This is a combination of 5 commits.
1st commit message:
===================

CTT-221 create an install-from-repo method

create prep script
new script ipf_configure_modules
config file for ipf_configure_modules removes need for cmdline options
all steps in QUICKSTART can be cut-n-paste
configure_extmodules looks for multiple conf files
wfm - WorkFlow Manager
wfm - use init.d files
wfm - update init-WORKFLOW template, assume running as non-root
new script to save_configs
prep will restore config links
add FAQ

This is the commit message #2:
==============================

CTT-303 Add version to IPF Quick Install

fixes #15
update FAQ

This is the commit message #3:
==============================

CTT-304 backwards compatible quick install

fixes #14

This is the commit message #4:
==============================

Update README for 3 install methods

Summary of changes to be committed:

modified:   README.md
  - Now a short overview, 3 install methods, and support info

new file:   docs/history.md
  - The old, long overview from README

deleted:    docs/Quickstart.md
  - broken into the following files...
new file:   docs/best-practices.md
new file:   docs/configure-workflows.md
new file:   docs/install-from-pip.md
  - Just the PIP parts
new file:   docs/testing.md

renamed:    QUICKSTART.md -> docs/install-from-github.md
renamed:    FAQ.md -> docs/install-from-github-FAQ.md

renamed:    docs/INSTALL.md -> docs/install-from-rpm.md
  - in all it's original glory, no changes made

All of the following still have 'xsede' at the top
assuming no updates have been made for ACCESS
and these should probably go away, eventually
renamed:    docs/COMMUNITY_SOFTWARE_PROVIDER.md -> docs/xsede/COMMUNITY_SOFTWARE_PROVIDER.md
renamed:    docs/COMMUNITY_SOFTWARE_SP_SETUP.md -> docs/xsede/COMMUNITY_SOFTWARE_SP_SETUP.md
renamed:    docs/Configuring.Service.Files.OLD.md -> docs/xsede/Configuring.Service.Files.OLD.md
renamed:    docs/Configuring.Service.Files.md -> docs/xsede/Configuring.Service.Files.md
renamed:    docs/GENERIC_PUBLISHER.md -> docs/xsede/GENERIC_PUBLISHER.md
renamed:    docs/INSTALL.OLD -> docs/xsede/INSTALL.OLD
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants