-
-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible update to GPU feature #535
Comments
I like it! And I wonder if we need to make it easier to parse these feature groups - e.g recent changes to default-_version have a similar kind of logic - check the value and act differently depending on the case, and we would soon have the same for mpi (and maybe others in the future). I can give this a shot at implementation, although I want to work on update first (probably this weekend) since I think the binoc runs and generating incorrect listings! |
Haven't gotten to try this out yet - worked on the update functionality today! Not sure I made progress there, but this is next on my TODO to play around with. |
okay this is next in my queue @marcodelapierre ! I haven't forgotten! |
no rush. |
I'm not terribly flooded (yet, knock on wood!), but I like working on one shpc new feature at a time! So Linux terms let's just say my brain works fairly serially, or in HPC terms I'm single threaded, within a single project. 🧵 😆 |
Ahah always a great metaphor, love it! 😄 [In other SHPC issues, hopefully I will get to comment on the environments/views, it is a very powerful concept, and I do have a scenario to share with you and the other contributors] |
@marcodelapierre one quick question! So this approach:
Assumes that a container can only be built for one GPU type. E.g., tensorflow/tensorflow could be matched to nvidia, but not amd. Is that correct? And would we not run into issues with different tags being intended for different gpus? This does feel like something that should still be general in the container recipe to not hard code a bias (e.g.,true/false) but then on a particular install it should be up to the admin to decide the customizations. Our previous approach assumed a center is using one gpu type, and currently the admin would need one "one off" to install the same container name with a different gpu. Is that the action that is annoying / can be improved upon? Some more thinking:
So TLDR: I think we want to make this easy and support it, but we want to ensure that we don't hard code a preference into a container.yaml that might be different / change with tags, and I think we should find the right way to scope this (e.g., scoped in a view I think would make sense!) |
Great points, sorry @vsoch I have been swamped these days, trying to catch up! To be honest, I would tend to consider as very unlikely the case where a single container image tag contains builds for multiple GPU vendors (happy to be proven wrong...). But on the other hand... One scenario which I agree we need to definitely support is the one where different tags of the same image are built for different vendors. To this end.... Here is the issue on this feature: #536 So, bottom line, I agree with you that we need to improve this aspect of this functionality, starting from the case where multiple tags of the same image support distinct vendors. What do you think? |
Thinking more about environments in this context, and your point on AMD+Nvidia containers .... why not?! I am not really adding much here, just paraphrasing your thoughts, which I can say I support! |
Just to loop back here to discussion - when you review #545 think of it in context of some of these questions. E.g., if we can find a way to customize a specific module install (still maintaining symbolic links or something else?) I think we could handle specifics like this. |
See my comment on #545,
If we restrict the scope of the current issue to single GPU vendor, then I would just suggest to change the functionality inside the container yaml, from
to
On the ground that typically a container is only built for one vendor. |
This is next in the queue after views! I did start working on it actually but paused with views in case it's a subset of that (which right now it looks like it will be in addition to them). |
@marcodelapierre now that we have views could there be a way to allow this additional customization through them? |
Hi @vsoch, I think we could provide the functionality in two ways:
My personal preference is the first, as it still seems to be simple and flexible at the same time. However, we've also learnt that it is good to provide multiple ways to achieve the same setup, as different people/centres will have different preferences. SHPC views seem great in providing this additional flexibility in setups. |
This thought came out of the issue on MPI #527, so thanks @georgiastuart for the inspiration!
Current interface of the GPU feature:
I have realised that the current interface does not specify, for a given recipe, whether the corresponding package/container was built for Nvidia or AMD cards, which is known beforehand.
As a consequence, this is limiting in the (probably unlikely?) scenario where a centre has both Nvidia and AMD cards.
Updated interface/configuration, for consideration:
--rocm
flag if global setting contains amd, ....ignore if latter is null(?)--nv
flag if global setting contains nvidia, ....ignore if latter is null(?)Small implication: update documents, and update the few preexisting recipes which have "gpu: true" (all Nvidia, apart from "tensorflow/tensorflow", for which it is to be checked).
What do you think @vsoch?
The text was updated successfully, but these errors were encountered: