-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
H100 support issue #56
Comments
I just merged this PR that updates the dependencies for running on H100s. You may want to use the
depending on your training config. Let us know if you run into issues! |
Hi, thanks for the quick reply. I use the new docker image as well as installing ninja and xformers via pip as you suggested. Have you tested it on H100? When I followed the old readme, I managed to get it working on DGXA100. So I think the code and dataset are fine on my side. But when switching to DGXH100, that's when the above issue was encounterd. |
Update: seems to be an issue from xformers side. There were some problem with their H100 support. Working on it. |
Solution: |
When can we expect H100 support? I have tried building environment basing on cuda11.8 and 12.0. There seems to be some issues realted to package discrepency. Any suggestion for now?
The text was updated successfully, but these errors were encountered: