-
Notifications
You must be signed in to change notification settings - Fork 534
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fx]add all_reduce test. #2784
base: main
Are you sure you want to change the base?
[fx]add all_reduce test. #2784
Conversation
@stellaraccident @ramiro050 Hi, happy to introduce the first PR to support communication operator. Do you have any idea to fix this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's unfortunate but I expect happens from time to time, especially on these bleeding edge things that aren't covered by the pytorch team's defacto compatibility requirements.
I haven't seen one of these in a long time. Do you remember what we usually do? Can probably exclude one of them based on version?
I don't think I've ever seen this particular issue. We do have a place where we check the PyTorch version because of differences in ops supported:
but the issue here is that the ODS for the ops is hard-coded. One simple workaround would be to modify the ODS generator to output two versions for that op and add a |
In my local test, when I use
It seems that So how about adding another td file |
Interesting. Yeah, your proposed solution seems fine to me. No need to add an extra td file. I would just place it here next to this op: torch-mlir/include/torch-mlir/Dialect/Torch/IR/TorchOps.td Lines 1085 to 1087 in ac8975e
|
Ok, I place it at the end of |
|
||
if __name__ == "__main__": | ||
world_size = 4 | ||
mp.spawn(run, args=(world_size,), nprocs=world_size, join=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reason for doing the multi-process? This PR is not really changing anything in the FX importer, so I don't think we need these tests. If all that is needed is ODS, then it is fine to just have the ODS changes.
// Torch c10d Functional Communication Ops | ||
//===----------------------------------------------------------------------===// | ||
// These ops is manully added because nightly-build torch's signature is | ||
// not convergent. Generate them if torch has stable op signature. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Can you also mention something like "Autogenerated by ./build_tools/update_torch_ods.sh
with torch==(some version)`", so that people don't manually modify these
0aa712f
to
d5a1aaa
Compare
No description provided.