[fx]add all_reduce test. #2784

linuxlonelyeagle · 2024-01-22T16:56:18Z

No description provided.

qingyunqu · 2024-01-23T02:30:09Z

@stellaraccident @ramiro050 Hi, happy to introduce the first PR to support communication operator.
The CI failed because that nightly-build torch has difference signature from stable-build torch. The stable-build's signature is c10d_functional::all_reduce : (Tensor, str, str, int[], int) -> (Tensor), however the nightly-build's signature is _c10d_functional::all_reduce : (Tensor, str, str) -> (Tensor).
In our practice, the stable-build's signature is correct.

Do you have any idea to fix this?

test/python/fx_importer/all_reduce_test.py

stellaraccident

That's unfortunate but I expect happens from time to time, especially on these bleeding edge things that aren't covered by the pytorch team's defacto compatibility requirements.

I haven't seen one of these in a long time. Do you remember what we usually do? Can probably exclude one of them based on version?

ramiro050 · 2024-01-24T00:19:51Z

I don't think I've ever seen this particular issue. We do have a place where we check the PyTorch version because of differences in ops supported:

torch-mlir/projects/pt1/python/torch_mlir/dynamo.py

Line 69 in 77ae563

if torch_version_for_comparison() >= version.parse("2.1.0.dev"):

but the issue here is that the ODS for the ops is hard-coded. One simple workaround would be to modify the ODS generator to output two versions for that op and add a Stable or Nightly to the op names. Once things converge upstream, we can get rid of the workaround.

qingyunqu · 2024-01-24T02:53:32Z

I don't think I've ever seen this particular issue. We do have a place where we check the PyTorch version because of differences in ops supported:

torch-mlir/projects/pt1/python/torch_mlir/dynamo.py

Line 69 in 77ae563

if torch_version_for_comparison() >= version.parse("2.1.0.dev"):

but the issue here is that the ODS for the ops is hard-coded. One simple workaround would be to modify the ODS generator to output two versions for that op and add a Stable or Nightly to the op names. Once things converge upstream, we can get rid of the workaround.

In my local test, when I use torch==2.3.0.dev20240109+cpu and without modified GeneratedTorchOps.td, it will generate:

test_import_frozen_exported_program
-----------------------------------
module {
  func.func @main(%arg0: !torch.vtensor<[4],f32>) -> !torch.vtensor<[4],f32> {
    %str = torch.constant.str "sum"
    %str_0 = torch.constant.str ""
    %int0 = torch.constant.int 0
    %int1 = torch.constant.int 1
    %int2 = torch.constant.int 2
    %int3 = torch.constant.int 3
    %0 = torch.prim.ListConstruct %int0, %int1, %int2, %int3 : (!torch.int, !torch.int, !torch.int, !torch.int) -> !torch.list<int>
    %int4 = torch.constant.int 4
    %1 = torch.operator "torch.c10d_functional.all_reduce"(%arg0, %str, %str_0, %0, %int4) : (!torch.vtensor<[4],f32>, !torch.str, !torch.str, !torch.list<int>, !torch.int) -> !torch.vtensor<[4],f32>
    %2 = torch.operator "torch.c10d_functional.wait_tensor"(%1) : (!torch.vtensor<[4],f32>) -> !torch.vtensor<[4],f32>
    return %2 : !torch.vtensor<[4],f32>
  }
}

It seems that torch.export always generate op with signature c10d_functional::all_reduce : (Tensor, str, str, int[], int) -> (Tensor) even on nightly-build torch.

So how about adding another td file CommunicationOps.td manually (don't use the torch_ods_gen.py) as workaround? And merge this td file into GeneratedTorchOps.td once things converge upstream.

ramiro050 · 2024-01-24T18:39:56Z

It seems that torch.export always generate op with signature c10d_functional::all_reduce : (Tensor, str, str, int[], int) -> (Tensor) even on nightly-build torch.

Interesting. Yeah, your proposed solution seems fine to me. No need to add an extra td file. I would just place it here next to this op:

torch-mlir/include/torch-mlir/Dialect/Torch/IR/TorchOps.td

Lines 1085 to 1087 in ac8975e

    
           // The corresponding without underscore variant for `torch.aten.bernoulli_.float` 
        
           // doesn't exist in the pytorch ops registry. Add it here. 
        
           def Torch_ValsemVariantAtenBernoulliFloatOp: Torch_Op<"valsem.aten.bernoulli.float", [

qingyunqu · 2024-01-25T03:16:56Z

It seems that torch.export always generate op with signature c10d_functional::all_reduce : (Tensor, str, str, int[], int) -> (Tensor) even on nightly-build torch.

Interesting. Yeah, your proposed solution seems fine to me. No need to add an extra td file. I would just place it here next to this op:

torch-mlir/include/torch-mlir/Dialect/Torch/IR/TorchOps.td

Lines 1085 to 1087 in ac8975e

// The corresponding without underscore variant for `torch.aten.bernoulli_.float`

// doesn't exist in the pytorch ops registry. Add it here.

def Torch_ValsemVariantAtenBernoulliFloatOp: Torch_Op<"valsem.aten.bernoulli.float", [

Ok, I place it at the end of TorchOps.td.

ramiro050 · 2024-01-26T22:39:25Z

test/python/fx_importer/all_gather_into_tensor_test.py

+
+if __name__ == "__main__":
+    world_size = 4
+    mp.spawn(run, args=(world_size,), nprocs=world_size, join=True)


What is the reason for doing the multi-process? This PR is not really changing anything in the FX importer, so I don't think we need these tests. If all that is needed is ODS, then it is fine to just have the ODS changes.

ramiro050 · 2024-01-26T22:40:48Z

include/torch-mlir/Dialect/Torch/IR/TorchOps.td

+// Torch c10d Functional Communication Ops
+//===----------------------------------------------------------------------===//
+// These ops is manully added because nightly-build torch's signature is
+// not convergent. Generate them if torch has stable op signature.


nit: Can you also mention something like "Autogenerated by ./build_tools/update_torch_ods.sh with torch==(some version)`", so that people don't manually modify these

qingyunqu requested review from stellaraccident and ramiro050 January 23, 2024 02:48

Vremold reviewed Jan 23, 2024

View reviewed changes

test/python/fx_importer/all_reduce_test.py Outdated Show resolved Hide resolved

stellaraccident reviewed Jan 23, 2024

View reviewed changes

qingyunqu requested a review from stellaraccident January 24, 2024 09:57

ramiro050 requested changes Jan 26, 2024

View reviewed changes

linuxlonelyeagle and others added 8 commits March 28, 2024 16:07

add all_reduce test.

b1a61fe

profile the output.

5abefb2

add another td file

114d87a

update

f1f7581

add wait_tensor.

fe55aa1

add all_gather_into_tensor_test.py and reduce_scatter_tensor_test.py.

07ec25d

merge communication ops into TorchOps.td

dd5d5ce

rebase main.

d5a1aaa

linuxlonelyeagle force-pushed the all_reduce_test branch from 0aa712f to d5a1aaa Compare March 28, 2024 09:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[fx]add all_reduce test. #2784

[fx]add all_reduce test. #2784

Uh oh!

linuxlonelyeagle commented Jan 22, 2024

Uh oh!

qingyunqu commented Jan 23, 2024

Uh oh!

Uh oh!

stellaraccident left a comment

Uh oh!

ramiro050 commented Jan 24, 2024

Uh oh!

qingyunqu commented Jan 24, 2024 •

edited

Loading

Uh oh!

ramiro050 commented Jan 24, 2024

Uh oh!

qingyunqu commented Jan 25, 2024

Uh oh!

ramiro050 Jan 26, 2024

Uh oh!

ramiro050 Jan 26, 2024

Uh oh!

Uh oh!

[fx]add all_reduce test. #2784

Are you sure you want to change the base?

[fx]add all_reduce test. #2784

Uh oh!

Conversation

linuxlonelyeagle commented Jan 22, 2024

Uh oh!

qingyunqu commented Jan 23, 2024

Uh oh!

Uh oh!

stellaraccident left a comment

Choose a reason for hiding this comment

Uh oh!

ramiro050 commented Jan 24, 2024

Uh oh!

qingyunqu commented Jan 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ramiro050 commented Jan 24, 2024

Uh oh!

qingyunqu commented Jan 25, 2024

Uh oh!

ramiro050 Jan 26, 2024

Choose a reason for hiding this comment

Uh oh!

ramiro050 Jan 26, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

qingyunqu commented Jan 24, 2024 •

edited

Loading