[torch] Add Canonicalize Pattern for embedding op #3277

pashu123 · 2024-05-02T15:24:42Z

Converts PrimConvertOp followed by Embedding -> Embedding followed by PrimConvertOp. We don't need to cast the entire matrix; just the output of the embedding op.

Issue: iree-org/iree#17226 (comment)

rsuderman · 2024-05-07T21:16:49Z

I can see benefit to this optimization however this is more avoiding the compilation issue we have been encountering rather than preventing the crash.

benvanik · 2024-05-07T22:01:31Z

Note this could also be a pessimization: if you have your embeddings as f32, gather them, and convert to f16 you really want the conversion to fold into the embeddings so you aren't shipping (and doing the memory transactions) on f32 if you don't need those bits. This may get taken care of later in the pipeline but it's important to note that there are some massive implications of things like this (it's always better to hoist narrowing operations and sink widening operations, almost never the opposite).

rsuderman

To avoid hurting performance we should only perform the swap during the widening case, otherwise we are potentially loading more data just to truncate back down whereas there is benefit to truncating overall.

pashu123 · 2024-05-08T14:58:01Z

To avoid hurting performance we should only perform the swap during the widening case, otherwise we are potentially loading more data just to truncate back down whereas there is benefit to truncating overall.

There's a tradeoff between memory and compute; doing this might take more memory but is less compute-intensive, whereas the one suggested might be compute-intensive since we are not able to fuse both kernels at the backend. I will add the check to perform a swap only during the widening case.

pashu123 · 2024-05-27T15:53:58Z

To avoid hurting performance we should only perform the swap during the widening case, otherwise we are potentially loading more data just to truncate back down whereas there is benefit to truncating overall.

I've made the necessary changes. Please review.

Converts PrimConvertOp followed by Embedding -> Embedding followed by PrimConvertOp. We don't need to cast the entire matrix; just the output of the embedding op.

vivekkhandelwal1 · 2024-08-14T12:02:38Z

Hi @pashu123, it seems the PR has been there for quite some time. Can you please update it in order to get merged?

pashu123 · 2024-08-14T13:39:16Z

Hi @pashu123, it seems the PR has been there for quite some time. Can you please update it in order to get merged?

It’s not needed. I’ll close the PR.

pashu123 force-pushed the type_cast branch 2 times, most recently from 37782f7 to 22465ee Compare May 2, 2024 15:27

pashu123 requested a review from rsuderman May 2, 2024 15:27

pashu123 mentioned this pull request May 2, 2024

Llama-3-8B f16 fails to compile to vmfb iree-org/iree#17226

Open

hanhanW mentioned this pull request May 7, 2024

Serialize Executables crashing when compiling LLaMa on async-cpu iree-org/iree#17244

Open

rsuderman approved these changes May 7, 2024

View reviewed changes

ScottTodd mentioned this pull request May 7, 2024

Vulkan compile errors for llama model from sharktank iree-org/iree#17304

Closed

rsuderman self-requested a review May 7, 2024 22:03

rsuderman requested changes May 7, 2024

View reviewed changes

ScottTodd mentioned this pull request May 8, 2024

Tracking running llama models through IREE nod-ai/shark-ai#22

Closed

pashu123 mentioned this pull request May 20, 2024

Fuse Generic Ops Generated by gather Lowering iree-org/iree#17341

Merged

pashu123 force-pushed the type_cast branch from 22465ee to 329bbd1 Compare May 27, 2024 15:53

pashu123 requested a review from rsuderman May 27, 2024 15:53

[torch] Add Canonicalize Pattern for embedding op

46071e2

Converts PrimConvertOp followed by Embedding -> Embedding followed by PrimConvertOp. We don't need to cast the entire matrix; just the output of the embedding op.

pashu123 force-pushed the type_cast branch from 329bbd1 to 46071e2 Compare May 27, 2024 15:58

pashu123 requested a review from ScottTodd May 29, 2024 20:28

pashu123 closed this Aug 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[torch] Add Canonicalize Pattern for embedding op #3277

[torch] Add Canonicalize Pattern for embedding op #3277

pashu123 commented May 2, 2024

rsuderman commented May 7, 2024

benvanik commented May 7, 2024 •

edited

Loading

rsuderman left a comment •

edited

Loading

pashu123 commented May 8, 2024

pashu123 commented May 27, 2024

vivekkhandelwal1 commented Aug 14, 2024

pashu123 commented Aug 14, 2024

[torch] Add Canonicalize Pattern for embedding op #3277

[torch] Add Canonicalize Pattern for embedding op #3277

Conversation

pashu123 commented May 2, 2024

rsuderman commented May 7, 2024

benvanik commented May 7, 2024 • edited Loading

rsuderman left a comment • edited Loading

Choose a reason for hiding this comment

pashu123 commented May 8, 2024

pashu123 commented May 27, 2024

vivekkhandelwal1 commented Aug 14, 2024

pashu123 commented Aug 14, 2024

benvanik commented May 7, 2024 •

edited

Loading

rsuderman left a comment •

edited

Loading