optimize the performance of FlashBert Path for HPU #575

kaixuanliu · 2025-04-09T10:01:45Z

Use WhereIsAI/UAE-Large-V1 model to do benchmark, below is the throughput(seq/s) comparison:

bs	before	after
1	199.65	219.61
2	239.61	284.78
4	321.37	373.13
8	506.40	549.61
16	759.07	822.17
32	1028.31	1285.57
64	1130.67	1708.73
128	OOM	1030.06

kaixuanliu · 2025-04-09T11:54:12Z

@regisss @Narsil pls help review, thx!

Signed-off-by: Liu, Kaixuan <[email protected]>

Narsil

LGTM

Narsil · 2025-04-10T12:30:55Z

backends/python/server/text_embeddings_server/models/flash_bert.py

        if isinstance(batch, PaddedBatch):
            input_lens = batch.attention_mask.cumsum(-1)[:, -1].to(torch.int32)
-            max_input_lens = input_lens.max().item()
+            max_input_lens = 0  # This value will not be used


Suggested change

max_input_lens = 0 # This value will not be used

NIT

Hi, sorry , there may be misunderstanding. Here I commented "This value will not be used" means this variable can be any value, but we need to keep it here, as we need to pass it to L352

@Narsil , can you help double check?

I guess there are cases where the forward of the model does need a right value for this right? Otherwise why not removing it there?

Well, this is a common file shared by CPU/XPU andd HPU devices. On CPU/XPU, we do need this variable with exact meaning, while on HPU, we do not have real varlen_attention API, so we pass attn_mask to replace its functionality. Here we just need to set a random value for max_input_lens. This line cannot be deleted, as we need to pass it to L352.

Got it 👍

kaixuanliu · 2025-04-16T00:58:07Z

@regisss @Narsil Hi, can you help merge this PR?

optimize the performance of FlashBert Path for HPU

fc9dee1

Signed-off-by: Liu, Kaixuan <[email protected]>

Narsil approved these changes Apr 10, 2025

View reviewed changes

regisss approved these changes Apr 10, 2025

View reviewed changes

regisss merged commit 5a791e5 into huggingface:main Apr 16, 2025
2 of 13 checks passed

BrewTestBot mentioned this pull request Aug 5, 2025

text-embeddings-inference 1.8.0 Homebrew/homebrew-core#232408

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

optimize the performance of FlashBert Path for HPU #575

optimize the performance of FlashBert Path for HPU #575

Uh oh!

kaixuanliu commented Apr 9, 2025 •

edited

Loading

Uh oh!

kaixuanliu commented Apr 9, 2025

Uh oh!

Narsil left a comment

Uh oh!

Narsil Apr 10, 2025

Uh oh!

kaixuanliu Apr 10, 2025

Uh oh!

kaixuanliu Apr 11, 2025

Uh oh!

regisss Apr 16, 2025

Uh oh!

kaixuanliu Apr 16, 2025

Uh oh!

regisss Apr 16, 2025

Uh oh!

kaixuanliu commented Apr 16, 2025

Uh oh!

Uh oh!

Uh oh!

optimize the performance of FlashBert Path for HPU #575

optimize the performance of FlashBert Path for HPU #575

Uh oh!

Conversation

kaixuanliu commented Apr 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kaixuanliu commented Apr 9, 2025

Uh oh!

Narsil left a comment

Choose a reason for hiding this comment

Uh oh!

Narsil Apr 10, 2025

Choose a reason for hiding this comment

Uh oh!

kaixuanliu Apr 10, 2025

Choose a reason for hiding this comment

Uh oh!

kaixuanliu Apr 11, 2025

Choose a reason for hiding this comment

Uh oh!

regisss Apr 16, 2025

Choose a reason for hiding this comment

Uh oh!

kaixuanliu Apr 16, 2025

Choose a reason for hiding this comment

Uh oh!

regisss Apr 16, 2025

Choose a reason for hiding this comment

Uh oh!

kaixuanliu commented Apr 16, 2025

Uh oh!

Uh oh!

Uh oh!

kaixuanliu commented Apr 9, 2025 •

edited

Loading