[Operator] Init NLL_LOSS #269

GwokHiujin · 2024-10-30T09:36:18Z

A basic implementation of NLL_LOSS has been pushed.

Based on the performance testing results summarized earlier, we believe that using the gather operation would lead to a more efficient implementation (by observing the output results of latency, it seems this is also how torch does it), and we will push forward with this optimization.

tongxin

LGTM

tongxin · 2024-11-27T02:06:17Z

benchmark/test_reduction_perf.py

@@ -135,6 +141,13 @@ def cumsum_input_fn(shape, cur_dtype, device):
            FLOAT_DTYPES + INT_DTYPES,
            marks=pytest.mark.cumsum,
        ),
+        pytest.param(
+            "nll_loss",
+            torch.nn.NLLLoss,


NLLLoss is a class. Can we use it as the reference function?

Indeed. I've updated it to torch.nn.functional.nll_loss.

tongxin · 2024-11-27T02:09:29Z

src/flag_gems/ops/nllloss.py

+    BLOCK_N: tl.constexpr,
+):
+    pid_n = tl.program_id(0)
+    offsets_n = pid_n * BLOCK_N + tl.arange(0, BLOCK_N)


Check offset_n overflow.

Sorry, I didn't understand this suggestion. Shouldn't the subsequent mask generation be sufficient to handle potential overflow? Or are you suggesting we check if offsets_n could exceed Triton's maximum representable number? If so, I think many operators will need to incorporate this check too.

tongxin · 2024-11-27T02:20:47Z

src/flag_gems/ops/nllloss.py

+):
+    pid_n = tl.program_id(0)
+    pid_d = tl.program_id(1)
+    offset_d = pid_d * BLOCK_D + tl.arange(0, BLOCK_D)


Overflow check.

tongxin · 2024-11-27T02:21:17Z

src/flag_gems/ops/nllloss.py

+):
+    pid_n = tl.program_id(0)
+    pid_c = tl.program_id(1)
+    offsets_n = pid_n * BLOCK_N + tl.arange(0, BLOCK_N)


Overflow check.

tongxin · 2024-11-27T02:22:55Z

src/flag_gems/ops/nllloss.py

+
+        if weight is None:
+            weight = torch.ones(
+                [


tongxin · 2024-11-27T02:34:08Z

src/flag_gems/ops/nllloss.py

+    def forward(ctx, inp, target, weight, reduction, ignore_index):
+        logging.debug("GEMS NLLLoss FWD")
+        shape = list(inp.shape)
+        dim = inp.ndim


The input shape/layout appears to be pretty complex as shown in the pytorch doc. Shall we add some documentation here to help clarify the inputs?

I've tried to add some annotations here about nllloss's parameters.

However, I think users who will use this function should also be quite clear about its principles; the part that looks complicated is just a high-dimensional input 😄

tongxin · 2024-11-27T02:35:15Z

src/flag_gems/ops/nllloss.py

+):
+    pid_n = tl.program_id(0)
+    pid_d = tl.program_id(1)
+    offset_d = pid_d * BLOCK_D + tl.arange(0, BLOCK_D)


Overflow check.

tongxin · 2024-11-27T02:58:10Z

src/flag_gems/ops/nllloss.py

+        tl.store(inp_grad_ptrs, inp_grad.to(tl.float32), mask=(inp_mask & ignore_mask))
+
+
+class NLLLoss(torch.autograd.Function):


This function is intended be used as substitute for nll_loss whereas NLLLoss is already taken as the nn module name. We should avoid the name confusion.

Indeed. I've upated the class name.

tongxin · 2024-11-27T02:59:44Z

src/flag_gems/ops/nllloss.py

+        if reduction == 0:
+            res = out.to(inp.dtype)
+        elif reduction == 1:
+            ctx.total_weight = sum(w_tgt).item()


Shall we also add dim= args to avoid confusion?

GwokHiujin added 2 commits October 30, 2024 09:33

[Operator] Init NLL_LOSS

23d7188

[Bugfix] Fix NLLLoss accuracy test to put gradient on the same device

8f075d0

tongxin self-assigned this Nov 10, 2024

tongxin requested a review from StrongSpoon November 10, 2024 15:06

tongxin previously approved these changes Nov 15, 2024

View reviewed changes

Merge branch 'master' into nll_loss

e99720e

GwokHiujin dismissed tongxin’s stale review via e99720e November 18, 2024 04:14

GwokHiujin added 2 commits November 18, 2024 04:16

[Chore] format

c8621bd

Merge branch 'master' of github.com:FlagOpen/FlagGems into nll_loss

76c16d2

GwokHiujin requested a review from tongxin November 20, 2024 15:06

Merge branch 'master' of github.com:FlagOpen/FlagGems into nll_loss

2bbbba9

tongxin reviewed Nov 27, 2024

View reviewed changes

GwokHiujin added 2 commits December 3, 2024 14:40

[Chore] Apply minor modifications to NLLLoss

cf9c13f

Merge branch 'master' of github.com:FlagOpen/FlagGems into nll_loss

a9d7c43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Operator] Init NLL_LOSS #269

[Operator] Init NLL_LOSS #269

GwokHiujin commented Oct 30, 2024

tongxin left a comment

tongxin Nov 27, 2024

GwokHiujin Dec 3, 2024

tongxin Nov 27, 2024

GwokHiujin Dec 3, 2024

tongxin Nov 27, 2024

GwokHiujin Dec 3, 2024

tongxin Nov 27, 2024

GwokHiujin Dec 3, 2024

tongxin Nov 27, 2024

GwokHiujin Dec 3, 2024

tongxin Nov 27, 2024

GwokHiujin Dec 3, 2024

tongxin Nov 27, 2024

tongxin Nov 27, 2024

GwokHiujin Dec 3, 2024

tongxin Nov 27, 2024

		tl.store(inp_grad_ptrs, inp_grad.to(tl.float32), mask=(inp_mask & ignore_mask))


		class NLLLoss(torch.autograd.Function):

[Operator] Init NLL_LOSS #269

Are you sure you want to change the base?

[Operator] Init NLL_LOSS #269

Conversation

GwokHiujin commented Oct 30, 2024

tongxin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment