Make fused normalization functions backward-compatible #1760

timmoon10 · 2023-12-21T01:21:11Z

#1715 makes breaking API changes to some fused normalization functions, in particular adding memory_efficient as a positional argument. This PR makes memory_efficient a keyword argument to ensure backward compatibility.

This change is motivated by the fact that Megatron-LM uses the old API:
https://github.com/NVIDIA/Megatron-LM/blob/2bc6cd307a11423928c675f741e79e03df23e721/megatron/core/fusions/fused_layer_norm.py#L147
This prevents NeMo from upgrading from the 23.09 to 23.11 PyTorch container. See NVIDIA-NeMo/NeMo#7909 (comment).

Feedback would be appreciated. An alternative approach is to update Megatron-LM, but this seems simpler. Pinging @RuiWang1998.

Signed-off-by: Tim Moon <[email protected]>

See NVIDIA/apex#1760. Signed-off-by: Tim Moon <[email protected]>

RuiWang1998 · 2023-12-21T05:35:09Z

Hi @timmoon10 ,

Just thought people may not be using the Function directly and forgot about Megatron. I believe it might be best to submit another PR to Megatron-LM in tandem with this one, since I believe Megatron-Deepspeed already have this feature (deepspeedai/Megatron-DeepSpeed#277) and it'd be great if Megatron has it as well.

timmoon10 · 2023-12-27T19:07:45Z

@RuiWang1998 That's nifty, it'll be convenient to just reuse that existing work.

These two approaches aren't mutually exclusive, so I don't think there is any harm to merging. This change won't break the newer code that uses memory_efficient.

* Add distopt support for FP8 params and BF16 optimizer state Signed-off-by: Tim Moon <[email protected]> * Removed unused import Signed-off-by: Tim Moon <[email protected]> * Update PyTorch container in Jenkins pipeline Signed-off-by: Tim Moon <[email protected]> * Use custom container with Apex bugfixes See NVIDIA/apex#1760. Signed-off-by: Tim Moon <[email protected]> * Upgrade to PyTorch 23.11 container Signed-off-by: Tim Moon <[email protected]> * Update Apex commit Signed-off-by: Tim Moon <[email protected]> --------- Signed-off-by: Tim Moon <[email protected]> Signed-off-by: Tim Moon <[email protected]> Co-authored-by: Eric Harper <[email protected]>

…eMo#7909) * Add distopt support for FP8 params and BF16 optimizer state Signed-off-by: Tim Moon <[email protected]> * Removed unused import Signed-off-by: Tim Moon <[email protected]> * Update PyTorch container in Jenkins pipeline Signed-off-by: Tim Moon <[email protected]> * Use custom container with Apex bugfixes See NVIDIA/apex#1760. Signed-off-by: Tim Moon <[email protected]> * Upgrade to PyTorch 23.11 container Signed-off-by: Tim Moon <[email protected]> * Update Apex commit Signed-off-by: Tim Moon <[email protected]> --------- Signed-off-by: Tim Moon <[email protected]> Signed-off-by: Tim Moon <[email protected]> Co-authored-by: Eric Harper <[email protected]>

…eMo#7909) * Add distopt support for FP8 params and BF16 optimizer state Signed-off-by: Tim Moon <[email protected]> * Removed unused import Signed-off-by: Tim Moon <[email protected]> * Update PyTorch container in Jenkins pipeline Signed-off-by: Tim Moon <[email protected]> * Use custom container with Apex bugfixes See NVIDIA/apex#1760. Signed-off-by: Tim Moon <[email protected]> * Upgrade to PyTorch 23.11 container Signed-off-by: Tim Moon <[email protected]> * Update Apex commit Signed-off-by: Tim Moon <[email protected]> --------- Signed-off-by: Tim Moon <[email protected]> Signed-off-by: Tim Moon <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Sasha Meister <[email protected]>

…eMo#7909) * Add distopt support for FP8 params and BF16 optimizer state Signed-off-by: Tim Moon <[email protected]> * Removed unused import Signed-off-by: Tim Moon <[email protected]> * Update PyTorch container in Jenkins pipeline Signed-off-by: Tim Moon <[email protected]> * Use custom container with Apex bugfixes See NVIDIA/apex#1760. Signed-off-by: Tim Moon <[email protected]> * Upgrade to PyTorch 23.11 container Signed-off-by: Tim Moon <[email protected]> * Update Apex commit Signed-off-by: Tim Moon <[email protected]> --------- Signed-off-by: Tim Moon <[email protected]> Signed-off-by: Tim Moon <[email protected]> Co-authored-by: Eric Harper <[email protected]>

Make fused layer norm functions backward-compatible

99a276e

Signed-off-by: Tim Moon <[email protected]>

timmoon10 added a commit to timmoon10/NeMo that referenced this pull request Dec 21, 2023

Use custom container with Apex bugfixes

f40a38a

See NVIDIA/apex#1760. Signed-off-by: Tim Moon <[email protected]>

crcrpar approved these changes Jan 1, 2024

View reviewed changes

crcrpar merged commit c07a4cf into NVIDIA:master Jan 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make fused normalization functions backward-compatible #1760

Make fused normalization functions backward-compatible #1760

Uh oh!

timmoon10 commented Dec 21, 2023 •

edited

Loading

Uh oh!

RuiWang1998 commented Dec 21, 2023

Uh oh!

timmoon10 commented Dec 27, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Make fused normalization functions backward-compatible #1760

Make fused normalization functions backward-compatible #1760

Uh oh!

Conversation

timmoon10 commented Dec 21, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RuiWang1998 commented Dec 21, 2023

Uh oh!

timmoon10 commented Dec 27, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

timmoon10 commented Dec 21, 2023 •

edited

Loading