Skip to content

Conversation

@timmoon10
Copy link
Contributor

@timmoon10 timmoon10 commented Dec 21, 2023

#1715 makes breaking API changes to some fused normalization functions, in particular adding memory_efficient as a positional argument. This PR makes memory_efficient a keyword argument to ensure backward compatibility.

This change is motivated by the fact that Megatron-LM uses the old API:
https://github.com/NVIDIA/Megatron-LM/blob/2bc6cd307a11423928c675f741e79e03df23e721/megatron/core/fusions/fused_layer_norm.py#L147
This prevents NeMo from upgrading from the 23.09 to 23.11 PyTorch container. See NVIDIA-NeMo/NeMo#7909 (comment).

Feedback would be appreciated. An alternative approach is to update Megatron-LM, but this seems simpler. Pinging @RuiWang1998.

timmoon10 added a commit to timmoon10/NeMo that referenced this pull request Dec 21, 2023
@RuiWang1998
Copy link
Contributor

Hi @timmoon10 ,

Just thought people may not be using the Function directly and forgot about Megatron. I believe it might be best to submit another PR to Megatron-LM in tandem with this one, since I believe Megatron-Deepspeed already have this feature (deepspeedai/Megatron-DeepSpeed#277) and it'd be great if Megatron has it as well.

@timmoon10
Copy link
Contributor Author

@RuiWang1998 That's nifty, it'll be convenient to just reuse that existing work.

These two approaches aren't mutually exclusive, so I don't think there is any harm to merging. This change won't break the newer code that uses memory_efficient.

@crcrpar crcrpar merged commit c07a4cf into NVIDIA:master Jan 1, 2024
ericharper added a commit to NVIDIA-NeMo/NeMo that referenced this pull request Jan 12, 2024
* Add distopt support for FP8 params and BF16 optimizer state

Signed-off-by: Tim Moon <[email protected]>

* Removed unused import

Signed-off-by: Tim Moon <[email protected]>

* Update PyTorch container in Jenkins pipeline

Signed-off-by: Tim Moon <[email protected]>

* Use custom container with Apex bugfixes

See NVIDIA/apex#1760.

Signed-off-by: Tim Moon <[email protected]>

* Upgrade to PyTorch 23.11 container

Signed-off-by: Tim Moon <[email protected]>

* Update Apex commit

Signed-off-by: Tim Moon <[email protected]>

---------

Signed-off-by: Tim Moon <[email protected]>
Signed-off-by: Tim Moon <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
minitu pushed a commit to minitu/NeMo that referenced this pull request Jan 19, 2024
…eMo#7909)

* Add distopt support for FP8 params and BF16 optimizer state

Signed-off-by: Tim Moon <[email protected]>

* Removed unused import

Signed-off-by: Tim Moon <[email protected]>

* Update PyTorch container in Jenkins pipeline

Signed-off-by: Tim Moon <[email protected]>

* Use custom container with Apex bugfixes

See NVIDIA/apex#1760.

Signed-off-by: Tim Moon <[email protected]>

* Upgrade to PyTorch 23.11 container

Signed-off-by: Tim Moon <[email protected]>

* Update Apex commit

Signed-off-by: Tim Moon <[email protected]>

---------

Signed-off-by: Tim Moon <[email protected]>
Signed-off-by: Tim Moon <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
ssh-meister pushed a commit to ssh-meister/NeMo that referenced this pull request Feb 15, 2024
…eMo#7909)

* Add distopt support for FP8 params and BF16 optimizer state

Signed-off-by: Tim Moon <[email protected]>

* Removed unused import

Signed-off-by: Tim Moon <[email protected]>

* Update PyTorch container in Jenkins pipeline

Signed-off-by: Tim Moon <[email protected]>

* Use custom container with Apex bugfixes

See NVIDIA/apex#1760.

Signed-off-by: Tim Moon <[email protected]>

* Upgrade to PyTorch 23.11 container

Signed-off-by: Tim Moon <[email protected]>

* Update Apex commit

Signed-off-by: Tim Moon <[email protected]>

---------

Signed-off-by: Tim Moon <[email protected]>
Signed-off-by: Tim Moon <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Sasha Meister <[email protected]>
rohitrango pushed a commit to rohitrango/NeMo that referenced this pull request Jun 25, 2024
…eMo#7909)

* Add distopt support for FP8 params and BF16 optimizer state

Signed-off-by: Tim Moon <[email protected]>

* Removed unused import

Signed-off-by: Tim Moon <[email protected]>

* Update PyTorch container in Jenkins pipeline

Signed-off-by: Tim Moon <[email protected]>

* Use custom container with Apex bugfixes

See NVIDIA/apex#1760.

Signed-off-by: Tim Moon <[email protected]>

* Upgrade to PyTorch 23.11 container

Signed-off-by: Tim Moon <[email protected]>

* Update Apex commit

Signed-off-by: Tim Moon <[email protected]>

---------

Signed-off-by: Tim Moon <[email protected]>
Signed-off-by: Tim Moon <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants