Specifying normalization layers. #31

jakob-schloer · 2024-11-22T13:08:56Z

Is your feature request related to a problem? Please describe.

Currently, the processor is implemented with LayerNormalization. I would like to use other normalization layers (https://pytorch.org/docs/stable/nn.html#normalization-layers) including custom normalization layers.

Describe the solution you'd like

I would like to specify the normalization layer of the processor in the config, e.g. transformer.yaml:

layer_norm:  # This needs to be a partial instantiation since it is used in multiple places
  _target_: torch.nn.LayerNorm 
  _partial_: True
  normalized_shape: ${model.num_channels}

processor:
  _target_: anemoi.models.processor.TransformerProcessor
  _convert_: all
  activation: ${model.activation}
  num_layers: 16
  num_chunks: 2
  mlp_hidden_ratio: 4 # GraphTransformer or Transformer only
  num_heads: 16 # GraphTransformer or Transformer only
  window_size: 512
  dropout_p: 0.0 # GraphTransformer
  layer_norm: ${model.layer_norm} # (Optional) Default nn.LayerNorm

Describe alternatives you've considered

No response

Additional context

No response

Organisation

No response

The text was updated successfully, but these errors were encountered:

clessig · 2024-11-22T13:21:29Z

Cathal already did experiments with RMSNorm (but from TransformerEngine, I think). It might have been hard coded but good to coordinate.

CC: @cathalobrien

cathalobrien · 2024-11-22T13:27:06Z

Hey, yeah i have this PR ecmwf/anemoi-models#35 . I put it on ice a while back bc I thought it would cause problems in inference if we have arbitrary functions in the checkpoint file.

but now that the checkpoints are weights only, it should be fine. I can refresh it next week

jakob-schloer · 2024-11-22T13:36:06Z

I see, this is related but I was thinking of something more general. I would like to be able to write custom normalization layers, e.g.

class TransformerProcessorBlock(BaseBlock):
    """Transformer block with MultiHeadSelfAttention and MLPs."""

    def __init__(
        self,
        num_channels: int,
        hidden_dim: int,
        num_heads: int,
        activation: str,
        window_size: int,
        dropout_p: float = 0.0,
        layer_norm: Optional[dict] = None,
    ):
        super().__init__()

        try:
            act_func = getattr(nn, activation)
        except AttributeError as ae:
            LOGGER.error("Activation function %s not supported", activation)
            raise RuntimeError from ae

        # Instantiate normalization layers using Hydra
        self.layer_norm1 = layer_norm()
        self.layer_norm2 = layer_norm()
        ...
    
    def forward(
        self,
        x: Tensor,
        shapes: list,
        batch_size: int,
        model_comm_group: Optional[ProcessGroup] = None,
        **layer_kwargs,
    ) -> Tensor:
        # Need to be out of place for gradient propagation
        x = x + self.attention(self.layer_norm1(x, **layer_kwargs), shapes, batch_size, model_comm_group=model_comm_group)
        x = x + self.mlp(self.layer_norm2(x, **layer_kwargs))
        return x

Do you think this could be combined with your PR @cathalobrien?

cathalobrien · 2024-11-22T13:45:33Z

Ah I see, yeah I think this should work.

I already have this implemented

    LayerNorm:
      #_target_: "torch.nn.LayerNorm" #the default PyTorch implementation
      _target_: "liger_kernel.transformers.rms_norm.LigerRMSNorm" # my desired layernorm
      _partial_: True

I havent tried with a handwritten layernorm, but i assume as long as the import in target points to the right place it should be fine.

I like your idea of passing **layer_kwargs directly to the instantiated layer_norm, i was wondering how to handle arbitrary parameters at the time.

jakob-schloer · 2024-11-22T14:00:28Z

I like your idea of passing **layer_kwargs directly to the instantiated layer_norm, i was wondering how to handle arbitrary parameters at the time.

On a second thought, I believe it should be only **kwargs. In the future someone wants to do something else in the forward function.

clessig · 2024-11-22T14:03:53Z

Yes, e.g. cross attention or some fancy bias terms for the attention could also be passed.

jakob-schloer · 2024-11-22T14:06:02Z

I close this, since PR ecmwf/anemoi-models#35 has this already.

jakob-schloer added the enhancement New feature or request label Nov 22, 2024

jakob-schloer self-assigned this Nov 22, 2024

jakob-schloer closed this as completed Nov 22, 2024

jakob-schloer reopened this Nov 22, 2024

jakob-schloer closed this as not planned Won't fix, can't repro, duplicate, stale Nov 22, 2024

jakob-schloer reopened this Dec 4, 2024

jakob-schloer linked a pull request Dec 5, 2024 that will close this issue

Flexible normalization layers ecmwf/anemoi-models#95

Open

11 tasks

JesperDramsch added the models label Dec 19, 2024

JesperDramsch transferred this issue from ecmwf/anemoi-models Dec 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Specifying normalization layers. #31

Specifying normalization layers. #31

jakob-schloer commented Nov 22, 2024

clessig commented Nov 22, 2024

cathalobrien commented Nov 22, 2024 •

edited

Loading

jakob-schloer commented Nov 22, 2024 •

edited

Loading

cathalobrien commented Nov 22, 2024

jakob-schloer commented Nov 22, 2024

clessig commented Nov 22, 2024

jakob-schloer commented Nov 22, 2024

Specifying normalization layers. #31

Specifying normalization layers. #31

Comments

jakob-schloer commented Nov 22, 2024

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Organisation

clessig commented Nov 22, 2024

cathalobrien commented Nov 22, 2024 • edited Loading

jakob-schloer commented Nov 22, 2024 • edited Loading

cathalobrien commented Nov 22, 2024

jakob-schloer commented Nov 22, 2024

clessig commented Nov 22, 2024

jakob-schloer commented Nov 22, 2024

cathalobrien commented Nov 22, 2024 •

edited

Loading

jakob-schloer commented Nov 22, 2024 •

edited

Loading