Skip to content

Mis-aligned information in RichModelSummary with `DeepSpeedStrategy #21099

@GdoongMathew

Description

@GdoongMathew

Bug description

Currently in the master branch, if we use RichModelSummary with DeepSpeedStrategy, the resulting summary table is abit off with missing Params per Device information, like the following.

┏━━━┳━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━┳━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━┓
┃   ┃ Name  ┃ Type   ┃ Params ┃ Mode ┃ FLOPs ┃ In sizes ┃ Out sizes ┃
┡━━━╇━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━╇━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━┩
│ 0 │ layer │ Linear │     66 │ 66   │ train │  [4, 32] │    [4, 2] │
└───┴───────┴────────┴────────┴──────┴───────┴──────────┴───────────┘
Trainable params: 66                                                            
Non-trainable params: 0                                                         
Total params: 66                                                                
Total estimated model params size (MB): 0                                       
Modules in train mode: 1                                                        
Modules in eval mode: 0                                                         
Total FLOPs: 512            

while the correct table should be like

┏━━━┳━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┓
┃   ┃       ┃        ┃        ┃   Params ┃       ┃       ┃          ┃          ┃
┃   ┃       ┃        ┃        ┃      per ┃       ┃       ┃          ┃      Out ┃
┃   ┃ Name  ┃ Type   ┃ Params ┃   Device ┃ Mode  ┃ FLOPs ┃ In sizes ┃    sizes ┃
┡━━━╇━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━┩
│ 0 │ layer │ Linear │     66 │       66 │ train │   512 │  [4, 32] │   [4, 2] │
└───┴───────┴────────┴────────┴──────────┴───────┴───────┴──────────┴──────────┘
Trainable params: 66                                                            
Non-trainable params: 0                                                         
Total params: 66                                                                
Total estimated model params size (MB): 0                                       
Modules in train mode: 1                                                        
Modules in eval mode: 0                                                         
Total FLOPs: 512       

What version are you seeing the problem on?

master

Reproduced in studio

No response

How to reproduce the bug

Error messages and logs

# Error messages and logs here please

Environment

Current environment
#- PyTorch Lightning Version (e.g., 2.5.0):
#- PyTorch Version (e.g., 2.5):
#- Python version (e.g., 3.12):
#- OS (e.g., Linux):
#- CUDA/cuDNN version:
#- GPU models and configuration:
#- How you installed Lightning(`conda`, `pip`, source):

More info

No response

cc @lantiga

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions