Multiple Enhancement to Make MMENGINE walk further #1665

MGAMZ · 2025-10-17T06:55:53Z

Motivation

The maintenance progress of mmengine is getting slower and slower at present. In order to enable this excellent architecture to progress further, I have introduced some upcoming compatibility improvements and some minor optimizations for it.

Modification

Packing according to PyPA

The packaging using setup.py is deprecated and is due to be removed. Now PyPA suggests to use pyproject.toml as the packaging configuration. So refactoring them now.

Upgrade yapf from `0.32.0` to `0.43.0`

The original yapf is not supported in python3.13 due to the missing of lib2to3. And in order to use the latest yapf version, the repo site need to be changed, so the pre-commit-config.yaml undergo a small modification. This should only bring minor behavior difference.

Upgrade numpy from `1.2x` to `2.2.x+`

The numpy.compat is deprecated, using numpy.fft in tests/data/config/lazy_module_config/test_ast_transform.py and tests/test_config/test_lazy.py.

torch.compile configuration

The torch.compile receives a lot args. And in mmengine, all contents in compile from any config file will be transferred to torch.compile.
The mmengine checks several dependencies when user specifies compile in their config file, and making hasattr(config, compile) as the flag of enabling torch.compile.
However this is not exact. The compile config can include an arg disable to state if the compiler is actually working. So the mmengine needs to carefully determine if the user is actually enabling the torch.compile. This caused a small modification in mmengine/_strategy/base.py

Message Hub error caught

The message hub receives critical Tensor input and displays them. In it's procedure, the item method is called when handling values (can be lr, loss, metrics, time, epoch, etc...) When there exists some invalid values, the message simply crashes the process as there is an assertion.

Considering the user is easy to print something bad to the message hub at ANY code line, a more robust way is to should a warning and returns the value and see if the directly returned value can be successfully processed by outer functions.

Message Hub ignore `torch.compile`

The message hub's operation will inevitablely triger graph breaking during compile, so disabling it now.

Support for Python-Like config file containing `model_warpper_constructor` or `model_wrapper`

The current parsing of constructor_cfg does not support the following case (python-style configuration file):

optim_wrapper = dict(
    type = DeepSpeedOptimWrapper,
    optimizer = dict(type=AdamW, lr=lr, weight_decay=weight_decay),
    accumulative_counts = grad_accumulation,
    constructor = dict(type=DefaultOptimWrapperConstructor),
)

So adding the corresponding init logic:

constructor_cfg = optim_wrapper_cfg.pop('constructor', None)
if constructor_cfg is None:
    constructor_cfg = dict(type=DefaultOptimWrapperConstructor)

The similar issue exist on model_wrapper_cfg too, so there're some changes in mmengine.runner.runner.wrap_model too.

`torch.load` usage

The torch.load now requires a weights_only arg. By default, it should be set to False to prevent any risks.

Resume strategy

Advance dataloader to skip data that has already been trained on InfiniteSampled dataloader is non-sense. And could waste many times for data preprocessing, as the next(self.dataloader_iterator) operation actually performs all preprocessing steps.

Logic Breaking

In some scenarios where we need to make sure that all data are browsed at the same frequency after the model is trained. This modification might be insuitable.

Removal deprecation

torch.cuda.amp is no longer used, replacing it with from torch.amp import GradScaler, and then set GradScaler = partial(amp_GradScaler, device='cuda')
pkg_resources is no longer supported and is due to be completed abandoned in Nov. 2025. Replacing it with importlib. This causes many modifications in mmengine/utils/package_utils.py

Other fix

Lint Issues
Improve vis_backend error report hint.
Make Lint Happy.

Include these PR

#1650
#1654 (partially)
#1639
#1610
#1617
#1608
（Maybe missing some PRs）

Checklist

Due to my personal effort, I am unable to check everything here.

Pre-commit or other linting tools are used to fix the potential lint issues.
The modification is covered by complete unit tests. If not, please add more unit test to ensure the correctness.
If the modification has potential influence on downstream projects, this PR should be tested with downstream projects, like MMDetection or MMPretrain.
The documentation has been modified accordingly, like docstring or example tutorials.

FutureWarning: `torch.cuda.amp.GradScaler(args...)` is deprecated. Please use `torch.amp.GradScaler('cuda', args...)` instead.

FSDP.optim_state_dict_to_load requires the following parameters: model: Module, optim: Optimizer, optim_state_dict: Dict[str, Any]

…tions The current runner implementation has not yet supported for pure-python style configurations on model wrapper class. I follow the mainstream implementation to support this feature.

This may be due to the version confliction. Newer PyTorch may have introduced this optimizer.

2. Add torch compiler disable flag to message hub class. 3. The compile-time fault override has been moved from history buffer to message hub. 4. The MMDistributedDataParallel module has now been recovered to original MMEngine implementation. The reason for the modification at that time may be related to the train_step function modification at earlier projects. Such modification will be achived by inheriting a new class in the future.

MGAMZ · 2025-10-20T06:19:51Z

Thank you for your contribution. I agree that many of the changes are reasonable, and I'm willing to invest time to gradually merge this content.

However, merging so many changes at once carries significant risks (especially since the unit tests are currently failing). I hope we can submit these changes in separate PRs (for example, compile optimizations, lint fixes, strategy updates, etc.) and progressively move forward with merging them.

This may take some time, but if you're willing to do that, I'm happy to work with you to advance these changes.

Great, I am glad to. I was worrying the delicated PRs were useless as mmengine now lacks authorized maintainer.

And yes, It shall take some time as recently quite busy on my works.
I will refactore this big PR into small ones with higher quality.

HAOCHENYE · 2025-10-20T06:25:24Z

Thank you for your contribution. I agree that many of the changes are reasonable, and I'm willing to invest time to gradually merge this content.
However, merging so many changes at once carries significant risks (especially since the unit tests are currently failing). I hope we can submit these changes in separate PRs (for example, compile optimizations, lint fixes, strategy updates, etc.) and progressively move forward with merging them.
This may take some time, but if you're willing to do that, I'm happy to work with you to advance these changes.

Great, I am glad to. I was worrying the delicated PRs were useless as mmengine now lacks authorized maintainer

And yes, It shall take some time as recently quite busy on my works. I will refactore this big PR into small ones with higher quality.

Even though I have other commitments and cannot fulfill everyone's needs, I still hope that MMEngine can respond to users who are doing deep development.

Please trust that as long as you're willing to progressively and sustainably advance these PRs, I will eventually merge these changes into the main branch and release a new version.

lauriebax · 2025-10-21T10:08:10Z

Hi, we have noticed that MMLabs were not maintained very well. We have updated 7 of the repos, including MMEngine, to work with pytorch 2.8 and use pyproject.toml, along with some other modernization. We are committed to maintain our version of mmengine and this would help a lot too. Our repo has working CI pipelines and although we are a small team, we have some dedicated resources to review and test MR's like these. Would you be interested to open an MR in our fork (https://github.com/VBTI-development/onedl-mmengine)? And @HAOCHENYE maybe you can support us in maintaining our version, since the rest of mmlabs seems to be no longer maintained?

MGAMZ · 2025-10-21T10:45:07Z

@lauriebax Impressive.
But I'm quite busy recently, will consider your invitation soon.

HAOCHENYE · 2025-10-22T07:36:39Z

Hi, we have noticed that MMLabs were not maintained very well. We have updated 7 of the repos, including MMEngine, to work with pytorch 2.8 and use pyproject.toml, along with some other modernization. We are committed to maintain our version of mmengine and this would help a lot too. Our repo has working CI pipelines and although we are a small team, we have some dedicated resources to review and test MR's like these. Would you be interested to open an MR in our fork (https://github.com/VBTI-development/onedl-mmengine)? And @HAOCHENYE maybe you can support us in maintaining our version, since the rest of mmlabs seems to be no longer maintained?

I'll take some time this weekend to review what changes have been made in this PR and onedl mmengine compared to the current main branch, and see which modifications can be merged into main first. I can test these changes with mmengine or mmcv initially, and then try contacting the maintainers of other repos to see if we can move this forward.

HAOCHENYE · 2025-10-26T01:02:32Z

@MGAMZ

I've fixed the unit tests and lint issues to facilitate further validation of the PR content. However, the GitHub workflow isn't working yet, so manual local testing is required:

pytest tests --ignore tests/test_fileio/test_backends/test_petrel_backend.py.

After confirming all unit tests pass, you can have me review the PR. Additionally, I recommend removing lint-related code changes - let's focus on functional modifications first.

MGAMZ · 2025-10-26T02:01:06Z

@HAOCHENYE OK, I'm cherry picking the latest several commits from mmengine/main. There seems to be some format violations reported by pre-commit, so I'm fixing these format errors, this should not introduce any logical functional breaking.

And I'm running pytest and will update this PR according to the test results soon.

Maybe I will open a new PR which only contains functional modifications.

HAOCHENYE · 2025-10-26T12:07:42Z

Thanks @MGAMZ for your effort! I'll review the functional changes once all tests pass.

MGAMZ · 2025-10-27T10:50:17Z

torch.load issue has been resolved by #1670

is_builtin_module is resolved by #1669

Adafactor import error is included by #1660

MGAMZ added 30 commits July 21, 2024 17:44

Fix torch FutureWarning

4f62c98

FutureWarning: `torch.cuda.amp.GradScaler(args...)` is deprecated. Please use `torch.amp.GradScaler('cuda', args...)` instead.

Fix torch FutureWarning

b6b4224

FutureWarning: `torch.cuda.amp.GradScaler(args...)` is deprecated. Please use `torch.amp.GradScaler('cuda', args...)` instead.

Optimize the prompt for compile

4c7a5d4

Fix Incorrect Optim Param Resume Method

28d47f8

FSDP.optim_state_dict_to_load requires the following parameters: model: Module, optim: Optimizer, optim_state_dict: Dict[str, Any]

Update runner.py to support pure-python style model wrapper configura…

91d945f

…tions The current runner implementation has not yet supported for pure-python style configurations on model wrapper class. I follow the mainstream implementation to support this feature.

Merge branch 'open-mmlab:main' into main

0934d75

reconstruct

7103c3e

PyTorch Profiler within IterBasedTrainLoop

eecaa92

enable hook error exception traceback

698ad5e

Merge branch 'main' of github.com:MGAMZ/mmengine

8c80332

Merge branch 'open-mmlab:main' into main

3cf1003

improve codes

1e4c2ed

Merge branch 'open-mmlab:main' into main

2a5a1fe

KeyError: 'Adafactor is already registered in optimizer at torch.optim'.

29e3a08

This may be due to the version confliction. Newer PyTorch may have introduced this optimizer.

Merge branch 'main' of https://github.com/MGAMZ/mmengine

896576b

Update support for deep speed and multiple improvements.

be86710

Merge branch 'main' of gitee.com:MGAM/mmengine

dadedbb

improve multiple mmengine undeveloped issues.

861fc1b

Multiple improvements

8f37dd2

Merge branch 'open-mmlab:main' into main

bed2660

update dependency and bump versions

d45205c

fix wrong pyproject config.

c472f2b

Merge branch 'open-mmlab:main' into main

2cacfc0

sync version

4b3627a

disable HistoryBuffer's torch compile

c5f5ca7

Fix histort buffer bug when using torch.compile

438eb64

Merge branch 'main' of https://gitee.com/MGAM/mmengine

de1eaf9

Merge branch 'main' of gitee.com:MGAM/mmengine

eaf6d3c

1. remove unnecessary distributed warp.

6149316

HAOCHENYE and others added 10 commits October 26, 2025 09:37

[Fix] Fix config bug in python312

e93af0a

[Fix] Load checkpoint with weights_only=Flase (open-mmlab#1670)

93afd47

[Fix] Fix unit test of checkpoint hook

fed514a

[Test] Fix unittest of file client

1e12eaf

[Fix] Fix unittest of empty cache hook

e8a92b8

[Test] Fix unittest of EMAHook

74f989b

[Test] Fix ut of runner

c2d9739

[Test] Fix test of strategy

ddf6c83

[Test] Fix sync buffer hook

363492a

[Lint] Fix lint

0a07aad

HAOCHENYE mentioned this pull request Oct 26, 2025

[Feature] When to support Pytorch 2.5 and later version? #1667

Open

MGAMZ mentioned this pull request Oct 27, 2025

[Bug] Support for Python-Like config file containing model_warpper_constructor or model_wrapper MGAMZ/mmengine#3

Closed

JohannesTheo mentioned this pull request Oct 29, 2025

[Feature] State of the Open-MMLab project #1677

Open

This was referenced Nov 1, 2025

Fix the deprecated torch.cuda.amp module VBTI-development/onedl-mmengine#21

Merged

Correctly process compile.disable in config VBTI-development/onedl-mmengine#22

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Multiple Enhancement to Make MMENGINE walk further #1665

Multiple Enhancement to Make MMENGINE walk further #1665

MGAMZ commented Oct 17, 2025 •

edited

Loading

Uh oh!

MGAMZ commented Oct 20, 2025 •

edited

Loading

Uh oh!

HAOCHENYE commented Oct 20, 2025 •

edited

Loading

Uh oh!

lauriebax commented Oct 21, 2025

Uh oh!

MGAMZ commented Oct 21, 2025

Uh oh!

HAOCHENYE commented Oct 22, 2025

Uh oh!

HAOCHENYE commented Oct 26, 2025 •

edited

Loading

Uh oh!

MGAMZ commented Oct 26, 2025

Uh oh!

HAOCHENYE commented Oct 26, 2025

Uh oh!

MGAMZ commented Oct 27, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Multiple Enhancement to Make MMENGINE walk further #1665

Are you sure you want to change the base?

Multiple Enhancement to Make MMENGINE walk further #1665

Conversation

MGAMZ commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modification

Packing according to PyPA

Upgrade yapf from 0.32.0 to 0.43.0

Upgrade numpy from 1.2x to 2.2.x+

torch.compile configuration

Message Hub error caught

Message Hub ignore torch.compile

Support for Python-Like config file containing model_warpper_constructor or model_wrapper

torch.load usage

Resume strategy

Removal deprecation

Other fix

Include these PR

Checklist

Uh oh!

MGAMZ commented Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HAOCHENYE commented Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lauriebax commented Oct 21, 2025

Uh oh!

MGAMZ commented Oct 21, 2025

Uh oh!

HAOCHENYE commented Oct 22, 2025

Uh oh!

HAOCHENYE commented Oct 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MGAMZ commented Oct 26, 2025

Uh oh!

HAOCHENYE commented Oct 26, 2025

Uh oh!

MGAMZ commented Oct 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

MGAMZ commented Oct 17, 2025 •

edited

Loading

Upgrade yapf from `0.32.0` to `0.43.0`

Upgrade numpy from `1.2x` to `2.2.x+`

Message Hub ignore `torch.compile`

Support for Python-Like config file containing `model_warpper_constructor` or `model_wrapper`

`torch.load` usage

MGAMZ commented Oct 20, 2025 •

edited

Loading

HAOCHENYE commented Oct 20, 2025 •

edited

Loading

HAOCHENYE commented Oct 26, 2025 •

edited

Loading

MGAMZ commented Oct 27, 2025 •

edited

Loading