-
Notifications
You must be signed in to change notification settings - Fork 425
Multiple Enhancement to Make MMENGINE walk further #1665
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
FutureWarning: `torch.cuda.amp.GradScaler(args...)` is deprecated. Please use `torch.amp.GradScaler('cuda', args...)` instead.
FutureWarning: `torch.cuda.amp.GradScaler(args...)` is deprecated. Please use `torch.amp.GradScaler('cuda', args...)` instead.
FSDP.optim_state_dict_to_load requires the following parameters: model: Module, optim: Optimizer, optim_state_dict: Dict[str, Any]
…tions The current runner implementation has not yet supported for pure-python style configurations on model wrapper class. I follow the mainstream implementation to support this feature.
This may be due to the version confliction. Newer PyTorch may have introduced this optimizer.
2. Add torch compiler disable flag to message hub class. 3. The compile-time fault override has been moved from history buffer to message hub. 4. The MMDistributedDataParallel module has now been recovered to original MMEngine implementation. The reason for the modification at that time may be related to the train_step function modification at earlier projects. Such modification will be achived by inheriting a new class in the future.
Great, I am glad to. I was worrying the delicated PRs were useless as And yes, It shall take some time as recently quite busy on my works. |
Even though I have other commitments and cannot fulfill everyone's needs, I still hope that MMEngine can respond to users who are doing deep development. Please trust that as long as you're willing to progressively and sustainably advance these PRs, I will eventually merge these changes into the main branch and release a new version. |
|
Hi, we have noticed that MMLabs were not maintained very well. We have updated 7 of the repos, including MMEngine, to work with pytorch 2.8 and use pyproject.toml, along with some other modernization. We are committed to maintain our version of mmengine and this would help a lot too. Our repo has working CI pipelines and although we are a small team, we have some dedicated resources to review and test MR's like these. Would you be interested to open an MR in our fork (https://github.com/VBTI-development/onedl-mmengine)? And @HAOCHENYE maybe you can support us in maintaining our version, since the rest of mmlabs seems to be no longer maintained? |
|
@lauriebax Impressive. |
I'll take some time this weekend to review what changes have been made in this PR and onedl mmengine compared to the current main branch, and see which modifications can be merged into main first. I can test these changes with mmengine or mmcv initially, and then try contacting the maintainers of other repos to see if we can move this forward. |
|
I've fixed the unit tests and lint issues to facilitate further validation of the PR content. However, the GitHub workflow isn't working yet, so manual local testing is required: pytest tests --ignore tests/test_fileio/test_backends/test_petrel_backend.py.After confirming all unit tests pass, you can have me review the PR. Additionally, I recommend removing lint-related code changes - let's focus on functional modifications first. |
|
@HAOCHENYE OK, I'm cherry picking the latest several commits from And I'm running Maybe I will open a new PR which only contains functional modifications. |
|
Thanks @MGAMZ for your effort! I'll review the functional changes once all tests pass. |
Motivation
The maintenance progress of mmengine is getting slower and slower at present. In order to enable this excellent architecture to progress further, I have introduced some upcoming compatibility improvements and some minor optimizations for it.
Modification
Packing according to PyPA
The packaging using
setup.pyis deprecated and is due to be removed. Now PyPA suggests to usepyproject.tomlas the packaging configuration. So refactoring them now.Upgrade yapf from
0.32.0to0.43.0The original yapf is not supported in python3.13 due to the missing of
lib2to3. And in order to use the latest yapf version, the repo site need to be changed, so the pre-commit-config.yaml undergo a small modification. This should only bring minor behavior difference.Upgrade numpy from
1.2xto2.2.x+The
numpy.compatis deprecated, usingnumpy.fftintests/data/config/lazy_module_config/test_ast_transform.pyandtests/test_config/test_lazy.py.torch.compile configuration
The
torch.compilereceives a lot args. And in mmengine, all contents incompilefrom any config file will be transferred totorch.compile.The mmengine checks several dependencies when user specifies
compilein their config file, and makinghasattr(config, compile)as the flag of enablingtorch.compile.However this is not exact. The compile config can include an arg
disableto state if the compiler is actually working. So the mmengine needs to carefully determine if the user is actually enabling thetorch.compile. This caused a small modification inmmengine/_strategy/base.pyMessage Hub error caught
The message hub receives critical Tensor input and displays them. In it's procedure, the
itemmethod is called when handling values (can be lr, loss, metrics, time, epoch, etc...) When there exists some invalid values, the message simply crashes the process as there is an assertion.Considering the user is easy to print something bad to the message hub at ANY code line, a more robust way is to should a warning and returns the value and see if the directly returned value can be successfully processed by outer functions.
Message Hub ignore
torch.compileThe message hub's operation will inevitablely triger graph breaking during compile, so disabling it now.
Support for Python-Like config file containing
model_warpper_constructorormodel_wrapperThe current parsing of constructor_cfg does not support the following case (python-style configuration file):
So adding the corresponding init logic:
The similar issue exist on
model_wrapper_cfgtoo, so there're some changes inmmengine.runner.runner.wrap_modeltoo.torch.loadusageThe
torch.loadnow requires aweights_onlyarg. By default, it should be set toFalseto prevent any risks.Resume strategy
Advance dataloader to skip data that has already been trained on InfiniteSampled dataloader is non-sense. And could waste many times for data preprocessing, as the
next(self.dataloader_iterator)operation actually performs all preprocessing steps.Logic Breaking
In some scenarios where we need to make sure that all data are browsed at the same frequency after the model is trained. This modification might be insuitable.
Removal deprecation
torch.cuda.ampis no longer used, replacing it withfrom torch.amp import GradScaler, and then setGradScaler = partial(amp_GradScaler, device='cuda')pkg_resourcesis no longer supported and is due to be completed abandoned in Nov. 2025. Replacing it withimportlib. This causes many modifications inmmengine/utils/package_utils.pyOther fix
vis_backenderror report hint.Include these PR
#1650
#1654 (partially)
#1639
#1610
#1617
#1608
(Maybe missing some PRs)
Checklist
Due to my personal effort, I am unable to check everything here.