-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Cpu memory accumulation bug #20730
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Cpu memory accumulation bug #20730
Conversation
seems this is not included in this PR yet |
for more information, see https://pre-commit.ci
…orch-lightning into cpu_memory_accumulation_bug
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @ved1beta, thanks for the contribution.
Added some reviews. Please try fixing the failing tests and also mypy issue.
# Clear memory if not returning predictions | ||
import gc | ||
|
||
gc.collect() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you think it would be a good idea to have an argument collect_gc
or something that users can toggle.
As Adrian said: it might be expensive in certain situations
.
if predictions is None: | ||
self._warning_cache.warn("predict returned None if it was on purpose, ignore this warning...") | ||
step_args = self._build_step_args_from_hook_kwargs(hook_kwargs, "predict_step") | ||
step_output = call._call_lightning_module_hook(trainer, "predict_step", *step_args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are you directly calling lightning module hook
without calling strategy hook
?
After couple of checks and precision_plugin
context, it does call lightning_module's predict_step
.
What does this PR do?
This PR addresses the memory leak issue during prediction in PyTorch Lightning. It adds proper memory management when
return_predictions=False
and includes comprehensive tests to verify the fix.Fixes #19398
Key Changes:
return_predictions=False
Before submitting
trainer.predict
#19398PR review
Anyone in the community is welcome to review the PR.
Before you start reviewing, make sure you have read the review guidelines. In short, see the following bullet-list:
Reviewer checklist
📚 Documentation preview 📚: https://pytorch-lightning--20730.org.readthedocs.build/en/20730/