Feature/group offload pinning #12747

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

Aki-07 wants to merge 5 commits into huggingface:main from bconstantine:feature/group-offload-pinning

+414 −232

Aki-07 commented Nov 29, 2025 •

edited

Loading

What does this PR do?

Fixes #11966

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Aki-07 force-pushed the feature/group-offload-pinning branch from 7e50d90 to 3b3813d Compare

November 29, 2025 14:03

Member

sayakpaul commented Dec 2, 2025

Thanks for your PR. However, it's being worked on in #12721.

bconstantine mentioned this pull request

How about forcing the first and last block on device when groupoffloading is used? #11966

Open

sayakpaul requested review from DN6 and sayakpaul

December 2, 2025 15:29

sayakpaul mentioned this pull request

The Diffusers MVP 🚀 #12635

Open

Member

sayakpaul commented Dec 9, 2025 •

edited

Loading

Could we resolve conflicts so that it's a bit easier to review? Seems like there's some overlap from #12692.

bconstantine and others added 4 commits

December 10, 2025 11:01


          created test for pinning first and last block on device

b9e0994


          Support explicit block modules in group offloading

a99755a


          Expose group offload pinning options in API

ffad316


          removed deprecated flag pin_first_last

33d8b52

Aki-07 force-pushed the feature/group-offload-pinning branch from 6d96002 to 33d8b52 Compare

December 10, 2025 06:06

Author

Aki-07 commented Dec 10, 2025

Done! Rebased on latest main and resolved conflicts with #12692. Should be much cleaner to review now.

sayakpaul reviewed

View reviewed changes

Member

sayakpaul left a comment

Some initial comments.

src/diffusers/hooks/group_offloading.py

Comment on lines 310 to 312

    
                          should_synchronize = (

                              not self.group.onload_self and self.group.stream is not None and not should_onload_next_group

                          )

Member

sayakpaul Dec 9, 2025

What if non_blocking=True?

Author

Aki-07 Dec 10, 2025

Even with non_blocking=True, if a previous group onloaded this one on a side stream, we need a sync before the default stream uses the weights or we risk reading half-copied tensors. I’ve limited the sync to the record_stream=False case, when record_stream=True the tensors are tied to the consumer stream so we can safely skip the sync.

src/diffusers/hooks/group_offloading.py Outdated

Comment on lines 363 to 364

    
                      if len(tensors) == 0:

                          return True

Member

sayakpaul Dec 9, 2025

This means the group is empty. Why would we return True for this?

Author

Aki-07 Dec 10, 2025

Agreed, that was misleading. Now an empty group returns False so a ‘pinned’ empty group will still onload instead of claiming it’s already on device.

src/diffusers/hooks/group_offloading.py Outdated

Comment on lines 643 to 651

    
                  normalized_pin_groups = pin_groups

                  if isinstance(pin_groups, str):

                      normalized_pin_groups = pin_groups.lower()

                      if normalized_pin_groups not in {"first_last", "all"}:

                          raise ValueError("`pin_groups` must be one of `None`, 'first_last', 'all', or a callable.")

                  elif pin_groups is not None and not callable(pin_groups):

                      raise ValueError("`pin_groups` must be one of `None`, 'first_last', 'all', or a callable.")

                  pin_groups = normalized_pin_groups

Member

sayakpaul Dec 9, 2025

(nit): would prefer to have a small utility function: _normalize_pin_groups().

Author

Aki-07 Dec 10, 2025

Added a _normalize_pin_groups helper fn()

src/diffusers/hooks/group_offloading.py Outdated

    
                  for name, submodule in module.named_children():

                      if not isinstance(submodule, (torch.nn.ModuleList, torch.nn.Sequential)):

                      # Check if this is an explicitly defined block module

                      if name in block_modules:

Member

sayakpaul Dec 9, 2025

Suggested change

      
                    if name in block_modules:
          
                    if block_modules and name in block_modules:

Author

Aki-07 Dec 10, 2025

Updated

src/diffusers/hooks/group_offloading.py

    
                  block_modules: Optional[List[str]] = None

                  exclude_kwargs: Optional[List[str]] = None

                  module_prefix: Optional[str] = ""

                  pin_groups: Optional[Union[str, Callable]] = None

Member

sayakpaul Dec 10, 2025

This seems like a breaking change. Could you please elaborate?

Author

Aki-07 Dec 10, 2025

Thanks for flagging, have fixed to default it. Sorry for the oversight

src/diffusers/hooks/group_offloading.py Outdated

    
                      tensor.data = source_tensor.to(self.onload_device, non_blocking=self.non_blocking)

                      if self.record_stream:

                          tensor.data.record_stream(default_stream)

                          tensor.data.record_stream(self._torch_accelerator_module.current_stream())

Member

sayakpaul Dec 10, 2025

Could you elaborate on this change?
#12721 explains why it's the way it is.

Author

Aki-07 Dec 10, 2025

Fixed to the correct behavior from #12721, record the tensor on the consumer/default stream ( captured before entering the transfer stream ) so its lifetime is tied to the forward stream

bconstantine commented Dec 10, 2025

Thank you for the initial comment! We are working on the solutions right now


          Address review feedback for group offload pinning

6f5887e

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet