Fix dtype when loading to meta model #36447

zucchini-nlp · 2025-02-27T10:27:29Z

What does this PR do?

Before #36335, when loading the weights to meta model we kept the dtype of the module params. This PR fixes it by setting assign=True. I am not sure though, why strict=False, should I change it to True?

Before (no dtype is passed to accelerate):
set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs)

After:
module.load_state_dict({param_type: param[:].to(param_device)}, False, True)

github-actions · 2025-02-27T10:27:40Z

Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. When it is ready for review, please click the Ready for review button (at the bottom of the PR page).

ArthurZucker · 2025-02-27T10:33:31Z

src/transformers/modeling_utils.py

-                        False,
-                        True,
+                        strict=False,
+                        assign=False,


Suggested change

assign=False,

assign=True,

we need to assign to get speed boost / leverage meta device

oke, then we will need to either cast params to same dtype as module or add a more involved logic of passing dtype for composite model. Personally, I think the second way is will complicate things even more

Otherwise any time one wants to load different dtypes for each backbones with accelerate (device_map or low_cpu_mem-usage), one will get only one dtype for the whole model

HuggingFaceDocBuilderDev · 2025-02-27T10:54:04Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

zucchini-nlp added 2 commits February 27, 2025 11:22

fix

5bcfb5b

nit

fa0bc5f

zucchini-nlp requested a review from ArthurZucker February 27, 2025 10:27

github-actions bot marked this pull request as draft February 27, 2025 10:27

ArthurZucker reviewed Feb 27, 2025

View reviewed changes

zucchini-nlp added 2 commits February 27, 2025 12:25

update

8569757

revert

95abe6f

zucchini-nlp marked this pull request as ready for review February 28, 2025 07:52

Merge branch 'main' into fix-dtype-meta-model

8fd6145

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix dtype when loading to meta model #36447

Fix dtype when loading to meta model #36447

zucchini-nlp commented Feb 27, 2025

github-actions bot commented Feb 27, 2025

ArthurZucker Feb 27, 2025

zucchini-nlp Feb 27, 2025

HuggingFaceDocBuilderDev commented Feb 27, 2025

Fix dtype when loading to meta model #36447

Are you sure you want to change the base?

Fix dtype when loading to meta model #36447

Conversation

zucchini-nlp commented Feb 27, 2025

What does this PR do?

github-actions bot commented Feb 27, 2025

ArthurZucker Feb 27, 2025

Choose a reason for hiding this comment

zucchini-nlp Feb 27, 2025

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Feb 27, 2025