[FEATURE] Add Donut & Flava model #1271

zhtmike · 2025-09-11T03:49:42Z

Relies on Mbart #1195

What does this PR do?

Fixes # (issue)
In MS2.6/2.7, when there is a tied weights scenario, the load_param_into_net API may produce unexpected results during the weight loading stage. To address this, we switched to using the load_state_dict API, which more closely aligns with PyTorch's behavior.

Here is an example that demonstrates the buggy result caused by using the load_param_into_net API with tied weights.

import mindspore as ms
import mindspore.mint as mint


class Model(ms.nn.Cell):
    def __init__(self):
        super().__init__()
        self.layer1 = mint.nn.Linear(64, 64, bias=False)
        self.layer2 = mint.nn.Linear(64, 64, bias=False)
        self.layer1.weight = self.layer2.weight

# way 1
model = Model()
print(dict(model.parameters_and_names()).keys())  # layer 1 weight existed, layer 2 weight missing
model.load_state_dict({"layer1.weight": mint.ones((64, 64))}, strict=False) # can be loaded
print(model.layer2.weight.value()) # weight is correct

# way 2
model = Model()
print(dict(model.parameters_and_names()).keys())  # layer 1 weight existed, layer 2 weight missing
ms.load_param_into_net(model, {"layer1.weight": ms.Parameter(mint.ones((64, 64)))}, strict_load=False)
print(model.layer2.weight.value()) # weight is NOT correct

Adds # (feature)
Add Donut & Flava model

Donut

>>> from datasets import load_dataset
>>> from mindone.transformers import AutoProcessor, AutoModelForVision2Seq

>>> processor = AutoProcessor.from_pretrained("naver-clova-ix/donut-base-finetuned-docvqa", revision="refs/pr/23")
>>> model = AutoModelForVision2Seq.from_pretrained("naver-clova-ix/donut-base-finetuned-docvqa", revision="refs/pr/23)

>>> dataset = load_dataset("hf-internal-testing/example-documents", split="test")
>>> image = dataset[0]["image"]
>>> question = "What time is the coffee break?"
>>> task_prompt = f"<s_docvqa><s_question>{question}</s_question><s_answer>"
>>> inputs = processor(image, task_prompt, return_tensors="ms")

>>> outputs = model.generate(
...     input_ids=inputs.input_ids,
...     pixel_values=inputs.pixel_values,
...     max_length=512
... )
>>> answer = processor.decode(outputs[0], skip_special_tokens=True)
>>> print(answer)
"What time is the coffee break? 11-14 to 11:39 a.m."

version	mode	model	precision	task	s/step	weight load(s)
ms2.7.0	pynative	VisionEncoderDecoderModel	fp32	VQA	0.02	13

Flava

>>> from PIL import Image
>>> import requests
>>> from mindone.transformers import AutoProcessor, FlavaModel

>>> model = FlavaModel.from_pretrained("facebook/flava-full", revision="refs/pr/6")
>>> processor = AutoProcessor.from_pretrained("facebook/flava-full", revision="refs/pr/6")

>>> url = "http://images.cocodataset.org/val2017/000000039769.jpg"
>>> image = Image.open(requests.get(url, stream=True).raw)

>>> inputs = processor(text=["a photo of a cat"], images=image, return_tensors="ms", padding=True)

>>> outputs = model(**inputs)

>>> image_embeddings = outputs.image_embeddings
>>> text_embeddings = outputs.text_embeddings
>>> multimodal_embeddings = outputs.multimodal_embeddings

>>> outputs.image_embeddings.shape
(1, 197, 768)

>>> text_embeddings.shape
(1, 7, 768)

>>> multimodal_embeddings.shape
(1, 205, 768)

version	mode	model	precision	task	s/step)	weight load(s)
ms2.7.0	pynative	FlavaModel	fp32	FeatureExtraction	0.07	15

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you make sure to update the documentation with your changes? E.g. record bug fixes or new features in What's New. Here are the
documentation guidelines
Did you build and run the code without any errors?
Did you report the running environment (NPU type/MS version) and performance in the doc? (better record it for data loading, model inference, or training tasks)
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@xxx

SamitHuang · 2025-09-19T02:51:18Z

mindone/transformers/modeling_utils.py

    cm = silence_mindspore_logger() if is_sharded else nullcontext()
    with cm:
-        ms.load_param_into_net(model_to_load, state_dict, strict_load=True)
+        model_to_load.load_state_dict(state_dict, strict=False)


why change strict to False?

In the case of tied weights, there may be extra or missing parameters in the Hugging Face transformer checkpoint. Using strict=True will raise an error, so we follow the same design as the Transformers repo by setting strict=False.

zhtmike · 2025-09-23T02:23:19Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces support for the Donut and Flava models, a significant and valuable addition to the library. The change to use load_state_dict for handling tied weights is also a crucial bug fix. The implementation of the new models appears to be a faithful port from the Hugging Face transformers library. However, I've identified several issues, including two critical bugs in the FlavaForPreTraining model that affect the model's output when return_dict=False. Additionally, there are a few minor typos in docstrings and log messages that should be corrected to improve clarity and usability. Addressing these points will greatly enhance the quality of this contribution.

mindone/transformers/models/flava/modeling_flava.py

mindone/transformers/models/donut/modeling_donut_swin.py

mindone/transformers/models/flava/modeling_flava.py

zhtmike added 3 commits September 10, 2025 17:22

add Donut model

440cf83

add Flava Model

2e2b244

add license

2282e49

zhtmike added the feature request Add new features label Sep 11, 2025

zhtmike added 3 commits September 11, 2025 17:57

fix bug

3cf4bed

fix bug 2

3b465e6

Merge branch 'master' into donut

7d4e1ec

zhtmike marked this pull request as ready for review September 18, 2025 08:28

zhtmike requested a review from vigo999 as a code owner September 18, 2025 08:28

zhtmike added the bug Something isn't working label Sep 18, 2025

zhtmike requested a review from wcrzlh September 18, 2025 08:28

SamitHuang reviewed Sep 19, 2025

View reviewed changes

Merge branch 'master' into donut

3eb59f2

zhtmike added new model add new model to mindone and removed feature request Add new features labels Sep 23, 2025

Merge branch 'master' into donut

42ec4ae

zhtmike self-assigned this Sep 23, 2025

gemini-code-assist bot reviewed Sep 23, 2025

View reviewed changes

zhtmike added 2 commits September 23, 2025 10:26

fix typo

15048ef

Merge branch 'master' into donut

388ebce

vigo999 added this to mindone Sep 29, 2025

vigo999 moved this to In Progress in mindone Sep 29, 2025

vigo999 approved these changes Sep 29, 2025

View reviewed changes

vigo999 requested a review from zhanghuiyao September 29, 2025 08:13

Merge branch 'master' into donut

151aeab

hadipash mentioned this pull request Sep 29, 2025

feat(transformers): add Nougat example #1336

Merged

wcrzlh mentioned this pull request Sep 30, 2025

feat(transformers): add flava model #1342

Merged

6 tasks

zhtmike added 2 commits October 2, 2025 09:44

Merge branch 'master' into donut

9892766

fix merge

4657bed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FEATURE] Add Donut & Flava model #1271

[FEATURE] Add Donut & Flava model #1271

Uh oh!

zhtmike commented Sep 11, 2025 •

edited by vigo999

Loading

Uh oh!

SamitHuang Sep 19, 2025

Uh oh!

zhtmike Sep 19, 2025

Uh oh!

zhtmike commented Sep 23, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[FEATURE] Add Donut & Flava model #1271

Are you sure you want to change the base?

[FEATURE] Add Donut & Flava model #1271

Uh oh!

Conversation

zhtmike commented Sep 11, 2025 • edited by vigo999 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

SamitHuang Sep 19, 2025

Choose a reason for hiding this comment

Uh oh!

zhtmike Sep 19, 2025

Choose a reason for hiding this comment

Uh oh!

zhtmike commented Sep 23, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zhtmike commented Sep 11, 2025 •

edited by vigo999

Loading