feat(diffusers): add server demo for diffusers pipeline inference #1413

townwish4git · 2025-11-04T10:11:42Z

What does this PR do?

Adaptations for mindspore:

examples/server
examples/server-async

which demonstrate the serving capabilities for inference based on the mindone.diffusers model, supporting concurrent and multi-threaded requests to generate images that may be requested by multiple users at the same time.

Note

⚠️ It is important to note that due to MindSpore's inherent lack of support for multi-threading operations, the maximum number of workers in the thread pool dedicated to model inference is strictly limited to 1. Consequently, multiple concurrent requests are processed serially rather than in parallel in the backend.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you make sure to update the documentation with your changes? E.g. record bug fixes or new features in What's New. Here are the
documentation guidelines
Did you build and run the code without any errors?
Did you report the running environment (NPU type/MS version) and performance in the doc? (better record it for data loading, model inference, or training tasks)
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@xxx

…ence

gemini-code-assist · 2025-11-04T10:12:15Z

Summary of Changes

Hello @townwish4git, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a robust asynchronous server demo designed to serve mindone.diffusers models for image generation. The core innovation lies in its ability to manage concurrent inference requests efficiently by utilizing a RequestScopedPipeline that intelligently shares large model components while isolating mutable state for each request. This approach aims to optimize resource usage and prevent common concurrency issues. While the architecture is built for parallelism, it currently operates with a single inference worker due to MindSpore's present multi-threading limitations, processing requests serially. The demo provides a comprehensive example for deploying and interacting with diffusion models in a server environment.

Highlights

New Server Demo: Introduces a new asynchronous server demo (examples/server-async) for mindone.diffusers pipeline inference, enabling serving capabilities for image generation models.
Concurrent Request Handling: Implements a RequestScopedPipeline mechanism to safely handle concurrent inference requests by sharing heavy model parameters and cloning only small, stateful components (like schedulers and RNG state) per request, mitigating race conditions and memory duplication.
MindSpore Adaptations: Includes specific adaptations for MindSpore, such as mindspore_dtype=ms.float16 for pipeline loading and MindSpore-specific memory management (ms.runtime.empty_cache()) after inference.
Concurrency Limitations: Explicitly addresses the current limitation of MindSpore's lack of multi-threading support, restricting the inference thread pool to a single worker, meaning concurrent requests are processed serially.
Comprehensive Example: Provides a complete FastAPI-based server (serverasync.py), pipeline initialization logic (Pipelines.py), utility functions (utils/), a client test script (test.py), and a detailed README.md for setup, usage, and troubleshooting.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a valuable example of an asynchronous server for diffusers pipeline inference using FastAPI and MindSpore. The implementation correctly handles thread-safety for concurrent requests by using request-scoped pipeline objects and careful management of mutable state. The code is well-structured and includes useful features like metrics logging and graceful shutdown.

My review includes several suggestions to improve code quality, such as removing unused variables, using more specific exception types, improving error messages, and adhering to Python best practices like avoiding local imports. These changes will enhance the clarity and maintainability of the example code.

gemini-code-assist · 2025-11-04T10:15:01Z

examples/server-async/Pipelines.py

+                mindspore_dtype=ms.float16,
+            )
+        else:
+            raise Exception("No Ascend device available")


Using the generic Exception is generally discouraged. It's better to raise a more specific exception to allow for more granular error handling. RuntimeError would be more appropriate here to indicate that an external condition (the absence of an Ascend device) prevents the program from running.

Suggested change

raise Exception("No Ascend device available")

raise RuntimeError("No Ascend device available")

gemini-code-assist · 2025-11-04T10:15:01Z

examples/server-async/Pipelines.py

+        if self.model in preset_models.SD3:
+            self.model_type = "SD3"
+        elif self.model in preset_models.SD3_5:
+            self.model_type = "SD3_5"
+
+        # Create appropriate pipeline based on model type and type_models
+        if self.type_models == "t2im":
+            if self.model_type in ["SD3", "SD3_5"]:
+                self.pipeline = TextToImagePipelineSD3(self.model)
+            else:
+                raise ValueError(f"Model type {self.model_type} not supported for text-to-image")
+        elif self.type_models == "t2v":
+            raise ValueError(f"Unsupported type_models: {self.type_models}")


The current logic for handling unsupported models can be improved. If an unsupported model name is provided, self.model_type becomes None, leading to a less informative error message: Model type None not supported for text-to-image. It's better to raise an error immediately if the model is not in the preset lists. This refactoring also simplifies the subsequent logic.

Suggested change

if self.model in preset_models.SD3:

self.model_type = "SD3"

elif self.model in preset_models.SD3_5:

self.model_type = "SD3_5"

# Create appropriate pipeline based on model type and type_models

if self.type_models == "t2im":

if self.model_type in ["SD3", "SD3_5"]:

self.pipeline = TextToImagePipelineSD3(self.model)

else:

raise ValueError(f"Model type {self.model_type} not supported for text-to-image")

elif self.type_models == "t2v":

raise ValueError(f"Unsupported type_models: {self.type_models}")

if self.model in preset_models.SD3:

self.model_type = "SD3"

elif self.model in preset_models.SD3_5:

self.model_type = "SD3_5"

else:

raise ValueError(f"Model '{self.model}' is not a supported preset model.")

# Create appropriate pipeline based on model type and type_models

if self.type_models == "t2im":

self.pipeline = TextToImagePipelineSD3(self.model)

elif self.type_models == "t2v":

raise ValueError(f"Unsupported type_models: {self.type_models}")

gemini-code-assist · 2025-11-04T10:15:01Z

examples/server-async/serverasync.py

+model_pipeline.start()
+
+request_pipe = RequestScopedPipeline(model_pipeline.pipeline)
+pipeline_lock = threading.Lock()


The pipeline_lock variable is initialized but appears to be unused throughout the application. To improve code clarity and remove dead code, it should be removed. This also applies to its assignment to app.state.PIPELINE_LOCK on line 117.

gemini-code-assist · 2025-11-04T10:15:01Z

examples/server-async/serverasync.py

+    num_images_per_prompt = json.num_images_per_prompt
+
+    wrapper = app.state.MODEL_PIPELINE
+    initializer = app.state.MODEL_INITIALIZER  # noqa: F841


The initializer variable is assigned but never used. It can be safely removed to clean up the code.

gemini-code-assist · 2025-11-04T10:15:01Z

examples/server-async/serverasync.py

+            app.state.active_inferences += 1
+
+        # output = await run_in_threadpool(infer)
+        loop = asyncio.get_event_loop()


The function asyncio.get_event_loop() has been deprecated since Python 3.10 and its usage is discouraged. It's recommended to use asyncio.get_running_loop() instead, which is safer as it raises a RuntimeError if no event loop is running.

Suggested change

loop = asyncio.get_event_loop()

loop = asyncio.get_running_loop()

gemini-code-assist · 2025-11-04T10:15:01Z

examples/server-async/utils/requestscopedpipeline.py

+            return self._auto_detected_attrs
+
+        candidates: List[str] = []
+        seen = set()


The seen variable is initialized as a set but is never used within the _autodetect_mutables method. It should be removed to avoid dead code.

gemini-code-assist · 2025-11-04T10:15:02Z

examples/server-async/utils/utils.py

+        self.video_dir = os.path.join(tempfile.gettempdir(), "videos")
+        if not os.path.exists(self.video_dir):
+            os.makedirs(self.video_dir)


The self.video_dir attribute is initialized and the corresponding directory is created, but it is not used anywhere in the codebase. If this is not intended for future use, it should be removed to eliminate dead code.

gemini-code-assist · 2025-11-04T10:15:02Z

examples/server-async/utils/utils.py

+
+    def save_image(self, image):
+        if isinstance(image, ms.Tensor):
+            from mindspore.dataset.vision import transforms


This import is performed inside a method. According to PEP 8, imports should be at the top of the file. This improves readability and makes dependencies clear. Please move from mindspore.dataset.vision import transforms to the top of the module.

feat(diffusers): add server-async demo for diffusers pipeline infer…

1faf805

…ence

townwish4git requested review from CaitinZhao, SamitHuang and zhanghuiyao as code owners November 4, 2025 10:11

gemini-code-assist bot reviewed Nov 4, 2025

View reviewed changes

feat(diffusers): add server demo for diffusers pipeline inference

10bd9e1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(diffusers): add server demo for diffusers pipeline inference #1413

feat(diffusers): add server demo for diffusers pipeline inference #1413

Uh oh!

townwish4git commented Nov 4, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Nov 4, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Nov 4, 2025

Uh oh!

gemini-code-assist bot Nov 4, 2025

Uh oh!

gemini-code-assist bot Nov 4, 2025

Uh oh!

gemini-code-assist bot Nov 4, 2025

Uh oh!

gemini-code-assist bot Nov 4, 2025

Uh oh!

gemini-code-assist bot Nov 4, 2025

Uh oh!

gemini-code-assist bot Nov 4, 2025

Uh oh!

gemini-code-assist bot Nov 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	raise Exception("No Ascend device available")
	raise RuntimeError("No Ascend device available")

	loop = asyncio.get_event_loop()
	loop = asyncio.get_running_loop()

feat(diffusers): add server demo for diffusers pipeline inference #1413

Are you sure you want to change the base?

feat(diffusers): add server demo for diffusers pipeline inference #1413

Uh oh!

Conversation

townwish4git commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

gemini-code-assist bot commented Nov 4, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

townwish4git commented Nov 4, 2025 •

edited

Loading