Skip to content

Conversation

@townwish4git
Copy link
Contributor

@townwish4git townwish4git commented Nov 4, 2025

What does this PR do?

Adaptations for mindspore:

  • examples/server
  • examples/server-async

which demonstrate the serving capabilities for inference based on the mindone.diffusers model, supporting concurrent and multi-threaded requests to generate images that may be requested by multiple users at the same time.

Note

⚠️ It is important to note that due to MindSpore's inherent lack of support for multi-threading operations, the maximum number of workers in the thread pool dedicated to model inference is strictly limited to 1. Consequently, multiple concurrent requests are processed serially rather than in parallel in the backend.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline?
  • Did you make sure to update the documentation with your changes? E.g. record bug fixes or new features in What's New. Here are the
    documentation guidelines
  • Did you build and run the code without any errors?
  • Did you report the running environment (NPU type/MS version) and performance in the doc? (better record it for data loading, model inference, or training tasks)
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@xxx

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @townwish4git, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a robust asynchronous server demo designed to serve mindone.diffusers models for image generation. The core innovation lies in its ability to manage concurrent inference requests efficiently by utilizing a RequestScopedPipeline that intelligently shares large model components while isolating mutable state for each request. This approach aims to optimize resource usage and prevent common concurrency issues. While the architecture is built for parallelism, it currently operates with a single inference worker due to MindSpore's present multi-threading limitations, processing requests serially. The demo provides a comprehensive example for deploying and interacting with diffusion models in a server environment.

Highlights

  • New Server Demo: Introduces a new asynchronous server demo (examples/server-async) for mindone.diffusers pipeline inference, enabling serving capabilities for image generation models.
  • Concurrent Request Handling: Implements a RequestScopedPipeline mechanism to safely handle concurrent inference requests by sharing heavy model parameters and cloning only small, stateful components (like schedulers and RNG state) per request, mitigating race conditions and memory duplication.
  • MindSpore Adaptations: Includes specific adaptations for MindSpore, such as mindspore_dtype=ms.float16 for pipeline loading and MindSpore-specific memory management (ms.runtime.empty_cache()) after inference.
  • Concurrency Limitations: Explicitly addresses the current limitation of MindSpore's lack of multi-threading support, restricting the inference thread pool to a single worker, meaning concurrent requests are processed serially.
  • Comprehensive Example: Provides a complete FastAPI-based server (serverasync.py), pipeline initialization logic (Pipelines.py), utility functions (utils/), a client test script (test.py), and a detailed README.md for setup, usage, and troubleshooting.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a valuable example of an asynchronous server for diffusers pipeline inference using FastAPI and MindSpore. The implementation correctly handles thread-safety for concurrent requests by using request-scoped pipeline objects and careful management of mutable state. The code is well-structured and includes useful features like metrics logging and graceful shutdown.

My review includes several suggestions to improve code quality, such as removing unused variables, using more specific exception types, improving error messages, and adhering to Python best practices like avoiding local imports. These changes will enhance the clarity and maintainability of the example code.

mindspore_dtype=ms.float16,
)
else:
raise Exception("No Ascend device available")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using the generic Exception is generally discouraged. It's better to raise a more specific exception to allow for more granular error handling. RuntimeError would be more appropriate here to indicate that an external condition (the absence of an Ascend device) prevents the program from running.

Suggested change
raise Exception("No Ascend device available")
raise RuntimeError("No Ascend device available")

Comment on lines +66 to +78
if self.model in preset_models.SD3:
self.model_type = "SD3"
elif self.model in preset_models.SD3_5:
self.model_type = "SD3_5"

# Create appropriate pipeline based on model type and type_models
if self.type_models == "t2im":
if self.model_type in ["SD3", "SD3_5"]:
self.pipeline = TextToImagePipelineSD3(self.model)
else:
raise ValueError(f"Model type {self.model_type} not supported for text-to-image")
elif self.type_models == "t2v":
raise ValueError(f"Unsupported type_models: {self.type_models}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current logic for handling unsupported models can be improved. If an unsupported model name is provided, self.model_type becomes None, leading to a less informative error message: Model type None not supported for text-to-image. It's better to raise an error immediately if the model is not in the preset lists. This refactoring also simplifies the subsequent logic.

Suggested change
if self.model in preset_models.SD3:
self.model_type = "SD3"
elif self.model in preset_models.SD3_5:
self.model_type = "SD3_5"
# Create appropriate pipeline based on model type and type_models
if self.type_models == "t2im":
if self.model_type in ["SD3", "SD3_5"]:
self.pipeline = TextToImagePipelineSD3(self.model)
else:
raise ValueError(f"Model type {self.model_type} not supported for text-to-image")
elif self.type_models == "t2v":
raise ValueError(f"Unsupported type_models: {self.type_models}")
if self.model in preset_models.SD3:
self.model_type = "SD3"
elif self.model in preset_models.SD3_5:
self.model_type = "SD3_5"
else:
raise ValueError(f"Model '{self.model}' is not a supported preset model.")
# Create appropriate pipeline based on model type and type_models
if self.type_models == "t2im":
self.pipeline = TextToImagePipelineSD3(self.model)
elif self.type_models == "t2v":
raise ValueError(f"Unsupported type_models: {self.type_models}")

model_pipeline.start()

request_pipe = RequestScopedPipeline(model_pipeline.pipeline)
pipeline_lock = threading.Lock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The pipeline_lock variable is initialized but appears to be unused throughout the application. To improve code clarity and remove dead code, it should be removed. This also applies to its assignment to app.state.PIPELINE_LOCK on line 117.

num_images_per_prompt = json.num_images_per_prompt

wrapper = app.state.MODEL_PIPELINE
initializer = app.state.MODEL_INITIALIZER # noqa: F841
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The initializer variable is assigned but never used. It can be safely removed to clean up the code.

app.state.active_inferences += 1

# output = await run_in_threadpool(infer)
loop = asyncio.get_event_loop()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The function asyncio.get_event_loop() has been deprecated since Python 3.10 and its usage is discouraged. It's recommended to use asyncio.get_running_loop() instead, which is safer as it raises a RuntimeError if no event loop is running.

Suggested change
loop = asyncio.get_event_loop()
loop = asyncio.get_running_loop()

return self._auto_detected_attrs

candidates: List[str] = []
seen = set()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The seen variable is initialized as a set but is never used within the _autodetect_mutables method. It should be removed to avoid dead code.

Comment on lines +19 to +21
self.video_dir = os.path.join(tempfile.gettempdir(), "videos")
if not os.path.exists(self.video_dir):
os.makedirs(self.video_dir)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The self.video_dir attribute is initialized and the corresponding directory is created, but it is not used anywhere in the codebase. If this is not intended for future use, it should be removed to eliminate dead code.


def save_image(self, image):
if isinstance(image, ms.Tensor):
from mindspore.dataset.vision import transforms
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This import is performed inside a method. According to PEP 8, imports should be at the top of the file. This improves readability and makes dependencies clear. Please move from mindspore.dataset.vision import transforms to the top of the module.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant