-
Notifications
You must be signed in to change notification settings - Fork 5.9k
fix CPU offloading related fail cases on XPU #11288
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: YAO Matrix <[email protected]>
Signed-off-by: YAO Matrix <[email protected]>
@hlky , pls help review and comments, thx very much. |
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
@bot /style |
Style fixes have been applied. View the workflow run here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @yao-matrix. Changes look good to me and save updating a lot of other code/docs/tests
docs/ enable_model_cpu_offload() 169 results in 40 files
docs/ enable_sequential_cpu_offload() 15 results in 8 files
src/ enable_model_cpu_offload() `_load_lora_into_text_encoder`, `load_lora_adapter`, `load_attn_procs`, and 43 results in 35 files for examples
src/ enable_sequential_cpu_offload() `_load_lora_into_text_encoder`, `load_lora_adapter`, `load_attn_procs`
tests/ enable_model_cpu_offload() 47 results in 21 files
tests/ enable_sequential_cpu_offload() 14 results in 9 files
There is one related fast test failing (others are transient Hub issues):
diffusers/tests/pipelines/test_pipelines.py
Lines 1802 to 1822 in f1f38ff
def test_pipe_same_device_id_offload(self): | |
unet = self.dummy_cond_unet() | |
scheduler = PNDMScheduler(skip_prk_steps=True) | |
vae = self.dummy_vae | |
bert = self.dummy_text_encoder | |
tokenizer = CLIPTokenizer.from_pretrained("hf-internal-testing/tiny-random-clip") | |
sd = StableDiffusionPipeline( | |
unet=unet, | |
scheduler=scheduler, | |
vae=vae, | |
text_encoder=bert, | |
tokenizer=tokenizer, | |
safety_checker=None, | |
feature_extractor=self.dummy_extractor, | |
) | |
sd.enable_model_cpu_offload(gpu_id=5) | |
assert sd._offload_gpu_id == 5 | |
sd.maybe_free_model_hooks() | |
assert sd._offload_gpu_id == 5 |
This test could either be moved to PipelineSlowTests
, or if it's already passing when ran on a non-CUDA system we could simply add device="cuda"
as the test is not actually offloading anything and only checking _offload_gpu_id
is set correctly.
diffusers/tests/pipelines/test_pipelines.py
Line 1819 in f1f38ff
sd.enable_model_cpu_offload(gpu_id=5) |
Can you confirm whether this test passes on XPU system, @yao-matrix?
Let's also get a second opinion, cc @sayakpaul
I would prefer the second option here as it's only targeting SD pipeline and we already have a bunch of offloading related tests in |
@sayakpaul This is a fast test that runs on CPU, if we decorate with |
My bad. Then all good. |
@hlky, this case passes on XPU, I pasted the log as below: The reason is in the Thx. |
Symptom
As of now, below 5 CPU related offloading cases fail on XPU
Proposal
change
device
arg's default value from"cuda"
toNone
inenable_model_cpu_offload
andenable_sequential_cpu_offload
, and add an automatically device detection logic in these 2 functions to detect accelerator if not specified.Possible Questions
Q1: why not just change the test case code to specify the device?
[A1]: this 2 utility functions is heavily used in docs and jupyter notebooks, and also be used in internal lora loader(as here ) and ti loader(as here) with the assumption that if not specified, the arg should be accelerator(before this PR, this narrowly means "cuda"). We need inherit this good assumption and generalize the accelerator from cuda to all types of accelerators, so diffusers can support multiple devices with zero code change.
Q2: why not add "npu" and other devices while
get_device
?[A2]: I don't have these device access, so I prefer those guys who have access do that, so they can validate before merging PRs. And it's simple to add a if-branch in
get_device
, most of the efforts are validation efforts.Backward Compatibility
Full backward compatibility.