[Bug] Unexpected GPU0 Memory Usage When Using Multiple GPUs #1113

e1ijah1 · 2024-09-20T04:11:03Z

Your current environment information

Environment:

accelerate==1.0.0rc1
aiohappyeyeballs==2.4.0
aiohttp==3.10.5
aiosignal==1.3.1
aiosqlite==0.20.0
annotated-types==0.7.0
anyio==4.4.0
appdirs==1.4.4
asgiref==3.8.1
async-timeout==4.0.3
attrs==24.2.0
bentoml==1.3.5
cattrs==23.1.2
certifi==2024.8.30
charset-normalizer==3.3.2
circus==0.18.0
click==8.1.7
click-option-group==0.5.6
cloudpickle==3.0.0
DeepCache==0.1.1
deepmerge==2.0
Deprecated==1.2.14
diffusers @ git+https://github.com/huggingface/diffusers@95a7832879a3ca7debd3f7a4ee05b08ddd19a8a7
exceptiongroup==1.2.2
filelock==3.16.0
frozenlist==1.4.1
fs==2.4.16
fsspec==2024.9.0
h11==0.14.0
httpcore==1.0.5
httpx==0.27.2
httpx-ws==0.6.0
huggingface-hub==0.24.7
idna==3.9
importlib-metadata==6.11.0
inflection==0.5.1
inquirerpy==0.3.4
Jinja2==3.1.4
markdown-it-py==3.0.0
MarkupSafe==2.1.5
mdurl==0.1.2
mpmath==1.3.0
multidict==6.1.0
networkx==3.3
numpy==1.24.1
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu12==8.9.2.26
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-ml-py==11.525.150
nvidia-nccl-cu12==2.20.5
nvidia-nvjitlink-cu12==12.6.68
nvidia-nvtx-cu12==12.1.105
omegaconf==2.4.0.dev3
onediff==1.2.1.dev23
onediffx==1.2.1.dev23
oneflow==0.9.1.dev20240913+cu121
onefx==0.0.3
opentelemetry-api==1.20.0
opentelemetry-instrumentation==0.41b0
opentelemetry-instrumentation-aiohttp-client==0.41b0
opentelemetry-instrumentation-asgi==0.41b0
opentelemetry-sdk==1.20.0
opentelemetry-semantic-conventions==0.41b0
opentelemetry-util-http==0.41b0
packaging==24.1
pathspec==0.12.1
pfzy==0.3.4
pillow==10.4.0
pip-requirements-parser==32.0.1
prometheus_client==0.20.0
prompt_toolkit==3.0.47
protobuf==5.28.1
psutil==6.0.0
pydantic==2.9.2
pydantic_core==2.23.4
Pygments==2.18.0
pyparsing==3.1.4
python-dateutil==2.9.0.post0
python-json-logger==2.0.7
python-multipart==0.0.9
PyYAML==6.0.2
pyzmq==26.2.0
regex==2024.9.11
requests==2.32.3
rich==13.8.1
safetensors==0.4.5
schema==0.7.7
sentencepiece==0.2.0
simple-di==0.1.5
six==1.16.0
sniffio==1.3.1
starlette==0.38.5
sympy==1.13.2
tokenizers==0.13.3
tomli==2.0.1
tomli_w==1.0.0
torch==2.3.0
tornado==6.4.1
tqdm==4.66.5
transformers==4.27.1
triton==2.3.0
typing_extensions==4.12.2
urllib3==2.2.3
uv==0.4.12
uvicorn==0.30.6
watchfiles==0.24.0
wcwidth==0.2.13
wrapt==1.16.0
wsproto==1.2.0
yarl==1.11.1
zipp==3.20.2

🐛 Describe the bug

I'm attempting to implement the acceleration method described in this article: https://github.com/siliconflow/onediff/tree/7c325253d4e280e470613be43fa3e582a476923e/onediff_diffusers_extensions/examples/kolors

When specifying a particular device to load the model, compile, and run inference, there's always an additional memory usage observed on GPU0.

To troubleshoot, I modified the code to explicitly specify device=cuda:1. However, during inference, GPU0 still shows a 266MB memory occupation.

Steps to reproduce:

Follow the setup instructions in the linked article.
Modify the code to specify GPU1 for model loading and inference.
```
...
device = torch.device("cuda:1")
...
```
Run with oneflow backend compile:

  
  python3 onediff_diffusers_extensions/examples/kolors/text_to_image_kolors.py \
--compiler oneflow \
--saved-image kolors_oneflow_compile.png

Expected behavior:
All operations and memory usage should be confined to the specified GPU (cuda:1 in this case).

Actual behavior:
GPU0 consistently shows a 266MB memory usage, despite operations being directed to other GPUs.

Questions:

Is this 266MB memory usage on GPU0 expected behavior?
If not, what could be causing this persistent memory allocation on GPU0?
Are there any known workarounds or solutions to ensure all operations and memory usage are isolated to the specified GPU?
I would greatly appreciate any insights or solutions the repository maintainers could provide to address this issue. Thank you for your time and assistance.

The text was updated successfully, but these errors were encountered:

e1ijah1 added the Request-bug Something isn't working label Sep 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Unexpected GPU0 Memory Usage When Using Multiple GPUs #1113

[Bug] Unexpected GPU0 Memory Usage When Using Multiple GPUs #1113

e1ijah1 commented Sep 20, 2024

[Bug] Unexpected GPU0 Memory Usage When Using Multiple GPUs #1113

[Bug] Unexpected GPU0 Memory Usage When Using Multiple GPUs #1113

Comments

e1ijah1 commented Sep 20, 2024

Your current environment information

🐛 Describe the bug