Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Unexpected GPU0 Memory Usage When Using Multiple GPUs #1113

Open
e1ijah1 opened this issue Sep 20, 2024 · 0 comments
Open

[Bug] Unexpected GPU0 Memory Usage When Using Multiple GPUs #1113

e1ijah1 opened this issue Sep 20, 2024 · 0 comments
Labels
Request-bug Something isn't working

Comments

@e1ijah1
Copy link

e1ijah1 commented Sep 20, 2024

Your current environment information

Environment:

accelerate==1.0.0rc1
aiohappyeyeballs==2.4.0
aiohttp==3.10.5
aiosignal==1.3.1
aiosqlite==0.20.0
annotated-types==0.7.0
anyio==4.4.0
appdirs==1.4.4
asgiref==3.8.1
async-timeout==4.0.3
attrs==24.2.0
bentoml==1.3.5
cattrs==23.1.2
certifi==2024.8.30
charset-normalizer==3.3.2
circus==0.18.0
click==8.1.7
click-option-group==0.5.6
cloudpickle==3.0.0
DeepCache==0.1.1
deepmerge==2.0
Deprecated==1.2.14
diffusers @ git+https://github.com/huggingface/diffusers@95a7832879a3ca7debd3f7a4ee05b08ddd19a8a7
exceptiongroup==1.2.2
filelock==3.16.0
frozenlist==1.4.1
fs==2.4.16
fsspec==2024.9.0
h11==0.14.0
httpcore==1.0.5
httpx==0.27.2
httpx-ws==0.6.0
huggingface-hub==0.24.7
idna==3.9
importlib-metadata==6.11.0
inflection==0.5.1
inquirerpy==0.3.4
Jinja2==3.1.4
markdown-it-py==3.0.0
MarkupSafe==2.1.5
mdurl==0.1.2
mpmath==1.3.0
multidict==6.1.0
networkx==3.3
numpy==1.24.1
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu12==8.9.2.26
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-ml-py==11.525.150
nvidia-nccl-cu12==2.20.5
nvidia-nvjitlink-cu12==12.6.68
nvidia-nvtx-cu12==12.1.105
omegaconf==2.4.0.dev3
onediff==1.2.1.dev23
onediffx==1.2.1.dev23
oneflow==0.9.1.dev20240913+cu121
onefx==0.0.3
opentelemetry-api==1.20.0
opentelemetry-instrumentation==0.41b0
opentelemetry-instrumentation-aiohttp-client==0.41b0
opentelemetry-instrumentation-asgi==0.41b0
opentelemetry-sdk==1.20.0
opentelemetry-semantic-conventions==0.41b0
opentelemetry-util-http==0.41b0
packaging==24.1
pathspec==0.12.1
pfzy==0.3.4
pillow==10.4.0
pip-requirements-parser==32.0.1
prometheus_client==0.20.0
prompt_toolkit==3.0.47
protobuf==5.28.1
psutil==6.0.0
pydantic==2.9.2
pydantic_core==2.23.4
Pygments==2.18.0
pyparsing==3.1.4
python-dateutil==2.9.0.post0
python-json-logger==2.0.7
python-multipart==0.0.9
PyYAML==6.0.2
pyzmq==26.2.0
regex==2024.9.11
requests==2.32.3
rich==13.8.1
safetensors==0.4.5
schema==0.7.7
sentencepiece==0.2.0
simple-di==0.1.5
six==1.16.0
sniffio==1.3.1
starlette==0.38.5
sympy==1.13.2
tokenizers==0.13.3
tomli==2.0.1
tomli_w==1.0.0
torch==2.3.0
tornado==6.4.1
tqdm==4.66.5
transformers==4.27.1
triton==2.3.0
typing_extensions==4.12.2
urllib3==2.2.3
uv==0.4.12
uvicorn==0.30.6
watchfiles==0.24.0
wcwidth==0.2.13
wrapt==1.16.0
wsproto==1.2.0
yarl==1.11.1
zipp==3.20.2

🐛 Describe the bug

I'm attempting to implement the acceleration method described in this article: https://github.com/siliconflow/onediff/tree/7c325253d4e280e470613be43fa3e582a476923e/onediff_diffusers_extensions/examples/kolors

When specifying a particular device to load the model, compile, and run inference, there's always an additional memory usage observed on GPU0.

To troubleshoot, I modified the code to explicitly specify device=cuda:1. However, during inference, GPU0 still shows a 266MB memory occupation.

Steps to reproduce:

  • Follow the setup instructions in the linked article.
  • Modify the code to specify GPU1 for model loading and inference.
    ...
    device = torch.device("cuda:1")
    ...
  • Run with oneflow backend compile:
  
  python3 onediff_diffusers_extensions/examples/kolors/text_to_image_kolors.py \
--compiler oneflow \
--saved-image kolors_oneflow_compile.png

Expected behavior:
All operations and memory usage should be confined to the specified GPU (cuda:1 in this case).

Actual behavior:
GPU0 consistently shows a 266MB memory usage, despite operations being directed to other GPUs.

image

Questions:

Is this 266MB memory usage on GPU0 expected behavior?
If not, what could be causing this persistent memory allocation on GPU0?
Are there any known workarounds or solutions to ensure all operations and memory usage are isolated to the specified GPU?
I would greatly appreciate any insights or solutions the repository maintainers could provide to address this issue. Thank you for your time and assistance.

@e1ijah1 e1ijah1 added the Request-bug Something isn't working label Sep 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Request-bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant