Skip to content

Commit e1b3c61

Browse files
Update readme with H100 instructions
1 parent 9cc41d0 commit e1b3c61

1 file changed

Lines changed: 13 additions & 0 deletions

File tree

README.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,19 @@ Here are the system settings we recommend to start training your own diffusion m
4747
- Ubuntu Version: 20.04
4848
- Use a system with NVIDIA GPUs
4949

50+
- For running on NVIDIA H100s, use a docker image with PyTorch 1.13+ e.g. [MosaicML's PyTorch base image](https://hub.docker.com/r/mosaicml/pytorch/tags)
51+
- Recommended tag: `mosaicml/pytorch_vision:2.0.1_cu118-python3.10-ubuntu20.04`
52+
- This image comes pre-configured with the following dependencies:
53+
- PyTorch Version: 2.0.1
54+
- CUDA Version: 11.8
55+
- Python Version: 3.10
56+
- Ubuntu Version: 20.04
57+
- Depending on the training config, an additional install of `xformers` may be needed:
58+
```
59+
pip install -U ninja
60+
pip install -U git+https://github.com/facebookresearch/xformers
61+
```
62+
5063
# How many GPUs do I need?
5164
5265
We benchmarked the U-Net training throughput as we scale the number of A100 GPUs from 8 to 128. Our time estimates are based on training Stable Diffusion 2.0 base on 1,126,400,000 images at 256x256 resolution and 1,740,800,000 images at 512x512 resolution. Our cost estimates are based on $2 / A100-hour. Since the time and cost estimates are for the U-Net only, these only hold if the VAE and CLIP latents are computed before training. It took 3,784 A100-hours (cost of $7,600) to pre-compute the VAE and CLIP latents offline. If you are computing VAE and CLIP latents while training, expect a 1.4x increase in time and cost.

0 commit comments

Comments
 (0)