Skip to content

Commit 64d686a

Browse files
authored
Remove NCCL_NVLS_ENABLE setting from JAX image (#1708)
Removal of [NCCL_NVLS_ENABLE](https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/env.html#nccl-nvls-enable)=0 setting enables NVLink SHARP on systems which support it by default. Change (to be) added for 25.10 NGC release. - [x] Performance regression tests on internal clusters - [ ] Performance regression tests on GKE, EKS (due to @nouiz)
1 parent 1268848 commit 64d686a

File tree

2 files changed

+0
-5
lines changed

2 files changed

+0
-5
lines changed

.github/container/Dockerfile.jax

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -97,7 +97,6 @@ ENV BUILD_DATE=${BUILD_DATE}
9797
# The following environment variables tune performance
9898
ENV XLA_FLAGS=""
9999
ENV XLA_FLAGS="${XLA_FLAGS} --xla_gpu_enable_latency_hiding_scheduler=true"
100-
ENV NCCL_NVLS_ENABLE=0
101100

102101
COPY --from=builder ${BUILD_PATH_JAXLIB} ${BUILD_PATH_JAXLIB}
103102
COPY --from=builder ${SRC_PATH_JAX} ${SRC_PATH_JAX}

README.md

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -218,10 +218,6 @@ The [JAX image](https://github.com/NVIDIA/JAX-Toolbox/pkgs/container/jax) is emb
218218
| --------- | ----- | ----------- |
219219
| `--xla_gpu_enable_latency_hiding_scheduler` | `true` | allows XLA to move communication collectives to increase overlap with compute kernels |
220220

221-
| Environment Variable | Value | Explanation |
222-
| -------------------- | ----- | ----------- |
223-
| `NCCL_NVLS_ENABLE` | `0` | Disables NVLink SHARP ([1](https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/env.html#nccl-nvls-enable)). Future releases will re-enable this feature. |
224-
225221
There are various other XLA flags users can set to improve performance. For a detailed explanation of these flags, please refer to the [GPU performance](./rosetta/docs/GPU_performance.md) doc. XLA flags can also be tuned per workload. For example, each script includes a directory [xla_flags](./rosetta/rosetta/projects/maxtext/xla_flags).
226222

227223
For a list of previously used XLA flags that are no longer needed, please also refer to the [GPU performance](./rosetta/docs/GPU_performance.md#previously-used-xla-flags) page.

0 commit comments

Comments
 (0)