Skip to content

Commit 3965a7f

Browse files
committed
feat(RHOAIENG-29330):Deny RayCluster creation with Ray Version mismatches
1 parent e2fc98b commit 3965a7f

File tree

6 files changed

+525
-0
lines changed

6 files changed

+525
-0
lines changed

docs/sphinx/user-docs/cluster-configuration.rst

Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,79 @@ requirements for creating the Ray Cluster.
4444
documentation on building a custom image
4545
`here <https://github.com/opendatahub-io/distributed-workloads/tree/main/images/runtime/examples>`__.
4646

47+
Ray Version Compatibility
48+
-------------------------
49+
50+
The CodeFlare SDK requires that the Ray version in your runtime image matches the Ray version used by the SDK itself.When you specify a custom runtime image, the SDK will automatically validate that the Ray version in the image matches this requirement.
51+
52+
Version Validation Behavior
53+
~~~~~~~~~~~~~~~~~~~~~~~~~~~
54+
55+
The SDK performs the following validation when creating a cluster:
56+
57+
1. **Compatible versions**: If the runtime image contains Ray 2.47.1, the cluster will be created successfully.
58+
59+
2. **Version mismatch**: If the runtime image contains a different Ray version, cluster creation will fail with a detailed error message explaining the mismatch.
60+
61+
3. **Unknown versions**: If the SDK cannot determine the Ray version from the image name (e.g., SHA-based tags), a warning will be issued but cluster creation will continue.
62+
63+
Examples
64+
~~~~~~~~
65+
66+
**Compatible image (recommended)**:
67+
68+
.. code:: python
69+
70+
# This will work - versions match
71+
cluster = Cluster(ClusterConfiguration(
72+
name='ray-example',
73+
image='quay.io/modh/ray:2.47.1-py311-cu121'
74+
))
75+
76+
**Incompatible image (will fail)**:
77+
78+
.. code:: python
79+
80+
# This will fail with a version mismatch error
81+
cluster = Cluster(ClusterConfiguration(
82+
name='ray-example',
83+
image='ray:2.46.0' # Different version!
84+
))
85+
86+
**SHA-based image (will warn)**:
87+
88+
.. code:: python
89+
90+
# This will issue a warning but continue
91+
cluster = Cluster(ClusterConfiguration(
92+
name='ray-example',
93+
image='quay.io/modh/ray@sha256:abc123...'
94+
))
95+
96+
Best Practices
97+
~~~~~~~~~~~~~~
98+
99+
- **Use versioned tags**: Always use semantic version tags (e.g., `ray:2.47.1`) rather than `latest` or SHA-based tags for better version detection.
100+
101+
- **Test compatibility**: When building custom images, test them with the CodeFlare SDK to ensure compatibility.
102+
103+
- **Check SDK version**: You can check the Ray version used by the SDK with:
104+
105+
.. code:: python
106+
107+
from codeflare_sdk.common.utils.constants import RAY_VERSION
108+
print(f"CodeFlare SDK uses Ray version: {RAY_VERSION}")
109+
110+
**Why is version matching important?**
111+
112+
Ray version mismatches can cause:
113+
114+
- Incompatible API calls between the SDK and Ray cluster
115+
- Unexpected behavior in job submission and cluster management
116+
- Potential data corruption or job failures
117+
- Difficult-to-debug runtime errors
118+
119+
47120
Ray Usage Statistics
48121
-------------------
49122

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
# Copyright 2024 IBM, Red Hat
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
import pytest
16+
from codeflare_sdk.common.utils.constants import RAY_VERSION, CUDA_RUNTIME_IMAGE
17+
18+
19+
class TestConstants:
20+
"""Test constants module for expected values."""
21+
22+
def test_ray_version_is_defined(self):
23+
"""Test that RAY_VERSION constant is properly defined."""
24+
assert RAY_VERSION is not None
25+
assert isinstance(RAY_VERSION, str)
26+
assert RAY_VERSION == "2.47.1"
27+
28+
def test_cuda_runtime_image_is_defined(self):
29+
"""Test that CUDA_RUNTIME_IMAGE constant is properly defined."""
30+
assert CUDA_RUNTIME_IMAGE is not None
31+
assert isinstance(CUDA_RUNTIME_IMAGE, str)
32+
assert "quay.io/modh/ray" in CUDA_RUNTIME_IMAGE
Lines changed: 188 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,188 @@
1+
# Copyright 2024 IBM, Red Hat
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
import pytest
16+
from codeflare_sdk.common.utils.validation import (
17+
extract_ray_version_from_image,
18+
validate_ray_version_compatibility,
19+
)
20+
from codeflare_sdk.common.utils.constants import RAY_VERSION
21+
22+
23+
class TestRayVersionDetection:
24+
"""Test Ray version detection from container image names."""
25+
26+
def test_extract_ray_version_standard_format(self):
27+
"""Test extraction from standard Ray image formats."""
28+
# Standard format
29+
assert extract_ray_version_from_image("ray:2.47.1") == "2.47.1"
30+
assert extract_ray_version_from_image("ray:2.46.0") == "2.46.0"
31+
assert extract_ray_version_from_image("ray:1.13.0") == "1.13.0"
32+
33+
def test_extract_ray_version_with_registry(self):
34+
"""Test extraction from images with registry prefixes."""
35+
assert extract_ray_version_from_image("quay.io/ray:2.47.1") == "2.47.1"
36+
assert (
37+
extract_ray_version_from_image("docker.io/rayproject/ray:2.47.1")
38+
== "2.47.1"
39+
)
40+
assert (
41+
extract_ray_version_from_image("gcr.io/my-project/ray:2.47.1") == "2.47.1"
42+
)
43+
44+
def test_extract_ray_version_with_suffixes(self):
45+
"""Test extraction from images with version suffixes."""
46+
assert (
47+
extract_ray_version_from_image("quay.io/modh/ray:2.47.1-py311-cu121")
48+
== "2.47.1"
49+
)
50+
assert extract_ray_version_from_image("ray:2.47.1-py311") == "2.47.1"
51+
assert extract_ray_version_from_image("ray:2.47.1-gpu") == "2.47.1"
52+
assert extract_ray_version_from_image("ray:2.47.1-rocm62") == "2.47.1"
53+
54+
def test_extract_ray_version_complex_registry_paths(self):
55+
"""Test extraction from complex registry paths."""
56+
assert (
57+
extract_ray_version_from_image("quay.io/modh/ray:2.47.1-py311-cu121")
58+
== "2.47.1"
59+
)
60+
assert (
61+
extract_ray_version_from_image("registry.company.com/team/ray:2.47.1")
62+
== "2.47.1"
63+
)
64+
65+
def test_extract_ray_version_no_version_found(self):
66+
"""Test cases where no version can be extracted."""
67+
# SHA-based tags
68+
assert (
69+
extract_ray_version_from_image(
70+
"quay.io/modh/ray@sha256:6d076aeb38ab3c34a6a2ef0f58dc667089aa15826fa08a73273c629333e12f1e"
71+
)
72+
is None
73+
)
74+
75+
# Non-semantic versions
76+
assert extract_ray_version_from_image("ray:latest") is None
77+
assert extract_ray_version_from_image("ray:nightly") is None
78+
assert (
79+
extract_ray_version_from_image("ray:v2.47") is None
80+
) # Missing patch version
81+
82+
# Non-Ray images
83+
assert extract_ray_version_from_image("python:3.11") is None
84+
assert extract_ray_version_from_image("ubuntu:20.04") is None
85+
86+
# Empty or None
87+
assert extract_ray_version_from_image("") is None
88+
assert extract_ray_version_from_image(None) is None
89+
90+
def test_extract_ray_version_edge_cases(self):
91+
"""Test edge cases for version extraction."""
92+
# Version with 'v' prefix should not match our pattern
93+
assert extract_ray_version_from_image("ray:v2.47.1") is None
94+
95+
# Multiple version-like patterns - should match the first valid one
96+
assert (
97+
extract_ray_version_from_image("registry/ray:2.47.1-based-on-1.0.0")
98+
== "2.47.1"
99+
)
100+
101+
102+
class TestRayVersionValidation:
103+
"""Test Ray version compatibility validation."""
104+
105+
def test_validate_compatible_versions(self):
106+
"""Test validation with compatible Ray versions."""
107+
# Exact match
108+
is_compatible, message = validate_ray_version_compatibility(
109+
f"ray:{RAY_VERSION}"
110+
)
111+
assert is_compatible is True
112+
assert "Ray versions match" in message
113+
114+
# With registry and suffixes
115+
is_compatible, message = validate_ray_version_compatibility(
116+
f"quay.io/modh/ray:{RAY_VERSION}-py311-cu121"
117+
)
118+
assert is_compatible is True
119+
assert "Ray versions match" in message
120+
121+
def test_validate_incompatible_versions(self):
122+
"""Test validation with incompatible Ray versions."""
123+
# Different version
124+
is_compatible, message = validate_ray_version_compatibility("ray:2.46.0")
125+
assert is_compatible is False
126+
assert "Ray version mismatch detected" in message
127+
assert "CodeFlare SDK uses Ray" in message
128+
assert "runtime image uses Ray" in message
129+
130+
# Older version
131+
is_compatible, message = validate_ray_version_compatibility("ray:1.13.0")
132+
assert is_compatible is False
133+
assert "Ray version mismatch detected" in message
134+
135+
def test_validate_empty_image(self):
136+
"""Test validation with no custom image (should use default)."""
137+
# Empty string
138+
is_compatible, message = validate_ray_version_compatibility("")
139+
assert is_compatible is True
140+
assert "Using default Ray image compatible with SDK" in message
141+
142+
# None
143+
is_compatible, message = validate_ray_version_compatibility(None)
144+
assert is_compatible is True
145+
assert "Using default Ray image compatible with SDK" in message
146+
147+
def test_validate_unknown_version(self):
148+
"""Test validation when version cannot be determined."""
149+
# SHA-based image
150+
is_compatible, message = validate_ray_version_compatibility(
151+
"quay.io/modh/ray@sha256:6d076aeb38ab3c34a6a2ef0f58dc667089aa15826fa08a73273c629333e12f1e"
152+
)
153+
assert is_compatible is True
154+
assert "Warning: Cannot determine Ray version" in message
155+
156+
# Custom image without version
157+
is_compatible, message = validate_ray_version_compatibility(
158+
"my-custom-ray:latest"
159+
)
160+
assert is_compatible is True
161+
assert "Warning: Cannot determine Ray version" in message
162+
163+
def test_validate_custom_sdk_version(self):
164+
"""Test validation with custom SDK version."""
165+
# Compatible with custom SDK version
166+
is_compatible, message = validate_ray_version_compatibility(
167+
"ray:2.46.0", "2.46.0"
168+
)
169+
assert is_compatible is True
170+
assert "Ray versions match" in message
171+
172+
# Incompatible with custom SDK version
173+
is_compatible, message = validate_ray_version_compatibility(
174+
"ray:2.47.1", "2.46.0"
175+
)
176+
assert is_compatible is False
177+
assert "CodeFlare SDK uses Ray 2.46.0" in message
178+
assert "runtime image uses Ray 2.47.1" in message
179+
180+
def test_validate_message_content(self):
181+
"""Test that validation messages contain expected guidance."""
182+
# Mismatch message should contain helpful guidance
183+
is_compatible, message = validate_ray_version_compatibility("ray:2.46.0")
184+
assert is_compatible is False
185+
assert "compatibility issues" in message.lower()
186+
assert "unexpected behavior" in message.lower()
187+
assert "please use a runtime image" in message.lower()
188+
assert "update your sdk version" in message.lower()

0 commit comments

Comments
 (0)