Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feat] validate the name length of RayCluster, RayService, and RayJob #3083

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

rueian
Copy link
Contributor

@rueian rueian commented Feb 20, 2025

Why are these changes needed?

Part of the #3076, adding length validation to RayCluster, RayService, and RayJob to make sure we don't cut k8s service names:

  1. RayCluster: 63 - len("-serve-svc") = 53 characters at most.
  2. RayService: 53 - len("-xxxxx") = 47 characters at most.
  3. RayJob: 53 - len("-xxxxx") = 47 characters at most.

After #3101, the longest suffix for a K8s service is now "-serve-svc", which takes 10 characters. Therefore, we can only use 63-10=53 characters for the name of a RayCluster.

After #3102, we now add a "-" and 5 random characters to the name of either RayService or RayJob to create the underlying RayCluster. Therefore, they can only have 53-6=47 characters for their name.

Any RayCluster, RayService, or RayJob with a name exceeding the above limit will not be reconciled further and will receive an invalid spec event, which can be observed by kubectl describe.

Related issue number

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • This PR is not tested :(

@rueian rueian force-pushed the validate-raycluster-name branch from 1032d24 to 5cd981c Compare February 20, 2025 22:24
@rueian rueian changed the title feat: validate the length of RayCluster name and worker group names [feat][RayCluster] validate the length of RayCluster name and worker group names Feb 21, 2025
@rueian rueian force-pushed the validate-raycluster-name branch from d465e97 to b24ba93 Compare February 21, 2025 02:22
@@ -15,7 +15,7 @@ import (
func BuildRouteForHeadService(cluster rayv1.RayCluster) (*routev1.Route, error) {
labels := map[string]string{
utils.RayClusterLabelKey: cluster.Name,
utils.RayIDLabelKey: utils.GenerateIdentifier(cluster.Name, rayv1.HeadNode),
utils.RayIDLabelKey: utils.CheckLabel(utils.GenerateIdentifier(cluster.Name, rayv1.HeadNode)),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an old bug that forgets to chunk the value for utils.RayIDLabelKey.

@@ -21,7 +21,7 @@ func BuildIngressForHeadService(ctx context.Context, cluster rayv1.RayCluster) (

labels := map[string]string{
utils.RayClusterLabelKey: cluster.Name,
utils.RayIDLabelKey: utils.GenerateIdentifier(cluster.Name, rayv1.HeadNode),
utils.RayIDLabelKey: utils.CheckLabel(utils.GenerateIdentifier(cluster.Name, rayv1.HeadNode)),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an old bug that forgets to chunk the value for utils.RayIDLabelKey.

@@ -21,7 +22,7 @@ const (
redisAddress = "redis:6379"
)

func TestRayClusterGCSFaultTolerence(t *testing.T) {
func TestRayClusterGCSFaultTolerance(t *testing.T) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix typo.

@rueian rueian marked this pull request as ready for review February 21, 2025 05:09
@kevin85421 kevin85421 self-assigned this Feb 21, 2025
@rueian rueian force-pushed the validate-raycluster-name branch 2 times, most recently from e6ee5ac to 8817876 Compare February 25, 2025 00:18
@rueian rueian marked this pull request as draft February 25, 2025 23:49
@rueian rueian force-pushed the validate-raycluster-name branch from 3426bc8 to e26affb Compare March 2, 2025 23:13
@@ -1019,8 +1019,6 @@ func (r *RayClusterReconciler) createHeadRoute(ctx context.Context, route *route
func (r *RayClusterReconciler) createService(ctx context.Context, svc *corev1.Service, instance *rayv1.RayCluster) error {
logger := ctrl.LoggerFrom(ctx)

// making sure the name is valid
svc.Name = utils.CheckName(svc.Name)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No more utils.CheckName to cut service names.

Comment on lines -265 to +267
return CheckName(fmt.Sprintf("%s-%s-%s", ownerName, rayv1.HeadNode, "svc")), nil
return fmt.Sprintf("%s-%s-%s", ownerName, rayv1.HeadNode, "svc"), nil
case RayClusterCRD:
headSvcName := CheckName(fmt.Sprintf("%s-%s-%s", ownerName, rayv1.HeadNode, "svc"))
headSvcName := fmt.Sprintf("%s-%s-%s", ownerName, rayv1.HeadNode, "svc")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No more utils.CheckName to cut service names.

@@ -293,7 +293,7 @@ func ExtractRayIPFromFQDN(fqdnRayIP string) string {

// GenerateServeServiceName generates name for serve service.
func GenerateServeServiceName(serviceName string) string {
return CheckName(fmt.Sprintf("%s-%s-%s", serviceName, ServeName, "svc"))
return fmt.Sprintf("%s-%s-%s", serviceName, ServeName, "svc")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No more utils.CheckName to cut service names.

@rueian rueian changed the title [feat][RayCluster] validate the length of RayCluster name and worker group names [feat] validate the name length of RayCluster, RayService, and RayJob Mar 3, 2025
@rueian rueian force-pushed the validate-raycluster-name branch from e26affb to c9c3e9c Compare March 3, 2025 04:40
@rueian rueian force-pushed the validate-raycluster-name branch from c9c3e9c to 78f39a2 Compare March 3, 2025 05:18
@rueian rueian marked this pull request as ready for review March 3, 2025 06:51
@davidxia
Copy link
Contributor

davidxia commented Mar 3, 2025

Would it be useful to also validate in webhooks like here?

func (w *RayClusterWebhook) validateName(rayCluster *rayv1.RayCluster) *field.Error {
if !nameRegex.MatchString(rayCluster.Name) {
return field.Invalid(field.NewPath("metadata").Child("name"), rayCluster.Name, "name must consist of lower case alphanumeric characters or '-', start with an alphabetic character, and end with an alphanumeric character (e.g. 'my-name', or 'abc-123', regex used for validation is '[a-z]([-a-z0-9]*[a-z0-9])?')")
}
return nil
}

These webhooks can optionally be enabled by user (disabled by default). They prevent creation of invalid resources with a more obvious error than the controller events.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants