-
Notifications
You must be signed in to change notification settings - Fork 37
Add EllipsizeName utility to handle Kubernetes object name limits #1408
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
} else { | ||
name = fmt.Sprintf("p--%s-%s-%s", profileName, prefix, clusterName) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great. Thank you.
We need to handle upgrade scenario. Since we are changing the name, if ClusterSummary instance already exists before upgrade, we need to handle that. I am thinking we need to first check if clusterSummary with old name already exists, if so return that name. Otherwise return the new name.
this method getManagementClusterClient returns the client to the management cluster.
Also, even when we use the name ellipsized name, we need to handle conflicts. So when we generate the ellipsized name we probably need to:
- validate no clusterSummary exists with that name (so it can be used)
- if one exists, verify we are talking about the same resource (by making sure the labels refer to the correct profile and cluster)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are changing behavior only in case of actually hitting an edge case for long object name, at least the idea in change was like this, don't change names for cases where inject name is valid, ellipsize name for cases when object name was already invalid and it would be impossible to create it.
There shouldn't be any name conflict, I mean the probability of this happening when we add FNV hash as suffix is quite low (checkout fuzz test).
Problem with fetching a list of existing clusterSummary objects before generating unique name is a possibility to have a race condition and we would need to add a retry logic in every place where we create this objects, this will be more bigger/impactful change.
If we do decide to go with fetch list approach, please let me know if hash suffix is acceptable as object name differentiator in that case, we would still need to create unique object names in case of conflict and adding something like an index number as suffix would probbaly increase a chance of races (at least I think it will)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Sorry i missed that we are doing this only when length is over 63 characters. So then yes you are right we don't have to worry about upgrade.
Regarding the conflict, let me think a little more. Maybe instead of fetching, we can keep in memory map. Key could be the generated name and the value profile/cluster names. That will prevent race conditions. And we can rebuild map on restart and postpone profile reconciliation till that is complete
Even if rare I feel we should cover it. Overriding a clustersummary might cause stale resources or missing applications.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can keep in memory map
I guess the only question is, how that would impact memory footprint in general, probably not noticeable. Generally speaking in-mem map would simplify things a lot, avoiding retry logic in a lot of places.
On a side note, if we can say that our clusters won't go back in time, we can use unixtime as object suffix and completely avoid messing with hashing, on downside that would mean names will always be unique, not predictable per each input. In our usecase I think this can be acceptable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Yes we would need to store only names when exceeding 63 chars. So it won't impact.
But good point. We can safely say clusters won't go back on time. But I am not sure that will work. Those methods to get name are invoked also after clustersummary instances are created. For instance clusterprofile reconciliation invokes it for every matching clusters at every reconciliation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those methods to get name are invoked also after clustersummary instances are created. For instance clusterprofile reconciliation invokes it for every matching clusters at every reconciliation
This makes me think that we need everything to be deterministic, so any kind of counter approach at suffix won't work (counter, unixtime, ...)
Let's assume we implement a in-mem map for tracking objects, and we use some very prone to collisions hash function, we can detect collision via map key (same would apply to a case of direct k8s object lookup) but what we do next is unclear, how do we recover from collision and still keep names deterministic?
So I am thinking we need to invest a bit more time into that fuzz test, basically limiting the upper-bound string length to what we actual use (~ 63*4 + 5), ensure that it generates okeish looking strings, run it against current hash func for 24h and see if we have collisions, if we do, increase hash size and/or replace hash function, run again and repeat until we are not getting collision detection, use that as solution and put into docs that for larger cluster profile and cluster names we use hashing to create predictable object names, hashing has a low level of collusion chances but still not zero so if you ever hit that kind of a problem, please report that and rename your profile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks. Can we do this:
- if name is less than 63 characters, no changes (you are already doing it)
- if name is longer than 63 character, check if we have already allocated one for this profile/cluster. If so use that one (this is done using an in memory map)
- If name is longer than 63 character and we never allocated one before, allocate a new one (using the function you have currently). Verify this has not been allocated to anybody else, update the in memory map and return it
- If name is longer than 63 character and we never allocated one before, allocate a new one and allocation one with the function you have currently we hit a collision append an index to it and keep verifying it till we hit no collision. update the in memory map and return it
On restart we can rebuild this in memory map before we reconcile any clusterProfile/profile
I feel your PR already took care of most, we just need to handle those corners cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we can but keep in mind that in case of collision and appending index we can get to a situation when some other object that created collision gets removed and we will have none-deterministic name for new object (that was created with index), basically reusing same name.
(this is what I was referencing before)
And if we already talking about adding indexes we might as well do in-tree map + unixtime and remove a need of hash and additional hash+index
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct but when a clustersummary is deleted, if name was allocated via this new mechanism, we can remove from map.
Also what we want to solve is to make sure clustersummary name is always valid. Today if ClusterProfile name is really long and the cluster name is also long (exceeding the 253 chars) creation of a clustersummary will fail
implements: #118