Changes upgrade game server template to use safe-to-evict: Always #4096
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What type of PR is this?
/kind bug
What this PR does / Why we need it:
TL;DR
This PR is to update the upgrade test game server to use safe-to-evict: true (AKA eviction: Always) which will change the
Node-Selectors
for game servers on autopilot clusters to<none>
, which is the same as the balloon pods that are in place to prevent the need for scale-up on autopilot clusters. This should make spinning up new game server pods faster on autopilot clusters, and prevent frequent flakes.A number of the
upgrade
test flakes appear to be due to issues creating backing pods for the game servers.Taking a look at the logs, there are warnings like below on autopilot test clusters:
The game server is eventually assigned:
But then the pod repeatedly fails liveness probes:
The pod is then marked as unhealthy:
And the container is killed, which causes the test to fail as the gameserver is marked as unhealthy:
Both the "balloon" pods (evictable-pods-deployment) and the game server pods have the same labels and pod affinity:
However, they do not have the same Node-Selectors. The
evictable-pods-deployment
haveNode-Selectors: <none>
while the game servers haveNode-Selectors: cloud.google.com/extended-duration-pods=0
. The difference inNode-Selectors
is due to the game server having a default safe-to-evict value of false (Never). ThisNode-Selectors
is automatically set by GKE Autopilot whenever the game server spec eviction safe is set toNever
orUpgrade
. By setting the game server spec eviction toAlways
theNode-Selectors
for game server pods will be<none>
. This means that the game server pods will have the same node affinity/selector as the "balloon" pods, and will be able to evict the "balloon" pods and more quickly spin up backing pods for the game servers.Which issue(s) this PR fixes:
NA
Special notes for your reviewer: