Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: undefined instance state on provisioning state failed #7750

Conversation

comtalyst
Copy link
Contributor

What type of PR is this?

/kind bug

What this PR does / why we need it:

Comparing to the prior behavior, it looks like ef3b63c have accidentally created a caveat where provisioningStateFailed + enableFastDeleteOnFailedProvisioning == false leads to status.State not being assigned, leading to the undefined status. This PR fixes it and add a set of unit tests for that.

The provisioningStateFailed case is generally rare. And as of now, it is likely harmless, as InstanceRunning gives the same result as undefined state.

A lot of further improvements could be made in this space (e.g., reunifying two near-identical methods, more test cases), but shall be revisited in a different scope.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. area/cluster-autoscaler labels Jan 22, 2025
@k8s-ci-robot k8s-ci-robot added area/provider/azure Issues or PRs related to azure provider approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 22, 2025
@comtalyst comtalyst force-pushed the comtalyst/fix-missing-instance-state-from-provisioning-state-failed branch from 445f5b9 to d979d43 Compare January 22, 2025 21:35
@comtalyst
Copy link
Contributor Author

/cherry-pick cluster-autoscaler-release-1.32
/cherry-pick cluster-autoscaler-release-1.31
/cherry-pick cluster-autoscaler-release-1.30
/cherry-pick cluster-autoscaler-release-1.29
/cherry-pick cluster-autoscaler-release-1.28

@k8s-infra-cherrypick-robot

@comtalyst: once the present PR merges, I will cherry-pick it on top of cluster-autoscaler-release-1.28, cluster-autoscaler-release-1.29, cluster-autoscaler-release-1.30, cluster-autoscaler-release-1.31, cluster-autoscaler-release-1.32 in new PRs and assign them to you.

In response to this:

/cherry-pick cluster-autoscaler-release-1.32
/cherry-pick cluster-autoscaler-release-1.31
/cherry-pick cluster-autoscaler-release-1.30
/cherry-pick cluster-autoscaler-release-1.29
/cherry-pick cluster-autoscaler-release-1.28

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@comtalyst
Copy link
Contributor Author

/label tide/merge-method-squash

@k8s-ci-robot k8s-ci-robot added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Jan 22, 2025
@comtalyst
Copy link
Contributor Author

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 22, 2025
@@ -823,6 +824,8 @@ func instanceStatusFromProvisioningStateAndPowerState(resourceID string, provisi
case provisioningStateCreating:
status.State = cloudprovider.InstanceCreating
case provisioningStateFailed:
status.State = cloudprovider.InstanceRunning
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might be a bit more elegant is we simply declared the running state at the declaration, e.g. at L819:

status := &cloudprovider.InstanceStatus{
    State: cloudprovider.InstanceRunning
}

And then we could remove the default case.

This way we don't suggest that the Failed provisioning state is meaningfully connected to the InstanceRunning state.

wdyt?

Copy link
Contributor Author

@comtalyst comtalyst Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, I would want to make the mapping between the two explicit, given the small enough code size and the special complexity in this area (both Azure provisioning states and CAS instance states)---at least until we have higher confidence in the understanding. What do you think?

With that, I think we should refactor to not rely on the default case below and make sure all cases are considered explicitly. But that will be reconsidered later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sgtm

@jackfrancis
Copy link
Contributor

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 23, 2025
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 23, 2025
@@ -224,6 +225,8 @@ func (scaleSet *ScaleSet) instanceStatusFromVM(vm *compute.VirtualMachineScaleSe
case string(compute.GalleryProvisioningStateCreating):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also cleanup these "Gallery*" constants at some point; even if there is no correct constant in compute (I don't think there is), let's have something reasonably named within the provider

@tallaxes
Copy link
Contributor

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 23, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: comtalyst, jackfrancis, tallaxes

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@comtalyst
Copy link
Contributor Author

/unhold

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 23, 2025
@k8s-ci-robot k8s-ci-robot merged commit 64ca097 into kubernetes:master Jan 23, 2025
7 checks passed
@k8s-infra-cherrypick-robot

@comtalyst: new pull request created: #7754

In response to this:

/cherry-pick cluster-autoscaler-release-1.32
/cherry-pick cluster-autoscaler-release-1.31
/cherry-pick cluster-autoscaler-release-1.30
/cherry-pick cluster-autoscaler-release-1.29
/cherry-pick cluster-autoscaler-release-1.28

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-infra-cherrypick-robot

@comtalyst: new pull request created: #7755

In response to this:

/cherry-pick cluster-autoscaler-release-1.32
/cherry-pick cluster-autoscaler-release-1.31
/cherry-pick cluster-autoscaler-release-1.30
/cherry-pick cluster-autoscaler-release-1.29
/cherry-pick cluster-autoscaler-release-1.28

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-infra-cherrypick-robot

@comtalyst: new pull request created: #7756

In response to this:

/cherry-pick cluster-autoscaler-release-1.32
/cherry-pick cluster-autoscaler-release-1.31
/cherry-pick cluster-autoscaler-release-1.30
/cherry-pick cluster-autoscaler-release-1.29
/cherry-pick cluster-autoscaler-release-1.28

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-infra-cherrypick-robot

@comtalyst: new pull request created: #7757

In response to this:

/cherry-pick cluster-autoscaler-release-1.32
/cherry-pick cluster-autoscaler-release-1.31
/cherry-pick cluster-autoscaler-release-1.30
/cherry-pick cluster-autoscaler-release-1.29
/cherry-pick cluster-autoscaler-release-1.28

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-infra-cherrypick-robot

@comtalyst: new pull request created: #7758

In response to this:

/cherry-pick cluster-autoscaler-release-1.32
/cherry-pick cluster-autoscaler-release-1.31
/cherry-pick cluster-autoscaler-release-1.30
/cherry-pick cluster-autoscaler-release-1.29
/cherry-pick cluster-autoscaler-release-1.28

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/cluster-autoscaler area/provider/azure Issues or PRs related to azure provider cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants