Cannot disrupt NodeClaim: nodeclaim does not have an associated node #7631

IgorKurylo1988 · 2025-01-24T07:42:42Z

Description

Observed Behavior:
Instance with gpu taints not started and the node not connected to the cluster
We have AMI GPU based on amazon-eks-gpu-node-1.30-*
That new install of the karpanter, we have other cluster with v0.32+ karpanter and there gpu works.

Expected Behavior:
Instance connected to cluster

Reproduction Steps (Please include YAML):

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: gpu
spec:
  disruption:
    consolidationPolicy: WhenEmptyOrUnderutilized
    consolidateAfter: 24h
  template:
    spec:
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: gpu
      expireAfter: Never
      requirements:
        - key: karpenter.k8s.aws/instance-family
          operator: In
          values:
            - g5
        - key: karpenter.k8s.aws/instance-size
          operator: In
          values:
            - 2xlarge
            - 4xlarge
        - key: kubernetes.io/os
          operator: In
          values:
            - linux
        - key: kubernetes.io/arch
          operator: In
          values:
            - amd64
        - key: karpenter.sh/capacity-type
          operator: In
          values:
            - on-demand
        - key: karpenter.k8s.aws/instance-gpu-manufacturer
          operator: In
          values: ["nvidia"]
      taints:
        - key: nvidia.com/gpu
          effect: "NoSchedule"
          value: "true"
---
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
  name: gpu
spec:
  amiFamily: AL2 # Amazon Linux 2
  instanceProfile: "sfly-aws-apc-dev-svc-eks-node-group-InstanceProfile"
  subnetSelectorTerms:
    - tags:
        karpenter.sh/discovery/standard-app-dev-common: "standard-app-dev-common"
  securityGroupSelectorTerms:
    - tags:
        karpenter.sh/discovery/standard-app-dev-common: "standard-app-dev-common"
  amiSelectorTerms:
    - id: "ami-080bac37fb480fa75" - GPU AMI Based on  amazon-eks-gpu-node-1.30-v20250116
  blockDeviceMappings:
    - deviceName: /dev/xvda
      ebs:
        volumeSize: 100Gi
        volumeType: gp3
        iops: 10000
        deleteOnTermination: true
        throughput: 125
  tags:
    Name: standard-app-dev-common-eks-gpu
    Environment: "dev"
    Provisioner: Karpenter
    ManagedBy: APC
    BusinessUnit: Consumer
    App: EKS
    Role: GPU Compute Node

Versions:

Chart Version: v1.1.1
Kubernetes Version (kubectl version): 1.30 - EKS AWS

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

The text was updated successfully, but these errors were encountered:

rschalo · 2025-01-28T15:35:58Z

Does the instance launch at all? If so are you able to look at kubelet logs and check for failures there and can you please share the output?

github-actions · 2025-02-13T12:05:45Z

This issue has been inactive for 14 days. StaleBot will close this stale issue after 14 more days of inactivity.

allyrr · 2025-02-25T10:02:06Z

Hello,

I encounter the same issue.

EKS version: v1.31.5-eks-8cce635
Karpenter ver.: 1.2.1

With AL2 family configuration in EC2NodeClass

spec:
  amiFamily: AL2
  amiSelectorTerms:
  - id: ami-06806e88f71fcc3d2

got error:

k describe NodeClaim dev-g4dn-test-zxd6r
...
Events:
  Type    Reason             Age    From       Message
  ----    ------             ----   ----       -------
  Normal  Launched           3m28s  karpenter  Status condition transitioned, Type: Launched, Status: Unknown -> True, Reason: Launched
  Normal  DisruptionBlocked  3m27s  karpenter  Nodeclaim does not have an associated node
  Normal  Registered         2m43s  karpenter  Status condition transitioned, Type: Registered, Status: Unknown -> True, Reason: Registered
  Normal  DisruptionBlocked  85s    karpenter  Node isn't initialized

Despite the fact that NodeClaims is Unknown the node is in Ready state in EKS cluster.

With Bottlerocket there is no such issue.

IgorKurylo1988 added bug Something isn't working needs-triage Issues that need to be triaged labels Jan 24, 2025

rschalo added triage/needs-information Marks that the issue still needs more information to properly triage and removed needs-triage Issues that need to be triaged labels Jan 29, 2025

github-actions bot added the lifecycle/stale label Feb 13, 2025

github-actions bot removed the lifecycle/stale label Feb 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot disrupt NodeClaim: nodeclaim does not have an associated node #7631

Cannot disrupt NodeClaim: nodeclaim does not have an associated node #7631

IgorKurylo1988 commented Jan 24, 2025 •

edited

Loading

rschalo commented Jan 28, 2025

github-actions bot commented Feb 13, 2025

allyrr commented Feb 25, 2025 •

edited

Loading

Cannot disrupt NodeClaim: nodeclaim does not have an associated node #7631

Cannot disrupt NodeClaim: nodeclaim does not have an associated node #7631

Comments

IgorKurylo1988 commented Jan 24, 2025 • edited Loading

Description

rschalo commented Jan 28, 2025

github-actions bot commented Feb 13, 2025

allyrr commented Feb 25, 2025 • edited Loading

IgorKurylo1988 commented Jan 24, 2025 •

edited

Loading

allyrr commented Feb 25, 2025 •

edited

Loading