Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Support EndpointSlices Without In-cluster Pod Targets in Ingress #4017

Open
kahirokunn opened this issue Jan 15, 2025 · 5 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature.

Comments

@kahirokunn
Copy link
Member

kahirokunn commented Jan 15, 2025

Related Problem

When deploying a multi-cluster EKS environment that shares services via the Multi-Cluster Services (MCS) API, multiple EndpointSlices may be created for a single Service. Currently, in “target-type: ip” mode, the AWS Load Balancer Controller only registers Pod IPs of locally running Pods. It does not register:

  1. Pod IPs from other clusters exposed via the MCS API and listed in EndpointSlices; or
  2. External IPs included in EndpointSlices whose TargetRef.Kind is not "Pod."

This behavior forces users to employ workarounds—such as using “target-type: instance” and routing traffic through NodePorts—which can introduce suboptimal routing and increase the risk of disruptions if a Node is scaled in or replaced.

Proposed Unified Solution

Enhance the AWS Load Balancer Controller to directly register IP addresses from EndpointSlices in “target-type: ip” mode, even if those addresses are intended for multi-cluster usage (MCS) or represent external endpoints. This can be done by:

  • Recognizing that an EndpointSlice may contain additional or external IP addresses (for instance, based on TargetRef.Kind != "Pod").
  • Incorporating these addresses into the Target Group, alongside the local cluster Pod IPs already handled.

A relevant part of the AWS Load Balancer Controller’s current design is located here:

if ep.TargetRef == nil || ep.TargetRef.Kind != "Pod" {
continue
}

Here, the logic could be extended to handle these alternative address types. For example, if the endpointslice.kubernetes.io/managed-by: endpointslice-controller.k8s.io label is missing, the Controller might treat the EndpointSlice’s IP addresses as external IPs; or if EndpointSlice.Endpoints[].TargetRef.Kind != "Pod", the Controller might interpret them as external endpoints.

In both cases, the goal remains the same: provide direct integration with new or external IP addresses listed in EndpointSlices, reducing complexity and offering more efficient traffic routing.

Alternatives Considered

Using “target-type: instance”

  • This solution leads to indirect routing (through NodePorts) and higher susceptibility to disruptions upon Node scale-in or replacement.

Example: MCS with Additional Cluster IPs

Below is a sample configuration demonstrating how MCS might export a Service, creating an EndpointSlice in one cluster with Pod IPs from another cluster:

apiVersion: v1
kind: Service
metadata:
  name: example-service
  namespace: default
spec:
  selector:
    app: example
  ports:
    - name: http
      port: 80
      protocol: TCP
  type: ClusterIP
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: example-ingress
  namespace: default
  annotations:
    kubernetes.io/ingress.class: alb
    alb.ingress.kubernetes.io/target-type: ip
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/listen-ports: '[{"HTTP":80}]'
spec:
  rules:
    - http:
        paths:
          - path: /*
            pathType: ImplementationSpecific
            backend:
              service:
                name: example-service
                port:
                  number: 80
---
apiVersion: discovery.k8s.io/v1
kind: EndpointSlice
metadata:
  name: example-service-remotecluster
  namespace: default
  labels:
    kubernetes.io/service-name: example-service
addressType: IPv4
ports:
  - name: "http"
    port: 80
    protocol: TCP
endpoints:
  - addresses:
      - 10.11.12.13   # Pod IP on a remote EKS cluster
    conditions:
      ready: true
      serving: true
      terminating: false
    nodeName: remote-node-1
    zone: remote-az-1

With the proposed feature enabled, the IP “10.11.12.13” would be recognized by the AWS Load Balancer Controller and automatically registered in the Target Group.

References

@kahirokunn kahirokunn changed the title FeatureRequest: Support EndpointSlices Without In-cluster Pod Targets in Ingress Feature Request: Support EndpointSlices Without In-cluster Pod Targets in Ingress Jan 15, 2025
@shraddhabang shraddhabang added triage/needs-investigation kind/feature Categorizes issue or PR as related to a new feature. and removed triage/needs-investigation labels Jan 15, 2025
@zac-nixon
Copy link
Collaborator

Could you expand further on this point:

This solution leads to indirect routing (through NodePorts) and higher susceptibility to disruptions upon Node scale-in or replacement.

Later versions of Kubernetes and the controller have made using NodePorts for traffic a lot more reliable. For example, when using cluster autoscaler: #1688

@kahirokunn
Copy link
Member Author

@zac-nixon
Thank you for your insight and all the work you've done on this project. I wanted to share my experience using Karpenter instead of the Cluster Autoscaler. In my tests, when running ab (ApacheBench) or other load-testing tools while a node scales in, I often observe connections that do not return any response (instead of a 5xx error). After multiple rounds of verification, I suspect the following factors may be playing a role:

  1. Karpenter may terminate a node before it is fully deregistered from the ALB’s Target Group.
  2. There may be insufficient coordination between Karpenter and the AWS Load Balancer Controller during node termination.
  3. Any long-lived connections—such as WebSockets, long polling, or HTTP/2—remain open on nodes that are about to be terminated. Moreover, slower requests and long-running processes also stay active. As a result, when Karpenter scales in a node, these open connections or requests can be abruptly severed, causing no response to return to the client.

Additionally, by supporting direct IP-based communication as described in the Kubernetes documentation—rather than routing traffic exclusively through Nodes—we can further improve interoperability with existing controllers, foster additional integrations, and enable even more significant innovation in future.

@kahirokunn
Copy link
Member Author

I've created a separate issue regarding the problem we discussed about AWS Load Balancer Controller not handling Karpenter taints:
#4023
Along with this, I've also created a related PR:
#4022
However, I still want to continue the discussion about Ingress resources supporting custom EndpointSlices, as I believe this is a needed feature.
Thx 🙏

@zac-nixon
Copy link
Collaborator

Sorry for the delayed response. What automation are you using to populate the custom endpoint slice? I wonder if you can use a Multicluster Target Group Binding (https://kubernetes-sigs.github.io/aws-load-balancer-controller/latest/guide/targetgroupbinding/targetgroupbinding/#multicluster-target-group) and then point your automation to just register the targets directly into the Target Group?

@kahirokunn
Copy link
Member Author

I am currently trying to implement an MCS controller using Sveltos (Related Issue: projectsveltos/sveltos#435 (comment)).
While the proposed Multicluster Target Group Binding could achieve something similar, I believe there are challenges in the following areas:

  1. ALB and Listener need to be managed by separate tools like Terraform or Crossplane
  2. AWS Load Balancer Controller and information required for its operation need to be distributed to all clusters, increasing additional setup and management costs
  3. Not compatible with sig-multicluster, making it difficult to extend and apply in the long term

On the other hand, if AWS Load Balancer Controller directly supports Custom EndpointSlices, which is a Kubernetes standard specification, the complicated setup mentioned above would become unnecessary. I believe this approach is preferable in terms of achieving the configuration that users ultimately need in a simpler way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

No branches or pull requests

3 participants