-
Notifications
You must be signed in to change notification settings - Fork 271
Description
Not sure if this is an enhancement, feature or a maybe a bug.
I naively assumed that an observability platform for networking on K8s gives me some kind of explicit indication if a K8s service cannot be reached, because the pods that are backing the service are not available/running.
However, I only see the usual DNS flows that also occur when the pods are running. No indication whatsoever that would make me aware that the service cannot be used, because the pods have crashed:
Nov 20 12:53:48.042: pod-test/client-0 (ID:23235) <> kube-system/kube-dns:53 (world) pre-xlate-fwd TRACED (UDP)
Nov 20 12:53:48.042: pod-test/client-0 (ID:23235) <> kube-system/coredns-5dd5756b68-cnszf:53 (ID:31735) post-xlate-fwd TRANSLATED (UDP)
Nov 20 12:53:48.042: pod-test/client-0:41813 (ID:23235) -> kube-system/coredns-5dd5756b68-cnszf:53 (ID:31735) to-endpoint FORWARDED (UDP)
Nov 20 12:53:48.042: pod-test/client-0:41813 (ID:23235) <> kube-system/coredns-5dd5756b68-cnszf (ID:31735) pre-xlate-rev TRACED (UDP)
Nov 20 12:53:48.042: pod-test/client-0:41813 (ID:23235) <> kube-system/coredns-5dd5756b68-cnszf (ID:31735) pre-xlate-rev TRACED (UDP)
Nov 20 12:53:48.043: pod-test/client-0:41813 (ID:23235) <- kube-system/coredns-5dd5756b68-cnszf:53 (ID:31735) to-endpoint FORWARDED (UDP)
Nov 20 12:53:48.043: kube-system/coredns-5dd5756b68-cnszf:53 (ID:31735) <> pod-test/client-0 (ID:23235) pre-xlate-rev TRACED (UDP)
Nov 20 12:53:48.043: kube-system/kube-dns:53 (world) <> pod-test/client-0 (ID:23235) post-xlate-rev TRANSLATED (UDP)
Nov 20 12:53:48.043: kube-system/coredns-5dd5756b68-cnszf:53 (ID:31735) <> pod-test/client-0 (ID:23235) pre-xlate-rev TRACED (UDP)
Nov 20 12:53:48.043: kube-system/kube-dns:53 (world) <> pod-test/client-0 (ID:23235) post-xlate-rev TRANSLATED (UDP)
At the client side I simply curl’ed the service, getting an expected error message, because the server pods were not running:
root@client-0:/# curl http://pod-service
curl: (7) Couldn't connect to server
When the pods are running, then the flows look like this (note that the first 10 lines look exactly like the 10 lines of the error-case mentioned above):
Nov 20 12:59:27.931: pod-test/client-0 (ID:23235) <> kube-system/kube-dns:53 (world) pre-xlate-fwd TRACED (UDP)
Nov 20 12:59:27.931: pod-test/client-0 (ID:23235) <> kube-system/coredns-5dd5756b68-cnszf:53 (ID:31735) post-xlate-fwd TRANSLATED (UDP)
Nov 20 12:59:27.931: pod-test/client-0:50175 (ID:23235) -> kube-system/coredns-5dd5756b68-cnszf:53 (ID:31735) to-endpoint FORWARDED (UDP)
Nov 20 12:59:27.931: pod-test/client-0:50175 (ID:23235) <> kube-system/coredns-5dd5756b68-cnszf (ID:31735) pre-xlate-rev TRACED (UDP)
Nov 20 12:59:27.931: pod-test/client-0:50175 (ID:23235) <> kube-system/coredns-5dd5756b68-cnszf (ID:31735) pre-xlate-rev TRACED (UDP)
Nov 20 12:59:27.931: pod-test/client-0:50175 (ID:23235) <- kube-system/coredns-5dd5756b68-cnszf:53 (ID:31735) to-endpoint FORWARDED (UDP)
Nov 20 12:59:27.931: kube-system/coredns-5dd5756b68-cnszf:53 (ID:31735) <> pod-test/client-0 (ID:23235) pre-xlate-rev TRACED (UDP)
Nov 20 12:59:27.931: kube-system/kube-dns:53 (world) <> pod-test/client-0 (ID:23235) post-xlate-rev TRANSLATED (UDP)
Nov 20 12:59:27.932: kube-system/coredns-5dd5756b68-cnszf:53 (ID:31735) <> pod-test/client-0 (ID:23235) pre-xlate-rev TRACED (UDP)
Nov 20 12:59:27.932: kube-system/kube-dns:53 (world) <> pod-test/client-0 (ID:23235) post-xlate-rev TRANSLATED (UDP)
Nov 20 12:59:27.932: pod-test/client-0 (ID:23235) <> pod-test/pod-service:80 (world) pre-xlate-fwd TRACED (TCP)
Nov 20 12:59:27.932: pod-test/client-0 (ID:23235) <> pod-test/server-0:80 (ID:15840) post-xlate-fwd TRANSLATED (TCP)
Nov 20 12:59:27.932: pod-test/client-0:50570 (ID:23235) -> pod-test/server-0:80 (ID:15840) to-endpoint FORWARDED (TCP Flags: SYN)
Nov 20 12:59:27.932: pod-test/client-0:50570 (ID:23235) <- pod-test/server-0:80 (ID:15840) to-endpoint FORWARDED (TCP Flags: SYN, ACK)
Nov 20 12:59:27.932: pod-test/client-0:50570 (ID:23235) -> pod-test/server-0:80 (ID:15840) to-endpoint FORWARDED (TCP Flags: ACK)
Nov 20 12:59:27.932: pod-test/client-0:50570 (ID:23235) <> pod-test/server-0 (ID:15840) pre-xlate-rev TRACED (TCP)
Nov 20 12:59:27.932: pod-test/server-0:80 (ID:15840) <> pod-test/client-0 (ID:23235) pre-xlate-rev TRACED (TCP)
Nov 20 12:59:27.932: pod-test/pod-service:80 (world) <> pod-test/client-0 (ID:23235) post-xlate-rev TRANSLATED (TCP)
Nov 20 12:59:27.932: pod-test/client-0:50570 (ID:23235) -> pod-test/server-0:80 (ID:15840) to-endpoint FORWARDED (TCP Flags: ACK, PSH)
Nov 20 12:59:27.932: pod-test/client-0:50570 (ID:23235) <- pod-test/server-0:80 (ID:15840) to-endpoint FORWARDED (TCP Flags: ACK, PSH)
Nov 20 12:59:27.932: pod-test/client-0:50570 (ID:23235) -> pod-test/server-0:80 (ID:15840) to-endpoint FORWARDED (TCP Flags: ACK, FIN)
Nov 20 12:59:27.933: pod-test/client-0:50570 (ID:23235) <- pod-test/server-0:80 (ID:15840) to-endpoint FORWARDED (TCP Flags: ACK, FIN)
Nov 20 12:59:27.933: pod-test/client-0:50570 (ID:23235) -> pod-test/server-0:80 (ID:15840) to-endpoint FORWARDED (TCP Flags: ACK)
To reproduce the scenario, here’s a simple yaml with a client that can execute curl requests (e.g. curl http://pod-service
). The error situation is provoked by choosing a "nodeName" of the server pods which does not exist.
By commenting the “nodeName” in the StatefulSet “server”, the scenario can be switched to a state where the pod is running successfully and can serve as an endpoint to the service.
apiVersion: v1
kind: Namespace
metadata:
name: pod-test
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: server
namespace: pod-test
spec:
replicas: 1
selector:
matchLabels:
app: server
template:
metadata:
labels:
app: server
spec:
nodeName: kind-worker-non-existing
containers:
- image: nginx
imagePullPolicy: Always
name: nginx
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: client
namespace: pod-test
spec:
replicas: 1
selector:
matchLabels:
app: client
template:
metadata:
labels:
app: client
spec:
containers:
- image: ubuntu
command: ['sh', '-c', 'apt update && apt install curl -y && sleep 7d']
imagePullPolicy: Always
name: ubuntu
---
apiVersion: v1
kind: Service
metadata:
name: pod-service
namespace: pod-test
spec:
selector:
app: server
ports:
- protocol: TCP
port: 80
targetPort: 80