Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hubble-relay port-fording : context deadline exceeded / error reading server preface: EOF #1356

Closed
ledroide opened this issue Jan 22, 2024 · 8 comments

Comments

@ledroide
Copy link

summary

Cannot use hubble from port-forwarding to hubble-relay

demo

$ cilium hubble port-forward

$ hubble status
failed to connect to 'localhost:4245': context deadline exceeded: connection error: desc = "error reading server preface: EOF"

$ hubble observe --since 1h --follow --namespace cilium-test
failed to connect to 'localhost:4245': context deadline exceeded: connection error: desc = "error reading server preface: EOF"
$ kubectl port-forward -n kube-system deployment/hubble-relay 4245:4245
Forwarding from 127.0.0.1:4245 -> 4245
Forwarding from [::1]:4245 -> 4245
Handling connection for 4245
Handling connection for 4245

$ hubble status
failed to connect to 'localhost:4245': context deadline exceeded: connection error: desc = "error reading server preface: EOF"

context and versions

  • cilium v1.14.5 running fine
  • hubble 0.13.0 - tried hubble 0.12.3, same error
  • cilium connectivity tests -> all OK
  • kubeconfig is cluster-admin -> full privileges for all namespaces
  • kubernetes v1.28.6 on premise
  • cri-o 1.28.1 ; crun 1.8.5
  • ubuntu cloud minimal 23.10

info

There is this strange message "no cilium pods found in namespace "kube-system" - although cilium status find pods and images references as expected.

$ cilium version
cilium-cli: v0.15.20 compiled with go1.21.6 on linux/amd64
cilium image (default): v1.14.5
cilium image (stable): v1.14.5
cilium image (running): unknown. Unable to obtain cilium version, no cilium pods found in namespace "kube-system"

Everything looks fine :

$ cilium status --wait
    /¯¯\
 /¯¯\__/¯¯\    Cilium:             OK
 \__/¯¯\__/    Operator:           OK
 /¯¯\__/¯¯\    Envoy DaemonSet:    disabled (using embedded mode)
 \__/¯¯\__/    Hubble Relay:       OK
    \__/       ClusterMesh:        disabled

Deployment             hubble-ui          Desired: 1, Ready: 1/1, Available: 1/1
Deployment             hubble-relay       Desired: 1, Ready: 1/1, Available: 1/1
Deployment             cilium-operator    Desired: 2, Ready: 2/2, Available: 2/2
DaemonSet              cilium             Desired: 6, Ready: 6/6, Available: 6/6
Containers:            cilium             Running: 6
                       hubble-ui          Running: 1
                       hubble-relay       Running: 1
                       cilium-operator    Running: 2
Cluster Pods:          33/33 managed by Cilium
Helm chart version:
Image versions         cilium             quay.io/cilium/cilium:v1.14.5: 6
                       hubble-ui          quay.io/cilium/hubble-ui:v0.11.0: 1
                       hubble-ui          quay.io/cilium/hubble-ui-backend:v0.11.0: 1
                       hubble-relay       quay.io/cilium/hubble-relay:v1.14.5: 1
                       cilium-operator    quay.io/cilium/operator:v1.14.5: 2

$ hubble version
hubble 0.13.0 compiled with go1.21.6 on linux/amd64

what we have tried

  • hubble-relay logs : nothing happens in the logs while trying to use hubble relay
  • cilium logs : some errors like this one, not sure it is related to hubble :
cilium-8pb7d cilium-agent level=info msg="Unable to determine next hop address" error="failed to retrieve route for remote node IP: network is unreachable" interface=kube-ipvs0 ipAddr=192.168.1.1 subsys=node-neigh-debug
  • other service on the cluster port-forwarded to same port 4245 -> test passed OK, service is forwarded using local port 4245
$ kubectl port-forward svc/mongo-express -n mongo 4245:80
Forwarding from [::1]:4245 -> 8081
Handling connection for 4245
  • port forward hubble-relay to other local port -> same error when using local port 12000
$ cilium hubble port-forward --port-forward 12000
$ hubble status --server localhost:12000
failed to connect to 'localhost:12000': context deadline exceeded: connection error: desc = "error reading server preface: EOF"
  • port-forward hubble-ui -> test passed OK, UI available in browser using local port 12000
$ cilium hubble ui
ℹ️ Opening "http://localhost:12000" in your browser...

what we have read before

@ledroide
Copy link
Author

Does anyone have a idea for a workaround to use hubble ?
Since it looks like a port-forwarding issue, how can I display "hubble observe" without port-forwarding ?

@rolinh
Copy link
Member

rolinh commented Feb 2, 2024

What you can try doing is accessing the Hubble Relay service via the Hubble CLI (e.g. from within a Cilium agent pod) and check if everything is fine there (e.g. hubble status against the Hubble Relay service).

@ledroide
Copy link
Author

ledroide commented Feb 2, 2024

What you can try doing is accessing the Hubble Relay service via the Hubble CLI (e.g. from within a Cilium agent pod) and check if everything is fine there (e.g. hubble status against the Hubble Relay service).

many thanks @rolinh for your suggestion. This is a good workaround.

$ kubectl exec -ti ds/cilium -c cilium-agent -- hubble observe --since 10m --follow --namespace kube-system
Feb  2 15:47:11.048: 10.233.68.120:42736 (remote-node) <> kube-system/metrics-server-6dbb566f54-n9j4g:10250 (ID:51921) to-overlay FORWARDED (TCP Flags: ACK, PSH)
Feb  2 15:47:19.722: 10.233.68.120:44872 (host) <- kube-system/dns-autoscaler-8576bb9f5b-n4vh7:8080 (ID:9883) to-stack FORWARDED (TCP Flags: SYN, ACK)

@ledroide
Copy link
Author

Edit : the trick does not work as expected : the cilium pod only observes workloads that run on the same node.

if running hubble observe on a cilium pod running on worker-1, hubble will only observe pods running on worker-1.

@rolinh
Copy link
Member

rolinh commented Feb 16, 2024

if running hubble observe on a cilium pod running on worker-1, hubble will only observe pods running on worker-1.

Yes, by default the Hubble CLI on the pod queries the local hubble server. Howver, you can point it to the hubble-relay service instead by either using the --server flag or by using the HUBBLE_SERVER environment variable. So from a pod, you want to do something like:

HUBBLE_SERVER=hubble-relay.kube-system:80 hubble observe

@ledroide
Copy link
Author

Current status :

  • Nothing has changed after upgrading cilium to v1.15.1
  • cilium hubble port-forward still leads to "error reading server preface: EOF" when using hubble locally from workstation
  • kubectl exec - first trick from @rolinh - works only if we target a cilium pod that is running on a worker node
  • but hubble does no see any trafic if kubectl exec on a master node
  • HUBBLE_SERVER=hubble-relay.kube-system:80 2nd trick from @rolinh does solve anything

Let's dig a bit further. Here are the cilium pods running on 3 masters and 3 workers :

$ kubectl get pod -n kube-system -l k8s-app=cilium -o wide
NAME           READY   STATUS    RESTARTS   AGE   IP               NODE             NOMINATED NODE   READINESS GATES
cilium-2q56l   1/1     Running   0          11h   100.94.2.140     k8ststworker-3   <none>           <none>
cilium-5wdv5   1/1     Running   0          11h   100.94.2.45      k8ststmaster-1   <none>           <none>
cilium-lsgfm   1/1     Running   0          11h   100.182.210.97   k8ststworker-2   <none>           <none>
cilium-vnllq   1/1     Running   0          11h   100.94.2.86      k8ststmaster-2   <none>           <none>
cilium-w7lph   1/1     Running   0          11h   100.94.2.34      k8ststmaster-3   <none>           <none>
cilium-z65zm   1/1     Running   0          11h   100.94.2.101     k8ststworker-1   <none>           <none>

The first trick that works if I choose a worker node :

$ kubectl exec -ti pod/cilium-2q56l -c cilium-agent -n kube-system -- hubble observe --since 1m --namespace sxxxxxxl -l section=wxxxxxr
Feb 20 06:57:25.239: sxxxxxxl/zincsearch-0:43618 (ID:63357) -> 74.192.137.83:443 (world) to-stack FORWARDED (TCP Flags: ACK, PSH)
Feb 20 06:57:25.239: sxxxxxxl/zincsearch-0:43618 (ID:63357) -> 74.192.137.83:443 (world) to-stack FORWARDED (TCP Flags: ACK, FIN)
Feb 20 06:57:25.243: sxxxxxxl/zincsearch-0:43618 (ID:63357) <- 74.192.137.83:443 (world) to-endpoint FORWARDED (TCP Flags: ACK)
Feb 20 06:57:25.243: sxxxxxxl/zincsearch-0:43618 (ID:63357) <- 74.192.137.83:443 (world) to-endpoint FORWARDED (TCP Flags: ACK, FIN)
Feb 20 06:57:25.243: sxxxxxxl/zincsearch-0:43618 (ID:63357) -> 74.192.137.83:443 (world) to-stack FORWARDED (TCP Flags: ACK)
Feb 20 06:57:28.645: 10.233.65.97:43796 (host) -> sxxxxxxl/zincsearch-0:4080 (ID:63357) policy-verdict:L3-Only INGRESS ALLOWED (TCP Flags: SYN)
Feb 20 06:57:28.645: 10.233.65.97:43796 (host) -> sxxxxxxl/zincsearch-0:4080 (ID:63357) to-endpoint FORWARDED (TCP Flags: SYN)

Same command with exec to a master node -> no answer, no line displayed, hubble looks blind.

$ kubectl exec -ti pod/cilium-vnllq -c cilium-agent -n kube-system -- hubble observe --since 1m --namespace sxxxxxxl -l section=wxxxxxr

Now we try variable HUBBLE_SERVER=hubble-relay.kube-system:80 from a worker node -> "connection refused"

$ kubectl exec -ti pod/cilium-2q56l -c cilium-agent -n kube-system -- bash
root@k8ststworker-3:/home/cilium# HUBBLE_SERVER=hubble-relay.kube-system:80 hubble observe --since 1m --namespace sxxxxxxl -l section=wxxxxxr
failed to connect to 'hubble-relay.kube-system:80': connection error: desc = "transport: error while dialing: dial tcp 10.233.3.157:80: connect: connection refused"
root@k8ststworker-3:/home/cilium# host hubble-relay.kube-system
bash: host: command not found
root@k8ststworker-3:/home/cilium# getent hosts hubble-relay.kube-system
10.233.3.157    hubble-relay.kube-system.svc.cluster.local
root@k8ststworker-3:/home/cilium# 
exit

service/hubble-relay listens on 443, not 80. Not that hubble-relay pod had been scheduled to a worker node :

$ kubectl get svc,ep,pod -l k8s-app=hubble-relay -n kube-system -o wide
NAME                           TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)    AGE   SELECTOR
service/hubble-relay           ClusterIP   10.233.3.157   <none>        443/TCP    42d   k8s-app=hubble-relay
service/hubble-relay-metrics   ClusterIP   None           <none>        9966/TCP   42d   k8s-app=hubble-relay

NAME                             ENDPOINTS            AGE
endpoints/hubble-relay           10.233.65.168:4245   42d
endpoints/hubble-relay-metrics   <none>               42d

NAME                               READY   STATUS    RESTARTS   AGE   IP              NODE             NOMINATED NODE   READINESS GATES
pod/hubble-relay-b4df78f74-p2rhd   1/1     Running   0          11h   10.233.65.168   k8ststworker-3   <none>           <none>

So let's try port 443 from the same worker node :

$ kubectl exec -ti pod/cilium-2q56l -c cilium-agent -n kube-system -- bash
root@k8ststworker-3:/home/cilium# HUBBLE_SERVER=hubble-relay.kube-system:443 hubble observe --since 1m --namespace sxxxxxxl -l section=wxxxxxr
failed to connect to 'hubble-relay.kube-system:443': context deadline exceeded: connection error: desc = "error reading server preface: EOF"

Conclusions :

  • At first I was considering a network issue between my workstation and the cluster, that would prevent port-forwarding for some ports - knowing that port-forwarding works fine with all other services in other namespaces from the same cluster
  • But now, after testing further, I can see the issue from inside the cluster

Any idea ?

@rolinh
Copy link
Member

rolinh commented Feb 20, 2024

Any idea ?

There's a new bit of information: the hubble-relay service runs on port 443. This means you enabled TLS for the Relay service so you will need to enforce TLS using the Hubble CLI:

$ hubble help observe | grep -i tls
      --tls                           Specify that TLS must be used when establishing a connection to a Hubble server.
                                      By default, TLS is only enabled if the server address starts with 'tls://'.
      --tls-allow-insecure            Allows the client to skip verifying the server's certificate chain and host name.
                                      This option is NOT recommended as, in this mode, TLS is susceptible to machine-in-the-middle attacks.
                                      See also the 'tls-server-name' option which allows setting the server name.
      --tls-ca-cert-files strings     Paths to custom Certificate Authority (CA) certificate files.The files must contain PEM encoded data.
      --tls-client-cert-file string   Path to the public key file for the client certificate to connect to a Hubble server (implies TLS).
      --tls-client-key-file string    Path to the private key file for the client certificate to connect a Hubble server (implies TLS).
      --tls-server-name string        Specify a server name to verify the hostname on the returned certificate (eg: 'instance.hubble-relay.cilium.io').

@ledroide
Copy link
Author

Issue is SOLVED thanks to @rolinh. It was a TLS issue.

$ cilium hubble port-forward &
$ hubble config set tls true
$ hubble config set tls-allow-insecure true
$ 
$ cat ~/.config/hubble/config.yaml
tls: true
tls-allow-insecure: true
$ 
$ hubble status
Healthcheck (via localhost:4245): Ok
Current/Max Flows: 24,570/24,570 (100.00%)
Flows/s: 51.15
Connected Nodes: 6/6
$ 
$ hubble observe -n trivy-system --since 1m
Feb 20 14:05:47.862: trivy-system/trivy-operator-69cff49598-kwvgq:44670 (ID:3610) <- 100.94.2.25:443 (kube-apiserver) to-endpoint FORWARDED (TCP Flags: ACK, PSH)
Feb 20 14:05:47.862: trivy-system/trivy-operator-69cff49598-kwvgq:44670 (ID:3610) -> 100.94.2.25:6443 (kube-apiserver) to-stack FORWARDED (TCP Flags: ACK)
Feb 20 14:05:49.827: 10.233.70.129:49776 (host) -> trivy-system/trivy-operator-69cff49598-kwvgq:9090 (ID:3610) to-endpoint FORWARDED (TCP Flags: SYN)
Feb 20 14:05:49.827: 10.233.70.129:49776 (host) <- trivy-system/trivy-operator-69cff49598-kwvgq:9090 (ID:3610) to-stack FORWARDED (TCP Flags: SYN, ACK)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants