Skip to content

Commit f592655

Browse files
authored
Add troubleshooting guide for non-cluster hosts and VMs setup (#2350)
1 parent 05d1eff commit f592655

File tree

6 files changed

+338
-2
lines changed

6 files changed

+338
-2
lines changed
Lines changed: 111 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,111 @@
1+
---
2+
description: Troubleshoot non-cluster hosts and VMs setup
3+
---
4+
5+
# Troubleshoot non-cluster hosts and VMs setup
6+
7+
This document provides guidance for troubleshooting Calico running on hosts and VMs outside of a cluster.
8+
9+
## Useful commands
10+
11+
These commands can help you collect logs and monitor system activities during troubleshooting.
12+
13+
### On non-cluster hosts or VMs
14+
15+
```bash
16+
journalctl -xue calico-node.service -f
17+
journalctl -xue calico-fluent-bit.service -f
18+
```
19+
20+
### On the cluster side
21+
22+
```bash
23+
kubectl logs -n calico-system -l k8s-app=calico-typha-noncluster-host
24+
kubectl logs -n tigera-manager -l k8s-app=tigera-manager -c tigera-voltron
25+
```
26+
27+
You can monitor CertificateSigningRequests (CSR) by running:
28+
29+
```bash
30+
kubectl get certificatesigningrequest -w
31+
```
32+
33+
Monitoring CSRs is useful for debugging certificates used for Calico Node and Typha mutual TLS (mTLS) communication. The automatic CSR approval and signing flow can fail in several ways. For example:
34+
35+
- The CSR request might not be created or submitted correctly.
36+
- The Tigera Operator CSR controller might not process it.
37+
- The Tigera Operator signer might reject the request due to invalid fields or missing permission.
38+
39+
When such failure occur, the CSR status object contains detailed condition and error messages that help identify the root cause.
40+
41+
## Common problems
42+
43+
### No internet connection after installing the Calico Node package
44+
45+
By default, $[prodname] blocks all traffic to and from host interfaces. You can use a profile with host endpoints to modify default behavior. Apply the built-in profile `projectcalico-default-allow`, which allows all ingress and egress traffic. Host endpoints that use this profile will have *allow-all* behavior instead of *deny-all* when no network policy is applied.
46+
47+
Example `HostEndpoint` with the `projectcalico-default-allow` profile:
48+
49+
```yaml
50+
apiVersion: projectcalico.org/v3
51+
kind: HostEndpoint
52+
metadata:
53+
name: <endpoint-name>
54+
spec:
55+
interfaceName: <interface-name>
56+
node: <node-hostname>
57+
expectedIPs: ["<list-of-expected-ips>"]
58+
profiles:
59+
- projectcalico-default-allow
60+
```
61+
62+
### Certificate signed by unknown authority
63+
64+
If the certificate presented by the Kubernetes API server or Tigera Manager endpoint is not signed by a trusted Certificate Authority (CA), add the correct CA certificate to the system trust store. Alternatively, for the Calico fluent-bit log forwarder, you can temporarily disable TLS verifications by setting:
65+
66+
```conf
67+
[OUTPUT]
68+
...
69+
tls.verify Off
70+
...
71+
```
72+
73+
in the configuration file `/etc/calico/calico-fluent-bit/calico-fluent-bit.conf`.
74+
75+
:::note
76+
77+
Disabling TLS verification should only be used for testing or troubleshooting.
78+
79+
:::
80+
81+
### No object can be associated with CSR error
82+
83+
If a CSR is denied with the following error:
84+
85+
```text
86+
invalid: no object can be associated with CSR node-certs-noncluster-host:<hostname>
87+
```
88+
89+
verify the following:
90+
91+
* A corresponding host endpoint resource exists for the non-cluster host or VM.
92+
* The `spec.node` field in the host endpoint resource matches the non-cluster host name exactly.
93+
94+
### Peer certificate does not have required CN
95+
96+
If the non-cluster host fails to connect to the dedicated Typha deployment, check that the certificate Common Name (CN) values are consistent on both sides.
97+
98+
On the non-cluster host or VM under the `/etc/calico/calico-node` folder:
99+
100+
* In `calico-node.conf`, verify the `TyphaCN` value matches the remote Typha server certificate CN, or
101+
* In `calico-node.env`, verify the `FELIX_TYPHACN` value matches the remote Typha server certificate CN.
102+
103+
On the cluster side (`calico-system/calico-typha-noncluster-host` deployment):
104+
105+
* The `TYPHA_CLIENTCN` environment variable must match the CN used in the non-cluster node certificate.
106+
107+
### Certificate is not renewed or updated
108+
109+
The `calico-noncluster-host-init` process runs before the main `calico-node` service is responsible for renewing certificates that are expired or near expiry. Certificates are renewed automatically within 90 days of expiry.
110+
111+
If you need to force immediate renewal, manually delete the existing certificate (`calico-node.crt`) and private key (`calico-node.key`) under the `/etc/calico/calico-node` folder and restart the service.
Lines changed: 111 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,111 @@
1+
---
2+
description: Troubleshoot non-cluster hosts and VMs setup
3+
---
4+
5+
# Troubleshoot non-cluster hosts and VMs setup
6+
7+
This document provides guidance for troubleshooting Calico running on hosts and VMs outside of a cluster.
8+
9+
## Useful commands
10+
11+
These commands can help you collect logs and monitor system activities during troubleshooting.
12+
13+
### On non-cluster hosts or VMs
14+
15+
```bash
16+
journalctl -xue calico-node.service -f
17+
journalctl -xue calico-fluent-bit.service -f
18+
```
19+
20+
### On the cluster side
21+
22+
```bash
23+
kubectl logs -n calico-system -l k8s-app=calico-typha-noncluster-host
24+
kubectl logs -n tigera-manager -l k8s-app=tigera-manager -c tigera-voltron
25+
```
26+
27+
You can monitor CertificateSigningRequests (CSR) by running:
28+
29+
```bash
30+
kubectl get certificatesigningrequest -w
31+
```
32+
33+
Monitoring CSRs is useful for debugging certificates used for Calico Node and Typha mutual TLS (mTLS) communication. The automatic CSR approval and signing flow can fail in several ways. For example:
34+
35+
- The CSR request might not be created or submitted correctly.
36+
- The Tigera Operator CSR controller might not process it.
37+
- The Tigera Operator signer might reject the request due to invalid fields or missing permission.
38+
39+
When such failure occur, the CSR status object contains detailed condition and error messages that help identify the root cause.
40+
41+
## Common problems
42+
43+
### No internet connection after installing the Calico Node package
44+
45+
By default, $[prodname] blocks all traffic to and from host interfaces. You can use a profile with host endpoints to modify default behavior. Apply the built-in profile `projectcalico-default-allow`, which allows all ingress and egress traffic. Host endpoints that use this profile will have *allow-all* behavior instead of *deny-all* when no network policy is applied.
46+
47+
Example `HostEndpoint` with the `projectcalico-default-allow` profile:
48+
49+
```yaml
50+
apiVersion: projectcalico.org/v3
51+
kind: HostEndpoint
52+
metadata:
53+
name: <endpoint-name>
54+
spec:
55+
interfaceName: <interface-name>
56+
node: <node-hostname>
57+
expectedIPs: ["<list-of-expected-ips>"]
58+
profiles:
59+
- projectcalico-default-allow
60+
```
61+
62+
### Certificate signed by unknown authority
63+
64+
If the certificate presented by the Kubernetes API server or Tigera Manager endpoint is not signed by a trusted Certificate Authority (CA), add the correct CA certificate to the system trust store. Alternatively, for the Calico fluent-bit log forwarder, you can temporarily disable TLS verifications by setting:
65+
66+
```conf
67+
[OUTPUT]
68+
...
69+
tls.verify Off
70+
...
71+
```
72+
73+
in the configuration file `/etc/calico/calico-fluent-bit/calico-fluent-bit.conf`.
74+
75+
:::note
76+
77+
Disabling TLS verification should only be used for testing or troubleshooting.
78+
79+
:::
80+
81+
### No object can be associated with CSR error
82+
83+
If a CSR is denied with the following error:
84+
85+
```text
86+
invalid: no object can be associated with CSR node-certs-noncluster-host:<hostname>
87+
```
88+
89+
verify the following:
90+
91+
* A corresponding host endpoint resource exists for the non-cluster host or VM.
92+
* The `spec.node` field in the host endpoint resource matches the non-cluster host name exactly.
93+
94+
### Peer certificate does not have required CN
95+
96+
If the non-cluster host fails to connect to the dedicated Typha deployment, check that the certificate Common Name (CN) values are consistent on both sides.
97+
98+
On the non-cluster host or VM under the `/etc/calico/calico-node` folder:
99+
100+
* In `calico-node.conf`, verify the `TyphaCN` value matches the remote Typha server certificate CN, or
101+
* In `calico-node.env`, verify the `FELIX_TYPHACN` value matches the remote Typha server certificate CN.
102+
103+
On the cluster side (`calico-system/calico-typha-noncluster-host` deployment):
104+
105+
* The `TYPHA_CLIENTCN` environment variable must match the CN used in the non-cluster node certificate.
106+
107+
### Certificate is not renewed or updated
108+
109+
The `calico-noncluster-host-init` process runs before the main `calico-node` service is responsible for renewing certificates that are expired or near expiry. Certificates are renewed automatically within 90 days of expiry.
110+
111+
If you need to force immediate renewal, manually delete the existing certificate (`calico-node.crt`) and private key (`calico-node.key`) under the `/etc/calico/calico-node` folder and restart the service.
Lines changed: 111 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,111 @@
1+
---
2+
description: Troubleshoot non-cluster hosts and VMs setup
3+
---
4+
5+
# Troubleshoot non-cluster hosts and VMs setup
6+
7+
This document provides guidance for troubleshooting Calico running on hosts and VMs outside of a cluster.
8+
9+
## Useful commands
10+
11+
These commands can help you collect logs and monitor system activities during troubleshooting.
12+
13+
### On non-cluster hosts or VMs
14+
15+
```bash
16+
journalctl -xue calico-node.service -f
17+
journalctl -xue calico-fluent-bit.service -f
18+
```
19+
20+
### On the cluster side
21+
22+
```bash
23+
kubectl logs -n calico-system -l k8s-app=calico-typha-noncluster-host
24+
kubectl logs -n tigera-manager -l k8s-app=tigera-manager -c tigera-voltron
25+
```
26+
27+
You can monitor CertificateSigningRequests (CSR) by running:
28+
29+
```bash
30+
kubectl get certificatesigningrequest -w
31+
```
32+
33+
Monitoring CSRs is useful for debugging certificates used for Calico Node and Typha mutual TLS (mTLS) communication. The automatic CSR approval and signing flow can fail in several ways. For example:
34+
35+
- The CSR request might not be created or submitted correctly.
36+
- The Tigera Operator CSR controller might not process it.
37+
- The Tigera Operator signer might reject the request due to invalid fields or missing permission.
38+
39+
When such failure occur, the CSR status object contains detailed condition and error messages that help identify the root cause.
40+
41+
## Common problems
42+
43+
### No internet connection after installing the Calico Node package
44+
45+
By default, $[prodname] blocks all traffic to and from host interfaces. You can use a profile with host endpoints to modify default behavior. Apply the built-in profile `projectcalico-default-allow`, which allows all ingress and egress traffic. Host endpoints that use this profile will have *allow-all* behavior instead of *deny-all* when no network policy is applied.
46+
47+
Example `HostEndpoint` with the `projectcalico-default-allow` profile:
48+
49+
```yaml
50+
apiVersion: projectcalico.org/v3
51+
kind: HostEndpoint
52+
metadata:
53+
name: <endpoint-name>
54+
spec:
55+
interfaceName: <interface-name>
56+
node: <node-hostname>
57+
expectedIPs: ["<list-of-expected-ips>"]
58+
profiles:
59+
- projectcalico-default-allow
60+
```
61+
62+
### Certificate signed by unknown authority
63+
64+
If the certificate presented by the Kubernetes API server or Tigera Manager endpoint is not signed by a trusted Certificate Authority (CA), add the correct CA certificate to the system trust store. Alternatively, for the Calico fluent-bit log forwarder, you can temporarily disable TLS verifications by setting:
65+
66+
```conf
67+
[OUTPUT]
68+
...
69+
tls.verify Off
70+
...
71+
```
72+
73+
in the configuration file `/etc/calico/calico-fluent-bit/calico-fluent-bit.conf`.
74+
75+
:::note
76+
77+
Disabling TLS verification should only be used for testing or troubleshooting.
78+
79+
:::
80+
81+
### No object can be associated with CSR error
82+
83+
If a CSR is denied with the following error:
84+
85+
```text
86+
invalid: no object can be associated with CSR node-certs-noncluster-host:<hostname>
87+
```
88+
89+
verify the following:
90+
91+
* A corresponding host endpoint resource exists for the non-cluster host or VM.
92+
* The `spec.node` field in the host endpoint resource matches the non-cluster host name exactly.
93+
94+
### Peer certificate does not have required CN
95+
96+
If the non-cluster host fails to connect to the dedicated Typha deployment, check that the certificate Common Name (CN) values are consistent on both sides.
97+
98+
On the non-cluster host or VM under the `/etc/calico/calico-node` folder:
99+
100+
* In `calico-node.conf`, verify the `TyphaCN` value matches the remote Typha server certificate CN, or
101+
* In `calico-node.env`, verify the `FELIX_TYPHACN` value matches the remote Typha server certificate CN.
102+
103+
On the cluster side (`calico-system/calico-typha-noncluster-host` deployment):
104+
105+
* The `TYPHA_CLIENTCN` environment variable must match the CN used in the non-cluster node certificate.
106+
107+
### Certificate is not renewed or updated
108+
109+
The `calico-noncluster-host-init` process runs before the main `calico-node` service is responsible for renewing certificates that are expired or near expiry. Certificates are renewed automatically within 90 days of expiry.
110+
111+
If you need to force immediate renewal, manually delete the existing certificate (`calico-node.crt`) and private key (`calico-node.key`) under the `/etc/calico/calico-node` folder and restart the service.

calico-enterprise_versioned_sidebars/version-3.22-2-sidebars.json

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -110,7 +110,8 @@
110110
},
111111
"items": [
112112
"getting-started/bare-metal/about",
113-
"getting-started/bare-metal/typha-node-tls"
113+
"getting-started/bare-metal/typha-node-tls",
114+
"getting-started/bare-metal/troubleshoot"
114115
]
115116
},
116117
{

calico-enterprise_versioned_sidebars/version-3.23-1-sidebars.json

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -110,7 +110,8 @@
110110
},
111111
"items": [
112112
"getting-started/bare-metal/about",
113-
"getting-started/bare-metal/typha-node-tls"
113+
"getting-started/bare-metal/typha-node-tls",
114+
"getting-started/bare-metal/troubleshoot"
114115
]
115116
},
116117
{

sidebars-calico-enterprise.js

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -90,6 +90,7 @@ module.exports = {
9090
items: [
9191
'getting-started/bare-metal/about',
9292
'getting-started/bare-metal/typha-node-tls',
93+
'getting-started/bare-metal/troubleshoot',
9394
],
9495
},
9596
{

0 commit comments

Comments
 (0)