Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: obs.nerc Grafana Dashboards showing x509: certificate has expired #800

Closed
1 task
schwesig opened this issue Nov 5, 2024 · 14 comments
Closed
1 task
Assignees
Labels
bug Something isn't working observability openshift This issue pertains to NERC OpenShift

Comments

@schwesig
Copy link
Member

schwesig commented Nov 5, 2024

follow up:


Motivation

When opening a dashboard in Grafana on obs.nerc
e.g. https://grafana.apps.obs.nerc.mghpcc.org/d/20241028a/ai4dd-v5?orgId=1
there is an error:

  • Error updating options: x509: certificate has expired or is not yet valid: current time 2024-11-05T15:24:28Z is after 2024-11-01T13:48:52Z
  • on
    • Templating [Namespace]
    • Templating [Cluster]
    • Templating [ModelName]

Completion Criteria

Opening the dashboards in Grafana obs, seeing the data and getting no cert error.

Description

  • First step to resolve the issue

Completion dates

Desired - 2024-11-06
Required - 2024-11-08

Image

/CC @schwesig @computate @RH-csaggin @jtriley @larsks

@schwesig schwesig added bug Something isn't working observability openshift This issue pertains to NERC OpenShift labels Nov 5, 2024
@computate
Copy link
Member

computate commented Nov 5, 2024

Also the ACM Observability metrics endpoint on the infra cluster has a different certificate error, where the valid dates are ok:

Image

Image

@larsks
Copy link
Contributor

larsks commented Nov 5, 2024

Starting with the second issue first:

The certificate presented by https://observatorium-api-open-cluster-management-observability.apps.nerc-ocp-infra.rc.fas.harvard.edu/api/metrics/v1/default is signed by the observability-server-ca-certificate:

$ urlcert https://observatorium-api-open-cluster-management-observability.apps.nerc-ocp-infra.rc.fas.harvard.edu/api/metrics/v1/default | showcert
sha256 Fingerprint=4A:6C:A5:5C:69:D3:7D:3E:B8:EA:12:D1:5C:3B:D3:A2:AF:15:38:1C:43:5A:1C:23:BF:9E:76:86:9A:08:7A:03
subject=C=US, O=Red Hat, Inc., CN=observability-server-certificate
issuer=C=US, O=Red Hat, Inc., CN=observability-server-ca-certificate
notBefore=Aug 20 14:16:50 2024 GMT
notAfter=Aug 20 14:16:50 2025 GMT
X509v3 Subject Alternative Name:
    DNS:observability-server-certificate, DNS:observability-observatorium-api.open-cluster-management-observability.svc.cluster.local, DNS:observatorium-api-open-cluster-management-observability.apps.nerc-ocp-infra.rc.fas.harvard.edu

That CA isn't going to be trusted by anybody, hence the "certificate issuer is unknown" error. The correct fix is probably to change the corresponding route from passthrough to reencrypt so that the default ingress certificate is exposed to outside clients.

@larsks
Copy link
Contributor

larsks commented Nov 5, 2024

Regarding the first problem, which certificate is resulting in the "certificate is expired or not yet valid" error?

@larsks larsks self-assigned this Nov 5, 2024
@computate
Copy link
Member

computate commented Nov 5, 2024

The second problem sounds like ACM Observability suddenly broke with it's passthrough Route TLS handling.

kind: Route
apiVersion: route.openshift.io/v1
metadata:
  name: observatorium-api
  namespace: open-cluster-management-observability
  uid: a7f4bf8b-eba5-456b-b9b8-71e2e1dc4802
  resourceVersion: '1261594129'
  creationTimestamp: '2023-11-02T13:48:51Z'
  annotations:
    openshift.io/host.generated: 'true'
  ownerReferences:
    - apiVersion: observability.open-cluster-management.io/v1beta2
      kind: MultiClusterObservability
      name: observability
      uid: bcc31c98-3269-4ffc-bcfd-76257a9600d0
      controller: true
      blockOwnerDeletion: true

@larsks
Copy link
Contributor

larsks commented Nov 5, 2024

Another possible solution would be to configure grafana to trust the observability ca certificate.

@computate
Copy link
Member

The first one relates to dex and the Oauth configuration for Grafana in vault nerc-ocp-infra/dex/grafanas GF_TLSCLIENTCERT:

Validity
       Not Before: 2023-11-02 13:48:52 +0000 UTC
       Not After : 2024-11-01 13:48:52 +0000 UTC

@larsks
Copy link
Contributor

larsks commented Nov 5, 2024

The expired certificate in the oauth-client-secret secret (in the grafana namespace`) looks like it was generated by the observability tools:

$ k extract secret/oauth-client-secret
GF_AUTH_GENERIC_OAUTH_CLIENT_SECRET
GF_AUTH_GENERIC_TLSCACERT
GF_AUTH_GENERIC_TLSCLIENTCERT
GF_AUTH_GENERIC_TLSCLIENTKEY
$ showcert !$
showcert GF_AUTH_GENERIC_TLSCLIENTCERT
sha256 Fingerprint=51:E6:4F:CC:F6:D7:07:17:75:4B:00:F4:37:A3:74:EE:0D:31:EB:97:57:B8:25:DD:9A:A2:49:4E:AD:70:B8:0B
subject=C=US, O=Red Hat, Inc., CN=grafana
issuer=C=US, O=Red Hat, Inc., CN=observability-client-ca-certificate
notBefore=Nov  2 13:48:52 2023 GMT
notAfter=Nov  1 13:48:52 2024 GMT
X509v3 Subject Alternative Name:
    DNS:grafana

Note the issuer entry. This suggests there must be some mechanism to regenerate this certificate.

@schwesig
Copy link
Member Author

schwesig commented Nov 5, 2024

@computate
Copy link
Member

computate commented Nov 5, 2024

@larsks @schwesig I updated the certs and keys described in this issue (observability-grafana-certs, observability-server-ca-certs) in nerc-ocp-obs/dex/grafanas vault (GF_TLSCLIENTCERT, GF_TLSCLIENTKEY, GF_TLSCACERT) and restarted the grafana pods to get Grafana working again!

oc --as system:admin -n open-cluster-management-observability get secret/observability-grafana-certs -o jsonpath='{.data.tls\.crt}' | base64 -d
oc --as system:admin -n open-cluster-management-observability get secret/observability-grafana-certs -o jsonpath='{.data.tls\.key}' | base64 -d
oc --as system:admin -n open-cluster-management-observability get secret/observability-server-ca-certs -o jsonpath='{.data.ca\.crt}' | base64 -d

It's still a temporary solution until:

        Validity
            Not Before: Aug 20 14:16:50 2024 GMT
            Not After : Aug 20 14:16:50 2025 GMT

@larsks
Copy link
Contributor

larsks commented Nov 5, 2024

@computate @schwesig A neat command for dealing with files embedded in secrets (and configmaps) is the oc extract command; this will extract each key to a file in your local directory:

$ oc  -n open-cluster-management-observability extract secret/observability-grafana-certs
ca.crt
tls.crt
tls.key
$ ls -l
ca.crt	tls.crt  tls.key

Saves you from the whole jsonpath/base64 dance.

@schwesig
Copy link
Member Author

schwesig commented Nov 6, 2024

FYI:
thanks to @RH-csaggin for recommending
and shout out to @dcommisso (https://github.com/dcommisso) for writing this great tool
https://github.com/dcommisso/certexplorer

@schwesig
Copy link
Member Author

schwesig commented Nov 6, 2024

can we call this issue closed now?
I created a follow up for next year.
do we need an issue for finding a different solution?

@computate
Copy link
Member

You can close this issue @schwesig .

@schwesig schwesig closed this as completed Nov 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working observability openshift This issue pertains to NERC OpenShift
Projects
None yet
Development

No branches or pull requests

4 participants