Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to upgrade Keycloak after Velero Backup / Restore #718

Closed
tworcester opened this issue Aug 30, 2024 · 4 comments · Fixed by #812
Closed

Unable to upgrade Keycloak after Velero Backup / Restore #718

tworcester opened this issue Aug 30, 2024 · 4 comments · Fixed by #812
Assignees
Labels
bug Something isn't working
Milestone

Comments

@tworcester
Copy link

Environment

Device and OS: RHEL 8
App version: [v0.24.1]
Kubernetes distro being used: zarf k3s 1.28
Other:

Steps to reproduce

  1. Deploy UDS-core (With velero configured for EnableCSI, in this case I was using aws-ebs-csi-driver)
  2. trigger backup k -n velero exec -it deploy/velero -- velero backup create --from-schedule=velero-udsbackup
  3. stand up another cluster
  4. Deploy UDS-core (Ideally you would just install Velero, but I didn't have access to just that zarf package)
  5. trigger restore k -n velero exec -it deploy/velero -- velero restore create --from-backup=<reference> --existing-resource-policy=update --include-namespaces=keycloak
  6. try to install UDS-core to a newer version (or the same)

Expected result

Successful upgrade of UDS-core on a cluster with velero restored UDS-core components.

Actual Result

PVCs that are restored have these fields in the spec due to the restoration options:

spec:
  dataSource:
    apiGroup: snapshot.storage.k8s.io
    kind: VolumeSnapshot
    name: velero-keycloak-data-12345
  dataSourceRef:
    apiGroup: snapshot.storage.k8s.io
    kind: VolumeSnapshot
    name: velero-keycloak-data-12345

When you try to re-install or upgrade to a new version, the Keycloak deployment fails saying that these fields are immutable.

Visual Proof (screenshots, videos, text, etc)

N/A

Severity/Priority

High - I vote this is pretty high since Velero is supposed to be the disaster recovery mechanism if an entire cluster goes down. This bug would prevent you from ever doing updates/maintenance on UDS-core

Additional Context

N/A

@tworcester tworcester added the possible-bug Something may not be working label Aug 30, 2024
@mjnagel mjnagel self-assigned this Sep 16, 2024
@mjnagel mjnagel added this to the 0.28.0 milestone Sep 17, 2024
@mjnagel
Copy link
Contributor

mjnagel commented Sep 23, 2024

I was able to reproduce this issue today. Still need to identify where this limitation comes from, but it appears that restored PVCs end up with a 1Gi size. When re-deploying core it tries to shrink the PVC to the default size of 512Mi, which fails since PVCs cannot be shrunk.

Updating the default size should be a small, safe change that will fix this issue - will have a PR up by tomorrow to resolve this.

@mjnagel
Copy link
Contributor

mjnagel commented Sep 23, 2024

Also wanted to note, we would generally recommend an external DB (i.e. RDS or even an in cluster postgres) when running in production. This allows scalability and doesn't rely on the PVCs, which can simplify the backup/restore (can leverage other backup/restore capabilities in tools like RDS). We have a short doc/example on Keycloak HA with a database here: https://uds.defenseunicorns.com/core/configuration/resource-configuration-and-ha/#keycloak

@mjnagel mjnagel added bug Something isn't working and removed possible-bug Something may not be working labels Sep 23, 2024
@mjnagel
Copy link
Contributor

mjnagel commented Sep 24, 2024

@tworcester in further digging we discovered that this is a limitation specific to EBS. EBS volumes have a minimum size of 1Gi, so when the snapshots are created and used to restore Velero only has the snapshot size to go off of, not the original PVC size. I don't think we're going to update the default PVC size in this case since it could cause issues on the upgrade if someone's storage class does not allow volume expansion, but you should have a few options here still:

  • Use bundle overrides to set a larger PVC size (1Gi+) which will ensure that the restored snapshots are the same size
  • Switch to RDS/external DB for Keycloak's storage: This is definitely the ideal if this is for a more persistent staging/prod environment

We're likely going to make some slight changes to our pre-req notes on storage class to also help here.

@tworcester
Copy link
Author

Awesome, thank you for tracking this down!

noahpb pushed a commit that referenced this issue Sep 25, 2024
## Description

EBS impose a 1Gi size limitation on restored PVCs. This adds a short
note to pre-reqs about checking CSI limitations.

While testing with our EKS IAC I also discovered a few other issues:
- IRSA annotations were not correct
- Config did not properly variablize region
- Config had an unmatched `"` around one of the values
- Gitignore did not exclude terraform/tfstate files that shouldn't be
committed

## Related Issue

Fixes #718

## Type of change

- [x] Bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Other (security config, docs update, etc)

## Checklist before merging

- [x] Test, docs, adr added or updated as needed
- [x] [Contributor
Guide](https://github.com/defenseunicorns/uds-template-capability/blob/main/CONTRIBUTING.md)
followed
docandrew pushed a commit that referenced this issue Sep 25, 2024
## Description

EBS impose a 1Gi size limitation on restored PVCs. This adds a short
note to pre-reqs about checking CSI limitations.

While testing with our EKS IAC I also discovered a few other issues:
- IRSA annotations were not correct
- Config did not properly variablize region
- Config had an unmatched `"` around one of the values
- Gitignore did not exclude terraform/tfstate files that shouldn't be
committed

## Related Issue

Fixes #718

## Type of change

- [x] Bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Other (security config, docs update, etc)

## Checklist before merging

- [x] Test, docs, adr added or updated as needed
- [x] [Contributor
Guide](https://github.com/defenseunicorns/uds-template-capability/blob/main/CONTRIBUTING.md)
followed
UnicornChance pushed a commit that referenced this issue Sep 26, 2024
## Description

EBS impose a 1Gi size limitation on restored PVCs. This adds a short
note to pre-reqs about checking CSI limitations.

While testing with our EKS IAC I also discovered a few other issues:
- IRSA annotations were not correct
- Config did not properly variablize region
- Config had an unmatched `"` around one of the values
- Gitignore did not exclude terraform/tfstate files that shouldn't be
committed

## Related Issue

Fixes #718

## Type of change

- [x] Bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Other (security config, docs update, etc)

## Checklist before merging

- [x] Test, docs, adr added or updated as needed
- [x] [Contributor
Guide](https://github.com/defenseunicorns/uds-template-capability/blob/main/CONTRIBUTING.md)
followed
docandrew pushed a commit that referenced this issue Oct 17, 2024
EBS impose a 1Gi size limitation on restored PVCs. This adds a short
note to pre-reqs about checking CSI limitations.

While testing with our EKS IAC I also discovered a few other issues:
- IRSA annotations were not correct
- Config did not properly variablize region
- Config had an unmatched `"` around one of the values
- Gitignore did not exclude terraform/tfstate files that shouldn't be
committed

Fixes #718

- [x] Bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Other (security config, docs update, etc)

- [x] Test, docs, adr added or updated as needed
- [x] [Contributor
Guide](https://github.com/defenseunicorns/uds-template-capability/blob/main/CONTRIBUTING.md)
followed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants