You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Similar to what k8up ("application-aware") and velero ("hooks") provide, I propose to include a mechanism to run pre/post-snapshot commands via snapscheduler's operator. Let's simply call these commands hooks.
What is the value to the end user? (why is it a priority?)
For example, this would allow to dump a database straight onto the disk or run other WAL-flush / fs-freeze / fs-flush tooling before the CSI snapshot is actually triggered (as part of a pre backup hook). After a snapshot is completed, a post hook may run e.g. for fs-unfreeze handling.
How will we know we have a good solution? (acceptance criteria)
General design
In my opinion and as a starting-point, this feature should be implemented by directly passing-off commands to already running containers/pods (kubectl exec from the operator). The major advantage of this approach is immediate support for all PV access modes, especially when it comes to ReadWriteOncePod and ReadWriteOnce (though this could be mitigated by using proper affinity settings for same host execution).
With this approach, there is no need to inject side-containers or run additional pods/containers (e.g. a proper k8s Job) for this feature. Pre and post hooks could be anything and only depend on the available tools/shell within the targeted running container, where it is executed. In contract to k8up/velero, these commands should not result in a stream / artifact, which needs to be processed by the operator. Only the exit code is inspected. Hooks could be anything, but will typically do something on a mounted disk.
Out of scope
The obvious limitation is that pre and post hooks will not work if the pod/container the operator is targeting is currently not running. This is however fine in my opinion. The option to mark their execution as optional might even be a nice-to-have feature (skipIfNoSelectorFound).
CRD changes
There are 3 viable approaches when I comes to defining pre/post hooks:
as a pod-lvl annotation,
as part of a new CRD,
as part of the SnapshotSchedule CRD.
I'm not a fan of having annotations on pods for this feature, as manipulating the statefulsets / deployments (which typically manage those pods) will cause restarts / service disruptions (this is how k8up and velero does it). This design makes it hard to experiment with already running applications in my opinion and thus hard to introduce in preexisting services.
Having a new CRD for this feature is overkill in my opinion, thus I would propose having this as part of the SnapshotSchedule CRD.
We could do the following:
---
apiVersion: snapscheduler.backube/v1kind: SnapshotSchedulemetadata:
name: hourlynamespace: my-nsspec:
# [...]# pre hooks are executed within the container of a target pod in serial order before the actual snapshot is triggeredpreHooks:
# full example dumping a database
- podSelector: # required (LabelSelector)matchLabels:
app: my-databasecommand: # required (^= a non zero exit code will cause the snapshot to fail)
- /bin/bash
- -c
- "pg_dump $DATABASE | gzip > /mnt/data-disk/dump.sql.gz"container: postgres # optional, defaults to first container in target podnamespace: my-ns # optional, defaults to same namespace as SnapshotScheduletimeoutSeconds: 60# optional, defaults to 300 seconds (5 minutes)backoffLimit: 2# optional, defaults to 0 (no retries, snapshot is immediately failed)skipIfNoSelectorFound: true # optional, defaults to false (^= the default will not create a snapshot if no pod matches the selector!)# minimal example freezing the fs
- podSelector:
matchLabels:
app: nginxcommand: ["/sbin/fsfreeze", "--freeze", "/var/log/nginx"] container: fsfreeze# post hooks are executed within the container of a target pod in serial order after the snapshot was triggered# Note that post hooks may fail, but this will not cause the already generated snapshot to vanish/fail!postHooks:
# full example unfreezing the fs
- podSelector:
matchLabels:
app: nginxcommand: ["/sbin/fsfreeze", "--unfreeze", "/var/log/nginx"] container: fsfreezenamespace: my-ns # optional, defaults to same namespace as SnapshotSchedule# These fields are NOT supported for postHooks:# timeoutSeconds, backoffLimit, skipIfNoSelectorFound
This is just a rough first draft and still lacks some details (e.g. how are failure states represented), but I think it's a good starting point for discussion.
Alternatives
An alternative approach may be to allow to integrate full pod specifications (pre and post pods) into the SnapshotSchedule CRD, e.g. like k8up's pre backup pods. However I really think, this would get pretty complicated soon and will require a way more sophisticated job design.
Additional context
The fs freeze (fsfreeze -f [example-disk-location]) / unfreeze (fsfreeze -u [example-disk-location]) handling is a typical example of GCPs linux application consistant snapshots which must be defined per host (of course unacceptable in a k8s environment).
I actually like that the hooks could target pods/containers in other namespaces (and I also love snapschedulers SnapshotSchedule design, the CRD as namespaced resource). This would make it also possible to pass of actual hook handling to a centralized service in the cluster (in a specific namespace), therefore I think it's a good idea to include it.
I would need this at my dayjob (GKE clusters). While snapscheduler would already integrate with GCP disk snapshots (well its CSI snapshots after all), it currently provides no major benefit when compared to (GCP disk snapshots schedules). Having support for pre/post hooks at the CSI snapshot level would be super useful and also a quite unique feature to have that's hard to replicate outside of the cluster in a managed solution.
The text was updated successfully, but these errors were encountered:
Describe the feature you'd like to have.
Similar to what k8up ("application-aware") and velero ("hooks") provide, I propose to include a mechanism to run pre/post-snapshot commands via snapscheduler's operator. Let's simply call these commands hooks.
What is the value to the end user? (why is it a priority?)
For example, this would allow to dump a database straight onto the disk or run other WAL-flush / fs-freeze / fs-flush tooling before the CSI snapshot is actually triggered (as part of a pre backup hook). After a snapshot is completed, a post hook may run e.g. for fs-unfreeze handling.
How will we know we have a good solution? (acceptance criteria)
General design
In my opinion and as a starting-point, this feature should be implemented by directly passing-off commands to already running containers/pods (
kubectl exec
from the operator). The major advantage of this approach is immediate support for all PV access modes, especially when it comes toReadWriteOncePod
andReadWriteOnce
(though this could be mitigated by using proper affinity settings for same host execution).With this approach, there is no need to inject side-containers or run additional pods/containers (e.g. a proper k8s Job) for this feature. Pre and post hooks could be anything and only depend on the available tools/shell within the targeted running container, where it is executed. In contract to k8up/velero, these commands should not result in a stream / artifact, which needs to be processed by the operator. Only the exit code is inspected. Hooks could be anything, but will typically do something on a mounted disk.
Out of scope
skipIfNoSelectorFound
).CRD changes
There are 3 viable approaches when I comes to defining pre/post hooks:
SnapshotSchedule
CRD.I'm not a fan of having annotations on pods for this feature, as manipulating the statefulsets / deployments (which typically manage those pods) will cause restarts / service disruptions (this is how k8up and velero does it). This design makes it hard to experiment with already running applications in my opinion and thus hard to introduce in preexisting services.
Having a new CRD for this feature is overkill in my opinion, thus I would propose having this as part of the
SnapshotSchedule
CRD.We could do the following:
This is just a rough first draft and still lacks some details (e.g. how are failure states represented), but I think it's a good starting point for discussion.
Alternatives
An alternative approach may be to allow to integrate full pod specifications (pre and post pods) into the
SnapshotSchedule
CRD, e.g. like k8up's pre backup pods. However I really think, this would get pretty complicated soon and will require a way more sophisticated job design.Additional context
fsfreeze -f [example-disk-location]
) / unfreeze (fsfreeze -u [example-disk-location]
) handling is a typical example of GCPs linux application consistant snapshots which must be defined per host (of course unacceptable in a k8s environment).SnapshotSchedule
design, the CRD as namespaced resource). This would make it also possible to pass of actual hook handling to a centralized service in the cluster (in a specific namespace), therefore I think it's a good idea to include it.The text was updated successfully, but these errors were encountered: