-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Snapshot metrics #121
Comments
@JohnStrunk: Here are the additional stats that we are requesting for:
our HELM CHART BASED YAML FILES ARE snapschedule.yaml
snapshotquota.yaml
|
@JohnStrunk : Let me know if this looks okay to you |
I think I'd like to limit the metrics to objects that SnapScheduler actually manages (i.e., not report on all snapshots, just those created from a schedule).
The trick is to get metrics that are useful, not too difficult to implement, and don't have terribly high cardinality for Prometheus. |
"(i.e., not report on all snapshots, just those created from a schedule)." -- agreed readyToUse boolean flag based on
|
I was hoping the ready_total vs total would be sufficient for that use case. Could you explain a bit more about the need for match labels and VSC in the metrics? I'm particularly concerned about encoding the labels. If the labels and the VSC are determined by the SnapshotSchedule object, wouldn't it's name/namespace be sufficient? |
@JohnStrunk: Here is our use-case, we are backing up few StatefulSet services under a specific namespace and they are identified by the "app" label currently. The ask is to notify if there is a backup failure so that the Ops team can take a look and fix the issue. we are using prometheus to scrape the "metrics" endpoint ---> alertmanager ---> Pagerduty and slack notify. Currently, there is one single VSC that is tied to "ebs.csi.aws.com" but later we want to connect to different drivers such as EFS and create a separate VSC, so 1-1 mapping.
Now, this snapshot schedule taps to 3 different EBS volumes for the "app" cluster. we want to get notified if :
|
My thought here is that you'd monitor the "app-snapshot" schedule (by filtering on schedule_name) and expect 3 new ready snaps every day. |
@JohnStrunk: agreed. So far the plan looks good. Let me know once the implementation is done. I can test and let u know how it goes. |
@JohnStrunk: Just a gentle reminder, Are there any updates? to us, having observability is backup is on high priority. |
While it's on my list of items I'd like to add, I don't have a timeline for you. |
Any updates yet @JohnStrunk |
seems like is abandoned |
As I said before... I'd be happy to provide guidance if someone wants to contribute a PR. However, there doesn't seem to be sufficient interest in this feature for anyone to make it happen. |
Hi @JohnStrunk, i would like to try to implement this, as i understand: there are 4 required metrics |
@KyriosGN0 That seems like a good summary. Thanks for offering to take a look! |
I hope a more "general metrics solutions"(kube-state-metrics in my case) will finally add support of VolumeSnapshot and VolumeSnapshotContent metrics. |
Describe the feature you'd like to have.
Currently, snapscheduler doesn't provide any metrics related to the snapshots attempted/created. It would be good to provide some stats that could be monitored/alerted
What is the value to the end user? (why is it a priority?)
Users that depend on having snapshots to protect their data should have a way to monitor whether those snapshots are being successfully created
How will we know we have a good solution? (acceptance criteria)
Additional context
cc: @prasanjit-enginprogam
The text was updated successfully, but these errors were encountered: