Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add velero_backup_active_total metric #5703

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

mrnold
Copy link
Contributor

@mrnold mrnold commented Dec 16, 2022

Signed-off-by: Matthew Arnold [email protected]

Please add a summary of your change

Add a backup_active_total metric, as requested in #2254. This metric counts the number of backups in progress.

Does your change fix a particular issue?

Fixes #2254

Please indicate you've done the following:

  • Accepted the DCO. Commits without the DCO will delay acceptance.
  • Created a changelog file or added /kind changelog-not-required as a comment on this pull request.
  • Updated the corresponding documentation in site/content/docs/main.

@reasonerjt
Copy link
Contributor

As is pointed out in this comment:
#2254 (comment)

The active backup should always be 0 or 1, I'm not sure if we need a gauge here.

@sseago
Copy link
Collaborator

sseago commented Dec 19, 2022

@reasonerjt I think this is a useful change:

  1. The difference between 0 and 1 is significant here -- it tells us whether velero is currently backing something up
  2. Parallel/multi-threaded backup support is on the roadmap. Once that is in place, this metric will then let us know how many backups are currently being processed.

@stale
Copy link

stale bot commented Feb 18, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the staled label Feb 18, 2023
@stale stale bot removed the staled label Feb 21, 2023
@stale
Copy link

stale bot commented Apr 26, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the staled label Apr 26, 2023
@shubham-pampattiwar
Copy link
Collaborator

This issue is not stale

@stale stale bot removed the staled label Apr 26, 2023
@mrnold
Copy link
Contributor Author

mrnold commented Apr 26, 2023

Thanks, I am working on getting this rebased. I think it might make more sense with all the async work that's been done, I will try it and see.

@reasonerjt reasonerjt requested a review from sseago May 10, 2024 08:37
@reasonerjt reasonerjt added the Needs triage We need discussion to understand problem and decide the priority label May 10, 2024
Copy link

codecov bot commented May 10, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 59.01%. Comparing base (ebbeb7a) to head (6a98066).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5703      +/-   ##
==========================================
+ Coverage   58.99%   59.01%   +0.01%     
==========================================
  Files         367      367              
  Lines       38847    38864      +17     
==========================================
+ Hits        22918    22935      +17     
  Misses      14467    14467              
  Partials     1462     1462              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

sseago
sseago previously approved these changes May 10, 2024
@sseago
Copy link
Collaborator

sseago commented May 10, 2024

Now that we have an async phase of Backup after InProgress, this value will not just be 0 or 1 -- if one backup is InProgress and two are WaitingForPluginOperations, then we'll see a value of 3.

@shubham-pampattiwar
Copy link
Collaborator

@blackpiglet
Copy link
Contributor

The change looks good, but we still need to make this PR rebase to merge.

@blackpiglet
Copy link
Contributor

@Lyndon-Li Please also take a look.

@mrnold
Copy link
Contributor Author

mrnold commented Aug 30, 2024

I finally remembered to watch this while running some backups, and it does go above 1 now:

$ watch -n 1 "curl -s localhost:8085/metrics | grep active"
Every 1.0s: curl -s localhost:8085/metrics | grep active kedge: Fri Aug 30 16:43:03 2024

#HELP velero_backup_active_total Number of backups currently in progress
#TYPE velero_backup_active_total gauge
velero_backup_active_total 2

@reasonerjt
Copy link
Contributor

reasonerjt commented Oct 15, 2024

@mrnold Would you mind rebase to squash the commits?
Ideally, only one commit is needed for this PR.

@kaovilai
Copy link
Contributor

@reasonerjt

Per doc, you should be able to Squash and merge too right?

One could perhaps set this as default merge method.

@mrnold
Copy link
Contributor Author

mrnold commented Oct 15, 2024

Okay, this is rebased and squashed into one commit now.

@@ -121,6 +121,7 @@ func (r *backupFinalizerReconciler) Reconcile(ctx context.Context, req ctrl.Requ
velerov1api.BackupPhaseFailed,
velerov1api.BackupPhaseFailedValidation:
r.backupTracker.Delete(backup.Namespace, backup.Name)
r.metrics.DecrementActiveBackupTotal()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't we be decrementing twice? If not, please add a comment which decrement is the common happy path case and which is fallback/failure etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think I was trying to match every decrement with every Delete, but I guess this one only runs if the backup didn't get deleted for some reason? That would cause extra decrements, so I will take this one out.

@sseago
Copy link
Collaborator

sseago commented Oct 15, 2024

@kaovilai I'd think "squash and merge" would be better than the merge commit option. Merge commits make the history messy.

This counts the number of backups currently in progress.

Signed-off-by: Matthew Arnold <[email protected]>

Better metrics function names.

Signed-off-by: Matthew Arnold <[email protected]>

Update copyright notice as per contribution guidelines.

Signed-off-by: Matthew Arnold <[email protected]>

Add changelog file.

Signed-off-by: Matthew Arnold <[email protected]>

Rework backup_active_total after rebase.

Signed-off-by: Matthew Arnold <[email protected]>

Remove extraneous decrement.

Signed-off-by: Matthew Arnold <[email protected]>
@kaovilai
Copy link
Contributor

@kaovilai I'd think "squash and merge" would be better than the merge commit option. Merge commits make the history messy.

I did not disagree :D

@shubham-pampattiwar
Copy link
Collaborator

/assign @mpryc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
has-changelog Needs triage We need discussion to understand problem and decide the priority
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add velero_backup_active_total metric
7 participants