You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
While not frequent, I have had VolSyncVolumeOutOfSync alert raised with role="source" that does not clear on its own. I have to restart the volsync application pod to clear the alert.
Steps to reproduce
I have a Promtheus alert defined as:
- alert: VolSyncVolumeOutOfSync
annotations:
summary: >-
{{ $labels.obj_namespace }}/{{ $labels.obj_name }} volume
is out of sync.
expr: |
volsync_volume_out_of_sync == 1
for: 15m
labels:
severity: critical
I didn't notice exactly when the alert was raised. I suspect there was a delay with the initial run due to Restic repository secret issue. But it was definitely before the job's 2nd run:
Upon noticing the alert, I checked the replicationsource, which look like initial run was fine:
Last Sync Duration: 24.08989161s
Last Sync Time: 2024-03-12T14:14:05Z
Latest Mover Status:
Logs: no parent snapshot found, will read all files
Added to the repository: 390.739 MiB (282.820 MiB stored)
processed 697 files, 944.763 MiB in 0:09
snapshot c3b827c6 saved
Restic completed in 13s
Result: Successful
Next Sync Time: 2024-03-12T16:00:00Z
And next one has not been reached yet:
$ date -u +"%Y-%m-%dT%H-%M-%SZ"
2024-03-12T15-38-06Z
I expected the alert to clear after the next run. I waited for the next run also successful, but alert does not clear:
Last Sync Duration: 52.147254329s
Last Sync Time: 2024-03-12T16:00:52Z
Latest Mover Status:
Logs: using parent snapshot c3b827c6
Added to the repository: 31.496 MiB (9.310 MiB stored)
processed 697 files, 935.255 MiB in 0:04
snapshot ee5e4fe3 saved
Restic completed in 5s
Result: Successful
Next Sync Time: 2024-03-12T20:00:00Z
Expected behavior
I was expecting the VolSyncVolumeOutOfSync alert to clear after the next trigger run.
Actual results
The alert did not clear until I manually restarted the volsync application pod. The alert immediately clears and stays cleared.
Additional context
Not sure what is relevant in the volsync pod log. These are logs filtered on keyword unifi which had the raised alert, before being restarted:
Describe the bug
While not frequent, I have had
VolSyncVolumeOutOfSync
alert raised withrole="source"
that does not clear on its own. I have to restart thevolsync
application pod to clear the alert.Steps to reproduce
I have a Promtheus alert defined as:
I didn't notice exactly when the alert was raised. I suspect there was a delay with the initial run due to Restic repository secret issue. But it was definitely before the job's 2nd run:
Upon noticing the alert, I checked the
replicationsource
, which look like initial run was fine:And next one has not been reached yet:
I expected the alert to clear after the next run. I waited for the next run also successful, but alert does not clear:
Expected behavior
I was expecting the
VolSyncVolumeOutOfSync
alert to clear after the next trigger run.Actual results
The alert did not clear until I manually restarted the volsync application pod. The alert immediately clears and stays cleared.
Additional context
Not sure what is relevant in the volsync pod log. These are logs filtered on keyword
unifi
which had the raised alert, before being restarted:I restart the volsync application pod, the alert immediately clears and does not come back, this again filtered on
unifi
was after restart:The text was updated successfully, but these errors were encountered: