-
Notifications
You must be signed in to change notification settings - Fork 91
Fix GC hidden attribute in case of SIGTERM signal #760
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
The GC can be interrupted by a SIGTERM signal. If this is caught while modifying a volume's hidden flag, this can have bad consequences. For example in the situation below, the hidden flag of a volume has been changed but the cached value (self.hidden) in the python process still has the old value because of the 'util.CommandException' exception that was thrown. A VDI that normally should not be hidden is still hidden after executing `_undoInterruptedCoalesceLeaf` because the hidden value was not the correct one. Code: ``` def _setHidden(self, hidden=True): vhdutil.setHidden(self.path, hidden) # Exception! Next line is never executed. self.hidden = hidden ``` Trace: ``` Jun 5 09:15:50 r620-q6 SMGC: [563219] Removed vhd-parent from dce4b0fc(2.000G/170.336M?) Jun 5 09:15:50 r620-q6 SMGC: [563219] Removed vhd-blocks from dce4b0fc(2.000G/170.336M?) Jun 5 09:15:50 r620-q6 SM: [563219] ['/usr/bin/vhd-util', 'set', '--debug', '-n', '/var/run/sr-mount/f816795d-e7a9-43df-170c-23bc329607fc/OLD_dce4b0fc-6ad1-4750-857b-45d8d2758503.vhd', '-f', 'hidden', '-v', '1'] Jun 5 09:15:50 r620-q6 SM: [563219] GC: recieved SIGTERM Jun 5 09:15:50 r620-q6 SM: [563219] FAILED in util.pread: (rc -15) stdout: '', stderr: '' Jun 5 09:15:50 r620-q6 SMGC: [563219] *~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~* Jun 5 09:15:50 r620-q6 SMGC: [563219] *********************** Jun 5 09:15:50 r620-q6 SMGC: [563219] * E X C E P T I O N * Jun 5 09:15:50 r620-q6 SMGC: [563219] *********************** Jun 5 09:15:50 r620-q6 SMGC: [563219] _doCoalesceLeaf: EXCEPTION <class 'util.CommandException'>, Signalled 15 Jun 5 09:15:50 r620-q6 SMGC: [563219] File "/opt/xensource/sm/cleanup.py", line 2653, in _liveLeafCoalesce Jun 5 09:15:50 r620-q6 SMGC: [563219] self._doCoalesceLeaf(vdi) Jun 5 09:15:50 r620-q6 SMGC: [563219] File "/opt/xensource/sm/cleanup.py", line 2717, in _doCoalesceLeaf Jun 5 09:15:50 r620-q6 SMGC: [563219] vdi._setHidden(True) Jun 5 09:15:50 r620-q6 SMGC: [563219] File "/opt/xensource/sm/cleanup.py", line 1063, in _setHidden Jun 5 09:15:50 r620-q6 SMGC: [563219] vhdutil.setHidden(self.path, hidden) Jun 5 09:15:50 r620-q6 SMGC: [563219] File "/opt/xensource/sm/vhdutil.py", line 235, in setHidden Jun 5 09:15:50 r620-q6 SMGC: [563219] ret = ioretry(cmd) Jun 5 09:15:50 r620-q6 SMGC: [563219] File "/opt/xensource/sm/vhdutil.py", line 94, in ioretry Jun 5 09:15:50 r620-q6 SMGC: [563219] errlist=[errno.EIO, errno.EAGAIN]) Jun 5 09:15:50 r620-q6 SMGC: [563219] File "/opt/xensource/sm/util.py", line 347, in ioretry Jun 5 09:15:50 r620-q6 SMGC: [563219] return f() Jun 5 09:15:50 r620-q6 SMGC: [563219] File "/opt/xensource/sm/vhdutil.py", line 93, in <lambda> Jun 5 09:15:50 r620-q6 SMGC: [563219] return util.ioretry(lambda: util.pread2(cmd, text=text), Jun 5 09:15:50 r620-q6 SMGC: [563219] File "/opt/xensource/sm/util.py", line 255, in pread2 Jun 5 09:15:50 r620-q6 SMGC: [563219] return pread(cmdlist, quiet=quiet, text=text) Jun 5 09:15:50 r620-q6 SMGC: [563219] File "/opt/xensource/sm/util.py", line 217, in pread Jun 5 09:15:50 r620-q6 SMGC: [563219] raise CommandException(rc, str(cmdlist), stderr.strip()) Jun 5 09:15:50 r620-q6 SMGC: [563219] Jun 5 09:15:50 r620-q6 SMGC: [563219] *~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~* Jun 5 09:15:50 r620-q6 SMGC: [563219] *** UNDO LEAF-COALESCE Jun 5 09:15:50 r620-q6 SMGC: [563219] Renaming parent back: dce4b0fc-6ad1-4750-857b-45d8d2758503 -> 056b6f93-66ff-460a-9354-157540b584a8 Jun 5 09:15:50 r620-q6 SMGC: [563219] Renaming /var/run/sr-mount/f816795d-e7a9-43df-170c-23bc329607fc/dce4b0fc-6ad1-4750-857b-45d8d2758503.vhd -> /var/run/sr-mount/f816795d-e7a9-43df-170c-23bc329607fc/056b6f93-66ff-460a-9354-157540b584a8.vhd Jun 5 09:15:50 r620-q6 SMGC: [563219] Renaming child back to dce4b0fc-6ad1-4750-857b-45d8d2758503 Jun 5 09:15:50 r620-q6 SMGC: [563219] Renaming /var/run/sr-mount/f816795d-e7a9-43df-170c-23bc329607fc/OLD_dce4b0fc-6ad1-4750-857b-45d8d2758503.vhd -> /var/run/sr-mount/f816795d-e7a9-43df-170c-23bc329607fc/dce4b0fc-6ad1-4750-857b-45d8d2758503.vhd Jun 5 09:15:50 r620-q6 SMGC: [563219] Updating the VDI record Jun 5 09:15:50 r620-q6 SMGC: [563219] Set vhd-parent = 056b6f93-66ff-460a-9354-157540b584a8 for dce4b0fc(2.000G/8.500K?) Jun 5 09:15:50 r620-q6 SMGC: [563219] Set vdi_type = vhd for dce4b0fc(2.000G/8.500K?) Jun 5 09:15:50 r620-q6 SM: [563219] ['/usr/bin/vhd-util', 'set', '--debug', '-n', '/var/run/sr-mount/f816795d-e7a9-43df-170c-23bc329607fc/056b6f93-66ff-460a-9354-157540b584a8.vhd', '-f', 'hidden', '-v', '1'] Jun 5 09:15:50 r620-q6 SM: [563219] pread SUCCESS Jun 5 09:15:50 r620-q6 SMGC: [563219] *** leaf-coalesce undo successful ``` Therefore, a VDI impacted by this problem remains hidden and can no longer be used correctly without manual intervention: ``` Jun 5 09:16:29 r620-q6 SM: [566174] lock: released /var/lock/sm/f816795d-e7a9-43df-170c-23bc329607fc/sr Jun 5 09:16:29 r620-q6 SM: [566174] ***** generic exception: vdi_clone: EXCEPTION <class 'xs_errors.SROSError'>, Failed to clone VDI [opterr=hidden VDI] Jun 5 09:16:29 r620-q6 SM: [566174] File "/opt/xensource/sm/SRCommand.py", line 113, in run Jun 5 09:16:29 r620-q6 SM: [566174] return self._run_locked(sr) Jun 5 09:16:29 r620-q6 SM: [566174] File "/opt/xensource/sm/SRCommand.py", line 163, in _run_locked Jun 5 09:16:29 r620-q6 SM: [566174] rv = self._run(sr, target) Jun 5 09:16:29 r620-q6 SM: [566174] File "/opt/xensource/sm/SRCommand.py", line 270, in _run Jun 5 09:16:29 r620-q6 SM: [566174] return target.clone(self.params['sr_uuid'], self.vdi_uuid) Jun 5 09:16:29 r620-q6 SM: [566174] File "/opt/xensource/sm/FileSR.py", line 704, in clone Jun 5 09:16:29 r620-q6 SM: [566174] return self._do_snapshot(sr_uuid, vdi_uuid, VDI.SNAPSHOT_DOUBLE) Jun 5 09:16:29 r620-q6 SM: [566174] File "/opt/xensource/sm/FileSR.py", line 754, in _do_snapshot Jun 5 09:16:29 r620-q6 SM: [566174] return self._snapshot(snapType, cbtlog, consistency_state) Jun 5 09:16:29 r620-q6 SM: [566174] File "/opt/xensource/sm/FileSR.py", line 797, in _snapshot Jun 5 09:16:29 r620-q6 SM: [566174] raise xs_errors.XenError('VDIClone', opterr='hidden VDI') Jun 5 09:16:29 r620-q6 SM: [566174] ``` Signed-off-by: Ronan Abhamon <[email protected]>
|
||
def _setHidden(self, hidden=True): | ||
self._hidden = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems this line is not necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is as will trigger the code to re-read from the vhd. This is the core of the fix. It clears the state so that if the next call fails it will have to be read from the file.
else: | ||
VDI._loadInfoHidden(self) | ||
|
||
def _setHidden(self, hidden=True): | ||
if self.raw: | ||
self._hidden = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you do not have assign a None
at here.
The GC can be interrupted by a SIGTERM signal. If this is caught while modifying a volume's hidden flag, this can have bad consequences.
For example in the situation below, the hidden flag of a volume has been changed but the cached value (self.hidden) in the python process still has the old value because of the 'util.CommandException' exception that was thrown. A VDI that normally should not be hidden is still hidden after executing
_undoInterruptedCoalesceLeaf
because the hidden value was not the correct one.Code:
Trace:
Therefore, a VDI impacted by this problem remains hidden and can no longer be used correctly without manual intervention: