Lost lease while inactive #114

TProhofsky · 2020-10-15T15:09:02Z

Node lost VG lease for unknown reason on 1 of 4 nodes in cluster.

Broadcast message from systemd-journald@node1 (Tue 2020-10-13 09:21:52 CDT):
lvmlockctl[25120]: Lost access to sanlock lease storage in VG sbvg_datalake.
Broadcast message from systemd-journald@node1 (Tue 2020-10-13 09:21:52 CDT):
lvmlockctl[25120]: Immediately deactivate LVs in VG sbvg_datalake.
Broadcast message from systemd-journald@node1 (Tue 2020-10-13 09:21:52 CDT):
lvmlockctl[25120]: Once VG is unused, run lvmlockctl --drop sbvg_datalake.

TProhofsky · 2020-10-15T20:15:49Z

Issuing lvmlockctl -i -d shows the kill_vg=1 on all nodes. Issuing a drop and lock start on just one node will force a kill event in less than 30 seconds. issuing a drop on all nodes and then starting worked.
lvmlockctl --drop sbvg_datalake
vgchange --lock-start --lock-opt auto sbvg_datalake

TProhofsky · 2020-10-23T13:59:32Z

FROM LEO:
I did some initial analysis, the log for triggering "kill_vg" is:

2020-10-05 12:02:31 1465913 [21551]: <<<<< RAID lock dump: raid_renew_lock <<<<<
2020-10-05 12:02:31 1465913 [21551]: drive[0]=/dev/sg0 state=3
2020-10-05 12:02:31 1465913 [21551]: >>>>> RAID lock dump: raid_renew_lock >>>>>
2020-10-05 12:02:31 1465913 [21551]: idm_raid_multi_issue: start mutex op=ILM_OP_RENEW(4) mode=2 renew=1
2020-10-05 12:02:31 1465913 [21551]: _raid_state_find_op: state=IDM_LOCK(3) orignal op=ILM_OP_RENEW(4) op=ILM_OP_RENEW(4)
2020-10-05 12:02:31 1465913 [21551]: idm_raid_add_request: drive=/dev/sg0 state=IDM_LOCK(3) op=ILM_OP_RENEW(4) mode=2 renew=1 => raid_thread=0x7f5888003080
2020-10-05 12:02:31 1465913 [21551]: idm_raid_add_request: raid_thread=0x7f5888003080 renew_count=1
2020-10-05 12:02:31 1465913 [21551]: idm_raid_wait_renew: renew response [drive=/dev/sg0]
2020-10-05 12:02:31 1465913 [21551]: idm_raid_state_transition: drive=/dev/sg0 state=IDM_LOCK(3) -> next_state=IDM_LOCK(3) op=ILM_OP_RENEW(4) result=0
2020-10-05 12:02:31 1465913 [21551]: _raid_state_machine_end: state=3
2020-10-05 12:02:31 1465913 [21551]: idm_raid_multi_issue: drive result=0 mode=2 count=0
2020-10-05 12:02:31 1465913 [21551]: idm_raid_renew_lock: success

2020-10-22 14:05:17 2942079 [21551]: ilm_failure_handler: kill_path=/usr/sbin/lvmlockctl
2020-10-22 14:05:17 2942079 [21551]: ilm_lockspace_thread: has sent kill path or signal

So we can see the lockspace thread (PID=21551) renewed the lock and
make it success with the log "idm_raid_renew_lock: success"; then it
slept for 1 second and should be waken up 1s later for next time's
renewal.

But the thread slept time point is "2020-10-05" and its waken up time
is "2020-10-22" (7 days later!). So the lockspace thread compared the
current time and the previous renewal time, and it detected the
timeout for the lock, thus sent kill signal (so finally lvmlockd
received "kill_vg" command).

I don't see drive firmware is relevant to this issue based on the log;
so two potential reasons to cause the issue:

The first reason is the time is not reliable on the system, and
the lock manager relies on the time value to check if timeout or
not. Now in the lock manager we uses below function to read out the
clock: clock_gettime(CLOCK_MONOTONIC, &ts);
This seems to me another reason is the container is freezed, no?
If the container and its child processes are freezed, the lock manager
has acquired the mutex, but has no chance to run until the container
is scheduled in, it's easily to trigger timeout issue.

TProhofsky · 2020-10-27T12:27:12Z

FROM LEO:

I went through the sanlock lock manager and my conclusion is sanlock
should have the same behaviour for the inactivity, the lockspace
thread [1] renews the lease for the host, in the main thread loop [2]
checks the host lease and kill VG if detects the timeout.

Rather than a self-correcting solution after the error happens, I
think a potential solution is to setup watchdog (e.g.10s interval) so
can periodically wake up the system from suspend state, thus the lock
manager has chance to renew the mutex.

TProhofsky · 2020-11-04T13:28:39Z

From Leo:
I took some time to look into DLM locking scheme, so I think we miss
one thing is the corosync which is the pace maker for cluster nodes.

In the DLM daemon, it registers callbacks into corosync [1]:

static cpg_model_v1_data_t cpg_callbacks = {
.cpg_deliver_fn = deliver_cb,
.cpg_confchg_fn = confchg_cb,
.cpg_totem_confchg_fn = totem_cb,
.flags = CPG_MODEL_V1_DELIVER_INITIAL_TOTEM_CONF,
};

So the callback function confchg_cb() will call the function
start_kernel() [2] and stop_kernel() [3] when the node joins in or exits
from the cluster.

In the kernel side, DLM driver provides callbacks dlm_ls_start() and
dlm_ls_stop() to restart and stop lockspace [4][5]; the function
dlm_ls_start() will notify the lockspace daemon to adjust the
timestamp when restart lockspace [6] so can avoid timeout issue.

So suggest below direction to move forward:

IDM lock manager registers callback into Corosync (pace maker);
When IDM lock manager receives notification for stopping kernel, it
needs to set flag that the lock will be "stopped" to use;
When IDM lock manager receives notificaion for starting kernel, it
needs to update the timestamp for locks and try to acquire the mutex
again.

If use Corosync, we don't need to use watchdog anymore.

Leo-Yan · 2021-02-22T11:47:08Z

Some following up discussion: https://listman.redhat.com/archives/lvm-devel/2021-February/msg00077.html

And on the mailing list, there have two old patches to enable automatic deactivate VG/LVs:
https://listman.redhat.com/archives/lvm-devel/2017-September/msg00011.html; will verify the patches and give feedback to David Teigla (LVM maintainer). If the patches can work well, we can ask David to merge the changes.

Leo-Yan · 2021-03-03T06:47:57Z

For the automatic failure handling, two patch sets have been merged:

lvmlockctl and blkdeactivate patch set: Enable automatic failure handling lvm2-idm#33
IDM lock manager patch: ilm: failure: Use pipe to execute kill path command #124

So now the latest repository has supported automatic failure handling.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lost lease while inactive #114

Lost lease while inactive #114

TProhofsky commented Oct 15, 2020

TProhofsky commented Oct 15, 2020

TProhofsky commented Oct 23, 2020

TProhofsky commented Oct 27, 2020

TProhofsky commented Nov 4, 2020

Leo-Yan commented Feb 22, 2021 •

edited

Loading

Leo-Yan commented Mar 3, 2021

Lost lease while inactive #114

Lost lease while inactive #114

Comments

TProhofsky commented Oct 15, 2020

TProhofsky commented Oct 15, 2020

TProhofsky commented Oct 23, 2020

TProhofsky commented Oct 27, 2020

TProhofsky commented Nov 4, 2020

Leo-Yan commented Feb 22, 2021 • edited Loading

Leo-Yan commented Mar 3, 2021

Leo-Yan commented Feb 22, 2021 •

edited

Loading