Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lost lease while inactive #114

Open
TProhofsky opened this issue Oct 15, 2020 · 6 comments
Open

Lost lease while inactive #114

TProhofsky opened this issue Oct 15, 2020 · 6 comments

Comments

@TProhofsky
Copy link
Collaborator

Node lost VG lease for unknown reason on 1 of 4 nodes in cluster.

Broadcast message from systemd-journald@node1 (Tue 2020-10-13 09:21:52 CDT):
lvmlockctl[25120]: Lost access to sanlock lease storage in VG sbvg_datalake.
Broadcast message from systemd-journald@node1 (Tue 2020-10-13 09:21:52 CDT):
lvmlockctl[25120]: Immediately deactivate LVs in VG sbvg_datalake.
Broadcast message from systemd-journald@node1 (Tue 2020-10-13 09:21:52 CDT):
lvmlockctl[25120]: Once VG is unused, run lvmlockctl --drop sbvg_datalake.

@TProhofsky
Copy link
Collaborator Author

Issuing lvmlockctl -i -d shows the kill_vg=1 on all nodes. Issuing a drop and lock start on just one node will force a kill event in less than 30 seconds. issuing a drop on all nodes and then starting worked.
lvmlockctl --drop sbvg_datalake
vgchange --lock-start --lock-opt auto sbvg_datalake

@TProhofsky
Copy link
Collaborator Author

FROM LEO:
I did some initial analysis, the log for triggering "kill_vg" is:

2020-10-05 12:02:31 1465913 [21551]: <<<<< RAID lock dump: raid_renew_lock <<<<<
2020-10-05 12:02:31 1465913 [21551]: drive[0]=/dev/sg0 state=3
2020-10-05 12:02:31 1465913 [21551]: >>>>> RAID lock dump: raid_renew_lock >>>>>
2020-10-05 12:02:31 1465913 [21551]: idm_raid_multi_issue: start mutex op=ILM_OP_RENEW(4) mode=2 renew=1
2020-10-05 12:02:31 1465913 [21551]: _raid_state_find_op: state=IDM_LOCK(3) orignal op=ILM_OP_RENEW(4) op=ILM_OP_RENEW(4)
2020-10-05 12:02:31 1465913 [21551]: idm_raid_add_request: drive=/dev/sg0 state=IDM_LOCK(3) op=ILM_OP_RENEW(4) mode=2 renew=1 => raid_thread=0x7f5888003080
2020-10-05 12:02:31 1465913 [21551]: idm_raid_add_request: raid_thread=0x7f5888003080 renew_count=1
2020-10-05 12:02:31 1465913 [21551]: idm_raid_wait_renew: renew response [drive=/dev/sg0]
2020-10-05 12:02:31 1465913 [21551]: idm_raid_state_transition: drive=/dev/sg0 state=IDM_LOCK(3) -> next_state=IDM_LOCK(3) op=ILM_OP_RENEW(4) result=0
2020-10-05 12:02:31 1465913 [21551]: _raid_state_machine_end: state=3
2020-10-05 12:02:31 1465913 [21551]: idm_raid_multi_issue: drive result=0 mode=2 count=0
2020-10-05 12:02:31 1465913 [21551]: idm_raid_renew_lock: success

2020-10-22 14:05:17 2942079 [21551]: ilm_failure_handler: kill_path=/usr/sbin/lvmlockctl
2020-10-22 14:05:17 2942079 [21551]: ilm_lockspace_thread: has sent kill path or signal

So we can see the lockspace thread (PID=21551) renewed the lock and
make it success with the log "idm_raid_renew_lock: success"; then it
slept for 1 second and should be waken up 1s later for next time's
renewal.

But the thread slept time point is "2020-10-05" and its waken up time
is "2020-10-22" (7 days later!). So the lockspace thread compared the
current time and the previous renewal time, and it detected the
timeout for the lock, thus sent kill signal (so finally lvmlockd
received "kill_vg" command).

I don't see drive firmware is relevant to this issue based on the log;
so two potential reasons to cause the issue:

  • The first reason is the time is not reliable on the system, and
    the lock manager relies on the time value to check if timeout or
    not. Now in the lock manager we uses below function to read out the
    clock: clock_gettime(CLOCK_MONOTONIC, &ts);

  • This seems to me another reason is the container is freezed, no?
    If the container and its child processes are freezed, the lock manager
    has acquired the mutex, but has no chance to run until the container
    is scheduled in, it's easily to trigger timeout issue.

@TProhofsky
Copy link
Collaborator Author

FROM LEO:

I went through the sanlock lock manager and my conclusion is sanlock
should have the same behaviour for the inactivity, the lockspace
thread [1] renews the lease for the host, in the main thread loop [2]
checks the host lease and kill VG if detects the timeout.

Rather than a self-correcting solution after the error happens, I
think a potential solution is to setup watchdog (e.g.10s interval) so
can periodically wake up the system from suspend state, thus the lock
manager has chance to renew the mutex.

@TProhofsky
Copy link
Collaborator Author

From Leo:
I took some time to look into DLM locking scheme, so I think we miss
one thing is the corosync which is the pace maker for cluster nodes.

In the DLM daemon, it registers callbacks into corosync [1]:

static cpg_model_v1_data_t cpg_callbacks = {
.cpg_deliver_fn = deliver_cb,
.cpg_confchg_fn = confchg_cb,
.cpg_totem_confchg_fn = totem_cb,
.flags = CPG_MODEL_V1_DELIVER_INITIAL_TOTEM_CONF,
};

So the callback function confchg_cb() will call the function
start_kernel() [2] and stop_kernel() [3] when the node joins in or exits
from the cluster.

In the kernel side, DLM driver provides callbacks dlm_ls_start() and
dlm_ls_stop() to restart and stop lockspace [4][5]; the function
dlm_ls_start() will notify the lockspace daemon to adjust the
timestamp when restart lockspace [6] so can avoid timeout issue.

So suggest below direction to move forward:

  • IDM lock manager registers callback into Corosync (pace maker);
  • When IDM lock manager receives notification for stopping kernel, it
    needs to set flag that the lock will be "stopped" to use;
  • When IDM lock manager receives notificaion for starting kernel, it
    needs to update the timestamp for locks and try to acquire the mutex
    again.

If use Corosync, we don't need to use watchdog anymore.

@Leo-Yan
Copy link
Contributor

Leo-Yan commented Feb 22, 2021

Some following up discussion: https://listman.redhat.com/archives/lvm-devel/2021-February/msg00077.html

And on the mailing list, there have two old patches to enable automatic deactivate VG/LVs:
https://listman.redhat.com/archives/lvm-devel/2017-September/msg00011.html; will verify the patches and give feedback to David Teigla (LVM maintainer). If the patches can work well, we can ask David to merge the changes.

@Leo-Yan
Copy link
Contributor

Leo-Yan commented Mar 3, 2021

For the automatic failure handling, two patch sets have been merged:

So now the latest repository has supported automatic failure handling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants