Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pacemaker: remove node on delete (SOC-11240) #2040

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

sandonovsuse
Copy link
Contributor

On node delete, pacemaker needs to remove the node from the cluster
prior to being deleted from crowbar. This change adds said feature.

@sandonovsuse sandonovsuse force-pushed the pacemaker-remove-node-on-delete-bsc-1170479 branch from 59b58f0 to 9df7bdf Compare August 27, 2020 10:03
@sandonovsuse sandonovsuse self-assigned this Aug 27, 2020
@@ -180,6 +180,17 @@ def transition(inst, name, state)
xs <=> ys
end

# Make sure pacemaker is the first to execute on node delete
if state == "delete"
roles.each_with_index do |role, index|
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This loop will not work as expected. See https://stackoverflow.com/questions/31294630/is-it-safe-to-delete-from-an-array-inside-each for details.
TLDR
"when you delete an element, the elements following it will be shifted, hence the element that was supposed to be iterated next would be moved to the position of the deleted element, which has been iterated over already, and will be skipped."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand, but the iteration breaks after it is deleted, ie. no further iteration after unshift. I could do:

prole = roles.delete_at(index)
roles.unshift(prole)

though I already tested it, it should be OK. Test results are in bsc#1170479.

@sandonovsuse sandonovsuse force-pushed the pacemaker-remove-node-on-delete-bsc-1170479 branch 2 times, most recently from c879765 to d064f4c Compare August 27, 2020 12:59
@@ -180,6 +180,12 @@ def transition(inst, name, state)
xs <=> ys
end

# Make sure pacemaker is the first to execute on node delete
if state == "delete"
index = roles.index { |role| role.name.include? "pacemaker" }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are actually more than one role with "pacemaker" in the name. I'm not sure what was the original intention here.
Maybe search for "pacemaker-cluster-member" if that's what you want.

Copy link
Contributor Author

@sandonovsuse sandonovsuse Aug 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually you're right, it should be pacemaker-config-$inst. Will make the changes

On node delete, pacemaker needs to remove the node from the cluster
prior to being deleted from crowbar. This change adds said feature.
@sandonovsuse sandonovsuse force-pushed the pacemaker-remove-node-on-delete-bsc-1170479 branch from d064f4c to 2e60a22 Compare August 27, 2020 14:22
@skazi0 skazi0 requested a review from jgrassler August 28, 2020 07:44
@jgrassler
Copy link
Contributor

Hmm. So I tried to test this (with crowbar-core/crowbar-ha packages that contain this pull request and crowbar/crowbar-ha#376 as patches) and it doesn't yield what I'd expect to be the result. I deleted one controller...

root@crowbar:~ # knife node delete d52-54-77-77-01-02.vp5.cloud.suse.de d52-54-77-77-01-03.vp5.cloud.suse.de
Do you really want to delete d52-54-77-77-01-02.vp5.cloud.suse.de? (Y/N) y
Deleted node[d52-54-77-77-01-02.vp5.cloud.suse.de]
root@crowbar:

...and would have expected it to vanish from the stonith configuration in the cluster's proposal. It continues to be there though:

  "stonith": {
    "mode": "libvirt",
    "sbd": {
      "watchdog_module": "",
      "nodes": {
        "d52-54-77-77-01-01.vp5.cloud.suse.de": {
          "devices": [
            ""
          ]
        },
        "d52-54-77-77-01-02.vp5.cloud.suse.de": {  ### this should be gone if I understand the code correctly
          "devices": [
            ""
          ]
        },
        "d52-54-77-77-01-03.vp5.cloud.suse.de": {
          "devices": [
            ""
          ]
        }
      }
    },

@skazi0
Copy link
Member

skazi0 commented Aug 29, 2020

@jgrassler try to use "Forget" option in crowbar instead of direct deleting in chef.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

3 participants