Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance galera to interact over multiple clusters #1141

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

zzzeek
Copy link
Contributor

@zzzeek zzzeek commented Apr 19, 2018

This change adds a new resource agent "stretch_galera"
which builds off of the existing "galera" agent.
To accommodate this, the "galera" agent's shell script
structure is modified slightly so that it can be sourced
for its functions.

The new resource agent adds a new parameter "remote_node_map"
to the Galera resource agent which allows it to consider
galera node names that are in other clusters as part of its
Galera quorum. To achieve this, it launches read-only pcs
commands to the remote clusters in order to view and modify
remote state variables.

Additionally, the stretch agent honors an optional pcs
attribute -initial-bootstrap which when applied to the
local pcs nodes, will allow Galera to be bootstrapped with only
that subset of nodes, without the additional remote nodes
being available yet. An installer can set these attributes
to allow the first pcs cluster to come online before subsequent
clusters, and then remove the attributes.

@knet-ci-bot
Copy link

Can one of the admins verify this patch?

@fabbione
Copy link
Member

add to whitelist

heartbeat/galera Outdated
@@ -215,6 +218,7 @@ mapping in option cluster_host_map.
<content type="string" default=""/>
</parameter>


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you adding empty lines to the longdesc?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

likely an editor artifact. I'm also looking for reaction to the overall concept of embedding SSH commands into a resource agent and communicating with other clusters (hence WIP).

heartbeat/galera Outdated
fi
}


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another new empty line with no reason.

heartbeat/galera Outdated
local remote_ssh=$(get_remote_node $node)

if [ -z "$remote_ssh" ]; then
$CRM_MASTER -N $node $@
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI fails due to missing quotes around $@ it seems:
heartbeat/galera:498:30: error: Double quote array expansions to avoid re-splitting elements. [SC2068]
heartbeat/galera:500:76: error: Double quote array expansions to avoid re-splitting elements. [SC2068]
heartbeat/galera:512:47: error: Double quote array expansions to avoid re-splitting elements. [SC2068]
heartbeat/galera:514:68: error: Double quote array expansions to avoid re-splitting elements. [SC2068]

@zzzeek zzzeek force-pushed the stretch_galera branch 2 times, most recently from 1fd8fc3 to 8ca2886 Compare April 20, 2018 13:11
@zzzeek
Copy link
Contributor Author

zzzeek commented Apr 20, 2018

cc @beekhof

heartbeat/galera Outdated
#
# a. ssh as some other locked down account that can still run the
# necessary pcs commands on the other cluster? or some setuid script that
# does the things we need?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds like the nicer solution.

heartbeat/galera Outdated
# b. totally other means of invoking pcs commands on remote cluster? web
# service or something?
#
SSH_CMD="${SSH} -oStrictHostKeyChecking=no"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we make this configurable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, what other commands might we be using?

@beekhof
Copy link
Member

beekhof commented Jun 14, 2018

TBH, this scares the heck out of me.

At the minimum, can we put this in a new agent (source the original then add the new functionality)?

I can see the need for special handling (what to do if the remote isn't reachable for example) that I'd not want to complicate the non-stretch version with.

@zzzeek
Copy link
Contributor Author

zzzeek commented Jul 26, 2018

@beekhof as far as sourcing the original, I will need to add a line to the case statement at the end that upon passing a command like "sourceonly" does a simple "return", to avoid the "exit" call. am not finding any bash magic to source all the variables and having them survive past an "exit".

Copy link
Member

@beekhof beekhof left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not crazy about using ssh, but I think its worth merging

@zzzeek
Copy link
Contributor Author

zzzeek commented Aug 8, 2018

@beekhof note this change no longer writes any data over SSH, and only reports status, and to that end we can also replace the usage of SSH with an xinetd service linked to a script. however, in the xinetd case, authentication and encryption go out the window as far as I understand being able to create xinetd services. the security audit will then say, if the xinetd script itself has a vulnerability, it's openly exposed.

another option is to use SSH but to limit the commands / scripts that the user can execute. this can be done either with a custom shell in /etc/passwd or apparently you can limit commands for a specific key in authorized_keys, been googling that a bit. I think having a "front end" that is reached via ssh but nonetheless is just a single script with an argument is the least we can do.

@oalbrigt
Copy link
Contributor

@zzzeek can you fix the quote-issues reported by Travis CI?

This change adds a new resource agent "stretch_galera"
which builds off of the existing "galera" agent.
To accommodate this, the "galera" agent's shell script
structure is modified slightly so that it can be sourced
for its functions.

The new resource agent adds a new parameter "remote_node_map"
to the Galera resource agent which allows it to consider
galera node names that are in other clusters as part of its
Galera quorum.  To achieve this, it launches read-only pcs
commands to the remote clusters in order to view and modify
remote state variables.

Additionally, the stretch agent honors an optional pcs
attribute <node>-initial-bootstrap which when applied to the
local pcs nodes, will allow Galera to be bootstrapped with only
that subset of nodes, without the additional remote nodes
being available yet.  An installer can set these attributes
to allow the first pcs cluster to come online before subsequent
clusters, and then remove the attributes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants