Skip to content
This repository has been archived by the owner on Sep 30, 2024. It is now read-only.

Orchestrator switches incorrectly, causing database service failure, problem analysis #1462

Open
duanhui8 opened this issue Oct 21, 2022 · 1 comment

Comments

@duanhui8
Copy link

Hi I may have found a bug, please help
When I execute the change master command at (192.168.73.128:4307), the DistributePairs function writes the downed mysql information (192.168.73.128:4308) to consul, causing the database to fail to serve

1、This is my database topology
image

2、Database_instance table information in sqlite database(192.168.73.128:4308 MYSQL is down)
BA7F6AC07A2E1744BE5FD846B1E2DF7A

3、This code filters the real mysql master(192.168.73.128:3307), retains the mysql information(192.168.73.128:4308) that has been down, and is written to consul in the following code
C5D29285BB1B743E5DCCE4B39BCFF2EF
4、The downed mysql information(192.168.73.128:4308) is written to consul in this code
6}D5K41DFGPS~7V5O$I32IA

5、consul-template finds that the key of consul has changed, and updates the information of mysql that has been down to haproxy.cfg, causing a failure

If need additional information from me, please contact me
Please help, if it is a bug, please help to fix it, thank you

@alnet
Copy link

alnet commented Feb 14, 2023

I believe I have the same issue using mariadb. I shutdown the server via systemd to simulate a crash, and the mysqlorchestrator UI shows the replica unable to connect and is red, the database table also shows the new master value, but consul is still showing the dead server as the master. Even if I delete the consul values and force a master repopulation via the cli, or just wait long enough, orc populates the dead server back into consul as the current master. So it's not a static data issue, it seems like orc is simply wrong about which box should be the master, but only when writing to the KV store.

This is unfortunate as this was the exact use case I was hoping to use it for. It also clearly didn't used to be a problem as there are many blogs from many companies on line talking about using it in this capacity, so I assume this was a bug mistakenly introduced in later versions. It seems the project has gone stale as well since the primary maintainer has stepped back from development. I hope this project does not die, it seems wildly useful.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants