Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failover is delayed by waiting for a topology with more than one instance #1046

Open
danielbaniel opened this issue Jun 26, 2024 · 3 comments
Assignees
Labels
bug Something isn't working

Comments

@danielbaniel
Copy link

Describe the bug

The bug lies in the code here:

I don't know the history of this check, but it's problematic in a few situations.

Take a two instance cluster with instance Foo and instance Bar. Lets say Foo is the writer. Foo crashes and Bar gets promoted to the writer. When Bar becomes available the driver will get stuck in this loop until Foo comes up as a reader (which may never happen in a bounded time depending on other problems) and brings the topology size to two. However, as soon as the driver is connected to Bar it has a writer connection and can complete the failover so all the additional downtime is unnecessary.

Expected Behavior

I expect the driver to return availability to clients looking for a writer as soon as a new writer is connected to regardless of the rest of the topology in terms of number of readers and their health.

What plugins are used? What other connection properties were set?

aurora-mysql

Current Behavior

When connecting to a two instance aurora mysql cluster and calling the failover-db-cluster api the failover of the driver won't complete until both instances restart (the reader gets promoted and restarts as a writer and the old writer restarts as a reader). It should complete as soon as the new writer is up.

Reproduction Steps

Create a two instance mysql cluster. Connect and send queries with the driver. Trigger failover with the api. Wait for the FailoverSuccessSQLException. Note that this comes later than the time when the new writer comes up. You can get this from the db cloudwatch logs for example.

Possible Solution

No response

Additional Information/Context

No response

The AWS Advanced JDBC Driver version used

latest

JDK version used

11

Operating System and version

osx

@danielbaniel danielbaniel added the bug Something isn't working label Jun 26, 2024
@ucjonathan
Copy link

@danielbaniel I don't use MySQL, but since you pointed out the exact like of problematic code, I believe that statement should be changed to:

if (topology.size() == 1 && getWriter(topology) == null) {

If we have a topology of 1 and there is no writer, then log that message otherwise connect to that writer.

@danielbaniel
Copy link
Author

danielbaniel commented Jul 8, 2024

Hey @ucjonathan, this issue isn't mysql specific and applies to pg too. I filled in the issue incorrectly because I only specified the aurora-mysql plugin in this issue description but it affects both.

In either case however, your fix suggestion seems appropriate. As soon as the driver is connected to a writer it should go ahead and serve requests, no reason to wait for other instances.

I expect it will apply to MAZ clusters too not just Aurora. Whatever the context, as soon as you have a writer there's no need to wait for another instance to be up if you're looking for a writer endpoint.

@sergiyvamz
Copy link
Contributor

Hi, @danielbaniel @ucjonathan

A new version of failover plugin has been merged recently. It's a reworked and re-architected plugin to support cluster failover. In general, a new failover2 plugin shows a better stability and we hope it may solve the issue you reported.

The new plugin is available in the latest snapshot build. Could you kindly checkout our snapshot build and let us know
if the issue still persists with a new failover2 plugin?

https://github.com/aws/aws-advanced-jdbc-wrapper/blob/main/docs/using-the-jdbc-driver/using-plugins/UsingTheFailover2Plugin.md

https://github.com/aws/aws-advanced-jdbc-wrapper/blob/main/docs/using-the-jdbc-driver/UsingTheJdbcDriver.md#using-a-snapshot-of-the-driver

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants