Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SubnetAddressTranslator #2013

Open
wants to merge 9 commits into
base: 4.x
Choose a base branch
from

Conversation

jahstreet
Copy link
Contributor

@jahstreet jahstreet commented Feb 10, 2025

When running Cassandra in a private network and accessing it from outside of that private network via some kind of proxy, we have an option to use FixedHostNameAddressTranslator. But when we want to set it up in a HA way and have more control over latencies in multi-datacenter deployments, that is not enough.

This PR proposes a SubnetAddressTranslator, which translates Cassandra node IP addresses based on the match to the configured subnet IP range (CIDR notation). The assumption is that each Cassandra datacenter nodes belong to different subnets not having intersecting IP ranges, which is the usual configuration for multi-DC Kubernetes and K8ssandra, for example.

Copy link
Contributor Author

@jahstreet jahstreet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any additional documentation I should update with this change?

Copy link
Contributor

@tolbertam tolbertam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent work @jahstreet, thank you! Have some suggestions, but I'm +1 either way.

# default-address = "cassandra.datacenter1.com:9042"
# Whether to resolve the addresses once on initialization (if true) or on each node (re-)connection (if false).
# If not configured, defaults to false.
# resolve-addresses = false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 while resolve-contact-points defaults to true, I was thinking about whether we should do the same here, but I think you probably will generally use a DNS name that might change its backing IPs from time to time, so makes sense for this to be false. Was that what you were thinking as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100%, I expect proxy to have at least 2 replicas eligible for "periodic" restarts, that was the intuition to decide here.

@jahstreet
Copy link
Contributor Author

Excellent work @jahstreet, thank you! Have some suggestions, but I'm +1 either way.

Thanks! I will push a commit with annotations till Monday morning.

Copy link
Contributor

@absurdfarce absurdfarce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the overall idea, just a few things I think need to be tweaked here. Moving the DriverOptions around shouldn't be too bad but I am a bit concerned about adding a new dependency just for this. I'm also not sure I love the additional exception handling code that's been added now... but I can be convinced on either point.

<groupId>com.github.seancfoley</groupId>
<artifactId>ipaddress</artifactId>
<optional>true</optional>
</dependency>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't love the inclusion of another dependency here, even if it's an optional one. It's only used in one class (near as I can tell)... is there really no way to get the functionality we need without adding this in?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First thing that comes to mind is implementing it ourselves (or in other words copying it over from the library). Lemme evaluate how much of the util code is needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need at least the following functionality to work with subnets here:

  • Validate subnet string is in a prefix block format
  • Check if subnet contains IP address
  • All for IPv4 and IPv6

The library is quite big, so copying over its parts is an overkill.
Then the alternative is to implement these functions ourselves.
Looking into it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit of vibe-coding and we can have it with around 100 lines of code. Will work on integrating a change.

addresses = AddressUtils.extract(spec, resolve);
} catch (RuntimeException e) {
LOG.warn("Ignoring invalid contact point {} ({})", spec, e.getMessage(), e);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could just continue here to next iteration of the outer for loop. You know addresses is the empty set at this point so there's no point iterating over it below.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I look at this now it feels like we had to make this a bit more complicated because AddressUtils.extract() now throws exceptions in most cases rather than just logging errors and returning an empty set. Was there a particular reason for this change? It's not immediately clear the exceptions buy you much here.

Copy link
Contributor Author

@jahstreet jahstreet Mar 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, #extract code was used only in this class and we logged errors together with reasons of these errors. Now the info about reasons is moved to util method, which is called from multiple places. In this class, I aimed to keep logging (as well as other functionality) as close to the origin as seemed possible to avoid opinionated refactoring, so I needed a way to get reasons of errors from the utility #extract to log them together with the context logs.
Happy to agree on the way it should look like and change accordingly.

"Contact point {} resolves to multiple addresses, will use them all ({})",
spec,
addresses);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this log message offer us much useful information?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, it was there so I kept it as is.
As for me, this log is a good additional info when debugging failed to connect issues. Like, one could be surprised to see the client failed to connect logs where contact points do not match the configured ones.
What is your opinion on the need of it?

@jahstreet jahstreet force-pushed the add-subnet-address-translator branch from cf06929 to 85d5931 Compare March 23, 2025 11:55
@absurdfarce
Copy link
Contributor

Apologies, this is on my list but I haven't made it back to reconsider the updated comments in this review. I appreciate your patience @jahstreet!

FWIW I have added this to the 4.19.1 release planning doc under the working assumption that we'll almost certainly get this in in some form we can all agree on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants