Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Feature/double hbone #1429

Open
wants to merge 17 commits into
base: master
Choose a base branch
from

Conversation

Stevenjin8
Copy link
Contributor

@Stevenjin8 Stevenjin8 commented Jan 17, 2025

Initial double HBONE implementation

Right now, inner HBONE will only hold one connect tunnel. Once the inner tunnel terminates, so will the outer tunnel (but not the outer HBONE). So when ztunnel receives its first connection to a double HBONE host (E/W gateway), it will perform two TLS handshakes. Subsequent connections to the same host will perform one TLS handshake.

This behavior is not great, but if we put the inner HBONE in the connection pool, then we pin ourselves to a pod in the remote cluster since ztunnel performs connection pooling, but is not aware of the E/W gateway's routing decision.

That being said, I think this is a good place to stop and think about control plane implementation and get some feedback on how I'm approaching this.

NOTE: The TLS/certificate related code changes are just for me to tests.

Tasks:

  • Implement double HBONE
  • Implement graceful shutdowns
  • Fix TLS code (I don't think I can do this without more control plane info since SANs are set for services)
  • metrics
  • Implement proper inner HBONE connection pooling
  • Tests

Some open questions:

  • Do I need to make any changes to metrics?
  • How do should we do inner HBONE pooling? My suggestion is to have up to N inner HBONE connections per E/W or per remote cluster.
  • Good ways to test for race conditions in connection terminations? Right now, it seems that connections end gracefully without race conditions, but that's just on my machine.

References:

Sorry, something went wrong.

@istio-testing
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@istio-testing istio-testing added do-not-merge/work-in-progress Block merging of a PR because it isn't ready yet. needs-rebase Indicates a PR needs to be rebased before being merged size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 17, 2025
@Stevenjin8 Stevenjin8 force-pushed the feature/double-hbone branch from e27a4a8 to 40bbcd0 Compare January 17, 2025 17:00
@istio-testing istio-testing removed the needs-rebase Indicates a PR needs to be rebased before being merged label Jan 17, 2025
@Stevenjin8 Stevenjin8 changed the title Feature/double hbone WIP: Feature/double hbone Jan 17, 2025
@Stevenjin8 Stevenjin8 added the do-not-merge/hold Block automatic merging of a PR. label Jan 17, 2025
@Stevenjin8 Stevenjin8 force-pushed the feature/double-hbone branch from 66db8ae to 99d622f Compare January 17, 2025 17:51
@@ -107,7 +112,7 @@ impl Outbound {
debug!(component="outbound", dur=?start.elapsed(), "connection completed");
}).instrument(span);

assertions::size_between_ref(1000, 1750, &serve_outbound_connection);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did we get these numbers?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by looking a the current size and adding a small amount of buffer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we covered this live on the WG call, but just to close the loop here - this must not grow (beyond a trivial amount) else it meas every connection will use that much additional memory. Typically the fix here is to Box::pin the futures.

@Stevenjin8 Stevenjin8 force-pushed the feature/double-hbone branch from 99d622f to 96bb4de Compare January 17, 2025 17:55
@@ -83,6 +83,26 @@ struct ConnSpawner {

// Does nothing but spawn new conns when asked
impl ConnSpawner {
async fn new_unpooled_conn(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anything here we can do higher up I think, but things might change if we decide to implement pooling in this PR

Copy link
Contributor

@bleggett bleggett Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah if we want double-hbone conns to be unpooled and thus need ~none of this surrounding machinery, then I'd be inclined to just start proxy/double-hbone.rs and use that directly, rather than complicating the purpose of this file.

(Could also just have a common HboneConnMgr trait or something too)

Comment on lines 247 to 256
// This always drops ungracefully
// drop(conn_client);
// tokio::time::sleep(std::time::Duration::from_secs(1)).await;
// drain_tx.send(true).unwrap();
// tokio::time::sleep(std::time::Duration::from_secs(1)).await;
drain_tx.send(true).unwrap();
let _ = driver_task.await;
// this sleep is important, so we have a race condition somewhere
// tokio::time::sleep(std::time::Duration::from_secs(1)).await;
res
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does anybody have any info on how to properly drop/terminate H2 connections over stream with nontrivial drops (e.g. shutting down TLS over HTTP2 CONNECT). Right now, I'm just dropping things/aborting tasks randomly until something works

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you asking about how to cleanup after, for example, a RST_STREAM to the inner tunnel? Or something else

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kinda. I mostly mean the outer TLS stream because that's what I've looked at. It seems like if I drop conn_client before termination driver_task the TCP connection will close without sending close notifies. So yes, I'm asking if there is a way to explicitly do cleanup rather than relying on implicit drops.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the code changed; do you still need help figuring this out?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im still not confident in it. It works (on my machine), but I couldn't find any docs on proper connection termination/dropping.

@@ -217,12 +267,12 @@ impl OutboundConnection {
copy::copy_bidirectional(copy::TcpStreamSplitter(stream), upgraded, connection_stats).await
}

async fn send_hbone_request(
fn create_hbone_request(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Git merge is getting confused here

src/config.rs Outdated
@@ -70,7 +70,7 @@ const IPV6_ENABLED: &str = "IPV6_ENABLED";

const UNSTABLE_ENABLE_SOCKS5: &str = "UNSTABLE_ENABLE_SOCKS5";

const DEFAULT_WORKER_THREADS: u16 = 2;
const DEFAULT_WORKER_THREADS: u16 = 40;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may have missed in the description, but why the change here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was hoping it would making debugging async rust easier (it didn't)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(if you haven't already found it, tokio-console can be helpful)

Comment on lines 247 to 256
// This always drops ungracefully
// drop(conn_client);
// tokio::time::sleep(std::time::Duration::from_secs(1)).await;
// drain_tx.send(true).unwrap();
// tokio::time::sleep(std::time::Duration::from_secs(1)).await;
drain_tx.send(true).unwrap();
let _ = driver_task.await;
// this sleep is important, so we have a race condition somewhere
// tokio::time::sleep(std::time::Duration::from_secs(1)).await;
res
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you asking about how to cleanup after, for example, a RST_STREAM to the inner tunnel? Or something else

Verified

This commit was signed with the committer’s verified signature.
Stevenjin8 Steven Jin

Verified

This commit was signed with the committer’s verified signature.
Stevenjin8 Steven Jin

Verified

This commit was signed with the committer’s verified signature.
Stevenjin8 Steven Jin

Verified

This commit was signed with the committer’s verified signature.
Stevenjin8 Steven Jin

Verified

This commit was signed with the committer’s verified signature.
Stevenjin8 Steven Jin

Verified

This commit was signed with the committer’s verified signature.
Stevenjin8 Steven Jin
@Stevenjin8 Stevenjin8 force-pushed the feature/double-hbone branch from 1ea75fb to f1cc535 Compare January 21, 2025 17:07

Verified

This commit was signed with the committer’s verified signature.
Stevenjin8 Steven Jin

// Inner HBONE
let upgraded = TokioH2Stream::new(upgraded);
// TODO: dst should take a hostname? and upstream_sans currently contains E/W Gateway certs
let inner_workload = pool::WorkloadKey {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will reorganize later.

Verified

This commit was signed with the committer’s verified signature.
Stevenjin8 Steven Jin
@Stevenjin8 Stevenjin8 force-pushed the feature/double-hbone branch from a8856a4 to 565f41f Compare January 21, 2025 21:19
Protocol::TCP => None,
};
let (upstream_sans, final_sans) = match us.workload.protocol {
Copy link
Contributor Author

@Stevenjin8 Stevenjin8 Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding from talking to @keithmattix is that Upstream.service_sans will be repurposed to contain the identities of remote pods/waypoints, so I should change the logic of the other protocols to only use us.workload.identity instead of us.workload_and_services_san.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think this is correct; only the double hbone codepath needs to be added/changed because there are two sans being considered: the e/w gateway SAN and the SANs of the backends. So what you have looks right to me

@@ -511,10 +578,10 @@ impl WorkloadHBONEPool {
#[derive(Debug, Clone)]
// A sort of faux-client, that represents a single checked-out 'request sender' which might
// send requests over some underlying stream using some underlying http/2 client
struct ConnClient {
sender: H2ConnectClient,
pub struct ConnClient {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixme

@istio-testing istio-testing added the needs-rebase Indicates a PR needs to be rebased before being merged label Feb 26, 2025

Verified

This commit was signed with the committer’s verified signature.
Stevenjin8 Steven Jin

Verified

This commit was signed with the committer’s verified signature.
Stevenjin8 Steven Jin

Verified

This commit was signed with the committer’s verified signature.
Stevenjin8 Steven Jin

Verified

This commit was signed with the committer’s verified signature.
Stevenjin8 Steven Jin

Verified

This commit was signed with the committer’s verified signature.
Stevenjin8 Steven Jin

Verified

This commit was signed with the committer’s verified signature.
Stevenjin8 Steven Jin
@istio-policy-bot istio-policy-bot added the lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while label Mar 7, 2025
@istio-testing istio-testing removed the needs-rebase Indicates a PR needs to be rebased before being merged label Mar 7, 2025
@keithmattix
Copy link
Contributor

Not stale

@istio-policy-bot istio-policy-bot removed the lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while label Mar 7, 2025
@Stevenjin8 Stevenjin8 marked this pull request as ready for review March 7, 2025 17:14
@Stevenjin8 Stevenjin8 requested a review from a team as a code owner March 7, 2025 17:14

Verified

This commit was signed with the committer’s verified signature.
Stevenjin8 Steven Jin
@Stevenjin8 Stevenjin8 force-pushed the feature/double-hbone branch from 64042db to b2a21bd Compare March 7, 2025 17:47
@@ -50,6 +52,7 @@ async fn main() {
.await
.unwrap();
sender.wait_forever().await.unwrap();
tokio::time::sleep(Duration::from_secs(99999999)).await;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove me

@@ -107,7 +107,7 @@ impl Socks5 {
debug!(component="socks5", dur=?start.elapsed(), "connection completed");
}).instrument(span);

assertions::size_between_ref(1000, 2000, &serve);
assertions::size_between_ref(1000, 10000, &serve);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was erroring at at 5600

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we went from 2000 to 5600 we have a problem we need to address. The size here is the per-connection memory overhead, so increases here directly lead to increased memory utilization

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into this a bit more and "solved" it by boxing the futures. However, I don't know if this makes a difference the resulting binary because the size of futures can change based on compiler optimizations. For example, if I leave proxy_to_double_hbone unboxed, and restore the limit to 2000, the tests will pass if I build them with --release, but fail otherwise. So the size of serve seems more like a red herring.

Comment on lines +78 to +81
/// If there is a network gateway, use <service hostname>:<port>. Otherwise, use
/// <ip address>:<port>. Fortunately, for double-hbone, the authority/host is the same
/// for inner and outer CONNECT request, so we can reuse this for inner double hbone,
/// outer double hbone, and normal hbone.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// If there is a network gateway, use <service hostname>:<port>. Otherwise, use
/// <ip address>:<port>. Fortunately, for double-hbone, the authority/host is the same
/// for inner and outer CONNECT request, so we can reuse this for inner double hbone,
/// outer double hbone, and normal hbone.
/// The authority header value for hbone requests. It is the same for normal hbone, and the inner/outer
/// requests of double hbone

buf: &mut tokio::io::ReadBuf<'_>,
) -> Poll<std::io::Result<()>> {
let pinned = std::pin::Pin::new(&mut self.0.read);
copy::ResizeBufRead::poll_bytes(pinned, cx).map_ok(|bytes| buf.put(bytes))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like there is no protection against ReadBuf's rules: Panics if self.remaining() is less than buf.len().

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we plausible write a test around this like we have in copy.rs wrapping in WeirdIO? It detects these types of bugs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'm holding back on unit tests until there is some agreement on the implementation


let _ = drain_tx.send(true);
let _ = driver_task.await;
// Here, there is an implicit, drop(conn_client).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is no conn_client in scope so this is a bit confusing

.as_ref()
.and_then(|wl| wl.network_gateway.as_ref()),
) {
(_, Some(_)) => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there is a guarantee that something that has a network_gateway supports HBONE. Or that something that has a network_gateway is even in a different network

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we should check that the protocol is HBONE to do double hbone? what about the inner tunnel? I guess my question is how should we explicitly encode protocol information.? Maybe have it be a field in network_gateway?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@howardjohn does WDS/network gateway need a new protocol field? or are we good just checking that the protocol is HBONE and returning an Error in the rest of the cases? cc @keithmattix

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry github hid this comment so end up saying something similar but with slightly more context in #1429 (comment).

I think the WDS protocol has enough information, we just need to use the context of (src,dst) to determine whether a cross network traversal is required

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So just looking at the source/ dest networks should give enough info on whether to use the network gateway , and I can put this logic next to the waypoint logic in ‘build_request’ (I think that’s what the function is called).

let tls_stream = connector
.connect(
upgraded,
rustls::pki_types::ServerName::IpAddress(wl_key.dst.ip().into()),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The servername we pass has no meaning in the underlying library so I think its confusing to do the effort to pick a specific IP here vs always hardcoding a dummy IP and hiding it in the OutboundConnector like we do today

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now, we use the peer address of the tcp stream, but if it has no meaning I agree that its better to just have a dummy variable and get rid of the argument.

)
.await;

let _ = drain_tx.send(true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this? Doesn't dropping all references to the connection already make it close?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It closes it, but in my testing, it was not graceful (e.g. the proper close notifies weren't sent). For example, if the outer connection http2 CONNECT stream closes before the inner TLS stream has a chance to send the close notify.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But since there have been some refactors, ill take another look.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@howardjohn I took another look at it seems like connections aren't being closed particularly gracefully. Namely, I'm seeing FIN being sent before close notifies, But I do have a janky setup. How do we generally test for this?

Verified

This commit was signed with the committer’s verified signature.
Stevenjin8 Steven Jin

Verified

This commit was signed with the committer’s verified signature.
Stevenjin8 Steven Jin
@istio-testing istio-testing added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Mar 11, 2025
Some(network_gateway.hbone_mtls_port),
),
Destination::Hostname(_) => (
self.resolve_workload_address(&wl, source_workload, original_target_address)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should do a lookup in the service registry

@keithmattix
Copy link
Contributor

keithmattix commented Mar 17, 2025

Fixes #1423

@@ -107,7 +112,7 @@ impl Outbound {
debug!(component="outbound", dur=?start.elapsed(), "connection completed");
}).instrument(span);

assertions::size_between_ref(1000, 1750, &serve_outbound_connection);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we covered this live on the WG call, but just to close the loop here - this must not grow (beyond a trivial amount) else it meas every connection will use that much additional memory. Typically the fix here is to Box::pin the futures.

Protocol::HBONE => {
let res = match (
req.protocol,
req.actual_destination_workload
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The presence of a network gateway does not imply we need to reach it through the network gateway. For example, a local workload in a cluster will have a network gateway defined.

The WDS protocol is not contextual. A workload w/ a network gateway does not mean "To reach this workload, always go through this network gateway [and the control plane will dynamically set or unset this for you]", it means "if you need to traverse a network boundary, here is how to do it".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More generally, I wonder if the code might be simplified to have a new Protocol::DoubleHbone (maybe we make a new enum like RequestProtocol though)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

@@ -453,7 +551,7 @@ fn baggage(r: &Request, cluster: String) -> String {
)
}

#[derive(Debug)]
#[derive(Debug, Clone)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/hold Block automatic merging of a PR. do-not-merge/work-in-progress Block merging of a PR because it isn't ready yet. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants