Retrying compartmentless containers #1608

grnmeira · 2025-08-07T14:15:53Z

Problem

We have an issue with containerd and Windows' HCS where namespaces for pods don't yet have a compartment ID when the CNI signals ztunnel for a new workload that's just been created. Without that ID we can't create ztunnel proxy sockets inside the pod's network compartment. The compartment ID will only be available after ztunnel signals the CNI that the workload has been assimilated by ztunnel (which can't happen without a compartment ID 🫤).

What this PR does

This PR mitigates the stated problem with ztunnel replying an ACK to the CNI during the ADD operation, even though the workload proxies haven't been yet instantiated inside ztunnel. After a timeout, ztunnel tries again to add the workflow, and now the HCS API returns a valid compartment ID, which allows the creation of a proxy without any problems.

A more detailed flow looks like:

CNI sends an ADD command to ztunnel
ztunnel queries HCS and checks if a comapartment ID is available for that pod
2.1 If the compartment ID is available, we create a proxy for the workload and ACK the CNI
2.2 If the compartment ID is not available, we mark the workload as pending inside ztunnel and ACK the CNI.
The CNI receives an ACK and the pod creation proceeds.
After a timeout, ztunnel tries to add the pending workload again.

How the PR does it

It introduces a queue of events that runs in the same thread where ZDS commands are processed. These events are managed in a quite simple way at the moment, but can be expanded in the future for more advanced handling of future "internal events", including the current retries of pending workloads.

Caveats

We'll keep retrying compartmentless workloads even if they're failing in an unretriable way (to be fixed in a different PR). There's no current way to signal CNI after sending an ACK for the ADD command that the workload actually failed to be assimilated by ztunnel.

MikeZappa87 · 2025-08-18T14:35:25Z

src/inpod/windows/statemanager.rs

@@ -339,6 +362,98 @@ impl WorkloadProxyManagerState {
        }
    }

+    pub async fn retry_comparmentless(&mut self, poddata: &WorkloadData) -> Result<(), Error> {


compartmentless*

keithmattix · 2025-08-18T16:14:06Z

src/inpod/windows.rs

@@ -34,8 +34,7 @@ mod workloadmanager;
 #[cfg(any(test, feature = "testing"))]
 pub mod test_helpers;

-
-#[derive(Debug)]
+#[derive(Debug, Clone)]


Are we sure we want to clone this?

Reason here is just to facilitate persisting this information across requests in our message loop (as now we can retry workloads later). We can make an effort to simplify that struct, or use an Rc if you think that's desirable.

Ack, let's optimize later if we feel like we need it. This doesn't affect Linux so I feel good coming back to it

keithmattix · 2025-08-18T16:15:07Z

src/inpod/windows/admin.rs

+    pub fn proxy_pending(&self, uid: &crate::inpod::WorkloadUid, workload_info: &WorkloadInfo) {
+        let mut state = self.state.write().unwrap();


Does it make sense to merge these?

Good point, they're pretty much the same.

keithmattix · 2025-08-18T16:15:37Z

src/inpod/windows/config.rs

@@ -36,14 +34,14 @@ impl InPodConfig {
            ..cfg.socket_config
        };
        Ok(InPodConfig {
-            cur_namespace: InpodNamespace::current()?,
+            cur_namespace: NetworkNamespace::current()?,


Why the rename?

+1, the current naming makes no distiction between the specific namespace. Does this rename add value?

I think it was more because I removed one layer there, and this was the original "inner" struct. I'll change it back to InpodNamespace 👍

grnmeira · 2025-09-02T09:09:43Z

@keithmattix @MikeZappa87 thanks for the reviews, I've addressed your comments. Could you please give me a hand with a second pass?

keithmattix

Some error handling questions, but the bones look good to me

keithmattix · 2025-09-02T13:48:57Z

src/inpod/windows/statemanager.rs

+                "network compartment ID not yet available for namespace {}",
+                netns.namespace_guid
+            );
+            self.compartmentless_workloads


Do we ever pop this off the list? Could this create duplicates?

keithmattix · 2025-09-02T13:49:41Z

src/inpod/windows/statemanager.rs

+            compartment_id, netns.namespace_guid
+        );
+        let new_netns = InpodNamespace::new(netns.namespace_guid.clone()).map_err(|e| {
+            self.compartmentless_workloads


Why potentially do this again?

keithmattix · 2025-09-02T13:50:08Z

src/inpod/windows/statemanager.rs

+        match self.add_workload(&uid, info.clone(), new_netns).await {
+            Ok(()) => {}
+            Err(e) => {
+                self.compartmentless_workloads


I might be missing something, but isn't there a possibility that we add this guid to the list 3 times?

For compartmentless_workloads? Its a Vec<(WorkloadUid, WorkloadInfo, InpodNamespace)> it has nothing would prevent it from having duplicate entries.

That vector shouldn't be there anymore. It should've been removed after the queue of events was created.

keithmattix · 2025-09-02T13:53:35Z

/cc @Stevenjin8 @howardjohn for a closer look

grnmeira · 2025-09-08T11:58:52Z

pinging @Stevenjin8 and @howardjohn for a closer look

Stevenjin8 · 2025-09-11T17:55:10Z

looking at this now... Going to build and maybe run on an aks cluster

Stevenjin8 · 2025-09-11T17:56:15Z

Cargo.toml

@@ -112,6 +112,7 @@ tracing-core = "0.1"
 tracing-appender = "0.2"
 tokio-util = { version = "0.7", features = ["io-util"] }
 educe = "0.6"
+uuid = "1.17.0"


this is windows only right?

Actually I think we don't need that anymore, as we're using windows::core::UUID now.

Stevenjin8 · 2025-09-11T17:58:57Z

istio/istio#57303

Stevenjin8 · 2025-09-12T16:24:39Z

src/inpod/windows/workloadmanager.rs

+        };
+        match self
+            .event_queue
+            .binary_search_by_key(&expiration, |event| event.expiration)


we can use partition_point instead: https://doc.rust-lang.org/std/primitive.slice.html#method.partition_point

Stevenjin8 · 2025-09-12T16:46:15Z

src/inpod/windows/workloadmanager.rs

+                        // We can't tell the CNI that a node needs to be removed due
+                        // to an unretriable error. So at the moment we always retry,
+                        // always increasing the the timeout by a factor on each attempt.
+                        warn!("error while retyring workload: {}", e);


Suggested change

warn!("error while retyring workload: {}", e);

warn!("error while retrying workload: {}", e);

Stevenjin8 · 2025-09-12T17:16:06Z

src/inpod/windows/workloadmanager.rs

+                            poddata,
+                            new_timeout.as_secs()
+                        );
+                        self.enqueue_local_event(


After some thought, this doesn't matter, but it is a bit sketchy to modify the event queue as we are iterating over it

I think it works out, but feels a bit fragile (like the fact that enqueue_local_event will always enqueue an event that has a timeout that has an expiration > now (otherwise the retry event would get immediately deleted). It would be nice to at least have some comments for this.

Stevenjin8 · 2025-09-12T17:16:14Z

src/inpod/windows/workloadmanager.rs

+                        // to an unretriable error. So at the moment we always retry,
+                        // always increasing the the timeout by a factor on each attempt.
+                        warn!("error while retyring workload: {}", e);
+                        let new_timeout = previous_timeout.mul(2);


we should put a cap on this.

and add jitter

grnmeira added 2 commits August 7, 2025 13:34

changes in internal event handling

0d9fcac

fixing namespace tests compilation

8403c9c

grnmeira requested a review from a team as a code owner August 7, 2025 14:15

istio-testing added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Aug 7, 2025

grnmeira mentioned this pull request Aug 7, 2025

Handle partial ZDS adds between ztunnel and CNI #1609

Open

grnmeira added the windows Experimental Windows support label Aug 8, 2025

grnmeira mentioned this pull request Aug 11, 2025

Readme for Windows ambient istio/istio#57303

Open

MikeZappa87 reviewed Aug 18, 2025

View reviewed changes

keithmattix reviewed Aug 18, 2025

View reviewed changes

rustfmt + PR suggestions

054064e

keithmattix reviewed Sep 2, 2025

View reviewed changes

suggestions from PR, removing unnused vector

e312389

Stevenjin8 reviewed Sep 11, 2025

View reviewed changes

grnmeira added 2 commits September 11, 2025 20:24

removing unnecessary create uuid

6e16f85

reverting Cargo.lock

be448a2

Stevenjin8 reviewed Sep 12, 2025

View reviewed changes

		pub fn proxy_pending(&self, uid: &crate::inpod::WorkloadUid, workload_info: &WorkloadInfo) {
		let mut state = self.state.write().unwrap();

	warn!("error while retyring workload: {}", e);
	warn!("error while retrying workload: {}", e);

Retrying compartmentless containers #1608

Are you sure you want to change the base?

Retrying compartmentless containers #1608

Uh oh!

Conversation

grnmeira commented Aug 7, 2025

Problem

What this PR does

How the PR does it

Caveats

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

grnmeira commented Sep 2, 2025

Uh oh!

keithmattix left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

keithmattix commented Sep 2, 2025

Uh oh!

grnmeira commented Sep 8, 2025

Uh oh!

Stevenjin8 commented Sep 11, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Stevenjin8 commented Sep 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Stevenjin8 commented Sep 11, 2025 •

edited

Loading