Skip to content

refactor(server): extract kubernetes compute driver#817

Merged
drew merged 8 commits intomainfrom
os-51-extract-kubernetes-compute-driver
Apr 14, 2026
Merged

refactor(server): extract kubernetes compute driver#817
drew merged 8 commits intomainfrom
os-51-extract-kubernetes-compute-driver

Conversation

@drew
Copy link
Copy Markdown
Collaborator

@drew drew commented Apr 13, 2026

Summary

Extract the Kubernetes sandbox implementation out of openshell-server into a dedicated compute-driver crate and introduce shared sandbox protobufs for gateway and driver internals.

Closes OS-51

Changes

  • moved shared sandbox lifecycle types into proto/sandbox.proto and added internal compute driver RPCs in proto/compute_driver.proto
  • added the new openshell-driver-kubernetes crate with the extracted Kubernetes driver library and tonic service binary
  • replaced direct Kubernetes coupling in openshell-server with a gateway-owned compute runtime and routed sandbox creation, deletion, watch handling, and endpoint resolution through it
  • updated the gateway architecture doc to describe the new compute-driver boundary

Testing

  • mise run pre-commit passes
  • Unit tests added/updated
  • E2E tests added/updated (if applicable)

Additional verification:

  • cargo test -p openshell-driver-kubernetes --lib
  • cargo test -p openshell-server --lib

Checklist

  • Follows Conventional Commits
  • Commits are signed off (DCO)
  • Architecture docs updated (if applicable)

Signed-off-by: Drew Newberry <anewberry@nvidia.com>
@drew drew self-assigned this Apr 13, 2026
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Apr 13, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

rpc CreateSandbox(ComputeCreateSandboxRequest) returns (ComputeCreateSandboxResponse);

// Tear down platform resources for a sandbox.
rpc DeleteSandbox(ComputeDeleteSandboxRequest) returns (ComputeDeleteSandboxResponse);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be a separate StopSandbox so there is an opportunity to inspect state before calling DeleteSandbox?

Copy link
Copy Markdown
Collaborator Author

@drew drew Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. We probably also want to be able to resume the sandbox. Being able to clone or copy may also be useful down the road.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, don't need explicit resume. We can auto resume on connect or exec.

string namespace = 3;
SandboxSpec spec = 4;
SandboxStatus status = 5;
SandboxPhase phase = 6;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be put under SandboxStatus

@drew drew added the test:e2e Requires end-to-end coverage label Apr 14, 2026
@drew drew marked this pull request as ready for review April 14, 2026 16:51
@drew drew requested a review from a team as a code owner April 14, 2026 16:51
@mrunalp
Copy link
Copy Markdown

mrunalp commented Apr 14, 2026

Thanks for the changes. lgtm!

Copy link
Copy Markdown

@maxamillion maxamillion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - I think there's room to iterate on a couple small things later but this is really good 👍

@drew drew merged commit 60035c6 into main Apr 14, 2026
11 checks passed
@drew drew deleted the os-51-extract-kubernetes-compute-driver branch April 14, 2026 20:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

test:e2e Requires end-to-end coverage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants