-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
resource-agents: vSphere K8s cluster provider #613
resource-agents: vSphere K8s cluster provider #613
Conversation
f6ebba9
to
f3e6f11
Compare
The "Build/Images" check is failing because I changed the bottlerocket test tools image Dockerfile and the check is using the default tools image in the ECR repository. |
f3e6f11
to
85f8a18
Compare
85f8a18
to
e51bd15
Compare
Taking this to draft until #615 merges and we push out the new tools image. |
b8153ad
to
a0b024f
Compare
Really? You mean latest? That's not right. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great work and exciting that we will have this functionality.
My main question is about requiring a CAPI cluster to pre-exist. I think I understand the choice... otherwise you would need a kind cluster running "docker in docker" inside this container, right? And maybe that feels just too horrible?
The other feedback I have is that vsphere_k8s_cluster_provider.rs
is basically a big long procedural script. My person preference would be to look for a bunch of places where you could refactor out some functions (even if they are only being called once) just for the sake of readability.
bottlerocket/agents/src/bin/vsphere-k8s-cluster-resource-agent/vsphere_k8s_cluster_provider.rs
Show resolved
Hide resolved
let mgmt_kubeconfig_path = format!("{}/mgmt.kubeconfig", WORKING_DIR); | ||
let encoded_kubeconfig = if do_create { | ||
// Check CAPI management cluster is accessible and valid | ||
debug!("Decoding and writing out kubeconfig for the CAPI management cluster"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This agent requires a CAPI management cluster to already exist?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, like you mentioned in the review message, I wanted to avoid bootstrapping with kind since the docker daemon is running with vfs
as the underlying storage driver and it's incredibly inefficient since each container layer is its own directory. And setting up a kind bootstrap cluster requires a lot of containers to be running. I can try again to see if this is acceptable though.
Setting up overlayfs requires installing docker on the testsys node host and mounting in /var/lib/docker
into the dind container which I find unappetizing as well.
By having an external management cluster, I don't have to worry about all of that. In addition, the management cluster becomes a source of truth for the state of all provisioned clusters and I can retrieve a kubeconfig for any provisioned cluster as long as I have access to the mgmt cluster. There are cons with this approach though.
- I believe this makes integrating with
cargo make test
harder since the resource agent is no longer self-contained and requires an external kubeconfig (@ecpullen probably has opinions on this). - Someone needs to maintain the management cluster and remember to upgrade the CAPI, CAPV CRDs when EKS-A bumps versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. I think this is a great solution and we can consider "Don't require an external CAPI cluster" to be a separate issue/feature-request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The additional kubeconfig will either need to have a know location in the monorepo (like testsys.kubeconfig
) or we will have to include it (as a string) in the Test.toml
file. I prefer the former option, but not having to worry about it would be nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried without a external management cluster and let EKS-A set up the bootstrap cluster in kind (kind running in docker that's running in docker btw lol) and it failed due to networking issues:
2022-10-25T20:35:28.490Z V4 ----------------------------------
2022-10-25T20:35:28.490Z V4 Task start {"task_name": "bootstrap-cluster-init"}
2022-10-25T20:35:28.490Z V0 Creating new bootstrap cluster
2022-10-25T20:35:28.491Z V4 Creating kind cluster {"name": "br-eksa-122-eks-a-cluster", "kubeconfig": "br-eksa-122/generated/br-eksa-122.kind.kubeconfig"}
2022-10-25T20:35:28.491Z V6 Executing command {"cmd": "/usr/bin/docker exec -i eksa_1666730106969714044 kind create cluster --name br-eksa-122-eks-a-cluster --kubeconfig br-eksa-122/generated/br-eksa-122.kind.kubeconfig --image public.ecr.aws/eks-anywhere/kubernetes-sigs/kind/node:v1.22.15-eks-d-1-22-11-eks-a-19 --config br-eksa-122/generated/kind_tmp.yaml"}
ERRO[2022-10-25T20:35:53.428755287Z] Could not add route to IPv6 network fc00:f853:ccd:e793::1/64 via device br-3443489d9de0: network is down
time="2022-10-25T20:35:56.630430522Z" level=info msg="loading plugin \"io.containerd.event.v1.publisher\"..." runtime=io.containerd.runc.v2 type=io.containerd.event.v1
time="2022-10-25T20:35:56.630486720Z" level=info msg="loading plugin \"io.containerd.internal.v1.shutdown\"..." runtime=io.containerd.runc.v2 type=io.containerd.internal.v1
time="2022-10-25T20:35:56.630493149Z" level=info msg="loading plugin \"io.containerd.ttrpc.v1.task\"..." runtime=io.containerd.runc.v2 type=io.containerd.ttrpc.v1
time="2022-10-25T20:35:56.630582130Z" level=info msg="starting signal loop" namespace=moby path=/run/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/99c8a19b51e1e85305f365229030c3b6b7e58f7423f120b88f02d0162ee1de7e pid=2752 runtime=io.containerd.runc.v2
So unless we want to invest time investigating how to get kind to work under 3 layers of container abstraction, I think we should stick with external mgmt cluster for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🐢 🐢 🐢 🐢 🐢
bottlerocket/agents/src/bin/vsphere-k8s-cluster-resource-agent/vsphere_k8s_cluster_provider.rs
Show resolved
Hide resolved
bottlerocket/agents/src/bin/vsphere-k8s-cluster-resource-agent/vsphere_k8s_cluster_provider.rs
Outdated
Show resolved
Hide resolved
bottlerocket/agents/src/bin/vsphere-k8s-cluster-resource-agent/vsphere_k8s_cluster_provider.rs
Show resolved
Hide resolved
bottlerocket/agents/src/bin/vsphere-k8s-cluster-resource-agent/vsphere_k8s_cluster_provider.rs
Outdated
Show resolved
Hide resolved
bottlerocket/agents/src/bin/vsphere-vm-resource-agent/vsphere_vm_provider.rs
Outdated
Show resolved
Hide resolved
a0b024f
to
ffc6221
Compare
Push above addresses @webern 's comments.
Repeated testing and the results are the same as before as described in the PR description |
ffc6221
to
703b9f5
Compare
Push above rebases onto develop and fixes conflicts in EKS resource agent |
703b9f5
to
775331f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a tutorial for using this agent somewhere?
775331f
to
2e991c3
Compare
Push above adds a README for the vSphere K8s cluster resource agent on how to configure the agent and more info on how to set up the initial management cluster. |
This refactors out some functions needed for implementing the vSphere K8s cluster provider so they can be shared between the two crates
2e991c3
to
0ea1092
Compare
Push above rebases onto develop and fixes conflicts with #537 |
This adds a new resource agent for provisioning vSphere K8s clusters via EKS-A.
This refactors out the duplicated logic between eks cluster provider and vsphere k8s cluster cluster provider for determining whether cluster creation is necessary according to the CreationPolicy
0ea1092
to
c3b7b28
Compare
Push above addresses lints caught by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple questions.
Talked with @mjsterckx and we're ok with merging |
Issue number:
Resolves #35
Description of changes:
Testing done:
Applied the following resource spec for a vSphere K8s cluster:
Creation pod runs to completion without problem:
Destruction runs to completion without problem after deleting the resource:
With
testsys run vmware
, the cluster can be used to run migration tests/conformance tests:Terms of contribution:
By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.