-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Shared network for vfkit driver using vmnet-helper #20501
base: master
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: nirs The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Hi @nirs. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Can one of the admins verify this patch? |
@afbjorklund can you review this? |
86b449f
to
6bf1d56
Compare
Example machine config when vmnet-shared is used: {
"ConfigVersion": 3,
"Driver": {
"IPAddress": "192.168.105.21",
"MachineName": "minikube",
"SSHUser": "docker",
"SSHPort": 22,
"SSHKeyPath": "",
"StorePath": "/Users/nir/.minikube",
"SwarmMaster": false,
"SwarmHost": "",
"SwarmDiscovery": "",
"Boot2DockerURL": "file:///Users/nir/.minikube/cache/iso/arm64/minikube-v1.35.0-arm64.iso",
"DiskSize": 20000,
"CPU": 2,
"Memory": 6000,
"Cmdline": "",
"ExtraDisks": 0,
"Network": "vmnet-shared",
"MACAddress": "de:a2:9c:71:3c:f7",
"VmnetHelper": {
"MachineDir": "/Users/nir/.minikube/machines/minikube",
"InterfaceID": "a0c43efb-2dcc-4abb-a310-568782c5dc7a"
}
},
"DriverName": "vfkit",
"HostOptions": {
"Driver": "",
"Memory": 0,
"Disk": 0,
"EngineOptions": {
"ArbitraryFlags": null,
"Dns": null,
"GraphDir": "",
"Env": null,
"Ipv6": false,
"InsecureRegistry": [
"10.96.0.0/12"
],
"Labels": null,
"LogLevel": "",
"StorageDriver": "",
"SelinuxEnabled": false,
"TlsVerify": false,
"RegistryMirror": [],
"InstallURL": "https://get.docker.com"
},
"SwarmOptions": {
"IsSwarm": false,
"Address": "",
"Discovery": "",
"Agent": false,
"Master": false,
"Host": "",
"Image": "",
"Strategy": "",
"Heartbeat": 0,
"Overcommit": 0,
"ArbitraryFlags": null,
"ArbitraryJoinFlags": null,
"Env": null,
"IsExperimental": false
},
"AuthOptions": {
"CertDir": "/Users/nir/.minikube",
"CaCertPath": "/Users/nir/.minikube/certs/ca.pem",
"CaPrivateKeyPath": "/Users/nir/.minikube/certs/ca-key.pem",
"CaCertRemotePath": "",
"ServerCertPath": "/Users/nir/.minikube/machines/server.pem",
"ServerKeyPath": "/Users/nir/.minikube/machines/server-key.pem",
"ClientKeyPath": "/Users/nir/.minikube/certs/key.pem",
"ServerCertRemotePath": "",
"ServerKeyRemotePath": "",
"ClientCertPath": "/Users/nir/.minikube/certs/cert.pem",
"ServerCertSANs": null,
"StorePath": "/Users/nir/.minikube"
}
},
"Name": "minikube"
} |
Example multi-node cluster% minikube start --network vmnet-shared --nodes 2 --cni auto
😄 minikube v1.35.0 on Darwin 15.3.1 (arm64)
✨ Using the vfkit (experimental) driver based on user configuration
❗ --network flag is only valid with the docker/podman, KVM and Qemu drivers, it will be ignored
👍 Starting "minikube" primary control-plane node in "minikube" cluster
🔥 Creating vfkit VM (CPUs=2, Memory=4050MB, Disk=20000MB) ...
📦 Preparing Kubernetes v1.32.2 on containerd 1.7.23 ...
▪ Generating certificates and keys ...
▪ Booting up control plane ...
▪ Configuring RBAC rules ...
🔗 Configuring CNI (Container Networking Interface) ...
🔎 Verifying Kubernetes components...
▪ Using image gcr.io/k8s-minikube/storage-provisioner:v5
🌟 Enabled addons: default-storageclass, storage-provisioner
👍 Starting "minikube-m02" worker node in "minikube" cluster
🔥 Creating vfkit VM (CPUs=2, Memory=4050MB, Disk=20000MB) ...
🌐 Found network options:
▪ NO_PROXY=192.168.105.22
📦 Preparing Kubernetes v1.32.2 on containerd 1.7.23 ...
▪ env NO_PROXY=192.168.105.22
> kubelet.sha256: 64 B / 64 B [-------------------------] 100.00% ? p/s 0s
> kubectl.sha256: 64 B / 64 B [-------------------------] 100.00% ? p/s 0s
> kubeadm.sha256: 64 B / 64 B [-------------------------] 100.00% ? p/s 0s
> kubectl: 53.25 MiB / 53.25 MiB [------------] 100.00% 37.13 MiB p/s 1.6s
> kubelet: 71.75 MiB / 71.75 MiB [------------] 100.00% 12.43 MiB p/s 6.0s
> kubeadm: 66.81 MiB / 66.81 MiB [------------] 100.00% 10.11 MiB p/s 6.8s
🔎 Verifying Kubernetes components...
🏄 Done! kubectl is now configured to use "minikube" cluster and "default" namespace by default
% kubectl get node
NAME STATUS ROLES AGE VERSION
minikube Ready control-plane 5m1s v1.32.2
minikube-m02 Ready <none> 4m40s v1.32.2
% kubectl get node -o jsonpath='{.items[*].status.addresses[0].address}{"\n"}'
192.168.105.22 192.168.105.23 Each node get its own vment-helper process: % ps au | grep vmnet-helper | grep -v grep
nir 60260 0.0 0.0 410743984 3792 s020 S 1:07AM 0:00.71 /opt/vmnet-helper/bin/vmnet-helper --fd 21 --interface-id c9ae713b-5937-42de-9018-bbb95a425506
root 60259 0.0 0.0 410737168 6448 s020 S 1:07AM 0:00.01 sudo --non-interactive --close-from 22 /opt/vmnet-helper/bin/vmnet-helper --fd 21 --interface-id c9ae713b-5937-42de-9018-bbb95a425506
nir 60254 0.0 0.0 410735792 3728 s020 S 1:07AM 0:00.73 /opt/vmnet-helper/bin/vmnet-helper --fd 13 --interface-id cf62fb28-2533-4c25-8d35-1eef1736f707
root 60253 0.0 0.0 410754576 6992 s020 S 1:07AM 0:00.01 sudo --non-interactive --close-from 14 /opt/vmnet-helper/bin/vmnet-helper --fd 13 --interface-id cf62fb28-2533-4c25-8d35-1eef1736f707 |
I could take a look at it later perhaps, but one of the minikube maintainers will still need to "take over" the vfkit driver. Probably should have an issue, as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also noticed several vment
typos in commit logs
|
||
// startVfkit starts the vfkit child process. If vfkitSock is non nil, vfkit is | ||
// connected to the vmnet network via the socket instead of "nat" network. | ||
func (d *Driver) startVfkit(vfkitSock *os.File) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be clearer to rename vfkitSock
to vmnetSock
in this method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name come from Start() when we want to make it clear that one socket is for the helper, and one for vfkit. Keeping the same name in startVfkit() to make it easier to follow the code. It is very confusing when using different names to the same thing in different fuctions.
Maybe we can name both vmnetHelperSock and vmnetVfkitSock?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is minor, fine with me if you want to keep the current naming. vmnetHelperSock
/vmnetVfkitSock
is good too even if a bit on the "long" side.
if vfkitState, err := d.getVfkitState(); err != nil { | ||
return state.Error, err | ||
} else if vfkitState == state.Running { | ||
return state.Running, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if getVfkitState
returns running
, is it guaranteed that vmnet will also be running? or should it be s/Running/Stopped
in this block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the semantics is returning Running if at least one process is running, since we used it only in stop or delete flows. If some process is running, we need to continue polling and maybe retry the stop or kill.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's not used in Start
to wait until the state is running
, then this should be fine. My worry was if something expects the VM to be running and usable when only one of the 2 processes is up.
@@ -0,0 +1,79 @@ | |||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe check it is not installed and Suggest ppl to install it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I plan to use NotFoundVmnetHelper in the same way we handle NotFoundSocketVmnet.
If you ask for --network vmnet-shared and vment-helper is not installed or not configured properly, you will get link to the docs on how to install and configure it.
This will be useful to implement stop and kill with signals. Remove unneeded os.Stat() call. os.ReadFile() will return os.ErrNotExists if the file does not exist, so we don't need extra syscall. This is also fix theoretical race if the pidfile is created after we check if the file exist. The new helper is generic like checkPid() and can be reused later by all drivers managing a pidfile.
For consistency with readPidfile(), and cleaner code. This helper can also generic and can be used by any driver managing a pidfile.
GetState() had several issues: - When accessing vfkit HTTP API, we handled only "running", "VirtualMachineStateRunning", "stopped", and "VirtualMachineStateStopped", but there are other 10 possible states, which we handled as state.None, when vfkit is running and need to be stopped. This can lead to wrong handling in the caller. - When handling "stopped" and "VirtualMachineStateStopped" we returned state.Stopped, but did not remove the pidfile. This can cause termination of unrelated process or reporting wrong status when the pid is reused. - Accessing the HTTP API will fail after we stop or kill it. This cause GetState() to fail when the process is actually stopped, which can lead to unnecessary retries and long delays (kubernetes#20503). - When retuning state.None during Remove(), we use tried to do graceful shutdown which does not make sense in minikube delete flow, and is not consistent with state.Running handling. Accessing vfkit API to check for state does not add much value for our used case, checking if the vfkit process is running, and it is not reliable. Fix all the issues by not using the HTTP API in GetState(), and use only the process state. We still use the API for stopping and killing vfkit to do graceful shutdown. This also simplifies Remove(), since we need to handle only the state.Running state. With this change we consider vfkit as stopped only when the process does not exist, which takes about 3 seconds after the state is reported as "stopped". Example stop flow: I0309 18:15:40.260249 18857 main.go:141] libmachine: Stopping "minikube"... I0309 18:15:40.263225 18857 main.go:141] libmachine: set state: {State:Stop} I0309 18:15:46.266902 18857 main.go:141] libmachine: Machine "minikube" was stopped. I0309 18:15:46.267122 18857 stop.go:75] duration metric: took 6.127761459s to stop Example delete flow: I0309 17:00:49.483078 18127 out.go:177] * Deleting "minikube" in vfkit ... I0309 17:00:49.499252 18127 main.go:141] libmachine: set state: {State:HardStop} I0309 17:00:49.569938 18127 lock.go:35] WriteFile acquiring /Users/nir/.kube/config: ... I0309 17:00:49.573977 18127 out.go:177] * Removed all traces of the "minikube" cluster.
If setting vfkit state to "Stop" fails, we used to return an error. Retrying the operation may never succeed. Fix by falling back to terminating vfkit using SIGTERM. This should be handled as graceful shutdown in vfkit if it is implemented correctly. We can still fail if the pidfile is corrupted. Getting process from the pid returns an error, but it is documented to "always succeed on unix". In the case when we are sure the vfkit process does not exist, we remove the pidfile immediately, avoiding leftover pidfile if the caller does not call GetState() after Stop(). Sending a signal using a pidfile can work for any driver using a pidfile, so it is implemented as a reusable function.
We know that setting the state to `HardStop` typically fails: I0309 19:19:42.378591 21795 out.go:177] 🔥 Deleting "minikube" in vfkit ... W0309 19:19:42.397472 21795 delete.go:106] remove failed, will retry: kill: Post "http://_/vm/state": EOF This may lead to unnecessary retries and delays. Fix by falling back to sending a SIGKILL signal. Example delete flow when setting vfkit state fails: I0309 20:07:41.688259 25540 out.go:177] 🔥 Deleting "minikube" in vfkit ... I0309 20:07:41.712017 25540 main.go:141] libmachine: Failed to set vfkit state to 'HardStop': Post "http://_/vm/state": EOF I0309 20:07:41.712082 25540 main.go:141] libmachine: Sending signal "killed" to pid 25482
Remove temporary and unneeded mac variable. It is easier to follow the code when we use d.MACAddress.
System state changes should be more visible to make debugging easier.
They can be used by any driver that need to manage pidfile, and they will also be used for the vment package.
The package manages the vmnet-helper[1] child process, providing connection to the vment network without running the guest as root. We will use vment-helper for the vfkit driver, which does not have a way to use shared network, when guests can access other guest in the network. We can use it later with the qemu driver as alternative to socket_vment. [1] https://github.com/nirs/vmnet-helper
Add new network option for vfkit "vment-shared", connecting vfkit to the vmnet shared network. Clusters using this network can access other clusters in the same network, similar to socket_vment with QEMU driver. If network is not specified, we default to the "nat" network, keeping the previous behavior. If network is "vment-shared", the vfkit driver manages 2 processes: vfkit and vmnet-helper. Like vfkit, vmnet-helper is started in the background, in a new process group, so it not terminated if the minikube process group is terminate. Since vment-helper requires root to start the vmnet interface, we start it with sudo, creating 2 child processes. vment-helper drops privileges immediately after starting the vment interface, and run as the user and group running minikube. Stopping the cluster will stop sudo, which will stop the vmnet-helper process. Deleting the cluster kill both sudo and vment-helper by killing the process group. This change is not complete, but it is good enough to play with the new shared network. Example usage: 1. Install vmnet-helper: https://github.com/nirs/vmnet-helper?tab=readme-ov-file#installation 2. Setup vment-helper sudoers rule: https://github.com/nirs/vmnet-helper?tab=readme-ov-file#granting-permission-to-run-vmnet-helper 3. Start 2 clusters with vmnet-shared network: % minikube start -p c1 --driver vfkit --network vment-shared ... % minikube start -p c2 --driver vfkit --network vmnet-shared ... % minikube ip -p c1 192.168.105.18 % minikube ip -p c2 192.168.105.19 4. Both cluster can access the other cluster: % minikube -p c1 ssh -- ping -c 3 192.168.105.19 PING 192.168.105.19 (192.168.105.19): 56 data bytes 64 bytes from 192.168.105.19: seq=0 ttl=64 time=0.621 ms 64 bytes from 192.168.105.19: seq=1 ttl=64 time=0.989 ms 64 bytes from 192.168.105.19: seq=2 ttl=64 time=0.490 ms --- 192.168.105.19 ping statistics --- 3 packets transmitted, 3 packets received, 0% packet loss round-trip min/avg/max = 0.490/0.700/0.989 ms % minikube -p c2 ssh -- ping -c 3 192.168.105.18 PING 192.168.105.18 (192.168.105.18): 56 data bytes 64 bytes from 192.168.105.18: seq=0 ttl=64 time=0.289 ms 64 bytes from 192.168.105.18: seq=1 ttl=64 time=0.798 ms 64 bytes from 192.168.105.18: seq=2 ttl=64 time=0.993 ms --- 192.168.105.18 ping statistics --- 3 packets transmitted, 3 packets received, 0% packet loss round-trip min/avg/max = 0.289/0.693/0.993 ms
Add new network option for vfkit "vment-shared", connecting vfkit to the
vmnet shared network. Clusters using this network can access other
clusters in the same network, similar to socket_vment with QEMU driver.
If network is not specified, we default to the "nat" network, keeping
the previous behavior. If network is "vment-shared", the vfkit driver
manages 2 processes: vfkit and vmnet-helper.
Like vfkit, vmnet-helper is started in the background, in a new process
group, so it not terminated if the minikube process group is terminate.
Since vment-helper requires root to start the vmnet interface, we start
it with sudo, creating 2 child processes. vment-helper drops privileges
immediately after starting the vment interface, and run as the user and
group running minikube.
Stopping the cluster will stop sudo, which will stop the vmnet-helper
process. Deleting the cluster kill both sudo and vment-helper by killing
the process group.
This change is not complete, but it is good enough to play with the new
shared network.
Example usage:
Install vmnet-helper:
https://github.com/nirs/vmnet-helper?tab=readme-ov-file#installation
Setup vment-helper sudoers rule:
https://github.com/nirs/vmnet-helper?tab=readme-ov-file#granting-permission-to-run-vmnet-helper
Start 2 clusters with vmnet-shared network:
To complete this work we need:
Based on #20506