-
Notifications
You must be signed in to change notification settings - Fork 881
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cleanup duplicate ipvlan and macvlan network IDs during createNetwork #2055
base: master
Are you sure you want to change the base?
cleanup duplicate ipvlan and macvlan network IDs during createNetwork #2055
Conversation
… to prevent collisions - check if the new network uses the same interface as an existing network, as different networks of this type cannot share the same interface - if the new network ID does not match an existing network ID then we must return with an error, as the new network is not the same and cannot co-exist with the existing - attempt to delete the old duplicate network, and return an error if unsuccessful, or log info if successful Signed-off-by: Isaac Rodman <[email protected]>
Mentioned this to @mavenugo quite a while ago. Not sure if cleanup can be done on the shutdown side, but I think this may be safer, to cleanup if needed on creation instead. |
Codecov Report
@@ Coverage Diff @@
## master #2055 +/- ##
========================================
Coverage ? 40.5%
========================================
Files ? 138
Lines ? 22185
Branches ? 0
========================================
Hits ? 8985
Misses ? 11877
Partials ? 1323
Continue to review full report at Codecov.
|
@eyz just a clarification, what do you mean with the existing copy not being deleted on daemon shutdown? I guess the state is persisted in the local store. |
Hi, @fcrisciani , and thanks for looking at this issue! I can attempt to clarify further if needed, but the typical behavior is that - intermittently, when I stop Docker Engine with Swarm nodes that have instantiated a config-only ipvlan network in global/swarm mode, there is often a collision when the service which uses that network comes back up again -- when the container assigned to that service (and thus using that network from before) comes up, the network occasionally is apparently being created once again -- it appears there is a collision in createNetwork when this occurs. The existing behavior, prior to my PR, will simply throw an error if the network is being re-created -- the same network ID is already presumably in the store, and now the executor is trying to create the network once again, and so a fatal error occurs in createNetwork, which results in a task-scoped error to be presented, and the container cannot be started from the task. With this PR, if the existing network ID is found, and is being created a second time, then the "old" network ID -- the duplicate from before -- is removed first, and then createNetwork can proceed with creation of the same network, which is also using the same network ID, as confirmed during my testing. I fully expect that there is another side of this solution, where perhaps there can be proper cleanup at shutdown. I spoke with Madhu (@mavenugo) quite a while ago, and we found that there was a workaround for the issue by deleting the network store, and re-creating, but that approach isn't sustainable. I found that the linked PR will allow createNetwork to proceed without a fatal error, and thus the containers can use the existing network once again. A more creative solution may be the actual best answer, but this PR seems to work as a solution as well. Is that enough detail to investigate this PR as a potential desired solution? |
So generally speaking I would prefer an approach similar to bridge driver, where simplistically there, if the bridge ifc is already created, will just simply use it. |
…ed all changes as follows - - support sending hostname and domainname in IPAM - support custom volume naming when using netapp volume driver - use dashes as seperators for container names - work in-progress DNS domainname changes - de-duplicate DNS search domains - add container "host" /etc/hosts entry equal to value of DOCKER_HOST_EXPORT_IP environment variable if defined - cleanup duplicate ipvlan and macvlan network IDs during createNetwork (moby/libnetwork#2055) - support optional skip of IPAM pool conflict checking - set Swarm tasks to orphaned state if node becomes unavailable - allow orphaned Swarm tasks assigned to a deleted node to start elseware - Swarm tasks in an orphaned state should not be allowed to restart - allow Swarm tasks in a remove state to be transitioned - prevent oldTaskTimer from allowing the slot to be prematurely updated - allow Swarm tasks to be cleaned up if they are in a pending state and are marked for removal - support service level anti-affinity via label h3o-limitActiveServiceSlotsPerNode
@fcrisciani my organization is probably moving to Kubernetes, so please adopt or close as applicable. Thanks! That said, let me know if you need any more clarification, but I probably won't be submitting code updates to this PR |
@fcrisciani Did you mean something like this? This isssue is very predictable. https://gist.github.com/rnataraja/d987b61d8acc8ac738641f55ebe5af0d |
@fcrisciani and @mavenugo, is anyone available to take this over? |
cleanup duplicate ipvlan and macvlan network IDs during createNetwork to prevent collisions
This resolves the issue shown below, where an existing copy of a network has not been removed on Docker Engine shutdown, and still persists when the config-only network is created once again on a subsequent createNetwork call
The symptom of an existing and now-conflicting network is usually as follows, including showing up as a rejected task's error -