Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cleanup duplicate ipvlan and macvlan network IDs during createNetwork #2055

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

eyz
Copy link

@eyz eyz commented Jan 16, 2018

cleanup duplicate ipvlan and macvlan network IDs during createNetwork to prevent collisions

  • check if the new network uses the same interface as an existing network, as different networks of this type cannot share the same interface
  • if the new network ID does not match an existing network ID then we must return with an error, as the new network is not the same and cannot co-exist with the existing
  • attempt to delete the old duplicate network, and return an error if unsuccessful, or log info if successful

This resolves the issue shown below, where an existing copy of a network has not been removed on Docker Engine shutdown, and still persists when the config-only network is created once again on a subsequent createNetwork call

existing network config = (*ipvlan.configuration)(0xc420322e80)({\n
  ID: (string) (len=25) \"hwqzu5i4kmrd2uzi34zpnpy1m\",\n
  Mtu: (int) 0,\n
  dbIndex: (uint64) 6,\n
  dbExists: (bool) true,\n
  Internal: (bool) false,\n
  Parent: (string) (len=4) \"eth0\",\n
  IpvlanMode: (string) (len=2) \"l2\",\n
  CreatedSlaveLink: (bool) false,\n
  Ipv4Subnets: ([]*ipvlan.ipv4Subnet) (len=1 cap=4) {\n
    (*ipvlan.ipv4Subnet)(0xc4206413a0)({\n
      SubnetIP: (string) (len=13) \"10.100.0.0/16\",\n
      GwIP: (string) (len=13) \"10.100.0.1/16\"\n
    })\n
  },\n
  Ipv6Subnets: ([]*ipvlan.ipv6Subnet) <nil>\n})\n

new network config = (*ipvlan.configuration)(0xc4203c5b80)({\n
  ID: (string) (len=25) \"hwqzu5i4kmrd2uzi34zpnpy1m\",\n
  Mtu: (int) 0,\n
  dbIndex: (uint64) 0,\n
  dbExists: (bool) false,\n
  Internal: (bool) false,\n
  Parent: (string) (len=4) \"eth0\",\n
  IpvlanMode: (string) (len=2) \"l2\",\n
  CreatedSlaveLink: (bool) false,\n
  Ipv4Subnets: ([]*ipvlan.ipv4Subnet) (len=1 cap=1) {\n
    (*ipvlan.ipv4Subnet)(0xc42149af00)({\n
      SubnetIP: (string) (len=13) \"10.100.0.0/16\",\n
      GwIP: (string) (len=13) \"10.100.0.1/16\"\n
    })\n
  },\n
  Ipv6Subnets: ([]*ipvlan.ipv6Subnet) <nil>\n})\n

The symptom of an existing and now-conflicting network is usually as follows, including showing up as a rejected task's error -

docker.err.log:time="2018-01-15T23:32:21.993891743Z" level=error msg="fatal task error" error="network di-hwqzu5i4kmrd is already using parent interface eth0

… to prevent collisions

- check if the new network uses the same interface as an existing network, as different networks of this type cannot share the same interface
- if the new network ID does not match an existing network ID then we must return with an error, as the new network is not the same and cannot co-exist with the existing
- attempt to delete the old duplicate network, and return an error if unsuccessful, or log info if successful

Signed-off-by: Isaac Rodman <[email protected]>
@eyz
Copy link
Author

eyz commented Jan 16, 2018

Mentioned this to @mavenugo quite a while ago. Not sure if cleanup can be done on the shutdown side, but I think this may be safer, to cleanup if needed on creation instead.

@codecov-io
Copy link

Codecov Report

❗ No coverage uploaded for pull request base (master@fcf1c3b). Click here to learn what that means.
The diff coverage is 0%.

Impacted file tree graph

@@           Coverage Diff            @@
##             master   #2055   +/-   ##
========================================
  Coverage          ?   40.5%           
========================================
  Files             ?     138           
  Lines             ?   22185           
  Branches          ?       0           
========================================
  Hits              ?    8985           
  Misses            ?   11877           
  Partials          ?    1323
Impacted Files Coverage Δ
drivers/macvlan/macvlan_network.go 0% <0%> (ø)
drivers/ipvlan/ipvlan_network.go 0% <0%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fcf1c3b...78df1c0. Read the comment docs.

@fcrisciani
Copy link

@eyz just a clarification, what do you mean with the existing copy not being deleted on daemon shutdown? I guess the state is persisted in the local store.

@eyz
Copy link
Author

eyz commented Jan 23, 2018

Hi, @fcrisciani , and thanks for looking at this issue! I can attempt to clarify further if needed, but the typical behavior is that - intermittently, when I stop Docker Engine with Swarm nodes that have instantiated a config-only ipvlan network in global/swarm mode, there is often a collision when the service which uses that network comes back up again -- when the container assigned to that service (and thus using that network from before) comes up, the network occasionally is apparently being created once again -- it appears there is a collision in createNetwork when this occurs. The existing behavior, prior to my PR, will simply throw an error if the network is being re-created -- the same network ID is already presumably in the store, and now the executor is trying to create the network once again, and so a fatal error occurs in createNetwork, which results in a task-scoped error to be presented, and the container cannot be started from the task.

With this PR, if the existing network ID is found, and is being created a second time, then the "old" network ID -- the duplicate from before -- is removed first, and then createNetwork can proceed with creation of the same network, which is also using the same network ID, as confirmed during my testing.

I fully expect that there is another side of this solution, where perhaps there can be proper cleanup at shutdown. I spoke with Madhu (@mavenugo) quite a while ago, and we found that there was a workaround for the issue by deleting the network store, and re-creating, but that approach isn't sustainable. I found that the linked PR will allow createNetwork to proceed without a fatal error, and thus the containers can use the existing network once again. A more creative solution may be the actual best answer, but this PR seems to work as a solution as well.

Is that enough detail to investigate this PR as a potential desired solution?

@fcrisciani
Copy link

So generally speaking I would prefer an approach similar to bridge driver, where simplistically there, if the bridge ifc is already created, will just simply use it.
Here the same way I would consider to start changing the if condition like this:
if config.Parent == nw.config.Parent && config.ID != nw.config.ID { ==> then error
Else we continue with the normal processing knowing that the network being created is actually the same.
Now arrives the tricky part, we have to check that the methods like: createDummyLink, createVlanLink, are not bailing out if the interface that are creating is already there.

@eyz eyz mentioned this pull request Jan 24, 2018
eyz added a commit to eyz/docker-ce that referenced this pull request Feb 9, 2018
…ed all changes as follows -

- support sending hostname and domainname in IPAM
- support custom volume naming when using netapp volume driver
- use dashes as seperators for container names
- work in-progress DNS domainname changes
- de-duplicate DNS search domains
- add container "host" /etc/hosts entry equal to value of DOCKER_HOST_EXPORT_IP environment variable if defined
- cleanup duplicate ipvlan and macvlan network IDs during createNetwork (moby/libnetwork#2055)
- support optional skip of IPAM pool conflict checking
- set Swarm tasks to orphaned state if node becomes unavailable
- allow orphaned Swarm tasks assigned to a deleted node to start elseware
- Swarm tasks in an orphaned state should not be allowed to restart
- allow Swarm tasks in a remove state to be transitioned
- prevent oldTaskTimer from allowing the slot to be prematurely updated
- allow Swarm tasks to be cleaned up if they are in a pending state and are marked for removal
- support service level anti-affinity via label h3o-limitActiveServiceSlotsPerNode
@eyz
Copy link
Author

eyz commented Feb 9, 2018

@fcrisciani my organization is probably moving to Kubernetes, so please adopt or close as applicable. Thanks! That said, let me know if you need any more clarification, but I probably won't be submitting code updates to this PR

@rnataraja
Copy link

@fcrisciani Did you mean something like this? This isssue is very predictable.

https://gist.github.com/rnataraja/d987b61d8acc8ac738641f55ebe5af0d

@eyz
Copy link
Author

eyz commented Jan 10, 2019

@fcrisciani and @mavenugo, is anyone available to take this over?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants