-
Notifications
You must be signed in to change notification settings - Fork 881
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bridge proxy arp #1744
base: master
Are you sure you want to change the base?
Bridge proxy arp #1744
Conversation
Some background on this change: The ARP protocol makes use of broadcast packets when sending who-has requests. In our 10,000 node tests, we will send Arp who-has packets to all 10,000 nodes, this requires the kernel to process 100,000,000 broadcast packets (all but 10k are dropped)! This can cause the kernel's network ingress queues to overflow resulting in packet drops and connection establishment timeouts. I have seen this problem with configurations of only a few thousand nodes as well, even when using multiple bridges. Increasing the size of the ingress queue helps but results in higher latency when the queue is filled with broadcast packets. Brtables can be used to eliminate broadcast traffic or to limit delivery to a small number of endpoints. However, doing so breaks the Arp protocol as arps relies on broadcast packets. This change enables the Linux bridge's ProxyArpWiFi feature eliminating the need flood ARP packets. All other broadcast traffic will pass normal allowing the administrator to manage it using ebtables. |
drivers/bridge/bridge.go
Outdated
return fmt.Errorf("could not find interface with destination name %s: %v", config.BridgeName, err) | ||
} | ||
|
||
HostIF, err := d.nlh.LinkByName(hostIfName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove this, host
variable already contains the host side veth end link. In fact later you use that
err = d.nlh.LinkSetBrProxyArpWiFi(host, true)
drivers/bridge/bridge.go
Outdated
@@ -60,6 +60,7 @@ type networkConfiguration struct { | |||
EnableIPv6 bool | |||
EnableIPMasquerade bool | |||
EnableICC bool | |||
EnableBrProxyArp bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given it is already part of the bridge options, would it make sense to drop the Br
part from the name of this option ?
Thank you for the review. I agree with both of your suggestions, I will update the commit shortly. |
This change enables the Linux bridge's ProxyArpWiFi feature eliminating the need flood ARP packets. When an endpoint is created the bridge driver already has the data needed to complete arp and fdb table entries. Rather than let the kernel discover this information on its own we populate the arp and fdb tables when the endpoint is configured. All other broadcast traffic will pass normal allowing the administrator to manage it using ebtables. Linux bridge ProxyArpWifi is enabled with: --opt com.docker.network.bridge.proxyarp=1 Dependencies: linux kernel v4.1-rc1 or later(commit 842a9ae08a25671db3d4f689eed68b4d64be15) Updated based on review comments from aboch. Signed-off-by: David Wilder <[email protected]>
Signed-off-by: David Wilder <[email protected]>
Hi, Changes have been made based on @aboch suggestions. Commits have been squashed. |
Thanks @djlwilder. |
Hi @aboch the fdb entry will be cleaned automatically when the endpoint is removed from the bridge (when the bridge port is deleted). However, to anticipate you next question :) Should I remove the permanent neighbor entry (arp) on DeleteEndpoint()? I gave this some thought, there is no harm in leaving the old entry around as the MAC is derived from the IP address. If a new endpoint is create reusing the same ipv4 address the old neighbor entry will be re-used. My concern with removing a neighbor entry is the possibility of a race between two threads deleting and creating a endpoint and re-using the same address. Although it may be cleaner to delete the neighbor entry anyway, what do you think? |
Thanks, agree. To this regard, have you checked whether Regarding the MAC from IP logic, be aware it is not there when the user selects the MAC address |
Hi @aboch Thanks for pointing out the --mac option, I missed that. Using NLM_F_REPLACE should handle the case where the MAC changes, however I need to test that. I will let you know the results. Thanks again for the feedback. |
Hi @aboch I verify that the --mac option works correctly with the proxyarp feature. I started a container using the --mac option, then stopped it. Then started another container using a different mac address but the same IP address. I verified that the neighbour entry was updated with the new MAC address as expected. BTW: found a bug with --mac option (unrelated to my changes), it is possible have two running containers with different IP address and the same MAC address. |
Thanks @djlwilder LGTM |
This change enables the Linux bridge's ProxyArpWiFi feature eliminating the need flood ARP packets. When an endpoint is created the bridge driver already has the data needed to complete arp and fdb table entries. Rather than let the kernel discover this information on its own we populate the arp and fdb tables when the endpoint is configured. All other broadcast traffic will pass normal allowing the administrator to manage it using ebtables.
Linux bridge ProxyArpWifi is enabled with:
--opt com.docker.network.bridge.proxyarp=1
Dependencies:
linux kernel v4.1-rc1 or later(commit 842a9ae08a25671db3d4f689eed68b4d64be15)