You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Recently, we faced the problem of failure of our application, consisting of gRPC-related services, during ungraceful node termination.
The problem manifested itself by hanging TCP connections (which were open to gRPC traffic) up to 15 minutes. The application itself or Linkerd could not identify in any way these connections as hung up, for 15 minutes such connections accumulated and led to degradation of application performance.
In Linkerd by default, TCP_KEEPALIVE is set - but with a half-open TCP connection, keepalive is not applied to the connection (https://blog.cloudflare.com/when-tcp-sockets-refuse-to-die - Idle ESTAB is forever) - so this situation TCP socket option does not help
TCP stack is waiting for RTO (retransmission timeout)
After RTO, the packet is sent again
Expectation of an exponentially growing RTO
By default, the TCP stack allows 15 such repeats, which results in approximately ~15 minutes of waiting
The connection is considered broken and closes with ETIMEDOUT
The waiting time for confirmation that the connection is broken can be controlled via net.ipv4.tcp_retries2 - system wide or via TCP_USER_TIMEOUT - socket option (TCP_USER_TIMEOUT limits the time allotted for packages retransmission)
How should the problem be solved?
Implement setting TCP_USER_TIMEOUT option to all TCP connections (inbount and outbound connections).
Setting TCP_USER_TIMEOUT on all connections will avoid linkerd-proxy to drain with resources being connected to clients or conrtol-plane (it is also susceptible) - which increases the fault tolerance of a separate linkerd-proxy.
Configuring TCP_USER_TIMEOUT is similar to TCP_KEEPALIVE, so the following implementation is assumed:
Make TCP_USER_TIMEOUT configurable from env, just like TCP_KEEPALIVE
Make default value for TCP_USER_TIMEOUT = 30s - it's will be enough for ~7 package retransmissions; after 7 retransmissions RTO rapidly grows and do not make much sense to wait for too long
It is safe to set TCP_USER_TIMEOUT with default value 30s: this setting works only for the peer who announced it and does not require consistency with another peer. It is somewhat similar to TCP_KEEPALIVE, which is already always set
Only the following parameters will configured by default to control traffic outside of pod: INBOUND_ACCEPT_USER_TIMEOUT - 30s OUTBOUND_CONNECT_USER_TIMEOUT - 30s
Any alternatives you've considered?
TCP_KEEPALIVE - will not work with half-opened connections as it may be assumed - is not suitable.
net.ipv4.tcp_retries2 - requires execution of sysctl commands (missing inside linkerd-proxy) and affects all system connections (may have potential unpleasant side-effects) - not suitable
LINKERD2_PROXY_{INBOUND,OUTBOUND}_SERVER_HTTP2_KEEP_ALIVE_{INTERVAL,TIMEOUT} - will work only for HTTP/2 connections, but TCP based connections which also may meshed by linkerd is not covered; For example, connection through opaque ports to MySQL or Redis still may hang up without TCP_USER_TIMEOUT set - not suitable
How would users interact with this feature?
No response
Would you like to work on this feature?
yes
The text was updated successfully, but these errors were encountered:
Implement providing configuration for LINKERD2_PROXY_INBOUND_ACCEPT_USER_TIMEOUT and LINKERD2_PROXY_OUTBOUND_CONNECT_USER_TIMEOUT to linkerd-proxy. Default values for 30s will be enough to linux TCP-stack completes about 7 packages retransmissions, after about 7 retransmissions RTO (retransmission timeout) will rapidly grows and do not make much sense to wait for too long. Setting TCP_USER_TIMEOUT between linkerd-proxy and wild world is enough, since connections to containers in same pod is more stable and reliable
Fixeslinkerd#13023
Signed-off-by: UsingCoding <[email protected]>
@olix0r for problem to be fully solved also need #13024 to be merged too otherwise, the parameters will not be set and TCP_USER_TIMEOUT will not work, since there is no default configuration.
Should we reopen the issue and wait for the #13024 to be completed ?
What problem are you trying to solve?
What problem are you trying to solve?
Recently, we faced the problem of failure of our application, consisting of gRPC-related services, during ungraceful node termination.
The problem manifested itself by hanging TCP connections (which were open to gRPC traffic) up to 15 minutes. The application itself or Linkerd could not identify in any way these connections as hung up, for 15 minutes such connections accumulated and led to degradation of application performance.
Others have faced a similar problem in istio/istio#33466 and istio/istio#28865 when using Istio + Envoy.
In Linkerd by default,
TCP_KEEPALIVE
is set - but with a half-open TCP connection, keepalive is not applied to the connection (https://blog.cloudflare.com/when-tcp-sockets-refuse-to-die - Idle ESTAB is forever) - so this situation TCP socket option does not helpThe root of the problem is that in connection to an ungraceful terminated TCP node, the sockets switch to the half-open state (https://www.excentis.com/blog/tcp-half-close-a-cool-feature-that-is-now-broken) - in this case, if the client has packets in the connection queue for sending, the following happens:
~15
minutes of waitingETIMEDOUT
The waiting time for confirmation that the connection is broken can be controlled via
net.ipv4.tcp_retries2
- system wide or viaTCP_USER_TIMEOUT
- socket option (TCP_USER_TIMEOUT
limits the time allotted for packages retransmission)How should the problem be solved?
Implement setting
TCP_USER_TIMEOUT
option to all TCP connections (inbount and outbound connections).Setting
TCP_USER_TIMEOUT
on all connections will avoid linkerd-proxy to drain with resources being connected to clients or conrtol-plane (it is also susceptible) - which increases the fault tolerance of a separate linkerd-proxy.Configuring
TCP_USER_TIMEOUT
is similar toTCP_KEEPALIVE
, so the following implementation is assumed:TCP_USER_TIMEOUT
configurable from env, just likeTCP_KEEPALIVE
TCP_USER_TIMEOUT
= 30s - it's will be enough for~7
package retransmissions; after 7 retransmissions RTO rapidly grows and do not make much sense to wait for too longIt is safe to set
TCP_USER_TIMEOUT
with default value30s
: this setting works only for the peer who announced it and does not require consistency with another peer. It is somewhat similar toTCP_KEEPALIVE
, which is already always setOnly the following parameters will configured by default to control traffic outside of pod:
INBOUND_ACCEPT_USER_TIMEOUT
-30s
OUTBOUND_CONNECT_USER_TIMEOUT
-30s
Any alternatives you've considered?
TCP_KEEPALIVE
- will not work with half-opened connections as it may be assumed - is not suitable.net.ipv4.tcp_retries2
- requires execution of sysctl commands (missing inside linkerd-proxy) and affects all system connections (may have potential unpleasant side-effects) - not suitableLINKERD2_PROXY_{INBOUND,OUTBOUND}_SERVER_HTTP2_KEEP_ALIVE_{INTERVAL,TIMEOUT}
- will work only for HTTP/2 connections, but TCP based connections which also may meshed by linkerd is not covered; For example, connection through opaque ports to MySQL or Redis still may hang up withoutTCP_USER_TIMEOUT
set - not suitableHow would users interact with this feature?
No response
Would you like to work on this feature?
yes
The text was updated successfully, but these errors were encountered: