Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement TCP_USER_TIMEOUT to detect half-opened TCP connections leading to 15min of dangling connections #13023

Closed
UsingCoding opened this issue Sep 4, 2024 · 2 comments · Fixed by linkerd/linkerd2-proxy#3174 · May be fixed by #13024

Comments

@UsingCoding
Copy link

What problem are you trying to solve?

What problem are you trying to solve?

Recently, we faced the problem of failure of our application, consisting of gRPC-related services, during ungraceful node termination.
The problem manifested itself by hanging TCP connections (which were open to gRPC traffic) up to 15 minutes. The application itself or Linkerd could not identify in any way these connections as hung up, for 15 minutes such connections accumulated and led to degradation of application performance.

Others have faced a similar problem in istio/istio#33466 and istio/istio#28865 when using Istio + Envoy.

In Linkerd by default, TCP_KEEPALIVE is set - but with a half-open TCP connection, keepalive is not applied to the connection (https://blog.cloudflare.com/when-tcp-sockets-refuse-to-die - Idle ESTAB is forever) - so this situation TCP socket option does not help

The root of the problem is that in connection to an ungraceful terminated TCP node, the sockets switch to the half-open state (https://www.excentis.com/blog/tcp-half-close-a-cool-feature-that-is-now-broken) - in this case, if the client has packets in the connection queue for sending, the following happens:

  • Sending the first package
  • TCP stack is waiting for RTO (retransmission timeout)
  • After RTO, the packet is sent again
  • Expectation of an exponentially growing RTO
  • By default, the TCP stack allows 15 such repeats, which results in approximately ~15 minutes of waiting
  • The connection is considered broken and closes with ETIMEDOUT

The waiting time for confirmation that the connection is broken can be controlled via net.ipv4.tcp_retries2 - system wide or via TCP_USER_TIMEOUT - socket option (TCP_USER_TIMEOUT limits the time allotted for packages retransmission)

How should the problem be solved?

Implement setting TCP_USER_TIMEOUT option to all TCP connections (inbount and outbound connections).

Setting TCP_USER_TIMEOUT on all connections will avoid linkerd-proxy to drain with resources being connected to clients or conrtol-plane (it is also susceptible) - which increases the fault tolerance of a separate linkerd-proxy.

Configuring TCP_USER_TIMEOUT is similar to TCP_KEEPALIVE, so the following implementation is assumed:

  • Make TCP_USER_TIMEOUT configurable from env, just like TCP_KEEPALIVE
  • Make default value for TCP_USER_TIMEOUT = 30s - it's will be enough for ~7 package retransmissions; after 7 retransmissions RTO rapidly grows and do not make much sense to wait for too long

It is safe to set TCP_USER_TIMEOUT with default value 30s: this setting works only for the peer who announced it and does not require consistency with another peer. It is somewhat similar to TCP_KEEPALIVE, which is already always set

Only the following parameters will configured by default to control traffic outside of pod:
INBOUND_ACCEPT_USER_TIMEOUT - 30s
OUTBOUND_CONNECT_USER_TIMEOUT - 30s

Any alternatives you've considered?

  • TCP_KEEPALIVE - will not work with half-opened connections as it may be assumed - is not suitable.
  • net.ipv4.tcp_retries2 - requires execution of sysctl commands (missing inside linkerd-proxy) and affects all system connections (may have potential unpleasant side-effects) - not suitable
  • LINKERD2_PROXY_{INBOUND,OUTBOUND}_SERVER_HTTP2_KEEP_ALIVE_{INTERVAL,TIMEOUT} - will work only for HTTP/2 connections, but TCP based connections which also may meshed by linkerd is not covered; For example, connection through opaque ports to MySQL or Redis still may hang up without TCP_USER_TIMEOUT set - not suitable

How would users interact with this feature?

No response

Would you like to work on this feature?

yes

UsingCoding added a commit to UsingCoding/linkerd2 that referenced this issue Sep 4, 2024
Implement providing configuration for LINKERD2_PROXY_INBOUND_ACCEPT_USER_TIMEOUT and LINKERD2_PROXY_OUTBOUND_CONNECT_USER_TIMEOUT to linkerd-proxy. Default values for 30s will be enough to linux TCP-stack completes about 7 packages retransmissions, after about 7 retransmissions RTO (retransmission timeout) will rapidly grows and do not make much sense to wait for too long. Setting TCP_USER_TIMEOUT between linkerd-proxy and wild world is enough, since connections to containers in same pod is more stable and reliable

Fixes linkerd#13023

Signed-off-by: UsingCoding <[email protected]>
@olix0r
Copy link
Member

olix0r commented Sep 5, 2024

Thank you for the detailed report! It may take us a few days to get to the reviews, but the general proposal makes sense to me.

@UsingCoding
Copy link
Author

@olix0r for problem to be fully solved also need #13024 to be merged too otherwise, the parameters will not be set and TCP_USER_TIMEOUT will not work, since there is no default configuration.
Should we reopen the issue and wait for the #13024 to be completed ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants