-
Hi there! I'm currently working on a project that involves redirecting websocket traffic from a primary back-end webserver onto a separate service. This new service doesn't do much with them other than send the occasional message. This setup requires a surprising amount of memory to be allocated to linkerd-proxy. It's currently set to 512MiB and was getting OOM-killed once we hit ~3K established connections/pod. At that time we were receiving ~200 requests/second on 15 pods (sockets dropping & reconnecting). That feels like a pretty trivial amount of traffic to me and the service had no problem keeping up. In contrast, we have another service accepting ~200 normal HTTP requests/second on 5 pods. There, linkerd-proxy only uses about 50MiB of memory. That's more in line with what everybody at my org expects. I suspect it's the open websocket connections that are consuming all the memory in linkerd-proxy. That makes a degree of sense. Unlike normal HTTP requests, websockets stay open indefinitely so resources aren't recycled as much. Questions:
We're OK throwing more pods at the issue so we aren't blocked or anything. But everybody is surprised by the memory requirements. I'd love to be able to give them a good answer and look smart. :) Thanks! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
@mplauman thanks for sharing the metrics and other info in slack. This report sounds similar to this issue. With ~2k TCP connections, we expect that there will be more than normal memory usage, especially with the long-lived web sockets connections. I spent some time testing this and found that the memory isn't leaking, and doesn't continue to grow with time. It grows with the number of connections. So, I'd suggest increasing the proxy memory (and possibly CPU) limits either through the values.yaml file, or by using the The annotations will let you target specific workloads, whereas changing values.yaml will affect all proxies in the cluster. Let us know how it goes! |
Beta Was this translation helpful? Give feedback.
@mplauman thanks for sharing the metrics and other info in slack.
This report sounds similar to this issue.
With ~2k TCP connections, we expect that there will be more than normal memory usage, especially with the long-lived web sockets connections.
I spent some time testing this and found that the memory isn't leaking, and doesn't continue to grow with time. It grows with the number of connections. So, I'd suggest increasing the proxy memory (and possibly CPU) limits either through the values.yaml file, or by using the
config.linkerd.io/proxy-memory-limit
andconfig.linkerd.io/proxy-memory-request
annotations.The annotations will let you target specific workloads, whereas changing value…