Linkerd Proxy memory usage increase & OOM when app response with ~5MB payload over ~12 requests/sec #11077

dkulchinsky · 2023-06-30T14:34:26Z

dkulchinsky
Jun 30, 2023

Hello friends 👋🏼

I'm hoping to get some guidance/direction on how to troubleshoot something we're experiencing on one of our services.

Environment: linkerd 2.12.4 on K8s 1.24.9

We have a fairly simple application that serves HTTP requests (and sits behind an nginx ingress controller), ingress controllers & application Pods are meshed, the requests hitting this app usually have a small response size however from time to time it needs to serve a larger payload (~5MB, compressed payload) as a response.

Here's a very basic flow diagram:

                                        |             K8s cluster             |
(client) --> {internet} --> (AWS NLB) --> (nginx)[linkerd] --> [linkerd](app)

The application is written in Go, using net/http to serve requests, runs on 8 pods and is sufficiently sized (based on cpu/memory usage observations), each pod is receiving ~10 req/s, however when the app responds with a ~5MB payload (per request) we notice that the linkerd proxy sidecar memory utilization increases quite rapidly and if we increase the load a little further (to ~12 req/s per pod) linkerd proxy eventually OOMs, there are roughly ~70 inbound connections on each pod.

Bandwidth wise, each Pod is responding at ~20MB/s (@~10 req/s), which drives the memory usage on the proxy to ~200MB, when we increase the load slightly, each Pod is sending ~25MB/s and that's when linkerd proxy eventually OOMs.

We're trying to understand where the bottleneck here might be.

we understand we can increase the proxy memory requests & limits, but when we tried that it just seem to have shifted the issue downstream (to the nginx ingress controller linkerd proxies), which caused a much bigger issue as that impacted all services behind ingress.

(speculation starts here 😄)

It appears like the proxy is waiting for the application to respond with the payload while buffering the response data in memory and releases it only after the full response was received? the application inbound latency increases only for p99 to around 500~700ms when it's serving the larger payload, otherwise we're not seeing anything abnormal and not quite sure how to troubleshoot this further.

I reviewed other issues/discussions I found on this topic, this one suggests that memory would increase as proxy need to handle more connections, which correlates with what we see on the ingress controllers in general and have increased linkerd proxy memory allocations there, however, this doesn't seem to explain what we're seeing on the application pods, they each have a steady ~70 inbound connections and this only happens when they are serving the larger payload (~5MB) as a response.

any guidance/assistance would be greatly appreciated!

dkulchinsky · 2023-07-14T15:27:31Z

dkulchinsky
Jul 14, 2023
Author

Kindly asking for some help here 🙏🏼

This issue is plaguing us and there's no clear path forward on how to diagnose and/or fix, so any advice would be greatly appreciated.

0 replies

hawkw · 2023-07-19T21:57:38Z

hawkw
Jul 19, 2023

Just to confirm, the proxy that's OOMing is the one injected into the application container that serves the long payload, correct? It could also be useful to know if the proxies memory usage remains elevated after stopping traffic, or if the memory usage returns to a lower state when traffic is temporarily terminated.

Regarding your speculations:

It appears like the proxy is waiting for the application to respond with the payload while buffering the response data in memory and releases it only after the full response was received? the application inbound latency increases only for p99 to around 500~700ms when it's serving the larger payload, otherwise we're not seeing anything abnormal and not quite sure how to troubleshoot this further.

This shouldn't be the case --- as soon as the proxy receives a chunk of body data from the application, it should be forwarding that chunk to the client immediately. The only time the proxy is supposed to hold an entire body payload in memory is when it's a request body with a ServiceProfile that enables retries: the entire request body must be buffered so that it can be sent again if the request is retried. Since these are response bodies, rather than requests, we should never be buffering the entire body...

0 replies

dkulchinsky · 2023-07-19T22:16:21Z

dkulchinsky
Jul 19, 2023
Author

Hey @hawkw 👋

Just to confirm, the proxy that's OOMing is the one injected into the application container that serves the long payload, correct? It could also be useful to know if the proxies memory usage remains elevated after stopping traffic, or if the memory usage returns to a lower state when traffic is temporarily terminated.

correct, the proxies that are injected into the application pods in question are the ones getting OOMed.

Indeed when we reduce/stop the traffic, memory usage of the proxies is reduced to more or less same levels as prior to introducing the traffic.

Regarding your speculations:

It appears like the proxy is waiting for the application to respond with the payload while buffering the response data in memory and releases it only after the full response was received? the application inbound latency increases only for p99 to around 500~700ms when it's serving the larger payload, otherwise we're not seeing anything abnormal and not quite sure how to troubleshoot this further.

This shouldn't be the case --- as soon as the proxy receives a chunk of body data from the application, it should be forwarding that chunk to the client immediately. The only time the proxy is supposed to hold an entire body payload in memory is when it's a request body with a ServiceProfile that enables retries: the entire request body must be buffered so that it can be sent again if the request is retried. Since these are response bodies, rather than requests, we should never be buffering the entire body...

Ok, in this case I'm out of ideas.

11 replies

hawkw Jul 27, 2023

Hmm, that's pretty weird. If the responses are chunked, the proxy should only be buffering the current chunk in memory at any given time. If it's not doing that, there could be a bug in the proxy...

It could be useful to find out what the largest chunk size of the response body is?

dkulchinsky Aug 9, 2023
Author

Hey @hawkw!

Apologies again for the long delay, just got back from PTO and was able to do additional tests to provide more information.

I port-forwarded into the application Pod and captured the problematic a request/response sequence, analyzing the pcap it clearly shows that the response is chunked.

the first chunk is 512 bytes long, the rest are 32768 bytes and the last one is 3080 bytes.

Request:

Hypertext Transfer Protocol
    GET /data/<some-stuff-here> HTTP/1.1\r\n
        [Expert Info (Chat/Sequence): GET /data/<some-stuff-here> HTTP/1.1\r\n]
            [GET /data/<some-stuff-here> HTTP/1.1\r\n]
            [Severity level: Chat]
            [Group: Sequence]
        Request Method: GET
        Request URI: /data/<some-stuff-here>
        Request Version: HTTP/1.1
    Host: localhost:4801\r\n
    User-Agent: curl/7.88.1\r\n
    Accept: */*\r\n
    Accept-Encoding: gzip\r\n
    <internal header>: <key>\r\n
    \r\n
    [Full request URI: http://localhost:4801/data/<some-stuff-here>]
    [HTTP request 1/1]
    [Response in frame: 691]

Response:

Hypertext Transfer Protocol
    HTTP/1.1 200 OK\r\n
        [Expert Info (Chat/Sequence): HTTP/1.1 200 OK\r\n]
            [HTTP/1.1 200 OK\r\n]
            [Severity level: Chat]
            [Group: Sequence]
        Response Version: HTTP/1.1
        Status Code: 200
        [Status Code Description: OK]
        Response Phrase: OK
    Accept-Ranges: bytes\r\n
    Cache-Control: max-age=864000\r\n
    Content-Encoding: gzip\r\n
    Content-Type: application/octet-stream\r\n
    Etag: "1498510a0b8c7c841394f9b1690d75c33b1412a26f35b9b414a2f09c2c1ff526"\r\n
    Last-Modified: Wed, 09 Aug 2023 13:02:29 GMT\r\n
    Vary: Accept-Encoding\r\n
    Date: Wed, 09 Aug 2023 13:13:32 GMT\r\n
    Transfer-Encoding: chunked\r\n
    \r\n
    [HTTP response 1/1]
    [Time since request: 0.778165000 seconds]
    [Request in frame: 5]
    [Request URI: http://localhost:4801/data/<some-stuff-here>]
    HTTP chunked response
        Data chunk (512 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (32768 octets)
        Data chunk (3080 octets)
        End of chunked encoding
        \r\n
    Content-encoded entity body (gzip): 5508616 bytes -> 22213976 bytes
    File Data: 22213976 bytes

dkulchinsky Aug 9, 2023
Author

@hawkw I also confirmed the same by taking tcpdump capture in the Pod (using ksniff)

One thing I have noticed though, but not entirely sure why it's happening:

when the Pod is meshed and I capture the traffic (either in the app container, or in the linkerd-proxy container), wireshark seem to struggle to reassemble the http flow, I can see the response, but it is marked as a TCP segment (not HTTP).

following the TCP Stream seem to suggest that the response is being chunked as expected:

GET /data/<some-stuff-here> HTTP/1.1
traceparent: 00-c528bd0695e1d4c7ab873b85e991f589-6a83854ec63b42db-01
host: <ingress domain>
x-request-id: b7a861d9231ef3eff8d13731b8d1b67b
x-real-ip: <client ip>
x-forwarded-for: <client ip>
x-forwarded-host: <ingress domain>
x-forwarded-port: 443
x-forwarded-proto: https
x-forwarded-scheme: https
x-scheme: https
l5d-dst-override: pdnsdatad-pr-pdnsdatad-4801.dns.svc.cluster.local:4801
user-agent: curl/7.88.1
accept: */*
accept-encoding: gzip
<internal header>: <key>
l5d-client-id: ext-ic-ingress-nginx.platform.serviceaccount.identity.linkerd.cluster.local

HTTP/1.1 200 OK
Accept-Ranges: bytes
Cache-Control: max-age=864000
Content-Encoding: gzip
Content-Type: application/octet-stream
Etag: "c18d145f88e8325b84add31d8cb8bf1710844123a326b65c2a152f8cd9b41a55"
Last-Modified: Wed, 09 Aug 2023 16:57:34 GMT
Vary: Accept-Encoding
Date: Wed, 09 Aug 2023 17:06:48 GMT
Transfer-Encoding: chunked

200
<data>

200 is 512 in hex, which indicates the first chunk size followed by the actual data and Transfer-Encoding: chunked header is definitely there in the response.

when I un-mesh the Pod and repeat the same steps, wireshark correctly reassembles the http flow 🤷🏼‍♂️

hawkw Aug 9, 2023

Hmm, this is interesting, although unfortunately I'm not sure if it really paints a much clearer picture of what's going on here. 😬

The ~32 KB chunk size seems pretty reasonable, and if the proxy is able to forward chunks more or less as soon as they're received, I wouldn't expect us to be buffering as much data per requests as you're seeing. Something I'm wondering about is whether the proxy is buffering more than one chunk due to backpressure. Perhaps chunks are being received faster than they're able to be forwarded to the client? If we can compare the traffic on the connection from the server to the proxy and the connection from the proxy to the client, that could maybe be useful...

dkulchinsky Aug 9, 2023
Author

Thanks @hawkw, is there proxy telemetry/metrics we can use to measure this? any suggestions on how we could approach collecting this data?

dkulchinsky · 2023-07-19T22:45:43Z

dkulchinsky
Jul 19, 2023
Author

How are the large response bodies actually being sent by the server? Is this HTTP/1.1 or HTTP/2 traffic? If HTTP/1.1, are the response bodies sent using transfer-encoding: chunked?

Good questions, the http response is HTTP/1.1 but I'm not certain about the transfer encoding, let me check and get back to you.

0 replies

dkulchinsky · 2023-08-18T19:18:56Z

dkulchinsky
Aug 18, 2023
Author

Hey @hawkw 👋🏼

Just wanted to summarize our current status and some details on additional observations & tests that we've done, some of these I shared with you in DM, but wanted to document it here nonetheless.

While testing, we've identified that the client application (outside our cluster) is using HTTP/1.1 instead of HTTP/2, we've fixed this bug and retested with HTTP/2 but it had no impact on the results.
We observed that the Linkerd-proxy sidecar in the App pods seem to pool destination connection and has much less connections open with the App vs the received connections from clients, as you explained this is expected and Linkerd is opening new connections as needed.
We removed the Application Pods from the Linkerd mesh (linkerd.io/inject: disabled) while keeping everything else exactly the same and repeated the test at the same load level as where it was failing with linkerd (starting to see OOMs) and everything worked well (in fact we noticed even slightly higher data rates, but only marginally).

Resource usage on the App & nginx pods was pretty much identical (slightly less CPU on the App pods due to not having the proxy anymore and no TLS) we've then went an increased the load by ~30% and everything worked well, no issues. Memory usage on both the application pods & nginx was stable throughout (only a slight increase in nginx by a few MBs).

We feel confident at this point that there shouldn't be any bottlenecks in the overall transport & components (clients, nginx, application, network, etc...) and we're able to scale up the load considerably without any impact to any of these components when running without the linkerd proxy on the app Pods, at peak load we were able to reach ~850MB/s of throughput, while with linkerd it would OOM around ~550MB/s.

We're not sure how to proceed, we could probably increase the memory allocation on the linkerd proxy, but without understanding why this is happening we don't quite feel confident with this approach.

I realize also we're running a fairly outdated version of Linkerd (2.12.4), do you think there's anything in the newer versions that might have an impact on what we're seeing? any other thoughts or suggestions? 🙏🏼

3 replies

hawkw Aug 18, 2023

At this point, it's starting to seem like there might be a proxy issue at play here, since it does seem like the application is pretty well-behaved. I'm not aware of any changes between your Linkerd version (2.12.4) and the latest 2.12 release, 2.12.5, that might effect this behavior, but I would be curious to see if the issue still reproduces on 2.13. We haven't made substantial changes to how response bodies are buffered in 2.13, either, but some proxy internals have changed substantially, and it might be worth a shot.

If the issue also reproduces with Linkerd 2.13, it's probably time to open a bug report! If you can reproduce the OOMs in a self-contained reproduction with a load-testing tool, that would be very helpful for us when opening an issue.

dkulchinsky Aug 18, 2023
Author

Upgrading to 2.13 would take us some time, but we'll discuss this internally to see when we can plan for this.

Regarding easy reproduction, I'd have to think about how we can achieve this, this particular issue is reproducible in our real-world setup, it's just we're able to control the amount of clients that we "enable" thus controlling the overall load.

surenraju-careem Feb 12, 2024

Firstly, thank you for the continuous support and for the outstanding product you've developed. We've been following a thread closely where a user experienced a memory usage increase and OOM issues with Linkerd proxies when handling large (~5MB) payloads at a rate of around 12 requests per second. This situation closely mirrors the challenges we are currently facing in our production environment, where we've observed sporadic proxy memory issues with Linkerd stable-2.13.7.

In our case, we've noticed that under certain conditions, the memory usage of some Linkerd proxies escalates to the configured limit of 250MB, leading to OOM kills. Despite our efforts to investigate open connections and data transfer rates—which appear normal—and considering that the upstream service has had some intermittent issues but recovered.

Given the similarities to the discussed thread, we're particularly interested in understanding if there are specific conditions or configurations that might exacerbate memory consumption in Linkerd proxies, especially when handling larger payloads. Moreover, are there any recommended practices or configurations that could help mitigate such memory usage spikes? Could adjusting parameters related to payload handling or connection management offer any relief in such scenarios?

Your guidance would be invaluable to us as we navigate through this issue. Thank you once again for your support.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Linkerd Proxy memory usage increase & OOM when app response with ~5MB payload over ~12 requests/sec #11077

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 5 comments 14 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Linkerd Proxy memory usage increase & OOM when app response with ~5MB payload over ~12 requests/sec #11077

dkulchinsky Jun 30, 2023

Replies: 5 comments · 14 replies

dkulchinsky Jul 14, 2023 Author

hawkw Jul 19, 2023

dkulchinsky Jul 19, 2023 Author

hawkw Jul 27, 2023

dkulchinsky Aug 9, 2023 Author

dkulchinsky Aug 9, 2023 Author

hawkw Aug 9, 2023

dkulchinsky Aug 9, 2023 Author

dkulchinsky Jul 19, 2023 Author

dkulchinsky Aug 18, 2023 Author

hawkw Aug 18, 2023

dkulchinsky Aug 18, 2023 Author

surenraju-careem Feb 12, 2024

dkulchinsky
Jun 30, 2023

Replies: 5 comments 14 replies

dkulchinsky
Jul 14, 2023
Author

hawkw
Jul 19, 2023

dkulchinsky
Jul 19, 2023
Author

dkulchinsky Aug 9, 2023
Author

dkulchinsky Aug 9, 2023
Author

dkulchinsky Aug 9, 2023
Author

dkulchinsky
Jul 19, 2023
Author

dkulchinsky
Aug 18, 2023
Author

dkulchinsky Aug 18, 2023
Author