Receive: netlink receive: recvmsg: no buffer space available #103

danopia · 2020-07-13T21:41:30Z

Hello, I've been using this library to repeatedly 'replace' a table with ~50 rules, and one thing I've noticed is that certain patterns seem to trigger this cryptic message:

Receive: netlink receive: recvmsg: no buffer space available

The cryptic part is that it doesn't seem related to the overall size of the table, but rather the presence of certain rules (jumps, etc).

Even so, the flush seems to take effect anyway; the changes are reflected in the kernel after receiving the message.

I found some info on ENOBUFS here: https://netfilter.vger.kernel.narkive.com/cwzlgk8d/why-no-buffer-space-available and noticed that the golang netlink library has an option exposed to help the error: https://github.com/mdlayher/netlink/blob/c5f8ab79aa345dcfcf7f14d746659ca1b80a0ecc/conn.go#L437

I patched my local checkout of google/nftables/conn.go to include this line:

        conn.SetOption(netlink.NoENOBUFS, true)

after:

        defer conn.Close()

and it seems to fix the error.. I don't know enough at this point to understand the implications of using that option, or why it might not be desired, so I decided to start with a ticket instead of a PR.

Has anyone else seen this message?

The text was updated successfully, but these errors were encountered:

stapelberg · 2020-07-14T08:56:15Z

@mdlayher any thoughts?

mdlayher · 2020-07-14T11:06:28Z

Unfortunately I don't have any more info. I've never run into this error in my work and only chose to implement the option because it appeared in a list in the man pages.

danopia · 2020-07-14T12:50:29Z

Ok thanks, I'll see if I can capture and minify a repro for this. I assume without that there's basically nothing to look at 😄

I am running my code on two different Debian releases, 10 and sid, and see the error frequently on both. (I've modified my calling code to consider ENOBUFS as a successful Flush(), instead of patching NoENOBUFS on, which might be a more reasonable workaround. Really not sure though.)

mdlayher · 2020-07-14T15:38:39Z

I took a look at the mailing list thread you mentioned above and it seems to indicate that userspace cannot keep up with the kernel's messages.

When I wrote my netlink code, simplicity and correctness were the primary goals. I know there are some outstanding work items related to OS thread locking and etc that can slow things down, but since I haven't run into them in my use cases, it's low priority work for me.

You could try DisableNSLockThread in https://godoc.org/github.com/mdlayher/netlink#Config but I have no other advice at the moment unfortunately.

danopia · 2020-07-14T18:15:51Z

I took a look at the mailing list thread you mentioned above and it seems to indicate that userspace cannot keep up with the kernel's messages.

That's the part that I don't really understand--it makes sense to fall behind when processing individual packets in userspace as the thread indicates, but I'm just sending ~100 rules to the kernel in one batch and waiting for an ACK batch back. It seems like it should be hard to fall behind on that? I suppose it's related to how the ACK includes handles for rules that are created.

You could try DisableNSLockThread in https://godoc.org/github.com/mdlayher/netlink#Config but I have no other advice at the moment unfortunately.

This is the case already:

nftables/conn.go

Lines 92 to 98 in 26bcabf

    
           disableNSLockThread := false 
        
           if cc.NetNS == 0 { 
        
           	// Since process is operating in main namespace, there is no reason to have NSLockThread lock, 
        
           	// as it causes some performance penalties. 
        
           	disableNSLockThread = true 
        
           } 
        
           return netlink.Dial(unix.NETLINK_NETFILTER, &netlink.Config{NetNS: cc.NetNS, DisableNSLockThread: disableNSLockThread})

and I've confirmed that it's passing true for me.

Here's a super simplified reproduction that just puts a bunch of accept rules into one table/chain: https://gist.github.com/danopia/437173efc0046962fafb746eccfe830e

On my laptop it reaches ENOBUFS at 138 accept rules. I also tried tcp dport 80 ip saddr 1.1.1.1 accept and ENOBUFS appeared at 104 rules. (So rule complexity is relevant but not as relevant as I'd expect.)

> go build .; and sudo setcap cap_net_admin+ep firewall-buf-repro; and ./firewall-buf-repro
2020/07/14 19:54:39 Making ever more rules...
2020/07/14 19:54:39 Trying 75 rules
2020/07/14 19:54:39 Trying 80 rules
2020/07/14 19:54:39 Trying 85 rules
2020/07/14 19:54:39 Trying 90 rules
2020/07/14 19:54:39 Trying 95 rules
2020/07/14 19:54:39 Trying 100 rules
2020/07/14 19:54:39 Trying 105 rules
2020/07/14 19:54:39 Trying 110 rules
2020/07/14 19:54:39 Trying 115 rules
2020/07/14 19:54:39 Trying 120 rules
2020/07/14 19:54:39 Trying 125 rules
2020/07/14 19:54:39 Trying 130 rules
2020/07/14 19:54:39 Trying 135 rules
2020/07/14 19:54:39 Trying 140 rules
2020/07/14 19:54:39 nftables error: Receive: netlink receive: recvmsg: no buffer space available

Does it make sense that this library is unable to acknowledge 150 accept rules in one flush? Is everyone who manages a couple hundred rules creating them piecemeal? I didn't think that this would be too many rules!

# one too many rules to receive Flush() confirmation on my laptop
> sudo nft list table demo-table
table ip demo-table {
        chain cluster-ips {
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
                accept
        }
}

stapelberg · 2020-07-22T21:39:21Z

Thanks for the repro program.

On my machine, it prints:

2020/07/22 23:26:08 Making ever more rules...
2020/07/22 23:26:08 Trying 75 rules
2020/07/22 23:26:09 Trying 80 rules
2020/07/22 23:26:09 Trying 85 rules
2020/07/22 23:26:09 Trying 90 rules
2020/07/22 23:26:09 Trying 95 rules
2020/07/22 23:26:09 Trying 100 rules
2020/07/22 23:26:09 Trying 105 rules
2020/07/22 23:26:09 Trying 110 rules
2020/07/22 23:26:09 Trying 115 rules
2020/07/22 23:26:09 Trying 120 rules
2020/07/22 23:26:09 Trying 125 rules
2020/07/22 23:26:09 Trying 130 rules
2020/07/22 23:26:09 Trying 135 rules
2020/07/22 23:26:09 Trying 140 rules
2020/07/22 23:26:09 nftables error: Receive: netlink receive: recvmsg: no buffer space available

My guess would be that due to the NLM_F_ACK flag, each request generates a reply, but the combination of replies exceeds the socket buffer size, and hence the kernel returns -ENOBUFS.

I’m not yet sure what the best way to fix this is:

Do nothing at the library level, and instead rely on users not causing too large reply packets.
Stop requesting acknowledgements to keep reply sizes small? Maybe this only reduces the likelihood of triggering the issue, in case there are replies which are always big.
Reading https://patchwork.ozlabs.org/project/netdev/patch/20090323141858.6808.50788.stgit@Decadence/, I think setting the NoENOBUFS option results in dropping acknowledgements instead of returning -ENOBUFS. Instead of setting NoENOBUFS, we should probably stop requesting acknowledgements.

mdlayher · 2020-07-22T22:41:00Z

Instead of setting NoENOBUFS, we should probably stop requesting acknowledgements.

Seems reasonable to me.

danopia · 2020-07-24T17:51:59Z

Thanks for looking into the netdev source. Since we're dealing with the kernel I suppose it makes sense that packets would get dropped instead of buffered when using NoENOBUFS.

I found the specific message for adding rules and noticed that it has both ECHO and ACK flags masked in:

nftables/rule.go

Line 132 in 7127d9d

    
           flags = netlink.Request | netlink.Acknowledge | netlink.Create | unix.NLM_F_ECHO | unix.NLM_F_APPEND

So I decided to do a little bit of testing with the above accept repro:

With ECHO | ACK, my loop crashes at 140 rules.
With only ACK, it crashes at 280 rules.
With only ECHO, it also crashes at 280 rules.
With neither, it seems error free up to / past 1000 rules.

Then I adjusted the repro to include 8 verdicts per rule, accept accept accept accept accept accept accept accept:

With ECHO | ACK, it crashes at 105 rules
With only ACK, it crashes at 280 rules.
With only ECHO, it crashes at 170 rules.

So it seems like ACK isn't the only problem, especially as rule complexity is increased. ECHO is a larger contributor towards read buffer space. Frustratingly, I've noticed this unrelated work that depends on the ECHO responses: #88

Maybe there's need for configurability if the masking is being changed? So folks that want ECHO on each frame can keep it. I only care about knowing if the whole batch was accepted as a unit.

(I also toyed with adding calls like conn.SetReadBuffer(200*1000) in conn.go which directly changed the ENOBUFS boundary and could be made configurable, however rmem_max is still at play without resorting to SO_RCVBUFFORCE. So that's a separate hole to go down.)

stapelberg · 2020-08-08T08:17:22Z

Thanks for the details!

How does the nft(8) tool handle this issue?

danopia · 2020-08-08T18:04:08Z

When using nft --debug mnl -f ..., I see neither ECHO nor ACK flags on any of the packets. Thus it shouldn't have this problem at all.

When using nft --debug mnl --echo -f ..., I see only the ECHO flag added to all the packets. With this arg I actually see the nft(8) tool fail silently, after around 170 lines of tcp dport http ip saddr 1.1.1.1 accept repeated:

> sudo nft -f nft.txt
> echo $status
0

> sudo nft -f nft.txt --echo
> echo $status
1

I don't fully understand when nft(8) uses ACK. Passing --handle by itself seems to be a no-op. Passing --handle --echo starting using ACK for some packets, but the main rules packets still seem to only have ECHO masked in.

In summary, nft(8) doesn't handle this error at all, but it doesn't use ACK or ECHO by default either, so it generally avoids the issue overall.

stapelberg · 2020-08-10T06:35:43Z

Thanks for the details! Sounds like removing the ACK and ECHO flags is the best choice to make, then. Is there any downside to that I’m not seeing? Would you be interested in sending a PR that removes them?

danopia · 2021-08-25T21:45:20Z

I'm closing this as stale. I ignore the error in my program and have had no further issues since then.

stapelberg mentioned this issue Apr 1, 2021

garlic stops working after a few days: ENOBUFS error fearful-symmetry/garlic#6

Closed

danopia closed this as completed Aug 25, 2021

Reiner030 mentioned this issue Nov 15, 2021

ipverse.net UA (Ukraine) bug rpthms/nft-geo-filter#23

Open

turekt mentioned this issue Sep 25, 2022

Netlink conn retry logic #191

Closed

maxmoehl mentioned this issue Aug 10, 2023

Not all response messages are received causing the receive buffer to overflow #235

Closed

patryk4815 mentioned this issue Jan 13, 2024

add config.SkipErrorENOBUFS option florianl/go-nflog#36

Closed

christoph-zededa mentioned this issue Jul 25, 2024

Usbmanager uevent reader handle ENOBUFSYS and similar errors lf-edge/eve#4107

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Receive: netlink receive: recvmsg: no buffer space available #103

Receive: netlink receive: recvmsg: no buffer space available #103

danopia commented Jul 13, 2020

stapelberg commented Jul 14, 2020

mdlayher commented Jul 14, 2020

danopia commented Jul 14, 2020

mdlayher commented Jul 14, 2020

danopia commented Jul 14, 2020 •

edited

Loading

stapelberg commented Jul 22, 2020

mdlayher commented Jul 22, 2020

danopia commented Jul 24, 2020

stapelberg commented Aug 8, 2020

danopia commented Aug 8, 2020

stapelberg commented Aug 10, 2020

danopia commented Aug 25, 2021

Receive: netlink receive: recvmsg: no buffer space available #103

Receive: netlink receive: recvmsg: no buffer space available #103

Comments

danopia commented Jul 13, 2020

stapelberg commented Jul 14, 2020

mdlayher commented Jul 14, 2020

danopia commented Jul 14, 2020

mdlayher commented Jul 14, 2020

danopia commented Jul 14, 2020 • edited Loading

stapelberg commented Jul 22, 2020

mdlayher commented Jul 22, 2020

danopia commented Jul 24, 2020

stapelberg commented Aug 8, 2020

danopia commented Aug 8, 2020

stapelberg commented Aug 10, 2020

danopia commented Aug 25, 2021

danopia commented Jul 14, 2020 •

edited

Loading