linux: use rtnetlink directly #64

MarcoPolo · 2025-10-28T19:45:21Z

This change uses rtnetlink directly per route query instead of relying on the routing information base.

This fixes a routing issue where, with some VPNs (wireguard based ones), this library would return the incorrect source IP when queried.

This is because Wireguard creates a new table and routing rules that are not reflected in the routing information base.

Instead of trying to recreate this logic, we can query kernel directly via the rtnetlink socket. This is a bit painful from Go, but doable.

Consult man 7 rtnetlink for more information on the rtnetlink interface.

AI Disclosure

For full transparency, I used an LLM to generate much of the bit fiddling/C FFI details here. I've reviewed all the changes and referenced the relevant man pages. I've changed much of the code from the original LLM output. All comments are my own. This is not vibe coded. That said, bit fiddling with C FFI is always tricky and it's possible both I and the LLM have missed something. A careful review would be appreciated.

willscott · 2025-10-28T19:58:17Z

I'm pretty sure wireguard RIBs are visible through the syscall interface.

There's a bit of lack of definition in what this library is trying to do, which i think boils down to whether we want to cache the RIB snapshot for efficiency (see also #60 for someone wanting to optimize against the snapshot)

The question then is whether we want to extend the routing logic to cases with more complex / multiple tables, or if we want to instead / also support a way to do a direct syscall as you're PR'ing here.

MarcoPolo · 2025-10-28T20:36:42Z

I'm pretty sure wireguard RIBs are visible through the syscall interface.

The interface is visible, but two issues:

Since we only get this once we fail to update if the routing decisions change. e.g. someone turns on VPN while the application is running.
The default routing table isn't changed by wireguard. Instead it creates a new table and routes packets to that new table via the fwmark attribute (on my machine via nftables).

The RIB we get from the stdlib's NetlinkRIB doesn't let us route correctly. We're missing the routing rules.

We could maybe import the routing rules being used and try to copy the kernel behavior, or we could ask the kernel.

There's a bit of lack of definition in what this library is trying to do, which i think boils down to whether we want to cache the RIB snapshot for efficiency (see also #60 for someone wanting to optimize against the snapshot)

Hmm, I think this approach might perform fine in the linked use-case as we are sending minimal data through the syscall interface. On my machine this is ~150us per .Route call.

The question then is whether we want to extend the routing logic to cases with more complex / multiple tables, or if we want to instead / also support a way to do a direct syscall as you're PR'ing here.

I don't want to manage the code that tries to replicate nftables and kernel routing decisions. This change is also inline with the recent change we made to BSD variants.

willscott · 2025-10-29T03:53:47Z

Other thing that i'm a bit worried about is it looks like there may be some swath of things that won't have permission to bind to AF_NETLINK - in particular it looks like this is true on android and some linux distros.

we should probably get confident that arch/nyx/debian at least can do this - i was able to do it when testing on debian just now.
I wonder if we want the current method as a fallback that will work with less permissions?

MarcoPolo · 2025-10-29T04:00:17Z

Other thing that i'm a bit worried about is it looks like there may be some swath of things that won't have permission to bind to AF_NETLINK - in particular it looks like this is true on android and some linux distros.

we should probably get confident that arch/nyx/debian at least can do this - i was able to do it when testing on debian just now.

I wonder if we want the current method as a fallback that will work with less permissions?

The current method uses syscall.NetlinkRIB which also opens the AF_NETLINK socket.

fwiw, I didn't have an issue on Fedora.

This change uses rtnetlink directly per route query instead of relying on the routing information base. This fixes a routing issue where, with some VPNs (wireguard based ones), this library would return the incorrect source IP when queried. This is because Wireguard creates a new table and routing rules that are not reflected in the routing information base. Instead of trying to recreate this logic, we can query kernel directly via the rtnetlink socket. This is a bit painful from Go, but doable. Consult `man 7 rtnetlink` for more information on the rtnetlink interface.

As they are used by various different build tags, and it's a bit complicated to have staticcheck realize this.

MarcoPolo requested a review from willscott October 28, 2025 19:45

MarcoPolo force-pushed the use-rtnetlink branch from b6432dc to 7a24dc9 Compare October 29, 2025 03:56

MarcoPolo and others added 2 commits October 28, 2025 21:04

remove unused routeSlice

ee80ee3

MarcoPolo force-pushed the use-rtnetlink branch from 7a24dc9 to ee80ee3 Compare October 29, 2025 04:04

ignore unused structs in common

0d22acb

As they are used by various different build tags, and it's a bit complicated to have staticcheck realize this.

willscott merged commit 85c7afb into master Oct 29, 2025
8 of 9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

linux: use rtnetlink directly #64

linux: use rtnetlink directly #64

Uh oh!

MarcoPolo commented Oct 28, 2025

Uh oh!

willscott commented Oct 28, 2025

Uh oh!

MarcoPolo commented Oct 28, 2025 •

edited

Loading

Uh oh!

willscott commented Oct 29, 2025

Uh oh!

MarcoPolo commented Oct 29, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

linux: use rtnetlink directly #64

linux: use rtnetlink directly #64

Uh oh!

Conversation

MarcoPolo commented Oct 28, 2025

Uh oh!

willscott commented Oct 28, 2025

Uh oh!

MarcoPolo commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

willscott commented Oct 29, 2025

Uh oh!

MarcoPolo commented Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

MarcoPolo commented Oct 28, 2025 •

edited

Loading

MarcoPolo commented Oct 29, 2025 •

edited

Loading