Skip to content

Conversation

@MarcoPolo
Copy link
Contributor

This change uses rtnetlink directly per route query instead of relying on the routing information base.

This fixes a routing issue where, with some VPNs (wireguard based ones), this library would return the incorrect source IP when queried.

This is because Wireguard creates a new table and routing rules that are not reflected in the routing information base.

Instead of trying to recreate this logic, we can query kernel directly via the rtnetlink socket. This is a bit painful from Go, but doable.

Consult man 7 rtnetlink for more information on the rtnetlink interface.

AI Disclosure

For full transparency, I used an LLM to generate much of the bit fiddling/C FFI details here. I've reviewed all the changes and referenced the relevant man pages. I've changed much of the code from the original LLM output. All comments are my own. This is not vibe coded. That said, bit fiddling with C FFI is always tricky and it's possible both I and the LLM have missed something. A careful review would be appreciated.

@MarcoPolo MarcoPolo requested a review from willscott October 28, 2025 19:45
@willscott
Copy link
Contributor

I'm pretty sure wireguard RIBs are visible through the syscall interface.

There's a bit of lack of definition in what this library is trying to do, which i think boils down to whether we want to cache the RIB snapshot for efficiency (see also #60 for someone wanting to optimize against the snapshot)

The question then is whether we want to extend the routing logic to cases with more complex / multiple tables, or if we want to instead / also support a way to do a direct syscall as you're PR'ing here.

@MarcoPolo
Copy link
Contributor Author

MarcoPolo commented Oct 28, 2025

I'm pretty sure wireguard RIBs are visible through the syscall interface.

The interface is visible, but two issues:

  1. Since we only get this once we fail to update if the routing decisions change. e.g. someone turns on VPN while the application is running.
  2. The default routing table isn't changed by wireguard. Instead it creates a new table and routes packets to that new table via the fwmark attribute (on my machine via nftables).

The RIB we get from the stdlib's NetlinkRIB doesn't let us route correctly. We're missing the routing rules.

We could maybe import the routing rules being used and try to copy the kernel behavior, or we could ask the kernel.

There's a bit of lack of definition in what this library is trying to do, which i think boils down to whether we want to cache the RIB snapshot for efficiency (see also #60 for someone wanting to optimize against the snapshot)

Hmm, I think this approach might perform fine in the linked use-case as we are sending minimal data through the syscall interface. On my machine this is ~150us per .Route call.

The question then is whether we want to extend the routing logic to cases with more complex / multiple tables, or if we want to instead / also support a way to do a direct syscall as you're PR'ing here.

I don't want to manage the code that tries to replicate nftables and kernel routing decisions. This change is also inline with the recent change we made to BSD variants.

@willscott
Copy link
Contributor

Other thing that i'm a bit worried about is it looks like there may be some swath of things that won't have permission to bind to AF_NETLINK - in particular it looks like this is true on android and some linux distros.

  • we should probably get confident that arch/nyx/debian at least can do this - i was able to do it when testing on debian just now.
  • I wonder if we want the current method as a fallback that will work with less permissions?

@MarcoPolo
Copy link
Contributor Author

MarcoPolo commented Oct 29, 2025

Other thing that i'm a bit worried about is it looks like there may be some swath of things that won't have permission to bind to AF_NETLINK - in particular it looks like this is true on android and some linux distros.

  • we should probably get confident that arch/nyx/debian at least can do this - i was able to do it when testing on debian just now.
  • I wonder if we want the current method as a fallback that will work with less permissions?

The current method uses syscall.NetlinkRIB which also opens the AF_NETLINK socket.

fwiw, I didn't have an issue on Fedora.

MarcoPolo and others added 2 commits October 28, 2025 21:04
This change uses rtnetlink directly per route query instead of relying
on the routing information base.

This fixes a routing issue where, with some VPNs (wireguard based ones),
this library would return the incorrect source IP when queried.

This is because Wireguard creates a new table and routing rules that are
not reflected in the routing information base.

Instead of trying to recreate this logic, we can query kernel directly
via the rtnetlink socket. This is a bit painful from Go, but doable.

Consult `man 7 rtnetlink` for more information on the rtnetlink
interface.
As they are used by various different build tags, and it's a bit
complicated to have staticcheck realize this.
@willscott willscott merged commit 85c7afb into master Oct 29, 2025
8 of 9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants