Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(kad): implement automatic client mode #3877

Merged
merged 123 commits into from
May 31, 2023

Conversation

thomaseizinger
Copy link
Contributor

@thomaseizinger thomaseizinger commented May 4, 2023

Description

Currently, the kademlia behaviour can only learn that the remote node supports kademlia on a particular connection if we successfully negotiate a stream to them.

Using the newly introduced abstractions from #3651, we don't have to attempt to establish a stream to the remote to learn whether they support kademlia on a connection but we can directly learn it from the ConnectionEvent::RemoteProtocolsChange event. This happens directly once a connection is established which should overall benefit the DHT.

Clients do not advertise the kademlia protocol and thus we will immediately learn that a given connection is not suitable for kadmelia requests. We may receive inbound messages from it but this does not affect the routing table.

Resolves: #2032.

Notes & open questions

Draft because it is stacked on top of:

Is Mode::Server a good default? Should it be Mode::Client to be "safer"?
Should we even have an overall mode? With how we are now thinking of supported protocols, we could be a server on one connection and a client on another.

Depends-On: #3954.

Change checklist

  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • A changelog entry has been made in the appropriate crates

@thomaseizinger thomaseizinger requested a review from mxinden May 30, 2023 13:24
@thomaseizinger
Copy link
Contributor Author

@mxinden All comments addressed.

@thomaseizinger
Copy link
Contributor Author

@mxinden Feel free to merge if this is ready in your eyes.

Copy link
Member

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ready to go apart from the smaller suggestion below.

swarm/src/behaviour/external_addresses.rs Outdated Show resolved Hide resolved
@mxinden
Copy link
Member

mxinden commented May 31, 2023

@rkuhn in https://github.com/Actyx/:

  1. Do you use libp2p-kad?
  2. Do you use it with private IPv4 addresses?
  3. Do you call Swarm::add_external_address?

Feel free to ignore if !one || !two || three.

@mergify mergify bot merged commit 92c8cc4 into master May 31, 2023
@mergify mergify bot deleted the feat/kademlia-automatic-client-mode branch May 31, 2023 09:02
@mxinden
Copy link
Member

mxinden commented Jun 1, 2023

Commit updating kademlia-exporter project: mxinden/kademlia-exporter@b958a02

@mxinden
Copy link
Member

mxinden commented Jun 3, 2023

@thomaseizinger
Copy link
Contributor Author

Overall results look promising @thomaseizinger:

image

https://kademlia-exporter.max-inden.de/d/Pfr0Fj6Mk/rust-libp2p?orgId=1&refresh=30s&from=now-7d&to=now

Cool! So essentially, the improvement we are seeing here by just deploying it to one node is that we are learning about the other node supporting kademlia via identify and add it to our routing table immediately instead of waiting for an inbound/outbound request.

Consequently, we have a better local routing table and can answer queries more quickly.

Is that analysis correct?

@dariusc93
Copy link
Member

Was Kademlia::set_mode removed? From what it looks like, it looks like the mode is determined based on if the node has an external address. If so, how would one set the mode to advertise the kad protocol without adding an external address, or say if a node only wants to stay as a client and query the DHT, how would it do that while also having an external address?

@thomaseizinger
Copy link
Contributor Author

Was Kademlia::set_mode removed?

Because we didn't want to place the burden on configuring this on the user.

From what it looks like, it looks like the mode is determined based on if the node has an external address.

Yes, that is correct.

If so, how would one set the mode to advertise the kad protocol without adding an external address

If you don't have an external address, how would your node be reachable and thus of use to the kademlia network?

or say if a node only wants to stay as a client and query the DHT, how would it do that while also having an external address?

Why would you want to do that?

@dariusc93
Copy link
Member

If you don't have an external address, how would your node be reachable and thus of use to the kademlia network?

More under the assumption that the address that is listened on would be its external such as a public ip as some nodes may not have autonat in place or want to add/remove external addresses manually on each change at runtime, although I did see the commit removing the scoring system so that make sense on that.

Why would you want to do that?

Some nodes may only want to receive information from DHT without being queried themselves. Eg, a node may want to query the DHT for records without the node being a server itself and receiving request while allowing it to change to a server later on in its runtime, while the same can also be said on if the node wish to switch from being a server to a client later on if it wishes not to handle anymore queries. Under the implementation here, the node dont have a choice but to be a client until its reachable with no option to set it back later on unless it remove its external address (assuming it does switch back to client), but then that may trigger other behaviours to act on those events too if they rely on an external address event.

@thomaseizinger
Copy link
Contributor Author

Why would you want to do that?

Some nodes may only want to receive information from DHT without being queried themselves. Eg, a node may want to query the DHT for records without the node being a server itself and receiving request while allowing it to change to a server later on in its runtime, while the same can also be said on if the node wish to switch from being a server to a client later on if it wishes not to handle anymore queries. Under the implementation here, the node dont have a choice but to be a client until its reachable with no option to set it back later on unless it remove its external address (assuming it does switch back to client), but then that may trigger other behaviours to act on those events too if they rely on an external address event.

All this sounds very constructed to me. Do you have a concrete usecase for which the current implementation does not work?

I understand that we are making some assumptions here with how things are now implemented. I don't mind if those get invalidated and we have to adapt the implementation. But the vision I want to work towards is that things work out of the box and people don't have to become experts at the library to make use of its (advanced) features.

@dariusc93
Copy link
Member

dariusc93 commented Jun 11, 2023

All this sounds very constructed to me. Do you have a concrete usecase for which the current implementation does not work?

I do maintain a fork of rust-ipfs and one case would be that nodes may wish to only find what peer is providing what over dht, or may wish to fetch records (eg resolving ipns records). Such nodes may not wish to operate as a dht server at that moment especially if the intention is to locate contents or records provided by peers, and may later want to switch over and operate as a server.

Another case would be in embedded/mobile environments. Under these environments, they may be reachable by other nodes directly (ip being public or upnp is enabled), or via relay, however the nodes may not have the resources to operate as a dht server. This could be memory related, network restrictions, etc. Client mode would be suitable under those environments while under the current implementation the nodes would have no choice but to operate as a server upon having an external address or dont have kad enabled at all.

I understand that we are making some assumptions here with how things are now implemented. I don't mind if those get invalidated and we have to adapt the implementation. But the vision I want to work towards is that things work out of the box and people don't have to become experts at the library to make use of its (advanced) features.

I dont believe having the option to set the mode to client or server would be overly complex internally. Maybe have a flag in the configuration to rely on external addresses to determine its mode or to be determined manually by setting the mode to client or server (with the default being client) and send an event to the handlers when the mode is changed at runtime, either as it does in Kademlia::on_swarm_event or in Kademlia::set_mode.

@mxinden
Copy link
Member

mxinden commented Jun 22, 2023

Overall results look promising @thomaseizinger:
image
https://kademlia-exporter.max-inden.de/d/Pfr0Fj6Mk/rust-libp2p?orgId=1&refresh=30s&from=now-7d&to=now

Cool! So essentially, the improvement we are seeing here by just deploying it to one node is that we are learning about the other node supporting kademlia via identify and add it to our routing table immediately instead of waiting for an inbound/outbound request.

Consequently, we have a better local routing table and can answer queries more quickly.

Is that analysis correct?

I don't think that is the reason. Instead I think that the routing table is significantly more healthy now. I.e. previously we would add go-libp2p based Kademlia client nodes to our routing table once they did an inbound request to us. Given that they are behind a firewall or NAT (thus Kademlia clients), reaching out to them for requests would fail and thus delay outbound query durations. Now we no longer add them to our routing table and thus, for outbound requests, only reach out to the Kademlia server nodes, thus significantly reducing outbound request latency.

@thomaseizinger
Copy link
Contributor Author

Cool! So essentially, the improvement we are seeing here by just deploying it to one node is that we are learning about the other node supporting kademlia via identify and add it to our routing table immediately instead of waiting for an inbound/outbound request.

Consequently, we have a better local routing table and can answer queries more quickly.

Is that analysis correct?

I don't think that is the reason. Instead I think that the routing table is significantly more healthy now. I.e. previously we would add go-libp2p based Kademlia client nodes to our routing table once they did an inbound request to us.

I don't think that is true, we never add inbound connections to our routing table.

@mxinden
Copy link
Member

mxinden commented Jun 22, 2023

You are right. I have to dig deeper. Your suggestion makes the most sense.

we are learning about the other node supporting kademlia via identify and add it to our routing table immediately instead of waiting for an inbound/outbound request.

@folex
Copy link
Contributor

folex commented Jul 7, 2023

Do I understand correctly, that a remote peer can basically force my peer to switch from Client mode to Server mode by triggering NetworkBehaviourAction::ReportObservedAddr via Identify? Is that a desired behaviour?

On that note, what do you think about allowing to configure mode manually via configuration parameters? For example, as a part of KademliaConfig.

I think that makes a lot of sense, because if I know my peers are high uptime and on powerful intances, I want to be sure that they operate in Server mode, controlling it directly, without having to think about calling add_external_address, depending on the Identify and other non-direct side effects.

Controlling non-direct side effects in a big codebases is hard.

@folex
Copy link
Contributor

folex commented Jul 7, 2023

To clarify. I have external_addresses as a configuration parameter in my backend's configuration file.

So administrator of the backend is free to set external_address to be empty, I have no control over that.

And that's totally OK, right? If a peer doesn't know its own external address – they can still accept connections, other peers still can see their socket addresses and try to use them.

But with this change, peer now behaves entirely different based on the user configuration. Until Identify comes, which happens unpredictable.

I want to have more control over how Kademlia behaves, at least I think I do :)

@@ -368,6 +365,9 @@ pub enum KademliaHandlerIn {
/// for the query on the remote.
Reset(KademliaRequestId),

/// Change the connection to the specified mode.
Copy link
Contributor

@folex folex Jul 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the effect of that? Is there a network message sent all my connections when I learn about external_addresses?

I think it makes sense to elaborate on that in the comment. Describe effects of other peers receiving this message.

Also, isn't that a lot of overhead? What if my addresses go on and off often? Not sure why that might happen, but now how do I make sure that doesn't happen?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nobody receives any messages. Client vs server mode essentially just means, "do we accept inbound kademlia streams" which translates to rust-libp2p as "do we offer kademlia via ConnectionHandler::listen_protocols".

.map_upgrade(Either::Left)
} else {
SubstreamProtocol::new(Either::Right(upgrade::DeniedUpgrade), ())
match self.mode {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add some comments on what happens here?

That sounds like a rather impactful thing. Now Peer can allow and disallow accepting connections on its own, and I as a developer have no control over that.

Copy link
Contributor

@folex folex Jul 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try to outline my understanding of the effects of this change. I'm certainly can be wrong here, so forgive me if I am :)

So, previously if I had allow_listening = true, my peer would accept incoming connections even before Identify happened.

Now Kademlia handshake behaviour depends on whether Identify already happened or not. So instead of 1 handshake on Kademlia protocol and 1 handshake on Identify, we now have (depending on handshakes order) several scenarios:

PeerA connects to PeerB. PeerA is in Server mode, PeerB has not yet learned her external addresses, so it is in Client mode.

Scenario 1: failed connection

  1. PeerA tries to connect to PeerB
  2. PeerA attempts Kademlia handhsake with PeerB, PeerB rejects because Mode::Client => upgrade::DeniedUpgrade
  3. PeerA performs a succesful handshake via Identify with PeerB
  4. PeerB learns its external address, switches to Server mode
  5. PeerB sends ReconfigureMode to all its kademlia connections, but PeerA is not among them
  6. Kademlia handhsake between A and B has failed
  7. Connection failed

If PeerA tries again, connection will succeed, even though nothing has changed in the setup. Technically, nothing prevents PeerB of accepting PeerA's connection the first time. But now PeerA has to retry if she's sure she brought external_addresses to PeerB.

Scenario 2: succesful connection

  1. PeerA tries to connect to PeerB
  2. PeerA performs a succesful handshake via Identify with PeerB
  3. PeerB switches to Server mode
  4. PeerA attempts Kademlia handshake with PeerB
  5. PeerB accepts
  6. Connection succesful

If I understand correctly, there's no strict order in which handshakes happen, right? So it is basically a race condition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PeerA attempts Kademlia handhsake with PeerB, PeerB rejects because Mode::Client => upgrade::DeniedUpgrade

Can you elaborate on how that handshake is triggered? At least rust-libp2p does not attempt to speak the kademlia protocol to a peer that did not advertise it first. This is the whole point of this PR: By filtering peers based on whether the advertise kademlia, we get healthier routing tables and as a result, kademlia queries finish a lot quicker. See #3877 (comment).

@dariusc93
Copy link
Member

To clarify. I have external_addresses as a configuration parameter in my backend's configuration file.

So administrator of the backend is free to set external_address to be empty, I have no control over that.

And that's totally OK, right? If a peer doesn't know its own external address – they can still accept connections, other peers still can see their socket addresses and try to use them.

But with this change, peer now behaves entirely different based on the user configuration. Until Identify comes, which happens unpredictable.

I want to have more control over how Kademlia behaves, at least I think I do :)

This was resolved in #4132 to allow setting the mode

@thomaseizinger
Copy link
Contributor Author

Do I understand correctly, that a remote peer can basically force my peer to switch from Client mode to Server mode by triggering NetworkBehaviourAction::ReportObservedAddr via Identify?

That is not the way to think about it. Whether or not your node operates in server or client mode solely depends on whether you have an external address. Using other nodes to learn you external address is one way to do that although identify by itself does not result in a verified external address, only a candidate. It is your own node's responsibility to turn such candidates in the verified external addresses and only that will trigger a switch from client to server mode.

@thomaseizinger
Copy link
Contributor Author

thomaseizinger commented Jul 9, 2023

I think that makes a lot of sense, because if I know my peers are high uptime and on powerful intances, I want to be sure that they operate in Server mode, controlling it directly, without having to think about calling add_external_address, depending on the Identify and other non-direct side effects.

Controlling non-direct side effects in a big codebases is hard.

It is a trade-off, right? I appreciate you sharing your view! We have received quite a bit of feedback on this actually but I do still think that this is the correct design. In the past, rust-libp2p was designed with many explicit config switches. This required (and still requires) people to become experts at the library before they can use it effectively.

To clarify. I have external_addresses as a configuration parameter in my backend's configuration file.

So administrator of the backend is free to set external_address to be empty, I have no control over that.

Perhaps your configuration system is too powerful? Have you considered to changing your config file to mean additional external addresses? With protocols like identify and AutoNAT, discovering and verifying an external address should work flawlessly.

Let's assume we are a talking about an always-online bootnode and other nodes are hard-wired to connect to that one on startup. We definitely want that node to speak kademlia, right?

Without the above design, we'd essentially have to rely on other nodes "probing" for kademlia. They would have to just try whether the node they connect to speaks kademlia. Given that the kademlia protocol itself doesn't really know where a connection came from, we'd have to probe every connection for kademlia support.

With this design, we can instead make a very good educated guesses that a node supports kademlia by relying on its (self-reported) supported protocols. A bootnode that is configured with a static, external IP address would advertise kademlia from the moment it boots and any new peer that connects would immediately put it in its routing table once identify reported the supported protocols (which also happens immediately one every new connection).

I still believe this is a much cleaner design because it is deterministic in behaviour which makes it easier to understand despite the many moving parts compared to random of protocols.

That being said, I am open to alternative designs if you have any ideas!

umgefahren pushed a commit to umgefahren/rust-libp2p that referenced this pull request Mar 8, 2024
Previously, a `NetworkBehaviour` could report an `AddressScore` for an external address. This score was a `u32` and addresses would be ranked amongst those.

In reality, an address is either confirmed to be publicly reachable (via a protocol such as AutoNAT) or merely represents a candidate that might be an external address. In a way, addresses are guilty (private) until proven innocent (publicly reachable).

When a `NetworkBehaviour` reports an address candidate, we perform address translation on it to potentially correct for ephemeral ports of TCP. These candidates are then injected back into the `NetworkBehaviour`. Protocols such as AutoNAT can use these addresses as a source for probing their NAT status. Once confirmed, they can emit a `ToSwarm::ExternalAddrConfirmed` event which again will be passed to all `NetworkBehaviour`s.

This simplified approach will allow us implement Kademlia's client-mode (libp2p#2032) without additional configuration options: As soon as an address is reported as publicly reachable, we can activate server-mode for that connection.

Related: libp2p#3877.
Related: libp2p#3953.
Related: libp2p#2032.
Related: libp2p/go-libp2p#2229.

Co-authored-by: Max Inden <[email protected]>

Pull-Request: libp2p#3954.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

protocols/kad: Support client mode
5 participants