Skip to content

fix: another bundle of small fixes for how we select peers#904

Merged
Davidson-Souza merged 9 commits intogetfloresta:masterfrom
Davidson-Souza:fix-stuff
Mar 25, 2026
Merged

fix: another bundle of small fixes for how we select peers#904
Davidson-Souza merged 9 commits intogetfloresta:masterfrom
Davidson-Souza:fix-stuff

Conversation

@Davidson-Souza
Copy link
Copy Markdown
Member

Description and Notes

This PR bundles a couple of small changes intended to make our node more effective at finding Utreexo peers. See each commit message for more details about what they are changing and why.

Changelog:

fix(addr_man: don't return connected addresses in get_address_by_service
fix: update update_set_service_flag to require NETWORK_LIMITED
test: add a new test for adding fixed peers on addr_man
fix(sync_ctx): make it more aggressive at oppening utreexo connections
fix: update excess peer's data
fix: fix service bits for some hardcoded addresses

@Davidson-Souza Davidson-Souza added this to the v0.9.0 milestone Mar 18, 2026
@Davidson-Souza Davidson-Souza self-assigned this Mar 18, 2026
@Davidson-Souza Davidson-Souza added the reliability Related to runtime reliability, stability and production readiness label Mar 18, 2026
"Tried": 0
},
"services": 1037,
"services": 1033,
Copy link
Copy Markdown
Member

@JoseSK999 JoseSK999 Mar 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently we can't connect to these, if these don't support P2Pv2. Just noting this.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to test all those peers to check if they are still alive. A little offtopic for this PR tho

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened #907 for this

@JoseSK999
Copy link
Copy Markdown
Member

In cf61fbb you say "don't try to create a normal connection
if we are short on utreexo ones.", but I only see one code change which is the interval reduction

Comment on lines +231 to +233
// Don't allow our node to have more than T::MAX_OUTGOING_PEERS, unless this is a
// manual peer, those can exceed our quota.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: remove these lines, you already added one updated comment below


// We allow utreexo and manual peers to bypass our connection limis
let is_utreexo_peer =
version.kind == ConnectionKind::Regular(service_flags::UTREEXO.into());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is incorrect since you can have other service bits, so this will not be equal even if this is a bridge node


let idx = rand::random::<usize>() % peers.len();
let utreexo_peer = peers.get(idx)?;
let utreexo_peer = *peers.get(idx)?;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, this shouldn't assume the peer is utreexo, we can use other service bits

@Davidson-Souza Davidson-Souza mentioned this pull request Mar 18, 2026
20 tasks
@luisschwab luisschwab self-requested a review March 19, 2026 18:53
Comment on lines 740 to 763
fn get_address_by_service(&self, service: ServiceFlags) -> Option<(usize, LocalAddress)> {
let peers = self.good_peers_by_service.get(&service)?;
let peers: Vec<_> = self
.good_peers_by_service
.get(&service)?
.iter()
.filter_map(|address| {
let local_address = self.addresses.get(address)?;
if local_address.state == AddressState::Connected {
return None;
}

Some(local_address.id)
})
.collect();

if peers.is_empty() {
return None;
}

let idx = rand::random::<usize>() % peers.len();
let utreexo_peer = peers.get(idx)?;
let address_id = *peers.get(idx)?;

Some((*utreexo_peer, self.addresses.get(utreexo_peer)?.to_owned()))
Some((peer_id, self.addresses.get(&address_id)?.to_owned()))
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing documentation

pub fn update_set_service_flag(&mut self, idx: usize, flags: ServiceFlags) -> &mut Self {
// if this peer turns out to not have the minimum required services, we remove it
if !flags.has(ServiceFlags::NETWORK) || !flags.has(ServiceFlags::WITNESS) {
if !flags.has(ServiceFlags::NETWORK_LIMITED) || !flags.has(ServiceFlags::WITNESS) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does NETWORK also imply NETWORK_LIMITED? If not, NETWORK should remain there and NETWORK_LIMITED should be added.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we require NETWORK we will drop all pruned nodes from addr_man

Comment on lines +1334 to +1335
let signet_address =
load_addresses_from_json("./src/p2p_wire/seeds/signet_seeds.json").unwrap();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

signet_addresses

ServiceFlags::NETWORK | ServiceFlags::WITNESS | service_flags::UTREEXO_ARCHIVE.into()
}

const TRY_NEW_CONNECTION: u64 = 30; // 30 seconds
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On 9be7162 you just removed TRY_NEW_CONNECTION, but where is the new 10 second value coming from?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In node_context.rs it is set to 10s by default

self.inflight
.insert(InflightRequests::GetAddresses, (peer, Instant::now()));

let good_peers_count = self.connected_peers();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.connected_peers() should return the actual peers, not the count.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or be renamed to reflect that it returns a count, we use it in several places already. Fells a bit off-topic for this PR tho

luisschwab

This comment was marked as outdated.

@luisschwab luisschwab requested a review from JoseSK999 March 19, 2026 23:23
@JoseSK999
Copy link
Copy Markdown
Member

Trying this tomorrow morning🫡

@Davidson-Souza
Copy link
Copy Markdown
Member Author

I forgot to push some fixes suggested by @luisschwab

luisschwab

This comment was marked as outdated.

Comment on lines 743 to 766
fn get_address_by_service(&self, service: ServiceFlags) -> Option<(usize, LocalAddress)> {
let peers = self.good_peers_by_service.get(&service)?;
let peers: Vec<_> = self
.good_peers_by_service
.get(&service)?
.iter()
.filter_map(|address| {
let local_address = self.addresses.get(address)?;
if local_address.state == AddressState::Connected {
return None;
}

Some(local_address.id)
})
.collect();

if peers.is_empty() {
return None;
}

let idx = rand::random::<usize>() % peers.len();
let utreexo_peer = peers.get(idx)?;
let address_id = *peers.get(idx)?;

Some((*utreexo_peer, self.addresses.get(utreexo_peer)?.to_owned()))
Some((address_id, self.addresses.get(&address_id)?.to_owned()))
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compact version, avoiding calling self.addresses.get twice

    fn get_address_by_service(&self, service: ServiceFlags) -> Option<(usize, LocalAddress)> {
        let candidates = self.good_peers_by_service.get(&service)?;

        candidates
            .iter()
            .filter_map(|id| {
                let addr = self.addresses.get(id)?;
                (addr.state != AddressState::Connected).then_some((id, addr))
            })
            .choose(&mut rand::thread_rng())
            .map(|(id, addr)| (*id, addr.to_owned()))
    }

Comment on lines +244 to 250
self.peers.entry(peer).and_modify(|p| {
p.kind = ConnectionKind::Feeler;
});
}
}

if version.kind == ConnectionKind::Feeler {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you didn't actually fix it, we modify the kind to be feeler on self.peers, but then we check against the Version message which won't be feeler.

Also what if this is an extra connection, we shouldn't convert it into a feeler even if we have excess peers.

…vice

Currently, `get_address_by_service` will return addresses that we are already
connected. Since the address selection logic first calls this function, and
only if it returns `None`, tries a random (not known-to-be-good peer), we
end up only getting connected peers in the main loop of `get_address_to_connect`.
In turn, since we don't return connected addresses from that function, we return
a large amount of `None`'s if we try to get address for a poorly populated service.

This commit now filters out connected peers, so we don't even consider them. If
there's no peers besides the connected ones, then we fallback to trying the "not
good" addresses.
We previously required NETWORK, which excludes pruned nodes.
`push_addresses` also requires NETWORK_LIMITED.
This test adds fixed addresses to the address manager and checks
whether they actually have been added.
This commit reduces the interval between connection attempts to 10 secods
(the default for NodeContext).
@Davidson-Souza
Copy link
Copy Markdown
Member Author

Pushed a6c831d:

  • Changed version in handle_peer_ready to fully fix the bug in e91f329
  • Cherry picked f6f37ce from [WIP] feat: assume-valid swift sync #837, since it helps with cases where redo inflight requests fail
  • Increased timeout to two minutes on SyncNode, I had several cases where I've banned all my utreexo peers due to timeout 😢

}

#[test]
fn test_adding_fixed_peer() {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed this supersedes test_add_fixed_addresses, right? We should just add here the not empty check to keep this strictly a super-set of checks

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, test_add_fixed_addresses is using AddressMan::add_fixed_addresses, this new test does it manually to have access to the raw data


let good_peers_count = self.connected_peers();
if good_peers_count > T::MAX_OUTGOING_PEERS {
// We allow utreexo and manual peers to bypass our connection limis
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// We allow utreexo and manual peers to bypass our connection limis
// We allow utreexo and manual peers to bypass our connection limits

let is_utreexo_peer = matches!(version.kind, ConnectionKind::Regular(services) if services.has(service_flags::UTREEXO.into()));
let is_manual_peer = version.kind == ConnectionKind::Manual;

if !(is_utreexo_peer || is_manual_peer) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comment on the extra peer is unresolved. I think we will convert any extra peer into feeler here, we should exclude those as well.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now it's fixed

Davidson-Souza and others added 5 commits March 24, 2026 10:40
Before this commit, we would turn peers that exceeds our maximum peers
count into a Feeler connection. However, we didn't update the services
and state for that address, making our node believe that was a `Failed`
connection, rather than a `Tried` one. This commit fixes this by removing
the early return and move that code to before the logic that handles
feeler connections.
… peers

This will make sure we can create more than one utreexo connection, currently
if we have only one utreexo peer, we won't use hardcoded addresses, and
potentially have only one for the whole IBD, slowing things down.
@Davidson-Souza
Copy link
Copy Markdown
Member Author

Pushed 857a80b:

  • Fixed a typo on docs
  • Don't mutate Extra peers into Feeler if they are above the limit
  • Use hardcoded addresses if AddressMan::enough_addresses return false

Copy link
Copy Markdown
Member

@JoseSK999 JoseSK999 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 857a80b; works well on signet

Copy link
Copy Markdown
Member

@luisschwab luisschwab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 857a80b

@Davidson-Souza Davidson-Souza merged commit 249a32c into getfloresta:master Mar 25, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

reliability Related to runtime reliability, stability and production readiness

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants