Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gossip sub attestation subnet scoring #3029

Draft
wants to merge 7 commits into
base: unstable
Choose a base branch
from
Draft

Gossip sub attestation subnet scoring #3029

wants to merge 7 commits into from

Conversation

Menduist
Copy link
Contributor

Completely reworked scoring in gossipsub

First goal was to also score peers in the attestation subnet.
Previously, we only scored peers in global topics which was an issue when running with many peers, since we would only have a few of them in global topics, and the majority of peers wouldn't have a score, which would break, amongst other things, the opportunistic grafting.

The scoring system in gossipsub 1.1 is heavily convoluted, I tried my best to explain it in layman terms, and add helpers functions which allows to input "understable" data to generate the params.
I've not tweaked this yet, so WIP

period = slotPeriod,
averageOverNPeriods = ATTESTATION_SUBNET_COUNT, # Smooth out empty committees
peersPerTopic = peersPerTopic,
expectedMessagesPerPeriod = TARGET_COMMITTEE_SIZE.int, #TODO use current number
Copy link
Contributor Author

@Menduist Menduist Oct 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If someone has a ~1-liner to get the current average committee size (or number of active validators) from Eth2Node, I'll take it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't have it in here (network doesn't know about dag), but it changes only once per epoch so onSlotEnd could update it by grabbing an getEpochRef for (head, wallslot) similar to how it sets up attestation subnets

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more to the point - we might need to update the scoring while nimbus is running - for example during sync, we'll go from 20k "known" validators to 250k

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right, didn't think about sync.
But we only subscribe to gossip once synced, no?

My intuition was that the number grows/reduce slowly enough to avoid updating it OTF, but we can do it if needed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we shouldn't assume really that users restart nimbus regularly (they shouldn't have to) - ie there are still users that haven't upgraded since genesis

@github-actions
Copy link

github-actions bot commented Oct 26, 2021

Unit Test Results

     12 files  ±0     764 suites  ±0   33m 4s ⏱️ - 1m 19s
1 540 tests ±0  1 524 ✔️ ±0  16 💤 ±0  0 ±0 
9 256 runs  ±0  9 192 ✔️ ±0  64 💤 ±0  0 ±0 

Results for commit ad74b5d. ± Comparison against base commit d32c25e.

♻️ This comment has been updated with latest results.

# In order of priority:
# - A badly behaving peer (eg, sending bad messages) will be heavily sanctionned
# - A good peer will gain good scores
# - Inactive/slow peers will be mildly sanctionned.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# - Inactive/slow peers will be mildly sanctionned.
# - Inactive/slow peers will be mildly sanctioned.

@arnetheduck
Copy link
Member

Before we merge this, we should likely do 2 things:

  • add an option in nimbus to enable/disable it
  • add metrics in libp2p to monitor the outcome of the parameter settings

slotPeriod = chronos.seconds(12),
peersPerTopic = 8,
TopicType(topicType)
).validateParameters().tryGet()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validateParameters should probably be replaced by a TopicParameters.init that returns a Result

timeToEndValue = period * averageOverNPeriods.int * 2,
heartbeatPeriod)

# Start to remove up to 30 points when peer send less
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can bring a peer into negative score territory?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still tweaking this, but my intuition is that it should be
FirstMessageDeliveriesBonus > InactivePenalty > TimeInMeshBonus

Currently, it's 80 > -30 > 20
A perfect peer would have 100 per topic. If it suddenly becomes inactive, he would gradually (in averageOverNPeriods) drop to -10 (InactivePenalty + TimeInMeshBonus), so it would be negative, meaning you can't grind for points, then become inactive.

So yeah, being inactive could bring to negative score, if you don't do good things aside. Now, what we do when they are into negative score is a different question. As the moment, this PR is fairly conservative on actions (to reach -40 you would need to be inactive on a lot of topics, or more probably, send a lot of invalid messages), since the primary goal as of now is to restore the opportunistic grafting, which helps a lot with inactive peers.

ATM i'm not very happy on the topicWeight computation, need to rework that

@Menduist
Copy link
Contributor Author

Menduist commented Nov 3, 2021

I've been running this for a while, and it seem to give reasonable scores.

Only weird thing I saw, we often give invalid message penalties to peer on beacon_aggregate_and_proof, which could indicate a incompatibility somewhere (or malicious peer, maybe)

add an option in nimbus to enable/disable it

I'm not sure we need it in this case, because the impact is very minimal (we almost never kick peers because of it, we just prune inactive peers for a while)

I'll continue to monitor it, add the dynamic validator count, and should be GTG.
Scoring metrics in libp2p indeed need to be improved, I've added it to vacp2p/nim-libp2p#633

@arnetheduck arnetheduck force-pushed the unstable branch 2 times, most recently from 657f9d5 to a4667d1 Compare January 6, 2022 16:14
Base automatically changed from unstable to stable February 15, 2022 20:57
@zah zah changed the base branch from stable to unstable February 15, 2022 22:08
@Menduist Menduist mentioned this pull request Jun 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants