-
Notifications
You must be signed in to change notification settings - Fork 421
Fix sliding sync performance slow down for long lived connections. #19206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
We then filter them out before sending to the client, but it is unnecessary to do so and interferes with later changes.
This is so that clients know if they can use a cached `/members` response or not.
f67e114 to
0d6ccbe
Compare
This ensures that the set of required state doesn't keep growing as we add and remove member state. We then only load them from the DB when needed, rather than all state for all rooms when we get a request.
It was thinking the table name was `IN`, as it matched `connection_positi(on IS) NULL`.
0d6ccbe to
4984858
Compare
MadLittleMods
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't fully onboarded onto the concept and details to be confident in the approach.
synapse/storage/schema/main/delta/93/02_sliding_sync_members.sql
Outdated
Show resolved
Hide resolved
| Attributes: | ||
| required_state_map_change: The updated required state map to store in | ||
| the room config, or None if there is no change. | ||
| added_state_filter: The state filter to use to fetch any additional | ||
| current state that needs to be returned to the client. | ||
| lazy_members_previously_returned: The set of user IDs we should add to | ||
| the lazy members cache that we had previously returned. | ||
| lazy_members_invalidated: The set of user IDs whose membership has | ||
| changed but we didn't send down, so we need to invalidate them from | ||
| the cache. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is the standard way for the docstring but some of these attributes are a bit tricky and I'd rather see the docstring when I hover the attributes.
Potential to convert to the """ variant below the property itself
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's really annoying that VSCode doesn't pick these up, it does for functions.
I'm not sure about moving style, especially since its a bit confusing that the docstring for attributes go after the attribute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it does for functions because the spec says it should do that.
For attributes, it feels like unfortunate Python decisions but we get this from https://peps.python.org/pep-0257/ (2001)
String literals occurring immediately after a simple assignment at the top level of a module, class, or
__init__method are called “attribute docstrings”.
And expanded upon in https://peps.python.org/pep-0258/#attribute-docstrings (2001)
A string literal immediately following an assignment statement is interpreted by the docstring extraction machinery as the docstring of the target of the assignment statement, under the following conditions:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed in the backend team lobby.
Seems to be a strong preference to using """ documentation for attributes:
- Able to write large docstrings for an attribute without the whole function docstring getting unreasonably large
- LSP support > weird Python decisions on placement
| else: | ||
| # For non-limited timelines we always return all | ||
| # membership changes. This is so that clients | ||
| # who have fetched the full membership list | ||
| # already can continue to maintain it for | ||
| # non-limited syncs. | ||
| # | ||
| # This assumes that for non-limited syncs there | ||
| # won't be many membership changes that wouldn't | ||
| # have been included already (this can only | ||
| # happen if membership state was rolled back due | ||
| # to state resolution anyway). | ||
| required_state_types.append((EventTypes.Member, None)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a bigger behavioral change.
I think this fixes #18782 🤔 - If so, we should add a test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, did mean to factor that out but it sneaked in as it needs to be accounted for in the lazy loading stuff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added with test_lazy_load_state_reset ✅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, this only fixes it for non-limited syncs. I think we should also return state reset membership in limited timeline scenarios as well.
We should at-least leave a FIXME with a link to the issue in the if-block above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we want to return all membership changes when it is limited? Only the ones for users that appear in the timeline / required_state?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure (we should better clarify these semantics in the MSC once we decide and have reasoning).
If we've previously sent down the membership, feels like we should give them an update.
But to compare with a normal membership update in the limited scenario, it would only be relevant if it was part of the timeline.
So I guess the same would apply. If the state reset/rollback happened in the timeline range, we should give an update. Instead of trying to figure out that intricacy (although we do have delta.stream_id if we wanted to), we could just always assume that state rollbacks are relevant. Check for delta.event_type == EventTypes.Member and delta.event_id is None
…iously_returned in tests
Co-authored-by: Eric Eastwood <[email protected]>
When fetching previously sent lazy members we didn't filter by room, which meant that we didn't send down member events in a room if we'd previously sent that user's member event in another room.
fe94608 to
ec45e00
Compare
6c2cf0d to
2e844aa
Compare
synapse/storage/schema/main/delta/93/02_sliding_sync_members.sql
Outdated
Show resolved
Hide resolved
| Attributes: | ||
| required_state_map_change: The updated required state map to store in | ||
| the room config, or None if there is no change. | ||
| added_state_filter: The state filter to use to fetch any additional | ||
| current state that needs to be returned to the client. | ||
| lazy_members_previously_returned: The set of user IDs we should add to | ||
| the lazy members cache that we had previously returned. | ||
| lazy_members_invalidated: The set of user IDs whose membership has | ||
| changed but we didn't send down, so we need to invalidate them from | ||
| the cache. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it does for functions because the spec says it should do that.
For attributes, it feels like unfortunate Python decisions but we get this from https://peps.python.org/pep-0257/ (2001)
String literals occurring immediately after a simple assignment at the top level of a module, class, or
__init__method are called “attribute docstrings”.
And expanded upon in https://peps.python.org/pep-0258/#attribute-docstrings (2001)
A string literal immediately following an assignment statement is interpreted by the docstring extraction machinery as the docstring of the target of the assignment statement, under the following conditions:
…embership_storage2
Co-authored-by: Eric Eastwood <[email protected]>
Co-authored-by: Eric Eastwood <[email protected]>
Co-authored-by: Eric Eastwood <[email protected]>
Co-authored-by: Eric Eastwood <[email protected]>
Co-authored-by: Eric Eastwood <[email protected]>
Currently we always persist a new position when using lazy loading, which is needless.
| if prev_room_sync_config is not None: | ||
| # Define `required_user_state` as all user state we want, which | ||
| # is the explicitly requested members, any needed for lazy | ||
| # loading, and users whose membership has changed.s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # loading, and users whose membership has changed.s | |
| # loading, and users whose membership has changed. |
| else: | ||
| # For non-limited timelines we always return all | ||
| # membership changes. This is so that clients | ||
| # who have fetched the full membership list | ||
| # already can continue to maintain it for | ||
| # non-limited syncs. | ||
| # | ||
| # This assumes that for non-limited syncs there | ||
| # won't be many membership changes that wouldn't | ||
| # have been included already (this can only | ||
| # happen if membership state was rolled back due | ||
| # to state resolution anyway). | ||
| required_state_types.append((EventTypes.Member, None)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure (we should better clarify these semantics in the MSC once we decide and have reasoning).
If we've previously sent down the membership, feels like we should give them an update.
But to compare with a normal membership update in the limited scenario, it would only be relevant if it was part of the timeline.
So I guess the same would apply. If the state reset/rollback happened in the timeline range, we should give an update. Instead of trying to figure out that intricacy (although we do have delta.stream_id if we wanted to), we could just always assume that state rollbacks are relevant. Check for delta.event_type == EventTypes.Member and delta.event_id is None
| state_filter=StateFilter.from_types(hero_room_state), | ||
| to_token=to_token, | ||
| ) | ||
| room_state.update(hero_membership_state) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is good as-is ⏩
This was just redundant data that the client already had.
| # Normalize to proper user ID | ||
| state_key = user_id | ||
|
|
||
| # We remember the user if either they haven't been invalidated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # We remember the user if either they haven't been invalidated | |
| # We remember the user if they haven't been invalidated |
No longer an "either a) or b)" scenario since #19206 (comment)
| # down. | ||
| self.assertIsNone(response_body["rooms"][room_id1].get("required_state")) | ||
|
|
||
| def test_lazy_loaded_last_seen_ts(self) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| def test_lazy_loaded_last_seen_ts(self) -> None: | |
| def test_lazy_loading_room_members_last_seen_ts(self) -> None: |
| exact=True, | ||
| ) | ||
|
|
||
| def test_lazy_members_forked_position(self) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| def test_lazy_members_forked_position(self) -> None: | |
| def test_lazy_loading_room_members_forked_position(self) -> None: |
| exact=True, | ||
| ) | ||
|
|
||
| def test_lazy_members_across_multiple_connections(self) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| def test_lazy_members_across_multiple_connections(self) -> None: | |
| def test_lazy_loading_room_members_across_multiple_connections(self) -> None: |
| exact=True, | ||
| ) | ||
|
|
||
| def test_lazy_members_across_multiple_rooms(self) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| def test_lazy_members_across_multiple_rooms(self) -> None: | |
| def test_lazy_loading_room_members_across_multiple_rooms(self) -> None: |
| exact=True, | ||
| ) | ||
|
|
||
| def test_lazy_members_limited_sync(self) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| def test_lazy_members_limited_sync(self) -> None: | |
| def test_lazy_loading_room_members_limited_sync(self) -> None: |
Fixes #19175
This PR moves tracking of what lazy loaded membership we've sent to each room out of the required state table. This avoids that table from continuously growing, which massively helps performance as we pull out all matching rows for the connection when we receive a request.
The new table is only read when we have data in a room to send, so we end up reading a lot fewer rows from the DB. Though we now read from that table for every room we have events to return in, rather than once at the start of the request.
For an explanation of how the new table works, see the comment on the table schema.
The table is designed so that we can later prune old entries if we wish, but that is not implemented in this PR.
Reviewable commit-by-commit.