Skip to content

Conversation

@erikjohnston
Copy link
Member

@erikjohnston erikjohnston commented Nov 20, 2025

Fixes #19175

This PR moves tracking of what lazy loaded membership we've sent to each room out of the required state table. This avoids that table from continuously growing, which massively helps performance as we pull out all matching rows for the connection when we receive a request.

The new table is only read when we have data in a room to send, so we end up reading a lot fewer rows from the DB. Though we now read from that table for every room we have events to return in, rather than once at the start of the request.

For an explanation of how the new table works, see the comment on the table schema.

The table is designed so that we can later prune old entries if we wish, but that is not implemented in this PR.

Reviewable commit-by-commit.

We then filter them out before sending to the client, but it is
unnecessary to do so and interferes with later changes.
This is so that clients know if they can use a cached `/members`
response or not.
@erikjohnston erikjohnston force-pushed the erikj/sss_better_membership_storage2 branch from f67e114 to 0d6ccbe Compare November 20, 2025 13:43
This ensures that the set of required state doesn't keep growing as we
add and remove member state. We then only load them from the DB when
needed, rather than all state for all rooms when we get a request.
It was thinking the table name was `IN`, as it matched
`connection_positi(on IS) NULL`.
@erikjohnston erikjohnston force-pushed the erikj/sss_better_membership_storage2 branch from 0d6ccbe to 4984858 Compare November 20, 2025 13:52
@erikjohnston erikjohnston marked this pull request as ready for review November 20, 2025 15:52
@erikjohnston erikjohnston requested a review from a team as a code owner November 20, 2025 15:52
Copy link
Contributor

@MadLittleMods MadLittleMods left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't fully onboarded onto the concept and details to be confident in the approach.

Comment on lines +1539 to +1548
Attributes:
required_state_map_change: The updated required state map to store in
the room config, or None if there is no change.
added_state_filter: The state filter to use to fetch any additional
current state that needs to be returned to the client.
lazy_members_previously_returned: The set of user IDs we should add to
the lazy members cache that we had previously returned.
lazy_members_invalidated: The set of user IDs whose membership has
changed but we didn't send down, so we need to invalidate them from
the cache.
Copy link
Contributor

@MadLittleMods MadLittleMods Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is the standard way for the docstring but some of these attributes are a bit tricky and I'd rather see the docstring when I hover the attributes.

Potential to convert to the """ variant below the property itself

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's really annoying that VSCode doesn't pick these up, it does for functions.

I'm not sure about moving style, especially since its a bit confusing that the docstring for attributes go after the attribute.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it does for functions because the spec says it should do that.

For attributes, it feels like unfortunate Python decisions but we get this from https://peps.python.org/pep-0257/ (2001)

String literals occurring immediately after a simple assignment at the top level of a module, class, or __init__ method are called “attribute docstrings”.

And expanded upon in https://peps.python.org/pep-0258/#attribute-docstrings (2001)

A string literal immediately following an assignment statement is interpreted by the docstring extraction machinery as the docstring of the target of the assignment statement, under the following conditions:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed in the backend team lobby.

Seems to be a strong preference to using """ documentation for attributes:

  • Able to write large docstrings for an attribute without the whole function docstring getting unreasonably large
  • LSP support > weird Python decisions on placement

Comment on lines 1089 to 1101
else:
# For non-limited timelines we always return all
# membership changes. This is so that clients
# who have fetched the full membership list
# already can continue to maintain it for
# non-limited syncs.
#
# This assumes that for non-limited syncs there
# won't be many membership changes that wouldn't
# have been included already (this can only
# happen if membership state was rolled back due
# to state resolution anyway).
required_state_types.append((EventTypes.Member, None))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a bigger behavioral change.

I think this fixes #18782 🤔 - If so, we should add a test.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, did mean to factor that out but it sneaked in as it needs to be accounted for in the lazy loading stuff.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added with test_lazy_load_state_reset

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this only fixes it for non-limited syncs. I think we should also return state reset membership in limited timeline scenarios as well.

We should at-least leave a FIXME with a link to the issue in the if-block above.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want to return all membership changes when it is limited? Only the ones for users that appear in the timeline / required_state?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure (we should better clarify these semantics in the MSC once we decide and have reasoning).

If we've previously sent down the membership, feels like we should give them an update.

But to compare with a normal membership update in the limited scenario, it would only be relevant if it was part of the timeline.

So I guess the same would apply. If the state reset/rollback happened in the timeline range, we should give an update. Instead of trying to figure out that intricacy (although we do have delta.stream_id if we wanted to), we could just always assume that state rollbacks are relevant. Check for delta.event_type == EventTypes.Member and delta.event_id is None

@erikjohnston erikjohnston force-pushed the erikj/sss_better_membership_storage2 branch from fe94608 to ec45e00 Compare November 25, 2025 11:12
@erikjohnston erikjohnston force-pushed the erikj/sss_better_membership_storage2 branch from 6c2cf0d to 2e844aa Compare November 25, 2025 14:28
Comment on lines +1539 to +1548
Attributes:
required_state_map_change: The updated required state map to store in
the room config, or None if there is no change.
added_state_filter: The state filter to use to fetch any additional
current state that needs to be returned to the client.
lazy_members_previously_returned: The set of user IDs we should add to
the lazy members cache that we had previously returned.
lazy_members_invalidated: The set of user IDs whose membership has
changed but we didn't send down, so we need to invalidate them from
the cache.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it does for functions because the spec says it should do that.

For attributes, it feels like unfortunate Python decisions but we get this from https://peps.python.org/pep-0257/ (2001)

String literals occurring immediately after a simple assignment at the top level of a module, class, or __init__ method are called “attribute docstrings”.

And expanded upon in https://peps.python.org/pep-0258/#attribute-docstrings (2001)

A string literal immediately following an assignment statement is interpreted by the docstring extraction machinery as the docstring of the target of the assignment statement, under the following conditions:

erikjohnston and others added 20 commits December 2, 2025 13:16
Currently we always persist a new position when using lazy loading,
which is needless.
@MadLittleMods MadLittleMods added A-3PID Issues affecting third-party identifiers and invites and removed A-3PID Issues affecting third-party identifiers and invites labels Dec 3, 2025
if prev_room_sync_config is not None:
# Define `required_user_state` as all user state we want, which
# is the explicitly requested members, any needed for lazy
# loading, and users whose membership has changed.s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# loading, and users whose membership has changed.s
# loading, and users whose membership has changed.

Comment on lines 1089 to 1101
else:
# For non-limited timelines we always return all
# membership changes. This is so that clients
# who have fetched the full membership list
# already can continue to maintain it for
# non-limited syncs.
#
# This assumes that for non-limited syncs there
# won't be many membership changes that wouldn't
# have been included already (this can only
# happen if membership state was rolled back due
# to state resolution anyway).
required_state_types.append((EventTypes.Member, None))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure (we should better clarify these semantics in the MSC once we decide and have reasoning).

If we've previously sent down the membership, feels like we should give them an update.

But to compare with a normal membership update in the limited scenario, it would only be relevant if it was part of the timeline.

So I guess the same would apply. If the state reset/rollback happened in the timeline range, we should give an update. Instead of trying to figure out that intricacy (although we do have delta.stream_id if we wanted to), we could just always assume that state rollbacks are relevant. Check for delta.event_type == EventTypes.Member and delta.event_id is None

state_filter=StateFilter.from_types(hero_room_state),
to_token=to_token,
)
room_state.update(hero_membership_state)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is good as-is ⏩

This was just redundant data that the client already had.

# Normalize to proper user ID
state_key = user_id

# We remember the user if either they haven't been invalidated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# We remember the user if either they haven't been invalidated
# We remember the user if they haven't been invalidated

No longer an "either a) or b)" scenario since #19206 (comment)

# down.
self.assertIsNone(response_body["rooms"][room_id1].get("required_state"))

def test_lazy_loaded_last_seen_ts(self) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def test_lazy_loaded_last_seen_ts(self) -> None:
def test_lazy_loading_room_members_last_seen_ts(self) -> None:

exact=True,
)

def test_lazy_members_forked_position(self) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def test_lazy_members_forked_position(self) -> None:
def test_lazy_loading_room_members_forked_position(self) -> None:

exact=True,
)

def test_lazy_members_across_multiple_connections(self) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def test_lazy_members_across_multiple_connections(self) -> None:
def test_lazy_loading_room_members_across_multiple_connections(self) -> None:

exact=True,
)

def test_lazy_members_across_multiple_rooms(self) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def test_lazy_members_across_multiple_rooms(self) -> None:
def test_lazy_loading_room_members_across_multiple_rooms(self) -> None:

exact=True,
)

def test_lazy_members_limited_sync(self) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def test_lazy_members_limited_sync(self) -> None:
def test_lazy_loading_room_members_limited_sync(self) -> None:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Slow sliding sync when connection metadata gets large

3 participants