Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sliding Sync: Pre-populate room data for quick filtering/sorting #17512

Merged

Conversation

MadLittleMods
Copy link
Contributor

@MadLittleMods MadLittleMods commented Jul 31, 2024

Pre-populate room data for quick filtering/sorting in the Sliding Sync API

Spawning from #17450 (comment)

This PR is acting as the Synapse version N+1 step in the gradual migration being tracked by #17623

Adding two new database tables:

  • sliding_sync_joined_rooms: A table for storing room meta data that the local server is still participating in. The info here can be shared across all Membership.JOIN. Keyed on (room_id) and updated when the relevant room current state changes or a new event is sent in the room.
  • sliding_sync_membership_snapshots: A table for storing a snapshot of room meta data at the time of the local user's membership. Keyed on (room_id, user_id) and only updated when a user's membership in a room changes.

Also adds background updates to populate these tables with all of the existing data.

We want to have the guarantee that if a row exists in the sliding sync tables, we are able to rely on it (accurate data). And if a row doesn't exist, we use a fallback to get the same info until the background updates fill in the rows or a new event comes in triggering it to be fully inserted. This means we need a couple extra things in place until we bump SCHEMA_COMPAT_VERSION and run the foreground update in the N+2 part of the gradual migration. For context on why we can't rely on the tables without these things see [1].

  1. On start-up, block until we clear out any rows for the rooms that have had events since the max-stream_ordering of the sliding_sync_joined_rooms table (compare to max-stream_ordering of the events table). For sliding_sync_membership_snapshots, we can compare to the max-stream_ordering of local_current_membership
    • This accounts for when someone downgrades their Synapse version and then upgrades it again. This will ensure that we don't have any stale/out-of-date data in the sliding_sync_joined_rooms/sliding_sync_membership_snapshots tables since any new events sent in rooms would have also needed to be written to the sliding sync tables. For example a new event needs to bump event_stream_ordering in sliding_sync_joined_rooms table or some state in the room changing (like the room name). Or another example of someone's membership changing in a room affecting sliding_sync_membership_snapshots.
  2. Add another background update that will catch-up with any rows that were just deleted from the sliding sync tables (based on the activity in the events/local_current_membership). The rooms that need recalculating are added to the sliding_sync_joined_rooms_to_recalculate table.
  3. Making sure rows are fully inserted. Instead of partially inserting, we need to check if the row already exists and fully insert all data if not.

All of this extra functionality can be removed once the SCHEMA_COMPAT_VERSION is bumped with support for the new sliding sync tables so people can no longer downgrade (the N+2 part of the gradual migration).

[1]

For sliding_sync_joined_rooms, since we partially insert rows as state comes in, we can't rely on the existence of the row for a given room_id. We can't even rely on looking at whether the background update has finished. There could still be partial rows from when someone reverted their Synapse version after the background update finished, had some state changes (or new rooms), then upgraded again and more state changes happen leaving a partial row.

For sliding_sync_membership_snapshots, we insert items as a whole except for the forgotten column so we can rely on rows existing and just need to always use a fallback for the forgotten data. We can't use the forgotten column in the table for the same reasons above about sliding_sync_joined_rooms. We could have an out-of-date membership from when someone reverted their Synapse version. (same problems as outlined for sliding_sync_joined_rooms above)

Discussed in an internal meeting

TODO

  • Update stream_ordering/bump_stamp
  • Handle remote invites
  • Handle state resets
  • Consider adding sender so we can filter LEAVE memberships and distinguish from kicks.
    • We should add it to be able to tell leaves from kicks
  • Consider adding tombstone state to help address Sliding Sync: Add is_tombstoned: false filter in the defaults #17540
    • We should add it tombstone_successor_room_id
  • Consider adding forgotten status to avoid extra lookup/table-join on room_memberships
    • We should add it
  • Background update to fill in values for all joined rooms and non-join membership
  • Clean-up tables when room is deleted
  • Make sure tables are useful to our use case
  • Plan for how can we use this with a fallback
    • See plan discussed above in main area of the issue description
    • Discussed in an internal meeting
  • Plan for how we can rely on this new table without a fallback
    • Synapse version N+1: (this PR) Bump SCHEMA_VERSION to 87. Add new tables and background update to backfill all rows. Since this is a new table, we don't have to add any NOT VALID constraints and validate them when the background update completes. Read from new tables with a fallback in cases where the rows aren't filled in yet.
    • Synapse version N+2: Bump SCHEMA_VERSION to 88 and bump SCHEMA_COMPAT_VERSION to 87 because we don't want people to downgrade and miss writes while they are on an older version. Add a foreground update to finish off the backfill so we can read from new tables without the fallback. Application code can now rely on the new tables being populated.
    • Discussed in an internal meeting

Dev notes

SYNAPSE_TEST_LOG_LEVEL=INFO poetry run trial tests.storage.test_events.SlidingSyncPrePopulatedTablesTestCase

SYNAPSE_POSTGRES=1 SYNAPSE_POSTGRES_USER=postgres SYNAPSE_TEST_LOG_LEVEL=INFO poetry run trial tests.storage.test_events.SlidingSyncPrePopulatedTablesTestCase
SYNAPSE_TEST_LOG_LEVEL=INFO poetry run trial tests.handlers.test_sliding_sync.FilterRoomsTestCase

Reference:


persist_event (adds to queue)
_persist_event_batch
_persist_events_and_state_updates (assigns `stream_ordering` to events)
_persist_events_txn
	_store_event_txn
        _update_metadata_tables_txn
            _store_room_members_txn
	_update_current_state_txn

Concatenated Indexes [...] (also known as multi-column, composite or combined index)

[...] key consists of multiple columns.

We can take advantage of the fact that the first index column is always usable for searching

-- https://use-the-index-luke.com/sql/where-clause/the-equals-operator/concatenated-keys


Dealing with portdb (synapse/_scripts/synapse_port_db.py), #17512 (comment)


SQL queries:

Both of these are equivalent and work in SQLite and Postgres

Options 1:

WITH data_table (room_id, user_id, membership_event_id, membership, event_stream_ordering, {", ".join(insert_keys)}) AS (
    VALUES (
        ?, ?, ?,
        (SELECT membership FROM room_memberships WHERE event_id = ?),
        (SELECT stream_ordering FROM events WHERE event_id = ?),
        {", ".join("?" for _ in insert_values)}
    )
)
INSERT INTO sliding_sync_non_join_memberships
    (room_id, user_id, membership_event_id, membership, event_stream_ordering, {", ".join(insert_keys)})
SELECT * FROM data_table
WHERE membership != ?
ON CONFLICT (room_id, user_id)
DO UPDATE SET
    membership_event_id = EXCLUDED.membership_event_id,
    membership = EXCLUDED.membership,
    event_stream_ordering = EXCLUDED.event_stream_ordering,
    {", ".join(f"{key} = EXCLUDED.{key}" for key in insert_keys)}

Option 2:

INSERT INTO sliding_sync_non_join_memberships
    (room_id, user_id, membership_event_id, membership, event_stream_ordering, {", ".join(insert_keys)})
SELECT 
    column1 as room_id,
    column2 as user_id,
    column3 as membership_event_id,
    column4 as membership,
    column5 as event_stream_ordering,
    {", ".join("column" + str(i) for i in range(6, 6 + len(insert_keys)))}
FROM (
    VALUES (
        ?, ?, ?,
        (SELECT membership FROM room_memberships WHERE event_id = ?),
        (SELECT stream_ordering FROM events WHERE event_id = ?),
        {", ".join("?" for _ in insert_values)}
    )
) as v
WHERE membership != ?
ON CONFLICT (room_id, user_id)
DO UPDATE SET
    membership_event_id = EXCLUDED.membership_event_id,
    membership = EXCLUDED.membership,
    event_stream_ordering = EXCLUDED.event_stream_ordering,
    {", ".join(f"{key} = EXCLUDED.{key}" for key in insert_keys)}

If we don't need the membership condition, we could use:

INSERT INTO sliding_sync_non_join_memberships
    (room_id, membership_event_id, user_id, membership, event_stream_ordering, {", ".join(insert_keys)})
VALUES (
    ?, ?, ?,
    (SELECT membership FROM room_memberships WHERE event_id = ?),
    (SELECT stream_ordering FROM events WHERE event_id = ?),
    {", ".join("?" for _ in insert_values)}
)
ON CONFLICT (room_id, user_id)
DO UPDATE SET
    membership_event_id = EXCLUDED.membership_event_id,
    membership = EXCLUDED.membership,
    event_stream_ordering = EXCLUDED.event_stream_ordering,
    {", ".join(f"{key} = EXCLUDED.{key}" for key in insert_keys)}

Pull Request Checklist

  • Pull request is based on the develop branch
  • Pull request includes a changelog file. The entry should:
    • Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from EventStore to EventWorkerStore.".
    • Use markdown where necessary, mostly for code blocks.
    • End with either a period (.) or an exclamation mark (!).
    • Start with a capital letter.
    • Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry.
  • Code style is correct
    (run the linters)

This way if the row exists, we can rely on the information in it.
And only use a fallback for rows that don't exist.
…oom-meta-data

Conflicts:
	synapse/handlers/sliding_sync/__init__.py
@erikjohnston
Copy link
Member

Woo 🎉 🚀 🎉 🥳 🎈

@erikjohnston erikjohnston merged commit 1a6b718 into develop Aug 29, 2024
39 checks passed
@erikjohnston erikjohnston deleted the madlittlemods/sliding-sync-pre-populate-room-meta-data branch August 29, 2024 15:09
@MadLittleMods
Copy link
Contributor Author

@erikjohnston Super grateful that you were able to try this PR out multiple times on your own server to find all of the weird old corners in the data that we still need to deal with today 🙇

Thank you @erikjohnston and @reivilibre for all of the discussion so we can ensure that the data is reliable to use before and after the gradual migration is fully complete!

erikjohnston added a commit that referenced this pull request Aug 29, 2024
erikjohnston pushed a commit that referenced this pull request Aug 30, 2024
… sync tables (#17635)

Fix outlier re-persisting causing problems with sliding sync tables

Follow-up to #17512

When running on `matrix.org`, we discovered that a remote invite is
first persisted as an `outlier` and then re-persisted again where it is
de-outliered. The first the time, the `outlier` is persisted with one
`stream_ordering` but when persisted again and de-outliered, it is
assigned a different `stream_ordering` that won't end up being used.
Since we call `_calculate_sliding_sync_table_changes()` before
`_update_outliers_txn()` which fixes this discrepancy (always use the
`stream_ordering` from the first time it was persisted), we're working
with an unreliable `stream_ordering` value that will possibly be unused
and not make it into the `events` table.
erikjohnston added a commit that referenced this pull request Sep 1, 2024
erikjohnston added a commit that referenced this pull request Sep 1, 2024
Based on #17629

Utilizing the new sliding sync tables added in
#17512 for fast acquisition of
rooms for the user and filtering/sorting.

---------

Co-authored-by: Eric Eastwood <[email protected]>
erikjohnston added a commit that referenced this pull request Sep 5, 2024
erikjohnston pushed a commit that referenced this pull request Sep 9, 2024
…t's faster (#17658)

Get `bump_stamp` from [new sliding sync
tables](#17512) which should
be faster (performance) than flipping through the latest events in the
room.
erikjohnston added a commit that referenced this pull request Sep 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants