Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(meta): fix notification when dropping tables #20010

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

zwang28
Copy link
Contributor

@zwang28 zwang28 commented Jan 3, 2025

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

When dropping a table from catalog, the corresponding state tables will be later unregistered from Hummock either after the drop-stream-job barrier succeeds or during recovery. However there is a corner case that can cause a deadlock situation:

  1. The cluster encounters backpressure originating from Hummock. So the earliest barrier becomes stuck. It is expected to be resolved via Hummock compaction.
  2. User drops the table. Meta removes the table from catalog immediately on receiving the drop command. But Hummock won't remove the table until the barrier finishes, which is stuck already.
  3. At the moment, compaction task related to this dropped table will always fail due to the inconsistency between catalog and Hummock. So the backpressure will never recover. It's a deadlock situation. Neither the barrier or the compaction can make any progress.

This PR addresses the issue by ensuring that notifications of table drops are sent to the HummockObserver and CompactorObserver only after the barrier corresponding to the drop command has been completed.

This PR also fixes the issue that notifications of catalog deletion are never sent to HummockObserver and CompactorObserver, which is likely a bug introduced in SQL backend.

  • For catalog additions and updates, those observers will maintain their current behavior, which is to fetch data on demand. The rationale behind this decision is to simplify meta node notifications, as there are many scenarios that can trigger catalog additions and updates. Not all of these notifications are necessary for the HummockObserver and CompactorObserver, as they are primarily concerned with one catalog type, Table.

related #15144

Checklist

  • I have written necessary rustdoc comments.
  • I have added necessary unit tests and integration tests.
  • I have added test labels as necessary.
  • I have added fuzzing tests or opened an issue to track them.
  • My PR contains breaking changes.
  • My PR changes performance-critical code, so I will run (micro) benchmarks and present the results.
  • My PR contains critical fixes that are necessary to be merged into the latest release.

Documentation

  • My PR needs documentation updates.
Release note

@zwang28 zwang28 requested a review from hzxa21 January 6, 2025 03:32
@zwang28 zwang28 enabled auto-merge January 6, 2025 03:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants