Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(iceberg): only convert iceberg table to iceberg source for batch dql #20045

Conversation

chenzl25
Copy link
Contributor

@chenzl25 chenzl25 commented Jan 7, 2025

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

  • We should only convert the iceberg table to iceberg source for batch dql, since other batch sql (especially dml update and delete) should read the hummock table and use indexes to improve the performance and freshness.

Checklist

  • I have written necessary rustdoc comments.
  • I have added necessary unit tests and integration tests.
  • I have added test labels as necessary.
  • I have added fuzzing tests or opened an issue to track them.
  • My PR contains breaking changes.
  • My PR changes performance-critical code, so I will run (micro) benchmarks and present the results.
  • My PR contains critical fixes that are necessary to be merged into the latest release.

Documentation

  • My PR needs documentation updates.
Release note

@chenzl25 chenzl25 requested a review from fuyufjh January 7, 2025 07:14
@github-actions github-actions bot added the type/fix Bug fix label Jan 7, 2025
@chenzl25 chenzl25 requested review from Li0k and kwannoel January 7, 2025 07:14
Copy link
Contributor

@kwannoel kwannoel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between using iceberg table for DQL and iceberg source for DQL?
When running DQL against iceberg table does it use the row-based replica rather than the actual iceberg table, and hence we convert it to iceberg source?

I suppose we may also want to let user configure this down the road, since their SQL could have a combination of DQL and DML.

@chenzl25
Copy link
Contributor Author

chenzl25 commented Jan 7, 2025

What's the difference between using iceberg table for DQL and iceberg source for DQL? When running DQL against iceberg table does it use the row-based replica rather than the actual iceberg table, and hence we convert it to iceberg source?

I suppose we may also want to let user configure this down the road, since their SQL could have a combination of DQL and DML.

Let me change the ambiguous word iceberg table and iceberg source to row store and column store. The column store and the row store, which is the best one we want to use. As we don't have a cost-based optimizer to help us compare the cost among different plans, we should choose a default option for users. For DML, it seems choosing the row store is more beneficial since we can utilize indexes to accelerate point update/delete and more importantly there is a freshness lag between row store and column store right now, if we read column store to performance dml, which means we can't manipulate a row until it has been committed to iceberg, which might be a bad use experience for a demo. Last, it's fair to provide a session variable to allow users to control whether read row store or column store.

@chenzl25 chenzl25 added this pull request to the merge queue Jan 7, 2025
@chenzl25 chenzl25 mentioned this pull request Jan 7, 2025
16 tasks
Merged via the queue into main with commit ec36f40 Jan 7, 2025
30 of 31 checks passed
@chenzl25 chenzl25 deleted the dylan/only_convert_iceberg_table_to_iceberg_source_for_batch_dql branch January 7, 2025 09:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants