Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Perf] Add Relationship Index Storing Caveat #1988

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

lalalalatt
Copy link

No description provided.

@github-actions github-actions bot added the area/datastore Affects the storage system label Jul 18, 2024
Copy link

github-actions bot commented Jul 18, 2024

CLA Assistant Lite bot All contributors have signed the CLA ✍️ ✅

@lalalalatt

This comment was marked as resolved.

@lalalalatt lalalalatt force-pushed the perf/crdb-caveat-index branch from 908fd62 to 821f59b Compare July 18, 2024 17:25
authzedbot added a commit to authzed/cla that referenced this pull request Jul 18, 2024
@lalalalatt lalalalatt marked this pull request as ready for review July 19, 2024 15:05
@lalalalatt lalalalatt requested a review from a team July 19, 2024 15:05
authzedbot added a commit to authzed/cla that referenced this pull request Jul 23, 2024
@vroldanbet
Copy link
Contributor

I believe this PR was opened to address #1923

Copy link
Contributor

@vroldanbet vroldanbet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have concerns about this PR. The reason the issue was created in the first place is because the CRDB UI shows it as recommended. This is likely so that the query becomes served from the index, instead of having to go to disk to fetch the rest of the columns.

I have three concerns about the proposal:

  • it drops the previous indexes. This means that until the new indexes are created, the system is left without an index. This has the potential to cause an incident on a production system, as it can push the database over saturation level.
  • further increases the cost on the write path, as the system now has to write more information to disk, potentially making writes slower.
  • increases the amount of storage needed. While it is reasonable to trade disk for query speed, customers running CRDB Dedicated can see their bill explode, as Cockroach Labs puts a hefty price tag on the disk capacity of their managed offerings.

I don't think we can move forward with this without:

  • running a load test to assess how much queries improve and if it's worth the tradeoff
  • determining the % increase in disk used
  • finding a way to add those indexes without causing an incident (the easiest solution being to consider running 2 migrations, or exploring ALTER the indexes)

@lalalalatt
Copy link
Author

I believe this PR was opened to address #1923

Yes, you're right, thanks. I forgot to add the description.

@lalalalatt
Copy link
Author

I have concerns about this PR. The reason the issue was created in the first place is because the CRDB UI shows it as recommended. This is likely so that the query becomes served from the index, instead of having to go to disk to fetch the rest of the columns.

I have three concerns about the proposal:

  • it drops the previous indexes. This means that until the new indexes are created, the system is left without an index. This has the potential to cause an incident on a production system, as it can push the database over saturation level.

  • further increases the cost on the write path, as the system now has to write more information to disk, potentially making writes slower.

  • increases the amount of storage needed. While it is reasonable to trade disk for query speed, customers running CRDB Dedicated can see their bill explode, as Cockroach Labs puts a hefty price tag on the disk capacity of their managed offerings.

I don't think we can move forward with this without:

  • running a load test to assess how much queries improve and if it's worth the tradeoff

  • determining the % increase in disk used

  • finding a way to add those indexes without causing an incident (the easiest solution being to consider running 2 migrations, or exploring ALTER the indexes)

I would try to work on the stress test to test the two metrics you mentioned.

For the index drop and create issue, I have searched on that, (currently) it seems it is impossible to alter the standard index to the storing column index.
Moreover, maybe require to contribute this feature to cockroachDB upstream.
But, For the current workaround, I think we can create the storing column index first, after finishing indexing, then it would be safe to drop the original index.

@vroldanbet
Copy link
Contributor

But, For the current workaround, I think we can create the storing column index first, after finishing indexing, then it would be safe to drop the original index.

I'm not entirely sure this would work either. There are some caveats:

This is unfortunately a difficult migration to run without some careful planning. E.g. we have customers with Terabytes worth of SpiceDB relationships in their Cockroach clusters. This migration would:

  • increase the load on the cluster for a non-trivial amount of time (proportional to the amount of data stored)
  • increase write transaction latency
  • increase the disk capacity required to operate, as a new index covering all tuple columns is added.

I'd like to see these optimizations make it into main, but it's a risky change as-is. In an ideal world:

  • The code can estimate if there is enough disk available to run the migration
  • the old indexes are only dropped when the new indexes have been validated to be selected by the query planner and provide a visible performance gain
  • the index creation runs with low priority and limits the impact on the cluster

Perhaps a conservative approach would be:

  • To introduce logic that determines the size of the indexes to be dropped, and checks if at least the same amount plus some headroom (to account for the newly stored caveat columns) is available on cluster
  • the indexes are made NOT_VISIBLE to begin with, and a feature flag in SpiceDB enables / disables the use of new index. It's enabled by default, but can be disabled if users detect performance regressions
  • We can explore if using transaction priority buys us anything when it comes to running DDL statements like CREATE INDEX or DROP INDEX.
  • On a different subsequent SpiceDB release the new indexes can be made visible, and the old indexes can be dropped safely.

Would like to hear your thoughts on this issue @josephschorr

@lalalalatt
Copy link
Author

{
numTuples: 256,
bufferTimeout: 1 * time.Nanosecond,
expectFallBehind: true,
},

If I change tuple num to 512, then the test 256-true would pass, while its testing name would be 512-true

@vroldanbet
Copy link
Contributor

If I change tuple num to 512, then the test 256-true would pass, while its testing name would be 512-true

I think that's fine, the goal of the test is to push so many changes the consumer cannot retrieve them for the channel fast enough, a timeout will be triggered if the channel is at capacity, and if enough time elapses the channel Watch API connection will be aborted. Feel free to increase it, id say this is likely a flake.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/datastore Affects the storage system
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants