-
Notifications
You must be signed in to change notification settings - Fork 1
Sharding strategy and recommendations #355
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughDocumentation updates that retitle and restructure sharding/partitioning guidance, raise the recommended minimum shard size to 10 GB, replace prior strategy text with benchmark-driven sizing and shard-count guidance (10–50 GB per shard; up to 1,000 shards/node default), and reorganize sharding performance content into sizing, ingestion, and allocation considerations. Changes
Sequence Diagram(s)(omitted — changes are documentation-only and do not introduce or alter control flow) Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20–30 minutes Possibly related issues
Suggested labels
Suggested reviewers
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
docs/performance/sharding.md
Outdated
Smaller shards also result in reduced Lucene index efficiency, which can adversely | ||
affect computed search term relevance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@seut mentioned:
That's not true as it stands. The number of segments within a shard affects query performance because more segments have to be viewed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Segments" are currently not yet part of this discussion / document. Maybe we'd introduce them first? For now, I am deleting this statement? If you can up with a better one, or if you think we should swap this sentence verbatim at this very spot without ado as proposed, let me know:
The number of segments within a shard affects query performance because more segments have to be viewed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed per aaf9c53, but later added the statement at a dedicated section per 874339f. Let me know if you think we should expand that, or if you would like to see it deleted altogether.
-- https://cratedb-guide--355.org.readthedocs.build/performance/sharding.html#segments
docs/performance/sharding.md
Outdated
If you are looking for an intro to sharding, see {ref}`sharding | ||
<crate-reference:ddl-sharding>`. | ||
The optimal approach balances shard count with shard size. Individual shards should | ||
typically contain 3-70 GB of data, with 10-50 GB being the sweet spot for most |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
5b28873 brings in relevant adjustments.
To avoid _oversharding_, CrateDB by default limits the number of shards per node to | ||
1_000 as a critical stability boundary. Any operation that would exceed that limit | ||
leads to an exception. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@seut mentioned about this:
Actually, it's more of a protection than a ‘critical stability boundary’ in my view, as it depends on the environment, i.e. what kind of hardware I'm running.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Elsewhere, it is called "soft limit". Shall we also use that term here?
[...] there is a soft limit of 1000 shards per node;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think protection limit
is much more helpful than soft limit
. Otherwise I'd question what this soft limit
actually mean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docs/performance/sharding.md
Outdated
::: | ||
|
||
## Optimising for query performance | ||
### Avoid imbalances |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About detecting imbalances, let's quickly introduce and refer to the XMover utility at this spot as soon as it is released.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please review, so we can release the utility and mention it here.
CrateDB also has replicas of data and this results in additional shards in the | ||
cluster. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we remove this, isn't this interesting to note?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This statement has been refactored to the "advanced" page to reduce cluttering of valuable pro-tip information across different pages.
-- https://cratedb-guide--355.org.readthedocs.build/performance/sharding.html#partitions
:::{danger} | ||
**Over-sharding and over-partitioning** | ||
|
||
Sharding can drastically improve the performance on large datasets. | ||
However, having too many small shards will most likely degrade performance. | ||
Over-sharding and over-partitioning are common flaws leading to an overall | ||
poor performance. | ||
|
||
**As a rule of thumb, a single shard should hold somewhere between 5 - 50 | ||
GB of data.** | ||
|
||
To avoid oversharding, CrateDB by default limits the number of shards per | ||
node to 1000. Any operation that would exceed that limit, leads to an | ||
exception. | ||
::: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why remove this? Isn't it good to have as a top highlighted note? Also it correctly mentions by default ... 1000 shards
while the text below is missing this default
part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why remove this?
The valuable fragments here have also been refactored to the "advanced" page, to exactly avoid those kinds of slightly deviating redundancies across different documentation pages. It includes the "default" keyword, which is certainly important, thanks!
To avoid oversharding, CrateDB by default limits the number of shards per node to 1_000 as a protection limit.
-- https://cratedb-guide--355.org.readthedocs.build/performance/sharding.html#shards-per-node-limit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it good to have as a top highlighted note?
The section we are looking at here on the "basic" page about sharding is still an admonition (highlighted section, but now tuned down from "danger" to "caution"), still includes the relevant teaser text, but then links to the "advanced" page, so we don't need to manage the pro-tip-like valuable information bits redundantly.

-- https://cratedb-guide--355.org.readthedocs.build/admin/sharding-partitioning.html#strategy
To avoid _oversharding_, CrateDB by default limits the number of shards per node to | ||
1_000 as a critical stability boundary. Any operation that would exceed that limit | ||
leads to an exception. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think protection limit
is much more helpful than soft limit
. Otherwise I'd question what this soft limit
actually mean.
docs/performance/sharding.md
Outdated
- Too many shards can degrade search performance and make the cluster unstable. | ||
This is referred to as _oversharding_. | ||
|
||
- Very large shards can slow down cluster operations and prolong recovery times | ||
after failures. | ||
|
||
Finding the right balance when it comes to sharding will vary on a lot of | ||
things. While it's generally advisable to slightly over-allocate, we | ||
recommend to benchmark your particular setup to find the sweet spot to | ||
implement an appropriate sharding strategy. | ||
|
||
Figuring out how many shards to use for your tables requires you to think about | ||
the type of data you're processing, the types of queries you're running, and | ||
the type of hardware you're using. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrong section? this isn't related to shard imbalance. Shouldn't this rather stay at the general section?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1d64064 relocates this to the "General recommendations" section, thanks. Afterwards, 882a947 cleans up naming things and layout of the bottom-most section.
-- https://cratedb-guide--355.org.readthedocs.build/performance/sharding.html#general-recommendations
-- https://cratedb-guide--355.org.readthedocs.build/performance/sharding.html#notes
For the purposes of calculating how many shards a table should be clustered | ||
into, you can typically ignore replica partitions as these are not usually | ||
queried across for reads. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great that we remove this as it is not valid. For balancing reads, replicas are used as well (randomly choosing primaries and replicas).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hadn't removed it, this statement was also relocated, but has been deleted now at your disposal per c28811f. Maybe we should reverse the statement instead, in order to emphasize replica partitions MUST be taken into consideration when calculating shards, to add further clarity?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (1)
docs/performance/sharding.md (1)
53-55
: Tighten the grammar in the sizing introductionThe sentence “General principles requires careful consideration …” trips over subject–verb agreement. Please reword it for clarity.
-General principles requires careful consideration for cluster -sizing and architecture. +General principles require careful consideration of cluster +sizing and architecture.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
docs/admin/sharding-partitioning.md
(4 hunks)docs/feature/cluster/index.md
(1 hunks)docs/performance/sharding.md
(4 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Build docs
docs/admin/sharding-partitioning.md
Outdated
shard should roughly be between 10 - 50 GB, and that each node can only manage | ||
up to 1000 shards. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mention that the 1,000-shard cap is a default
Stating that “each node can only manage up to 1000 shards” reads like a hard product limit, but the setting is configurable. Please call out that the 1_000 value is the default protection limit so readers know they can tune it (within reason).
-Then, to calculate the number of shards, you should consider that the size of each
-shard should roughly be between 10 - 50 GB, and that each node can only manage
-up to 1000 shards.
+Then, to calculate the number of shards, consider that each shard should roughly
+be between 10 – 50 GB, and that each node can manage up to 1 000 shards by default.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
shard should roughly be between 10 - 50 GB, and that each node can only manage | |
up to 1000 shards. | |
Then, to calculate the number of shards, consider that each shard should roughly | |
be between 10 – 50 GB, and that each node can manage up to 1 000 shards by default. |
🤖 Prompt for AI Agents
In docs/admin/sharding-partitioning.md around lines 107–108, the sentence "each
node can only manage up to 1000 shards" implies a hard product limit; update it
to state that 1,000 is a default protection limit and is configurable via the
node's configuration settings, and add a short note that operators can tune this
value based on available RAM/CPU/disk while respecting practical resource
constraints.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Improved per 95dba4d.
If most nodes have more shards per table than they have CPUs, the cluster can | ||
experience performance degradations. | ||
For example, on clusters with substantial CPU resources (e.g., 8 nodes × 32 CPUs | ||
= 256 total CPUs), creating 256+ shards per table often proves counterproductive. | ||
If you don't manually set the number of shards per table, CrateDB will make a | ||
best guess, based on the assumption that your nodes have two CPUs each. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarify the CPU-to-shard guidance
The new “CPU-to-shard ratio” paragraph says that having as many (or more) shards per table as CPUs causes degradation—your example calls out 256+ shards on a 256‑CPU cluster as counterproductive. A few paragraphs later, “Avoid under-allocation” tells readers they should have at least as many shards as CPUs in the cluster to maximize parallelism. Both rules can’t be true simultaneously, and the contradiction leaves operators without actionable direction. Please reconcile the sections (e.g., by distinguishing per-node vs. per-cluster guidance, or by narrowing the recommended ratios) so they deliver a single, coherent recommendation.
Also applies to: 125-135
🤖 Prompt for AI Agents
In docs/performance/sharding.md around lines 69 to 75 (and similarly lines 125
to 135), reconcile the contradictory guidance by explicitly distinguishing
per-node vs per-cluster CPU-to-shard recommendations: state that shard counts
should be considered per-node (aim for ~1 shard per CPU per node as a starting
point) and avoid configuring total shards per table that exceed the total
cluster CPU count by a large margin (e.g., don’t create >1.5–2× shards per total
cluster CPUs), update the example (replace “256+ shards on a 256‑CPU cluster”
with a clarified example showing why many small shards spread across nodes hurts
locality and overhead), and add a short cross-reference between the
“CPU-to-shard ratio” and “Avoid under-allocation” sections so they present a
single coherent rule set (per-node baseline, guarded upper bound per cluster and
suggested multiplier range).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@coderabbitai: Thank you for providing a suggestion to resolve this. @seut and @WalBeh: Can you verify if this is actually applicable or just nonsense?
69-146
: Resolve conflicting CPU-to-shard guidanceLines 69-75 warn that matching or exceeding CPUs with shards per table degrades performance, yet lines 126-146 instruct readers to provision at least as many shards as cluster CPUs and even “slightly more.” Both can’t hold true simultaneously, and the contradiction leaves operators without actionable direction. Please reconcile the sections—e.g., clarify the per-node baseline (≈1 primary shard per CPU core on that node) versus the cluster-wide upper bound (avoid total shards per table greatly exceeding total CPUs), then cross-reference them so they form a single rule set. A possible rewrite:
-If most nodes have more shards per table than they have CPUs, the cluster can experience performance degradations. -For example, on clusters with substantial CPU resources (e.g., 8 nodes × 32 CPUs = 256 total CPUs), creating 256+ shards per table often proves counterproductive. +Aim for about one primary shard per CPU core on each node, which keeps shard work co-located with available compute. Substantially exceeding that per-node ratio increases scheduling overhead. +For example, on an 8-node × 32-CPU cluster (256 total CPUs), creating 256+ shards per table spreads many tiny shards across nodes and raises coordination costs without real gains. … -To increase the chances that a query can be parallelized and distributed maximally, there should be at least as many shards for a table than there are CPUs in the cluster. +Use the per-node guideline above as your baseline, and only grow total shards per table modestly beyond the cluster’s CPU count (≈1.5–2×) when you have measured that the extra shards actually improve concurrency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amotl: Thank you for looping in the domain experts! I'll stand by for @seut and @WalBeh's verification on whether the apparent contradiction is real or if I've misunderstood the intended guidance.
Once they provide their input on the technical accuracy, I'm happy to help refine the wording to ensure the documentation clearly conveys the correct sharding strategy—whether that means adjusting my suggested changes, taking a different approach entirely, or clarifying that the two sections address different scenarios.
In the meantime, if there are any specific aspects of the current text or my suggestion you'd like me to research (e.g., CrateDB's actual shard allocation behavior, query parallelization mechanics, or documented best practices), just let me know!
About
Thanks for making a start to improve the sharding guidelines, @WalBeh. We just added a few copy-edits.
Preview
References
Review
Please also add your comments and suggestions when applicable. 🙏
/cc @karynzv, @hammerhead, @WalBeh, @surister, @kneth