Sharding strategy and recommendations #355

amotl · 2025-09-30T22:16:17Z

About

Thanks for making a start to improve the sharding guidelines, @WalBeh. We just added a few copy-edits.

Preview

https://cratedb-guide--355.org.readthedocs.build/performance/sharding.html

References

Sharding and partitioning guidelines #301

Review

Please also add your comments and suggestions when applicable. 🙏

/cc @karynzv, @hammerhead, @WalBeh, @surister, @kneth

coderabbitai · 2025-09-30T22:16:26Z

Walkthrough

Documentation updates that retitle and restructure sharding/partitioning guidance, raise the recommended minimum shard size to 10 GB, replace prior strategy text with benchmark-driven sizing and shard-count guidance (10–50 GB per shard; up to 1,000 shards/node default), and reorganize sharding performance content into sizing, ingestion, and allocation considerations.

Changes

Cohort / File(s)	Summary of changes
Admin: sharding & partitioning `docs/admin/sharding-partitioning.md`	Retitled to "Sharding and Partitioning 101"; removed the over-sharding admonition; added a benchmark-driven "Strategy" section with updated shard-size guidance (10–50 GB per shard) and shard-count calculations (up to 1,000 shards/node default); revised examples (time-series example uses 6 shards, updated CREATE TABLE/CLUSTERED INTO usage) and added a caution referencing an external sharding guide.
Performance: scaling wording `docs/performance/scaling.md`	Clarified shard terminology and punctuation; removed the inline statement linking replicas to additional shards; simplified wording and referenced the sharding guide for detailed guidance; minor readability edits without semantic change.
Performance: sharding guide (restructured) `docs/performance/sharding.md`	Major restructure and rewrite: normalized heading casing, added intro/see-also, introduced "Sizing considerations" with subsections (shard size vs. number, CPU-to-shard ratio, 1000-shard/node limit, partitions, replicas, segments), added "Notes" and "Avoid under-allocation", and reworked ingestion/benchmarking guidance.
Feature: cluster synopsis `docs/feature/cluster/index.md`	Increased recommended shard-size lower bound from 5 GB to 10 GB (range now 10–50 GB); no other semantic changes.

Sequence Diagram(s)

(omitted — changes are documentation-only and do not introduce or alter control flow)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20–30 minutes

Possibly related issues

Sharding and partitioning guidelines #301 — both update shard-sizing and partitioning guidance; this PR raises the shard-size lower bound to 10 GB and aligns strategy guidance toward benchmarks, which overlaps the objectives of that issue.

Suggested labels

guidance

Suggested reviewers

kneth
bmunkholm

Poem

I nibble docs beneath moon's glow,
Counting shards in neat, calm rows.
Benchmarks guide my careful hop,
Six small piles — then I stop.
A carrot for cautious growth! 🥕

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The title “Sharding strategy and recommendations” accurately reflects the primary focus of the pull request, which is to update and refine sharding strategy guidance and recommendations across the documentation, making it concise and specific to the main changes introduced.
Docstring Coverage	✅ Passed	No functions found in the changes. Docstring coverage check skipped.
Description Check	✅ Passed	The pull request description directly references improvements to the sharding guidelines and notes copy-edits to documentation that align with the summarized changes to sharding and partitioning content, showing clear relevance to the changeset.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch sharding-advices

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

amotl · 2025-09-30T22:17:32Z

docs/performance/sharding.md

+Smaller shards also result in reduced Lucene index efficiency, which can adversely
+affect computed search term relevance.


@seut mentioned:

That's not true as it stands. The number of segments within a shard affects query performance because more segments have to be viewed.

"Segments" are currently not yet part of this discussion / document. Maybe we'd introduce them first? For now, I am deleting this statement? If you can up with a better one, or if you think we should swap this sentence verbatim at this very spot without ado as proposed, let me know:

The number of segments within a shard affects query performance because more segments have to be viewed.

Removed per aaf9c53, but later added the statement at a dedicated section per 874339f. Let me know if you think we should expand that, or if you would like to see it deleted altogether.

-- https://cratedb-guide--355.org.readthedocs.build/performance/sharding.html#segments

amotl · 2025-09-30T22:19:51Z

docs/performance/sharding.md

-If you are looking for an intro to sharding, see {ref}`sharding
-<crate-reference:ddl-sharding>`.
+The optimal approach balances shard count with shard size. Individual shards should
+typically contain 3-70 GB of data, with 10-50 GB being the sweet spot for most


@seut said:

We should remove this recommendation about 3-70 GB. 10-50 GB is correct.

See also:

https://opensearch.org/blog/optimize-opensearch-index-shard-size/

https://www.elastic.co/docs/deploy-manage/production-guidance/optimize-performance/size-shards#general-sizing-guidelines

5b28873 brings in relevant adjustments.

-- https://cratedb-guide--355.org.readthedocs.build/performance/sharding.html#shard-size-vs-number-of-shards

amotl · 2025-09-30T22:20:58Z

docs/performance/sharding.md

+To avoid _oversharding_, CrateDB by default limits the number of shards per node to
+1_000 as a critical stability boundary. Any operation that would exceed that limit
+leads to an exception.


@seut mentioned about this:

Actually, it's more of a protection than a ‘critical stability boundary’ in my view, as it depends on the environment, i.e. what kind of hardware I'm running.

Elsewhere, it is called "soft limit". Shall we also use that term here?

[...] there is a soft limit of 1000 shards per node;

I think protection limit is much more helpful than soft limit. Otherwise I'd question what this soft limit actually mean.

Thanks. Fixed per d1b6b46.

-- https://cratedb-guide--355.org.readthedocs.build/performance/sharding.html#shards-per-node-limit

amotl · 2025-09-30T22:25:36Z

docs/performance/sharding.md

 :::

-## Optimising for query performance
+### Avoid imbalances


About detecting imbalances, let's quickly introduce and refer to the XMover utility at this spot as soon as it is released.

Please review, so we can release the utility and mention it here.

Looking-glass utilities for exploring what's going on: CrateDB shard analysis and beyond cratedb-toolkit#513 (comment)

seut · 2025-10-01T06:21:00Z

docs/performance/scaling.md

-CrateDB also has replicas of data and this results in additional shards in the
-cluster.


Why do we remove this, isn't this interesting to note?

This statement has been refactored to the "advanced" page to reduce cluttering of valuable pro-tip information across different pages.

-- https://cratedb-guide--355.org.readthedocs.build/performance/sharding.html#partitions

seut · 2025-10-01T06:22:19Z

docs/admin/sharding-partitioning.md

-:::{danger}
-**Over-sharding and over-partitioning**
-
-Sharding can drastically improve the performance on large datasets.
-However, having too many small shards will most likely degrade performance.
-Over-sharding and over-partitioning are common flaws leading to an overall
-poor performance.
-
-**As a rule of thumb, a single shard should hold somewhere between 5 - 50
-GB of data.**
-
-To avoid oversharding, CrateDB by default limits the number of shards per
-node to 1000. Any operation that would exceed that limit, leads to an
-exception.
-:::


Why remove this? Isn't it good to have as a top highlighted note? Also it correctly mentions by default ... 1000 shards while the text below is missing this default part.

Why remove this?

The valuable fragments here have also been refactored to the "advanced" page, to exactly avoid those kinds of slightly deviating redundancies across different documentation pages. It includes the "default" keyword, which is certainly important, thanks!

To avoid oversharding, CrateDB by default limits the number of shards per node to 1_000 as a protection limit.

-- https://cratedb-guide--355.org.readthedocs.build/performance/sharding.html#shards-per-node-limit

Isn't it good to have as a top highlighted note?

The section we are looking at here on the "basic" page about sharding is still an admonition (highlighted section, but now tuned down from "danger" to "caution"), still includes the relevant teaser text, but then links to the "advanced" page, so we don't need to manage the pro-tip-like valuable information bits redundantly.

-- https://cratedb-guide--355.org.readthedocs.build/admin/sharding-partitioning.html#strategy

seut · 2025-10-01T06:25:26Z

docs/performance/sharding.md

+To avoid _oversharding_, CrateDB by default limits the number of shards per node to
+1_000 as a critical stability boundary. Any operation that would exceed that limit
+leads to an exception.


I think protection limit is much more helpful than soft limit. Otherwise I'd question what this soft limit actually mean.

seut · 2025-10-01T06:31:29Z

docs/performance/sharding.md

+- Too many shards can degrade search performance and make the cluster unstable.
+  This is referred to as _oversharding_.
+
+- Very large shards can slow down cluster operations and prolong recovery times
+  after failures.
+
+Finding the right balance when it comes to sharding will vary on a lot of
+things. While it's generally advisable to slightly over-allocate, we
+recommend to benchmark your particular setup to find the sweet spot to
+implement an appropriate sharding strategy.
+
+Figuring out how many shards to use for your tables requires you to think about
+the type of data you're processing, the types of queries you're running, and
+the type of hardware you're using.


Wrong section? this isn't related to shard imbalance. Shouldn't this rather stay at the general section?

1d64064 relocates this to the "General recommendations" section, thanks. Afterwards, 882a947 cleans up naming things and layout of the bottom-most section.

-- https://cratedb-guide--355.org.readthedocs.build/performance/sharding.html#general-recommendations
-- https://cratedb-guide--355.org.readthedocs.build/performance/sharding.html#notes

seut · 2025-10-01T06:34:26Z

docs/performance/sharding.md

-For the purposes of calculating how many shards a table should be clustered
-into, you can typically ignore replica partitions as these are not usually
-queried across for reads.


Great that we remove this as it is not valid. For balancing reads, replicas are used as well (randomly choosing primaries and replicas).

I hadn't removed it, this statement was also relocated, but has been deleted now at your disposal per c28811f. Maybe we should reverse the statement instead, in order to emphasize replica partitions MUST be taken into consideration when calculating shards, to add further clarity?

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

docs/performance/sharding.md (1)
53-55: Tighten the grammar in the sizing introduction

The sentence “General principles requires careful consideration …” trips over subject–verb agreement. Please reword it for clarity.
-General principles requires careful consideration for cluster
-sizing and architecture.
+General principles require careful consideration of cluster
+sizing and architecture.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d7d6cf3 and 874339f.

📒 Files selected for processing (3)

docs/admin/sharding-partitioning.md (4 hunks)
docs/feature/cluster/index.md (1 hunks)
docs/performance/sharding.md (4 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Build docs

coderabbitai · 2025-10-01T21:39:42Z

docs/admin/sharding-partitioning.md

+shard should roughly be between 10 - 50 GB, and that each node can only manage
 up to 1000 shards.


⚠️ Potential issue | 🟠 Major

Mention that the 1,000-shard cap is a default

Stating that “each node can only manage up to 1000 shards” reads like a hard product limit, but the setting is configurable. Please call out that the 1_000 value is the default protection limit so readers know they can tune it (within reason).

-Then, to calculate the number of shards, you should consider that the size of each -shard should roughly be between 10 - 50 GB, and that each node can only manage -up to 1000 shards. +Then, to calculate the number of shards, consider that each shard should roughly +be between 10 – 50 GB, and that each node can manage up to 1 000 shards by default.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

shard should roughly be between 10 - 50 GB, and that each node can only manage

up to 1000 shards.

Then, to calculate the number of shards, consider that each shard should roughly

be between 10 – 50 GB, and that each node can manage up to 1 000 shards by default.

🤖 Prompt for AI Agents

In docs/admin/sharding-partitioning.md around lines 107–108, the sentence "each node can only manage up to 1000 shards" implies a hard product limit; update it to state that 1,000 is a default protection limit and is configurable via the node's configuration settings, and add a short note that operators can tune this value based on available RAM/CPU/disk while respecting practical resource constraints.

Improved per 95dba4d.

coderabbitai · 2025-10-01T21:39:42Z

docs/performance/sharding.md

+If most nodes have more shards per table than they have CPUs, the cluster can
+experience performance degradations.
+For example, on clusters with substantial CPU resources (e.g., 8 nodes × 32 CPUs
+= 256 total CPUs), creating 256+ shards per table often proves counterproductive.
+If you don't manually set the number of shards per table, CrateDB will make a
+best guess, based on the assumption that your nodes have two CPUs each.
+


⚠️ Potential issue | 🟠 Major

Clarify the CPU-to-shard guidance

The new “CPU-to-shard ratio” paragraph says that having as many (or more) shards per table as CPUs causes degradation—your example calls out 256+ shards on a 256‑CPU cluster as counterproductive. A few paragraphs later, “Avoid under-allocation” tells readers they should have at least as many shards as CPUs in the cluster to maximize parallelism. Both rules can’t be true simultaneously, and the contradiction leaves operators without actionable direction. Please reconcile the sections (e.g., by distinguishing per-node vs. per-cluster guidance, or by narrowing the recommended ratios) so they deliver a single, coherent recommendation.

Also applies to: 125-135

🤖 Prompt for AI Agents

In docs/performance/sharding.md around lines 69 to 75 (and similarly lines 125 to 135), reconcile the contradictory guidance by explicitly distinguishing per-node vs per-cluster CPU-to-shard recommendations: state that shard counts should be considered per-node (aim for ~1 shard per CPU per node as a starting point) and avoid configuring total shards per table that exceed the total cluster CPU count by a large margin (e.g., don’t create >1.5–2× shards per total cluster CPUs), update the example (replace “256+ shards on a 256‑CPU cluster” with a clarified example showing why many small shards spread across nodes hurts locality and overhead), and add a short cross-reference between the “CPU-to-shard ratio” and “Avoid under-allocation” sections so they present a single coherent rule set (per-node baseline, guarded upper bound per cluster and suggested multiplier range).

@coderabbitai: Thank you for providing a suggestion to resolve this. @seut and @WalBeh: Can you verify if this is actually applicable or just nonsense?

69-146: Resolve conflicting CPU-to-shard guidance

Lines 69-75 warn that matching or exceeding CPUs with shards per table degrades performance, yet lines 126-146 instruct readers to provision at least as many shards as cluster CPUs and even “slightly more.” Both can’t hold true simultaneously, and the contradiction leaves operators without actionable direction. Please reconcile the sections—e.g., clarify the per-node baseline (≈1 primary shard per CPU core on that node) versus the cluster-wide upper bound (avoid total shards per table greatly exceeding total CPUs), then cross-reference them so they form a single rule set. A possible rewrite:

-If most nodes have more shards per table than they have CPUs, the cluster can experience performance degradations. -For example, on clusters with substantial CPU resources (e.g., 8 nodes × 32 CPUs = 256 total CPUs), creating 256+ shards per table often proves counterproductive. +Aim for about one primary shard per CPU core on each node, which keeps shard work co-located with available compute. Substantially exceeding that per-node ratio increases scheduling overhead. +For example, on an 8-node × 32-CPU cluster (256 total CPUs), creating 256+ shards per table spreads many tiny shards across nodes and raises coordination costs without real gains. … -To increase the chances that a query can be parallelized and distributed maximally, there should be at least as many shards for a table than there are CPUs in the cluster. +Use the per-node guideline above as your baseline, and only grow total shards per table modestly beyond the cluster’s CPU count (≈1.5–2×) when you have measured that the extra shards actually improve concurrency.

@amotl: Thank you for looping in the domain experts! I'll stand by for @seut and @WalBeh's verification on whether the apparent contradiction is real or if I've misunderstood the intended guidance.

Once they provide their input on the technical accuracy, I'm happy to help refine the wording to ensure the documentation clearly conveys the correct sharding strategy—whether that means adjusting my suggested changes, taking a different approach entirely, or clarifying that the two sections address different scenarios.

In the meantime, if there are any specific aspects of the current text or my suggestion you'd like me to research (e.g., CrateDB's actual shard allocation behavior, query parallelization mechanics, or documented best practices), just let me know!

WalBeh and others added 2 commits October 1, 2025 00:09

Sharding: Add section "Sizing considerations"

fd0de94

Sharding: Modernize and refactor, just regular copy-editing

d7d6cf3

amotl requested review from seut and WalBeh September 30, 2025 22:16

amotl commented Sep 30, 2025

View reviewed changes

amotl marked this pull request as ready for review September 30, 2025 22:32

amotl mentioned this pull request Sep 30, 2025

Sharding and partitioning guidelines #301

Open

seut reviewed Oct 1, 2025

View reviewed changes

amotl added 6 commits October 1, 2025 23:13

Sharding: Remove "Smaller shards also result in reduced efficiency..."

aaf9c53

Sharding: Remove 3-70 GB recommendation. Recommend 10-50 GB.

5b28873

Sharding: Explain "1000 shards per node" limit as "protection limit"

d1b6b46

Sharding: Relocate notes about finding the right balance to "general"

1d64064

Sharding: Rename bottom section and relocate subsection

882a947

Sharding: Add miniature section about "segments"

874339f

coderabbitai bot reviewed Oct 1, 2025

View reviewed changes

Sharding: Remove statement about ignoring replica partitions

c28811f

This comment was marked as resolved.

Sign in to view

Sharding: Implement suggestions by CodeRabbit

95dba4d

This comment was marked as duplicate.

Sign in to view

amotl requested review from seut and matriv and removed request for WalBeh October 1, 2025 21:55

		Smaller shards also result in reduced Lucene index efficiency, which can adversely
		affect computed search term relevance.

		CrateDB also has replicas of data and this results in additional shards in the
		cluster.

		shard should roughly be between 10 - 50 GB, and that each node can only manage
		up to 1000 shards.

Sharding strategy and recommendations #355

Are you sure you want to change the base?

Sharding strategy and recommendations #355

Uh oh!

Conversation

amotl commented Sep 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

About

Preview

References

Review

Uh oh!

coderabbitai bot commented Sep 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related issues

Suggested labels

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amotl Oct 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amotl Oct 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amotl Oct 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amotl Sep 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amotl Oct 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amotl Sep 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amotl Oct 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amotl Oct 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amotl Oct 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amotl Oct 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

amotl commented Sep 30, 2025 •

edited

Loading

coderabbitai bot commented Sep 30, 2025 •

edited

Loading

amotl Oct 1, 2025 •

edited

Loading

amotl Oct 1, 2025 •

edited

Loading

amotl Oct 1, 2025 •

edited

Loading

amotl Sep 30, 2025 •

edited

Loading

amotl Oct 1, 2025 •

edited

Loading

amotl Sep 30, 2025 •

edited

Loading

amotl Oct 1, 2025 •

edited

Loading

amotl Oct 1, 2025 •

edited

Loading

amotl Oct 1, 2025 •

edited

Loading

amotl Oct 1, 2025 •

edited

Loading

amotl Oct 1, 2025 •

edited

Loading

coderabbitai bot Oct 1, 2025 •

edited

Loading