Skip to content

Conversation

amotl
Copy link
Member

@amotl amotl commented Sep 30, 2025

About

Thanks for making a start to improve the sharding guidelines, @WalBeh. We just added a few copy-edits.

Preview

References

Review

Please also add your comments and suggestions when applicable. 🙏

/cc @karynzv, @hammerhead, @WalBeh, @surister, @kneth

@amotl amotl requested review from seut and WalBeh September 30, 2025 22:16
Copy link

coderabbitai bot commented Sep 30, 2025

Walkthrough

Documentation updates that retitle and restructure sharding/partitioning guidance, raise the recommended minimum shard size to 10 GB, replace prior strategy text with benchmark-driven sizing and shard-count guidance (10–50 GB per shard; up to 1,000 shards/node default), and reorganize sharding performance content into sizing, ingestion, and allocation considerations.

Changes

Cohort / File(s) Summary of changes
Admin: sharding & partitioning
docs/admin/sharding-partitioning.md
Retitled to "Sharding and Partitioning 101"; removed the over-sharding admonition; added a benchmark-driven "Strategy" section with updated shard-size guidance (10–50 GB per shard) and shard-count calculations (up to 1,000 shards/node default); revised examples (time-series example uses 6 shards, updated CREATE TABLE/CLUSTERED INTO usage) and added a caution referencing an external sharding guide.
Performance: scaling wording
docs/performance/scaling.md
Clarified shard terminology and punctuation; removed the inline statement linking replicas to additional shards; simplified wording and referenced the sharding guide for detailed guidance; minor readability edits without semantic change.
Performance: sharding guide (restructured)
docs/performance/sharding.md
Major restructure and rewrite: normalized heading casing, added intro/see-also, introduced "Sizing considerations" with subsections (shard size vs. number, CPU-to-shard ratio, 1000-shard/node limit, partitions, replicas, segments), added "Notes" and "Avoid under-allocation", and reworked ingestion/benchmarking guidance.
Feature: cluster synopsis
docs/feature/cluster/index.md
Increased recommended shard-size lower bound from 5 GB to 10 GB (range now 10–50 GB); no other semantic changes.

Sequence Diagram(s)

(omitted — changes are documentation-only and do not introduce or alter control flow)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20–30 minutes

Possibly related issues

  • Sharding and partitioning guidelines #301 — both update shard-sizing and partitioning guidance; this PR raises the shard-size lower bound to 10 GB and aligns strategy guidance toward benchmarks, which overlaps the objectives of that issue.

Suggested labels

guidance

Suggested reviewers

  • kneth
  • bmunkholm

Poem

I nibble docs beneath moon's glow,
Counting shards in neat, calm rows.
Benchmarks guide my careful hop,
Six small piles — then I stop.
A carrot for cautious growth! 🥕

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Title Check ✅ Passed The title “Sharding strategy and recommendations” accurately reflects the primary focus of the pull request, which is to update and refine sharding strategy guidance and recommendations across the documentation, making it concise and specific to the main changes introduced.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
Description Check ✅ Passed The pull request description directly references improvements to the sharding guidelines and notes copy-edits to documentation that align with the summarized changes to sharding and partitioning content, showing clear relevance to the changeset.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch sharding-advices

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Comment on lines 52 to 53
Smaller shards also result in reduced Lucene index efficiency, which can adversely
affect computed search term relevance.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seut mentioned:

That's not true as it stands. The number of segments within a shard affects query performance because more segments have to be viewed.

Copy link
Member Author

@amotl amotl Oct 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Segments" are currently not yet part of this discussion / document. Maybe we'd introduce them first? For now, I am deleting this statement? If you can up with a better one, or if you think we should swap this sentence verbatim at this very spot without ado as proposed, let me know:

The number of segments within a shard affects query performance because more segments have to be viewed.

Copy link
Member Author

@amotl amotl Oct 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed per aaf9c53, but later added the statement at a dedicated section per 874339f. Let me know if you think we should expand that, or if you would like to see it deleted altogether.

-- https://cratedb-guide--355.org.readthedocs.build/performance/sharding.html#segments

If you are looking for an intro to sharding, see {ref}`sharding
<crate-reference:ddl-sharding>`.
The optimal approach balances shard count with shard size. Individual shards should
typically contain 3-70 GB of data, with 10-50 GB being the sweet spot for most
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

@amotl amotl Oct 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 66 to 68
To avoid _oversharding_, CrateDB by default limits the number of shards per node to
1_000 as a critical stability boundary. Any operation that would exceed that limit
leads to an exception.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seut mentioned about this:

Actually, it's more of a protection than a ‘critical stability boundary’ in my view, as it depends on the environment, i.e. what kind of hardware I'm running.

Copy link
Member Author

@amotl amotl Sep 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Elsewhere, it is called "soft limit". Shall we also use that term here?

[...] there is a soft limit of 1000 shards per node;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think protection limit is much more helpful than soft limit. Otherwise I'd question what this soft limit actually mean.

Copy link
Member Author

@amotl amotl Oct 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:::

## Optimising for query performance
### Avoid imbalances
Copy link
Member Author

@amotl amotl Sep 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About detecting imbalances, let's quickly introduce and refer to the XMover utility at this spot as soon as it is released.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amotl amotl marked this pull request as ready for review September 30, 2025 22:32
Comment on lines -57 to -58
CrateDB also has replicas of data and this results in additional shards in the
cluster.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we remove this, isn't this interesting to note?

Copy link
Member Author

@amotl amotl Oct 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This statement has been refactored to the "advanced" page to reduce cluttering of valuable pro-tip information across different pages.

-- https://cratedb-guide--355.org.readthedocs.build/performance/sharding.html#partitions

Comment on lines -95 to -109
:::{danger}
**Over-sharding and over-partitioning**

Sharding can drastically improve the performance on large datasets.
However, having too many small shards will most likely degrade performance.
Over-sharding and over-partitioning are common flaws leading to an overall
poor performance.

**As a rule of thumb, a single shard should hold somewhere between 5 - 50
GB of data.**

To avoid oversharding, CrateDB by default limits the number of shards per
node to 1000. Any operation that would exceed that limit, leads to an
exception.
:::
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove this? Isn't it good to have as a top highlighted note? Also it correctly mentions by default ... 1000 shards while the text below is missing this default part.

Copy link
Member Author

@amotl amotl Oct 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove this?

The valuable fragments here have also been refactored to the "advanced" page, to exactly avoid those kinds of slightly deviating redundancies across different documentation pages. It includes the "default" keyword, which is certainly important, thanks!

To avoid oversharding, CrateDB by default limits the number of shards per node to 1_000 as a protection limit.

-- https://cratedb-guide--355.org.readthedocs.build/performance/sharding.html#shards-per-node-limit

Copy link
Member Author

@amotl amotl Oct 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it good to have as a top highlighted note?

The section we are looking at here on the "basic" page about sharding is still an admonition (highlighted section, but now tuned down from "danger" to "caution"), still includes the relevant teaser text, but then links to the "advanced" page, so we don't need to manage the pro-tip-like valuable information bits redundantly.

image

-- https://cratedb-guide--355.org.readthedocs.build/admin/sharding-partitioning.html#strategy

Comment on lines 66 to 68
To avoid _oversharding_, CrateDB by default limits the number of shards per node to
1_000 as a critical stability boundary. Any operation that would exceed that limit
leads to an exception.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think protection limit is much more helpful than soft limit. Otherwise I'd question what this soft limit actually mean.

Comment on lines 107 to 120
- Too many shards can degrade search performance and make the cluster unstable.
This is referred to as _oversharding_.

- Very large shards can slow down cluster operations and prolong recovery times
after failures.

Finding the right balance when it comes to sharding will vary on a lot of
things. While it's generally advisable to slightly over-allocate, we
recommend to benchmark your particular setup to find the sweet spot to
implement an appropriate sharding strategy.

Figuring out how many shards to use for your tables requires you to think about
the type of data you're processing, the types of queries you're running, and
the type of hardware you're using.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong section? this isn't related to shard imbalance. Shouldn't this rather stay at the general section?

Copy link
Member Author

@amotl amotl Oct 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1d64064 relocates this to the "General recommendations" section, thanks. Afterwards, 882a947 cleans up naming things and layout of the bottom-most section.

-- https://cratedb-guide--355.org.readthedocs.build/performance/sharding.html#general-recommendations
-- https://cratedb-guide--355.org.readthedocs.build/performance/sharding.html#notes

Comment on lines -79 to -81
For the purposes of calculating how many shards a table should be clustered
into, you can typically ignore replica partitions as these are not usually
queried across for reads.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great that we remove this as it is not valid. For balancing reads, replicas are used as well (randomly choosing primaries and replicas).

Copy link
Member Author

@amotl amotl Oct 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hadn't removed it, this statement was also relocated, but has been deleted now at your disposal per c28811f. Maybe we should reverse the statement instead, in order to emphasize replica partitions MUST be taken into consideration when calculating shards, to add further clarity?

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
docs/performance/sharding.md (1)

53-55: Tighten the grammar in the sizing introduction

The sentence “General principles requires careful consideration …” trips over subject–verb agreement. Please reword it for clarity.

-General principles requires careful consideration for cluster
-sizing and architecture.
+General principles require careful consideration of cluster
+sizing and architecture.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d7d6cf3 and 874339f.

📒 Files selected for processing (3)
  • docs/admin/sharding-partitioning.md (4 hunks)
  • docs/feature/cluster/index.md (1 hunks)
  • docs/performance/sharding.md (4 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Build docs

Comment on lines 107 to 108
shard should roughly be between 10 - 50 GB, and that each node can only manage
up to 1000 shards.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Mention that the 1,000-shard cap is a default

Stating that “each node can only manage up to 1000 shards” reads like a hard product limit, but the setting is configurable. Please call out that the 1_000 value is the default protection limit so readers know they can tune it (within reason).

-Then, to calculate the number of shards, you should consider that the size of each
-shard should roughly be between 10 - 50 GB, and that each node can only manage
-up to 1000 shards.
+Then, to calculate the number of shards, consider that each shard should roughly
+be between 10 – 50 GB, and that each node can manage up to 1 000 shards by default.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
shard should roughly be between 10 - 50 GB, and that each node can only manage
up to 1000 shards.
Then, to calculate the number of shards, consider that each shard should roughly
be between 10 – 50 GB, and that each node can manage up to 1 000 shards by default.
🤖 Prompt for AI Agents
In docs/admin/sharding-partitioning.md around lines 107–108, the sentence "each
node can only manage up to 1000 shards" implies a hard product limit; update it
to state that 1,000 is a default protection limit and is configurable via the
node's configuration settings, and add a short note that operators can tune this
value based on available RAM/CPU/disk while respecting practical resource
constraints.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improved per 95dba4d.

Comment on lines +69 to +75
If most nodes have more shards per table than they have CPUs, the cluster can
experience performance degradations.
For example, on clusters with substantial CPU resources (e.g., 8 nodes × 32 CPUs
= 256 total CPUs), creating 256+ shards per table often proves counterproductive.
If you don't manually set the number of shards per table, CrateDB will make a
best guess, based on the assumption that your nodes have two CPUs each.

Copy link

@coderabbitai coderabbitai bot Oct 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Clarify the CPU-to-shard guidance

The new “CPU-to-shard ratio” paragraph says that having as many (or more) shards per table as CPUs causes degradation—your example calls out 256+ shards on a 256‑CPU cluster as counterproductive. A few paragraphs later, “Avoid under-allocation” tells readers they should have at least as many shards as CPUs in the cluster to maximize parallelism. Both rules can’t be true simultaneously, and the contradiction leaves operators without actionable direction. Please reconcile the sections (e.g., by distinguishing per-node vs. per-cluster guidance, or by narrowing the recommended ratios) so they deliver a single, coherent recommendation.

Also applies to: 125-135

🤖 Prompt for AI Agents
In docs/performance/sharding.md around lines 69 to 75 (and similarly lines 125
to 135), reconcile the contradictory guidance by explicitly distinguishing
per-node vs per-cluster CPU-to-shard recommendations: state that shard counts
should be considered per-node (aim for ~1 shard per CPU per node as a starting
point) and avoid configuring total shards per table that exceed the total
cluster CPU count by a large margin (e.g., don’t create >1.5–2× shards per total
cluster CPUs), update the example (replace “256+ shards on a 256‑CPU cluster”
with a clarified example showing why many small shards spread across nodes hurts
locality and overhead), and add a short cross-reference between the
“CPU-to-shard ratio” and “Avoid under-allocation” sections so they present a
single coherent rule set (per-node baseline, guarded upper bound per cluster and
suggested multiplier range).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coderabbitai: Thank you for providing a suggestion to resolve this. @seut and @WalBeh: Can you verify if this is actually applicable or just nonsense?

69-146: Resolve conflicting CPU-to-shard guidance

Lines 69-75 warn that matching or exceeding CPUs with shards per table degrades performance, yet lines 126-146 instruct readers to provision at least as many shards as cluster CPUs and even “slightly more.” Both can’t hold true simultaneously, and the contradiction leaves operators without actionable direction. Please reconcile the sections—e.g., clarify the per-node baseline (≈1 primary shard per CPU core on that node) versus the cluster-wide upper bound (avoid total shards per table greatly exceeding total CPUs), then cross-reference them so they form a single rule set. A possible rewrite:

-If most nodes have more shards per table than they have CPUs, the cluster can experience performance degradations.
-For example, on clusters with substantial CPU resources (e.g., 8 nodes × 32 CPUs = 256 total CPUs), creating 256+ shards per table often proves counterproductive.
+Aim for about one primary shard per CPU core on each node, which keeps shard work co-located with available compute. Substantially exceeding that per-node ratio increases scheduling overhead.
+For example, on an 8-node × 32-CPU cluster (256 total CPUs), creating 256+ shards per table spreads many tiny shards across nodes and raises coordination costs without real gains.-To increase the chances that a query can be parallelized and distributed maximally, there should be at least as many shards for a table than there are CPUs in the cluster.
+Use the per-node guideline above as your baseline, and only grow total shards per table modestly beyond the cluster’s CPU count (≈1.5–2×) when you have measured that the extra shards actually improve concurrency.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amotl: Thank you for looping in the domain experts! I'll stand by for @seut and @WalBeh's verification on whether the apparent contradiction is real or if I've misunderstood the intended guidance.

Once they provide their input on the technical accuracy, I'm happy to help refine the wording to ensure the documentation clearly conveys the correct sharding strategy—whether that means adjusting my suggested changes, taking a different approach entirely, or clarifying that the two sections address different scenarios.

In the meantime, if there are any specific aspects of the current text or my suggestion you'd like me to research (e.g., CrateDB's actual shard allocation behavior, query parallelization mechanics, or documented best practices), just let me know!

coderabbitai[bot]

This comment was marked as resolved.

coderabbitai[bot]

This comment was marked as duplicate.

@amotl amotl requested review from seut and matriv and removed request for WalBeh October 1, 2025 21:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants