Skip to content

Conversation

amotl
Copy link
Member

@amotl amotl commented Sep 16, 2025

About

  • "Machine learning with CrateDB" is now located within the "Solutions and use cases" section.
  • The new page exclusively references existing content and provides teaser texts, intending to provide better guidance.
  • The "Advanced time series analysis" section has been pulled in, because it is also using machine learning technologies.

Outline

  • Vector store
  • Text-to-SQL
  • Time series analysis
  • MLOps and model training

Preview

References

@amotl amotl added refactoring Changing shape or layout, or moving content around. cross linking Linking to different locations of the documentation. guidance Matters of layout, shape, and structure. labels Sep 16, 2025
Copy link

coderabbitai bot commented Sep 16, 2025

Warning

Rate limit exceeded

@amotl has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 18 minutes and 16 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 4ca75d7 and f669aee.

📒 Files selected for processing (3)
  • docs/feature/query/index.md (1 hunks)
  • docs/feature/search/vector/index.md (2 hunks)
  • docs/solution/machine-learning/index.md (1 hunks)

Walkthrough

Adds a Machine Learning solution page and solution grid entry; removes the old Machine Learning topic page and card; updates multiple Time Series docs (titles, intros, tags, and removed advanced sections); adds a Text-to-SQL subsection under Search; rewords vector search overview; and adds a new external link entry.

Changes

Cohort / File(s) Summary
Machine Learning solution & topic changes
docs/solution/index.md, docs/solution/machine-learning/index.md, docs/topic/index.md, docs/topic/ml/index.md
Added a Machine Learning solution grid-item and new ML landing page; updated solution TOC; removed the ML topic grid-item and deleted the old ML topic page.
Time series documentation edits
docs/_include/card/timeseries-intro.md, docs/solution/machine-learning/time-series.md, docs/topic/timeseries/fundamentals.md, docs/topic/timeseries/index.md
Renamed and reworded Time Series cards and callouts, added/removed tags and muted intros, inserted datashader include, and removed multiple advanced/time-series sections and the "Advanced" toctree entry.
Search / query docs
docs/feature/query/index.md, docs/feature/search/vector/index.md
Added a "Text-to-SQL" subsection under Search; clarified FLOAT_VECTOR role and kNN/HNSW description and added note on sources of feature vectors.
Links include
docs/_include/links.md
Inserted a new public link entry pointing to an MLOps resource.

Sequence Diagram(s)

No sequence diagrams necessary for these documentation-only structural and content changes.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested labels

new content

Suggested reviewers

  • kneth
  • hammerhead
  • bmunkholm
  • surister

Poem

A rabbit hops through pages bright,
New ML paths bloom in morning light.
Vectors hum and queries sing,
Time-series tales take wing.
Links refreshed — hop on, delight! 🐇✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Title Check ✅ Passed The title "Machine learning: Rework and relocate section" concisely and accurately summarizes the primary change in this PR — reworking and relocating the Machine Learning documentation into the Solutions section; it directly corresponds to the added solution page and the related removals/reworks shown in the changeset and is clear for a teammate scanning history.
Description Check ✅ Passed The PR description clearly states the intent to move "Machine learning with CrateDB" into the Solutions section, that the new page references existing content, pulls in the advanced time series analysis, and provides an outline plus a preview URL, which aligns with the file-level changes in the summary and thus is on-topic for this changeset.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@amotl amotl added the sanding-500 Sanding medium-sized details. label Sep 16, 2025
@amotl amotl force-pushed the ml-ng branch 3 times, most recently from d8db9ff to 22c7390 Compare September 16, 2025 11:36
@amotl amotl marked this pull request as ready for review September 16, 2025 11:37
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (7)
docs/feature/query/index.md (1)

169-171: Avoid an orphan subsection: add 1–2 context sentences.

Consider a one‑liner under “Text‑to‑SQL” so the section isn’t just a single link.

 ## Text-to-SQL
-- {ref}`text-to-sql`
+Natural language to SQL conversions using adapters and frameworks.
+- {ref}`text-to-sql`
docs/feature/search/vector/index.md (2)

25-29: Consistent US spelling for kNN.

Mix of “neighbour”/“Neighbors” in this page. Standardize to “nearest neighbor (kNN)”.

- neighbour (kNN)
+ neighbor (kNN)
- functions, effectively conducting HNSW semantic similarity searches on them,
+ functions to perform HNSW-based semantic similarity search,

42-45: Minor wording polish.

Tighten the sentence and avoid “may be”.

-Feature vectors may be computed from raw data using machine learning
-methods such as feature extraction algorithms, word embeddings, or deep
-learning networks.
+Feature vectors are computed from raw data via ML methods such as feature
+extraction, word embeddings, or deep neural networks.
docs/solution/machine-learning/index.md (4)

17-25: Fix markdownlint MD052: explicit reference label.

markdownlint often can’t resolve link definitions from includes. Use explicit reference to avoid “Missing link definition: vector database”.

-[Vector databases][Vector Database] can be used for similarity search,
+[Vector databases][Vector Database] can be used for similarity search,

If MD052 persists, switch to an inline link or duplicate the definition locally.

- [Vector databases][Vector Database] can be used for similarity search,
+ [Vector databases](https://en.wikipedia.org/wiki/Vector_database) can be used for similarity search,

33-38: kNN spelling consistency.

Use “neighbor” to match the rest of the docs.

- k-nearest neighbour (kNN)
+ k-nearest neighbor (kNN)

51-61: LangChain adapter wording: minor tightening.

Shorten and remove “also”.

-The LangChain adapter for CrateDB provides support to use CrateDB as a vector
-store database, to load documents using LangChain’s DocumentLoader, and also
-supports LangChain’s conversational memory subsystem.
+The LangChain adapter lets you use CrateDB as a vector store database, load
+documents via DocumentLoader, and use LangChain’s conversational memory.

162-170: MLflow storage statement: consider adding link to setup guide.

If there’s a page on configuring MLflow Tracking with CrateDB, link it here.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8209e12 and 7395fb1.

📒 Files selected for processing (11)
  • docs/_include/card/timeseries-intro.md (1 hunks)
  • docs/_include/links.md (1 hunks)
  • docs/feature/query/index.md (1 hunks)
  • docs/feature/search/vector/index.md (2 hunks)
  • docs/solution/index.md (2 hunks)
  • docs/solution/machine-learning/index.md (1 hunks)
  • docs/solution/machine-learning/time-series.md (1 hunks)
  • docs/topic/index.md (0 hunks)
  • docs/topic/ml/index.md (0 hunks)
  • docs/topic/timeseries/fundamentals.md (2 hunks)
  • docs/topic/timeseries/index.md (0 hunks)
💤 Files with no reviewable changes (3)
  • docs/topic/timeseries/index.md
  • docs/topic/ml/index.md
  • docs/topic/index.md
🧰 Additional context used
🪛 markdownlint-cli2 (0.17.2)
docs/solution/machine-learning/index.md

17-17: Reference links and images should use a label that is defined
Missing link or image reference definition: "vector database"

(MD052, reference-links-images)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Build docs
🔇 Additional comments (10)
docs/_include/links.md (1)

51-51: Add MLOps link: looks good.

No issues. Link label and URL format match the file’s conventions.

docs/solution/index.md (2)

72-83: New ML card: LGTM.

Copy, icon, and link role look consistent with the grid.


94-94: TOC entry: LGTM.

The hidden toctree entry will surface the page in nav; path looks correct.

docs/topic/timeseries/fundamentals.md (2)

5-12: Intro and tags: LGTM.

Tone and tag usage align with style used elsewhere.


30-32: Include exists — no action required.
Found docs/_include/card/timeseries-datashader.md; referenced in docs/topic/timeseries/fundamentals.md (line 30) and docs/integrate/pyviz/index.md (line 62).

docs/_include/card/timeseries-intro.md (1)

25-41: Card copy tweaks: LGTM.

Clear upgrade; MAX_BY note is helpful. No action needed.

docs/solution/machine-learning/time-series.md (2)

6-9: Intro block: LGTM.

Matches phrasing used in Fundamentals; consistent voice.


1-1: Anchor token (ml-timeseries) is unique — no collisions found.
Only occurrence: docs/solution/machine-learning/time-series.md:1; nearby anchors (timeseries-advanced) and (timeseries-analysis) appear on lines 2–3 and are distinct.

docs/solution/machine-learning/index.md (2)

66-72: Good: cross-links for Text‑to‑SQL.

Nice addition; clear positioning with MCP and enterprise data.


77-96: Resolved — referenced anchors exist: llamaindex, mcp, mindsdb

Anchors found at docs/integrate/llamaindex/index.md, docs/connect/mcp/index.md, docs/integrate/mindsdb/index.md.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
docs/solution/machine-learning/index.md (4)

17-25: Fix missing reference link: replace [Vector Database] with a valid :ref: or plain text

markdownlint reports an undefined reference label. Prefer linking to the Vector Search doc via :ref:, or drop the link.

Apply one of:

-[Vector Database] can be used for similarity search,
+:ref:`Vector search <vector-search>` can be used for similarity search,

or

-[Vector Database] can be used for similarity search,
+Vector databases can be used for similarity search,

33-35: k-NN phrasing consistency

Use the standard plural and hyphenation.

-CrateDB's FLOAT_VECTOR data type implements a vector store and the k-nearest
-neighbor (kNN) search algorithm to find vectors that are similar to a query
+CrateDB's FLOAT_VECTOR data type implements a vector store and the k‑nearest
+neighbors (k‑NN) search algorithm to find vectors that are similar to a query

86-88: Remove stray comma after the acronym

Grammar nit.

-The Model Context Protocol (MCP), is an open protocol that enables seamless
+The Model Context Protocol (MCP) is an open protocol that enables seamless

51-60: Minor wording: LangChain “Document Loaders”

LangChain generally refers to “Document Loaders” (plural). Adjust for accuracy.

-The LangChain adapter lets you use CrateDB as a vector store database, load
-documents via DocumentLoader, and use LangChain’s conversational memory.
+The LangChain adapter lets you use CrateDB as a vector store database, load
+documents via document loaders, and use LangChain’s conversational memory.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7395fb1 and 4ca75d7.

📒 Files selected for processing (3)
  • docs/feature/query/index.md (1 hunks)
  • docs/feature/search/vector/index.md (2 hunks)
  • docs/solution/machine-learning/index.md (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • docs/feature/query/index.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • docs/feature/search/vector/index.md
🧰 Additional context used
🪛 markdownlint-cli2 (0.17.2)
docs/solution/machine-learning/index.md

17-17: Reference links and images should use a label that is defined
Missing link or image reference definition: "vector database"

(MD052, reference-links-images)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Build docs
🔇 Additional comments (3)
docs/solution/machine-learning/index.md (3)

128-136: Duplicate “Advanced time series analysis with MLflow and PyCaret” card

This card appears in both “Time series analysis” and “MLOps and model training.” If intentional cross-linking, ignore; otherwise dedupe to reduce repetition.

Also applies to: 180-189


194-198: Confirm toctree target exists — OK
docs/solution/machine-learning/time-series.md exists and includes a top-level heading ("# Advanced Time Series Analysis") and the expected section ("## Anomaly Detection and Forecasting").


1-3: Labels are unique — no duplicates found

Repo-wide search for Markdown "(ml)=", "(ml-tools)=", "(machine-learning)=" and RST ".. _label:" found only docs/solution/machine-learning/index.md (lines 1–3).

@amotl amotl merged commit 88e8d0b into main Sep 16, 2025
1 of 3 checks passed
@amotl amotl deleted the ml-ng branch September 16, 2025 12:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cross linking Linking to different locations of the documentation. guidance Matters of layout, shape, and structure. refactoring Changing shape or layout, or moving content around. sanding-500 Sanding medium-sized details.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant