-
Notifications
You must be signed in to change notification settings - Fork 1
Machine learning: Rework and relocate section #314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Warning Rate limit exceeded@amotl has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 18 minutes and 16 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (3)
WalkthroughAdds a Machine Learning solution page and solution grid entry; removes the old Machine Learning topic page and card; updates multiple Time Series docs (titles, intros, tags, and removed advanced sections); adds a Text-to-SQL subsection under Search; rewords vector search overview; and adds a new external link entry. Changes
Sequence Diagram(s)No sequence diagrams necessary for these documentation-only structural and content changes. Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
d8db9ff
to
22c7390
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (7)
docs/feature/query/index.md (1)
169-171
: Avoid an orphan subsection: add 1–2 context sentences.Consider a one‑liner under “Text‑to‑SQL” so the section isn’t just a single link.
## Text-to-SQL -- {ref}`text-to-sql` +Natural language to SQL conversions using adapters and frameworks. +- {ref}`text-to-sql`docs/feature/search/vector/index.md (2)
25-29
: Consistent US spelling for kNN.Mix of “neighbour”/“Neighbors” in this page. Standardize to “nearest neighbor (kNN)”.
- neighbour (kNN) + neighbor (kNN) - functions, effectively conducting HNSW semantic similarity searches on them, + functions to perform HNSW-based semantic similarity search,
42-45
: Minor wording polish.Tighten the sentence and avoid “may be”.
-Feature vectors may be computed from raw data using machine learning -methods such as feature extraction algorithms, word embeddings, or deep -learning networks. +Feature vectors are computed from raw data via ML methods such as feature +extraction, word embeddings, or deep neural networks.docs/solution/machine-learning/index.md (4)
17-25
: Fix markdownlint MD052: explicit reference label.markdownlint often can’t resolve link definitions from includes. Use explicit reference to avoid “Missing link definition: vector database”.
-[Vector databases][Vector Database] can be used for similarity search, +[Vector databases][Vector Database] can be used for similarity search,If MD052 persists, switch to an inline link or duplicate the definition locally.
- [Vector databases][Vector Database] can be used for similarity search, + [Vector databases](https://en.wikipedia.org/wiki/Vector_database) can be used for similarity search,
33-38
: kNN spelling consistency.Use “neighbor” to match the rest of the docs.
- k-nearest neighbour (kNN) + k-nearest neighbor (kNN)
51-61
: LangChain adapter wording: minor tightening.Shorten and remove “also”.
-The LangChain adapter for CrateDB provides support to use CrateDB as a vector -store database, to load documents using LangChain’s DocumentLoader, and also -supports LangChain’s conversational memory subsystem. +The LangChain adapter lets you use CrateDB as a vector store database, load +documents via DocumentLoader, and use LangChain’s conversational memory.
162-170
: MLflow storage statement: consider adding link to setup guide.If there’s a page on configuring MLflow Tracking with CrateDB, link it here.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (11)
docs/_include/card/timeseries-intro.md
(1 hunks)docs/_include/links.md
(1 hunks)docs/feature/query/index.md
(1 hunks)docs/feature/search/vector/index.md
(2 hunks)docs/solution/index.md
(2 hunks)docs/solution/machine-learning/index.md
(1 hunks)docs/solution/machine-learning/time-series.md
(1 hunks)docs/topic/index.md
(0 hunks)docs/topic/ml/index.md
(0 hunks)docs/topic/timeseries/fundamentals.md
(2 hunks)docs/topic/timeseries/index.md
(0 hunks)
💤 Files with no reviewable changes (3)
- docs/topic/timeseries/index.md
- docs/topic/ml/index.md
- docs/topic/index.md
🧰 Additional context used
🪛 markdownlint-cli2 (0.17.2)
docs/solution/machine-learning/index.md
17-17: Reference links and images should use a label that is defined
Missing link or image reference definition: "vector database"
(MD052, reference-links-images)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Build docs
🔇 Additional comments (10)
docs/_include/links.md (1)
51-51
: Add MLOps link: looks good.No issues. Link label and URL format match the file’s conventions.
docs/solution/index.md (2)
72-83
: New ML card: LGTM.Copy, icon, and link role look consistent with the grid.
94-94
: TOC entry: LGTM.The hidden toctree entry will surface the page in nav; path looks correct.
docs/topic/timeseries/fundamentals.md (2)
5-12
: Intro and tags: LGTM.Tone and tag usage align with style used elsewhere.
30-32
: Include exists — no action required.
Found docs/_include/card/timeseries-datashader.md; referenced in docs/topic/timeseries/fundamentals.md (line 30) and docs/integrate/pyviz/index.md (line 62).docs/_include/card/timeseries-intro.md (1)
25-41
: Card copy tweaks: LGTM.Clear upgrade; MAX_BY note is helpful. No action needed.
docs/solution/machine-learning/time-series.md (2)
6-9
: Intro block: LGTM.Matches phrasing used in Fundamentals; consistent voice.
1-1
: Anchor token (ml-timeseries) is unique — no collisions found.
Only occurrence: docs/solution/machine-learning/time-series.md:1; nearby anchors (timeseries-advanced) and (timeseries-analysis) appear on lines 2–3 and are distinct.docs/solution/machine-learning/index.md (2)
66-72
: Good: cross-links for Text‑to‑SQL.Nice addition; clear positioning with MCP and enterprise data.
77-96
: Resolved — referenced anchors exist: llamaindex, mcp, mindsdbAnchors found at docs/integrate/llamaindex/index.md, docs/connect/mcp/index.md, docs/integrate/mindsdb/index.md.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (4)
docs/solution/machine-learning/index.md (4)
17-25
: Fix missing reference link: replace[Vector Database]
with a valid :ref: or plain textmarkdownlint reports an undefined reference label. Prefer linking to the Vector Search doc via :ref:, or drop the link.
Apply one of:
-[Vector Database] can be used for similarity search, +:ref:`Vector search <vector-search>` can be used for similarity search,or
-[Vector Database] can be used for similarity search, +Vector databases can be used for similarity search,
33-35
: k-NN phrasing consistencyUse the standard plural and hyphenation.
-CrateDB's FLOAT_VECTOR data type implements a vector store and the k-nearest -neighbor (kNN) search algorithm to find vectors that are similar to a query +CrateDB's FLOAT_VECTOR data type implements a vector store and the k‑nearest +neighbors (k‑NN) search algorithm to find vectors that are similar to a query
86-88
: Remove stray comma after the acronymGrammar nit.
-The Model Context Protocol (MCP), is an open protocol that enables seamless +The Model Context Protocol (MCP) is an open protocol that enables seamless
51-60
: Minor wording: LangChain “Document Loaders”LangChain generally refers to “Document Loaders” (plural). Adjust for accuracy.
-The LangChain adapter lets you use CrateDB as a vector store database, load -documents via DocumentLoader, and use LangChain’s conversational memory. +The LangChain adapter lets you use CrateDB as a vector store database, load +documents via document loaders, and use LangChain’s conversational memory.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
docs/feature/query/index.md
(1 hunks)docs/feature/search/vector/index.md
(2 hunks)docs/solution/machine-learning/index.md
(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- docs/feature/query/index.md
🚧 Files skipped from review as they are similar to previous changes (1)
- docs/feature/search/vector/index.md
🧰 Additional context used
🪛 markdownlint-cli2 (0.17.2)
docs/solution/machine-learning/index.md
17-17: Reference links and images should use a label that is defined
Missing link or image reference definition: "vector database"
(MD052, reference-links-images)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Build docs
🔇 Additional comments (3)
docs/solution/machine-learning/index.md (3)
128-136
: Duplicate “Advanced time series analysis with MLflow and PyCaret” cardThis card appears in both “Time series analysis” and “MLOps and model training.” If intentional cross-linking, ignore; otherwise dedupe to reduce repetition.
Also applies to: 180-189
194-198
: Confirm toctree target exists — OK
docs/solution/machine-learning/time-series.md exists and includes a top-level heading ("# Advanced Time Series Analysis") and the expected section ("## Anomaly Detection and Forecasting").
1-3
: Labels are unique — no duplicates foundRepo-wide search for Markdown "(ml)=", "(ml-tools)=", "(machine-learning)=" and RST ".. _label:" found only docs/solution/machine-learning/index.md (lines 1–3).
About
Outline
Preview
References