Skip to content

Conversation

amotl
Copy link
Member

@amotl amotl commented Sep 15, 2025

About

Continue adding tutorials from the community forum.

Preview

References

/cc @seut

@amotl amotl added the refactoring Changing shape or layout, or moving content around. label Sep 15, 2025
Copy link

coderabbitai bot commented Sep 15, 2025

Walkthrough

Adds a TensorFlow integration overview and tutorial under docs/integrate/tensorflow, inserts the new entry into the integrate toctree, introduces a tensorflow-tutorial Sphinx anchor and several editorial/code-snippet adjustments in the tutorial, and replaces embedded TensorFlow content in the ML topic with a seealso reference to the new page.

Changes

Cohort / File(s) Summary
Integrations ToC update
docs/integrate/index.md
Inserts tensorflow/index into the integrations toctree after telegraf/index.
New TensorFlow integration page
docs/integrate/tensorflow/index.md
Adds an integration overview page with hero/logo block, About and Learn rubrics, a 2-column grid containing a grid-item-card linking to the tutorial (hidden toctree entry), and an external link to TensorFlow.org.
Tutorial content & anchor
docs/integrate/tensorflow/tutorial.rst
Adds Sphinx anchor .. _tensorflow-tutorial:, restructures headings (Introduction, Setup, Usage), renames/cases headings, changes code-block language for AWS creds to ini and path to ~/.aws/credentials, minor code edits (append .ravel() to label binarize result, change test_split→train_percent and split calc, fix import alias tf), and other editorial refinements.
ML topic index refactor
docs/topic/ml/index.md
Removes inlined TensorFlow content and hidden toctree entry; replaces with a seealso reference pointing to the new TensorFlow integration page and updates related links.

Sequence Diagram(s)

(omitted — documentation additions and editorial/code-snippet adjustments; no runtime/control-flow modifications warranting a sequence diagram)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • hammerhead
  • kneth

Poem

I nibble at docs, hop through each line,
A TensorFlow burrow — tidy and fine.
Anchors and cards, a tutorial trail,
I polish the meadow and leave a soft trail. 🐇

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Title Check ✅ Passed The title accurately and concisely summarizes the primary change: refactoring the TensorFlow documentation into a dedicated page, which matches the added docs/integrate/tensorflow/index.md, toctree updates, and related tutorial files in the changeset. It is specific, on-topic, and clear for a teammate scanning history.
Description Check ✅ Passed The PR description is on-topic: it states the intent to add community tutorials, provides a preview link, and references the related issue GH-313, all of which directly relate to the documentation changes in this PR. Although brief, it meets the lenient requirement of being related to the changeset.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
✨ Finishing touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch tensorflow

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0c58be5 and 74387ff.

📒 Files selected for processing (2)
  • docs/integrate/tensorflow/tutorial.rst (11 hunks)
  • docs/topic/ml/index.md (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • docs/integrate/tensorflow/tutorial.rst
  • docs/topic/ml/index.md
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Build docs

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@amotl amotl marked this pull request as ready for review September 15, 2025 21:43
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (9)
docs/integrate/tensorflow/tutorial.rst (9)

192-199: Bug: wrong pandas import module (pandas.io.aql).

Use pandas.io.sql; current import will fail at runtime.

Apply:

-import pandas.io.aql as sqlio
+import pandas.io.sql as sqlio
 from psycopg2 import connect

306-310: Fix TensorFlow import alias and avoid private API.

  • Import TensorFlow as tf (you use tf.* below).
  • Import EarlyStopping from public tensorflow.keras.callbacks.
-import tensorflow as f
+import tensorflow as tf
 import matplotlib.pyplot as plt
-from tensorflow.python.keras.callbacks import EarlyStopping
+from tensorflow.keras.callbacks import EarlyStopping
 from tensorflow.keras import layers

318-321: Classification head should use sigmoid.

Final layer lacks activation; downstream uses 0.5 threshold -> binary classification.

-mlp_model.add(layers.Dense(1))
+mlp_model.add(layers.Dense(1, activation="sigmoid"))

325-328: Loss/metric mismatch for binary classification.

Use binary_crossentropy (not MSE) with accuracy.

-mlp_model.compile(optimizer="adam",
-                  loss=tf.keras.losses.MeanSquaredError(),
-                  metrics=["accuracy"])
+mlp_model.compile(optimizer="adam",
+                  loss="binary_crossentropy",
+                  metrics=["accuracy"])

459-467: Filename mismatch: add extensions when downloading.

Earlier you saved model as .h5 and scaler as .joblib; download uses names without extensions.

-# These name should be the same as you used for saving the model, including the file extension
-model_name = "some-model-name"
-scaler_name = "some-scaler-name"
+# Use the same names (including extensions) used when saving
+model_name = "some-model-name.h5"
+scaler_name = "some-scaler-name.joblib"

471-477: Repeat pandas import typo and missing numpy import.

Fix pandas import and add numpy for later array conversion.

-    import pandas.io.aql as sqlio
+    import pandas.io.sql as sqlio
     import tensorflow as tf
+    import numpy as np
     from joblib import load
     from psycopg2 import connect

497-503: Bug: referencing df_train in prediction phase.

Use the current dataset length to bound the loop.

-        if row + timestep < len(df_train.index):
+        if row + timestep <= len(norm_data_cols):
             data_window = norm_data_cols[row:row + timestep]
             data_vector = data_window.flatten()
             x_pred.append(data_vector)
             pred_saved_step = row + timestep
         else:
             break

505-508: Shape/typing errors when predicting.

Convert list to array; don’t reshape to a single giant row; flatten result properly.

-    predictions = model.predict(x_pred.reshape(1, -1))
-    predictions.flatten()
+    x_pred = np.asarray(x_pred, dtype=float)
+    predictions = model.predict(x_pred).flatten()

523-533: Use a timezone-aware datetime (replace epoch float).

Pass a timezone-aware datetime (UTC) instead of time.time(); consider batching to reduce DB round-trips.

File: docs/integrate/tensorflow/tutorial.rst (lines 523-533)

-    import time
+    from datetime import datetime, timezone
@@
-                cur.execute(query, (float(entry), time.time(), model_name))
+                cur.execute(
+                    query,
+                    (float(entry), datetime.now(timezone.utc), model_name),
+                )
🧹 Nitpick comments (5)
docs/integrate/tensorflow/tutorial.rst (4)

215-219: Label binarize returns 2D array; flatten to 1D.

Avoid assigning a 2D array into a single column.

-df["machine_status"] = label_binarize(df["machine_status"], classes=["BROKEN", "NORMAL"])
+df["machine_status"] = label_binarize(
+    df["machine_status"], classes=["BROKEN", "NORMAL"]
+).ravel()

223-227: Variable naming: test_split actually sets the train share.

Minor clarity tweak.

-test_split = 70
-df_size = len(df.index)
-split = int((df_size / 100) * test_split)
+train_percent = 70
+split = int((len(df.index) * train_percent) / 100)

99-156: Schema: store timestamps as TIMESTAMP, not TEXT.

Column "timestamp" should be TIMESTAMP [WITH TIME ZONE] for correct typing and functions.

-   "timestamp" TEXT,
+   "timestamp" TIMESTAMP WITH TIME ZONE,

61-84: AWS credentials path note.

You point to ~/.aws/config; typically credentials live in ~/.aws/credentials with profiles configured in ~/.aws/config. SDK will search both; consider mentioning ~/.aws/credentials for clarity.

docs/integrate/tensorflow/index.md (1)

4-8: Avoid hotlinking a hashed gstatic logo.

That path may change; prefer a vendored asset or an official stable logo URL.

I can add the SVG under docs/_assets and switch the src if you want.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 427d701 and addcdff.

📒 Files selected for processing (4)
  • docs/integrate/index.md (1 hunks)
  • docs/integrate/tensorflow/index.md (1 hunks)
  • docs/integrate/tensorflow/tutorial.rst (1 hunks)
  • docs/topic/ml/index.md (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Build docs
🔇 Additional comments (6)
docs/integrate/tensorflow/tutorial.rst (1)

3-3: Anchor addition looks good.

New target tensorflow-tutorial matches the card link in the integration page.

docs/integrate/tensorflow/index.md (3)

1-3: Label and title LGTM.

Provides a stable {ref}tensorflow target used elsewhere.


20-27: Card-to-tutorial link resolves.

:link-type: ref to tensorflow-tutorial matches tutorial.rst anchor.


32-36: Hidden toctree entry is correct.

Ensures sidebar shows the tutorial under the TensorFlow section.

docs/topic/ml/index.md (1)

268-270: Good refactor — approve replacing inline section with cross‑ref.
Cross‑ref target (tensorflow)= exists in docs/integrate/tensorflow/index.md (1 match); no broken refs found.

docs/integrate/index.md (1)

71-72: Navigation entry added correctly — verified.
Single exact match found at docs/integrate/index.md:71 (tensorflow/index).

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (8)
docs/integrate/tensorflow/tutorial.rst (8)

193-202: Fix pandas import: pandas.io.aql is invalid.

Use pandas.io.sql or top‑level pandas.read_sql_query.

Apply this diff:

-.. code-block:: python
-
-    import pandas.io.aql as sqlio
-    from psycopg2 import connect
+.. code-block:: python
+
+    import pandas.io.sql as sqlio
+    from psycopg2 import connect

308-317: TensorFlow alias mismatch and internal import.

  • You import tensorflow as f but use tf later → NameError.
  • Avoid tensorflow.python.* internals; import callbacks from the public API.

Apply this diff:

-    import tensorflow as f
+    import tensorflow as tf
     import matplotlib.pyplot as plt
-    from tensorflow.python.keras.callbacks import EarlyStopping
-    from tensorflow.keras import layers
+    from tensorflow.keras.callbacks import EarlyStopping
+    from tensorflow.keras import layers
@@
-    mlp_model = tf.keras.Sequential()
+    mlp_model = tf.keras.Sequential()

Also applies to: 324-336


318-330: Use sigmoid + binary cross‑entropy for binary classification.

A linear output with MSE is suboptimal for a 0/1 target. Prefer a sigmoid output and binary_crossentropy.

Apply this diff:

-    mlp_model.add(layers.Dense(1))
+    mlp_model.add(layers.Dense(1, activation="sigmoid"))
@@
-    mlp_model.compile(optimizer="adam",
-                      loss=tf.keras.losses.MeanSquaredError(),
-                      metrics=["accuracy"])
+    mlp_model.compile(
+        optimizer="adam",
+        loss="binary_crossentropy",
+        metrics=["accuracy"]
+    )

421-431: Filename/extension mismatch between save and load.

You save model_name = "some-model-name" + ".h5" and scaler_name = "some-scaler-name" + ".joblib", but later download without extensions. This will 404.

Apply this diff to keep names consistent and prefer the modern .keras format:

-    model_name = "some-model-name" + ".h5"
+    model_name = "some-model-name.keras"
@@
-    scaler_name = "some-scaler-name" + ".joblib"
+    scaler_name = "some-scaler-name.joblib"

And here:

-    model_name = "some-model-name"
-    scaler_name = "some-scaler-name"
+    model_name = "some-model-name.keras"
+    scaler_name = "some-scaler-name.joblib"

Also applies to: 461-469


473-484: Fix pandas import (again) and ensure conn_str is documented.

Repeat of the invalid pandas.io.aql. Also, either define conn_str or add a short note showing its format.

Apply this diff:

-    import pandas.io.aql as sqlio
+    import pandas.io.sql as sqlio
@@
-    with connect(conn_str) as conn:
+    with connect(conn_str) as conn:  # e.g., "postgresql://crate@localhost:5432/doc"

494-509: Inference window generation references df_train and mis-shapes input.

  • df_train is undefined in this script.
  • Build x_pred as a 2‑D array of shape (num_windows, input_len).
  • Ensure timestep is defined in this file.

Apply this diff:

-    # Create the x_pred data set, containing the data of the specified time window as a vector
-    x_pred = []
-    pred_saved_step = 0
-    for i in norm_data_cols.index:
-        row = pred_saved_step
-        if row + timestep < len(df_train.index):
-            data_window = norm_data_cols[row:row + timestep]
-            data_vector = data_window.flatten()
-            x_pred.append(data_vector)
-            pred_saved_step = row + timestep
-        else:
-            break
+    # Define timestep if not already defined in this script
+    timestep = 60
+    # Build prediction windows
+    x_pred = []
+    pred_saved_step = 0
+    n = len(norm_data_cols.index)
+    while pred_saved_step + timestep <= n:
+        row = pred_saved_step
+        data_window = norm_data_cols[row:row + timestep]
+        data_vector = data_window.flatten()
+        x_pred.append(data_vector)
+        pred_saved_step = row + timestep
+    x_pred = np.asarray(x_pred)

507-510: Flatten not assigned; prediction on wrong shape.

Call model.predict on a 2‑D array and assign the flattened result.

Apply this diff:

-    predictions = model.predict(x_pred.reshape(1, -1))
-    predictions.flatten()
+    predictions = model.predict(x_pred)
+    predictions = predictions.flatten()

528-535: Insert proper timestamp type instead of time.time().

time.time() returns seconds float; use an actual UTC timestamp for TIMESTAMP WITH TIME ZONE.

Apply this diff:

-    import time
+    from datetime import datetime, timezone
@@
-                cur.execute(query, (float(entry), time.time(), model_name))
+                cur.execute(query, (float(entry), datetime.now(timezone.utc), model_name))
🧹 Nitpick comments (2)
docs/topic/ml/index.md (1)

280-284: Remove or use unused link reference definitions (MD053).

These references are defined but not used in this file. Either link to them in the content or remove to satisfy markdownlint.

Apply this diff if you choose to remove:

-[AutoML with PyCaret and CrateDB]: https://github.com/crate/cratedb-examples/tree/main/topic/machine-learning/pycaret
-[automl-classify-github]: https://github.com/crate/cratedb-examples/blob/main/topic/machine-learning/pycaret/automl_classification_with_pycaret.py
-[automl-classify-colab]: https://colab.research.google.com/github/crate/cratedb-examples/blob/main/topic/machine-learning/pycaret/automl_classification_with_pycaret.ipynb
-[automl-forecasting-github]: https://github.com/crate/cratedb-examples/blob/main/topic/machine-learning/pycaret/automl_timeseries_forecasting_with_pycaret.ipynb
-[automl-forecasting-colab]: https://colab.research.google.com/github/crate/cratedb-examples/blob/main/topic/machine-learning/pycaret/automl_timeseries_forecasting_with_pycaret.ipynb
docs/integrate/tensorflow/tutorial.rst (1)

433-436: Credentials path comment inconsistent with prerequisites.

Earlier you instruct to use ~/.aws/credentials, but this comment says ~/.aws/config. Align the comment.

Apply this diff:

-    # will be read from the ~/.aws/config file and then the files will be
+    # will be read from the ~/.aws/credentials file (via the default boto3 chain) and then the files will be
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between addcdff and 55f8f58.

📒 Files selected for processing (4)
  • docs/integrate/index.md (1 hunks)
  • docs/integrate/tensorflow/index.md (1 hunks)
  • docs/integrate/tensorflow/tutorial.rst (9 hunks)
  • docs/topic/ml/index.md (2 hunks)
✅ Files skipped from review due to trivial changes (1)
  • docs/integrate/index.md
🧰 Additional context used
🪛 markdownlint-cli2 (0.17.2)
docs/topic/ml/index.md

280-280: Link and image reference definitions should be needed
Unused link or image reference definition: "automl with pycaret and cratedb"

(MD053, link-image-reference-definitions)


281-281: Link and image reference definitions should be needed
Unused link or image reference definition: "automl-classify-github"

(MD053, link-image-reference-definitions)


282-282: Link and image reference definitions should be needed
Unused link or image reference definition: "automl-classify-colab"

(MD053, link-image-reference-definitions)


283-283: Link and image reference definitions should be needed
Unused link or image reference definition: "automl-forecasting-github"

(MD053, link-image-reference-definitions)


284-284: Link and image reference definitions should be needed
Unused link or image reference definition: "automl-forecasting-colab"

(MD053, link-image-reference-definitions)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Build docs
🔇 Additional comments (6)
docs/integrate/tensorflow/index.md (3)

4-8: External logo embed: confirm license and stability.

You're hotlinking the TensorFlow lockup SVG from gstatic. Please confirm usage is permitted and that the asset URL is stable for long‑term docs. Consider vendoring the SVG or using an officially documented CDN path.


32-36: Toctree looks good.

Hidden toctree entry correctly points to tutorial in the same folder. No issues.


22-27: Cross‑ref target present — run Sphinx build to confirm no warnings.
Anchor '.. _tensorflow-tutorial:' located at docs/integrate/tensorflow/tutorial.rst:3; link at docs/integrate/tensorflow/index.md:23.

docs/topic/ml/index.md (1)

204-206: Good deferral to dedicated page.

Replacing inline content with a seealso to {ref} keeps this topic concise and reduces duplication.

docs/integrate/tensorflow/tutorial.rst (2)

216-222: Label binarization update LGTM.

.ravel() is appropriate to get a 1‑D target array.


240-276: Sliding window generation skips most rows; confirm intent.

You advance by timestep each iteration (saved_step = row + timestep), which creates non‑overlapping windows. If you intended overlapping windows, increment by 1 instead. Otherwise, leave as is.

Also applies to: 353-368

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
docs/topic/ml/index.md (1)

204-206: Good refactor to a dedicated TensorFlow page.

Switching to a seealso keeps this page focused and reduces duplication. One minor copy edit for concision.

Apply this diff:

-:::{seealso}
-Please navigate to the dedicated page about {ref}`tensorflow`.
-:::
+:::{seealso}
+See the dedicated page: {ref}`tensorflow`.
+:::
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 55f8f58 and 0c58be5.

📒 Files selected for processing (2)
  • docs/integrate/tensorflow/tutorial.rst (10 hunks)
  • docs/topic/ml/index.md (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • docs/integrate/tensorflow/tutorial.rst
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Build docs
🔇 Additional comments (1)
docs/topic/ml/index.md (1)

204-206: Resolved — 'tensorflow' cross-reference target exists.
Label (tensorflow)= is defined at docs/integrate/tensorflow/index.md:1; the {ref}tensorflow in docs/topic/ml/index.md resolves.

@amotl amotl merged commit ee1d878 into main Sep 16, 2025
3 checks passed
@amotl amotl deleted the tensorflow branch September 16, 2025 08:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

refactoring Changing shape or layout, or moving content around.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant