-
Notifications
You must be signed in to change notification settings - Fork 1
TensorFlow: Refactor section to dedicated page #307
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughAdds a TensorFlow integration overview and tutorial under docs/integrate/tensorflow, inserts the new entry into the integrate toctree, introduces a tensorflow-tutorial Sphinx anchor and several editorial/code-snippet adjustments in the tutorial, and replaces embedded TensorFlow content in the ML topic with a seealso reference to the new page. Changes
Sequence Diagram(s)(omitted — documentation additions and editorial/code-snippet adjustments; no runtime/control-flow modifications warranting a sequence diagram) Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches🧪 Generate unit tests
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (9)
docs/integrate/tensorflow/tutorial.rst (9)
192-199
: Bug: wrong pandas import module (pandas.io.aql
).Use pandas.io.sql; current import will fail at runtime.
Apply:
-import pandas.io.aql as sqlio +import pandas.io.sql as sqlio from psycopg2 import connect
306-310
: Fix TensorFlow import alias and avoid private API.
- Import TensorFlow as
tf
(you usetf.*
below).- Import EarlyStopping from public
tensorflow.keras.callbacks
.-import tensorflow as f +import tensorflow as tf import matplotlib.pyplot as plt -from tensorflow.python.keras.callbacks import EarlyStopping +from tensorflow.keras.callbacks import EarlyStopping from tensorflow.keras import layers
318-321
: Classification head should use sigmoid.Final layer lacks activation; downstream uses 0.5 threshold -> binary classification.
-mlp_model.add(layers.Dense(1)) +mlp_model.add(layers.Dense(1, activation="sigmoid"))
325-328
: Loss/metric mismatch for binary classification.Use binary_crossentropy (not MSE) with accuracy.
-mlp_model.compile(optimizer="adam", - loss=tf.keras.losses.MeanSquaredError(), - metrics=["accuracy"]) +mlp_model.compile(optimizer="adam", + loss="binary_crossentropy", + metrics=["accuracy"])
459-467
: Filename mismatch: add extensions when downloading.Earlier you saved model as .h5 and scaler as .joblib; download uses names without extensions.
-# These name should be the same as you used for saving the model, including the file extension -model_name = "some-model-name" -scaler_name = "some-scaler-name" +# Use the same names (including extensions) used when saving +model_name = "some-model-name.h5" +scaler_name = "some-scaler-name.joblib"
471-477
: Repeat pandas import typo and missing numpy import.Fix pandas import and add numpy for later array conversion.
- import pandas.io.aql as sqlio + import pandas.io.sql as sqlio import tensorflow as tf + import numpy as np from joblib import load from psycopg2 import connect
497-503
: Bug: referencing df_train in prediction phase.Use the current dataset length to bound the loop.
- if row + timestep < len(df_train.index): + if row + timestep <= len(norm_data_cols): data_window = norm_data_cols[row:row + timestep] data_vector = data_window.flatten() x_pred.append(data_vector) pred_saved_step = row + timestep else: break
505-508
: Shape/typing errors when predicting.Convert list to array; don’t reshape to a single giant row; flatten result properly.
- predictions = model.predict(x_pred.reshape(1, -1)) - predictions.flatten() + x_pred = np.asarray(x_pred, dtype=float) + predictions = model.predict(x_pred).flatten()
523-533
: Use a timezone-aware datetime (replace epoch float).Pass a timezone-aware datetime (UTC) instead of time.time(); consider batching to reduce DB round-trips.
File: docs/integrate/tensorflow/tutorial.rst (lines 523-533)
- import time + from datetime import datetime, timezone @@ - cur.execute(query, (float(entry), time.time(), model_name)) + cur.execute( + query, + (float(entry), datetime.now(timezone.utc), model_name), + )
🧹 Nitpick comments (5)
docs/integrate/tensorflow/tutorial.rst (4)
215-219
: Label binarize returns 2D array; flatten to 1D.Avoid assigning a 2D array into a single column.
-df["machine_status"] = label_binarize(df["machine_status"], classes=["BROKEN", "NORMAL"]) +df["machine_status"] = label_binarize( + df["machine_status"], classes=["BROKEN", "NORMAL"] +).ravel()
223-227
: Variable naming:test_split
actually sets the train share.Minor clarity tweak.
-test_split = 70 -df_size = len(df.index) -split = int((df_size / 100) * test_split) +train_percent = 70 +split = int((len(df.index) * train_percent) / 100)
99-156
: Schema: store timestamps as TIMESTAMP, not TEXT.Column "timestamp" should be TIMESTAMP [WITH TIME ZONE] for correct typing and functions.
- "timestamp" TEXT, + "timestamp" TIMESTAMP WITH TIME ZONE,
61-84
: AWS credentials path note.You point to ~/.aws/config; typically credentials live in ~/.aws/credentials with profiles configured in ~/.aws/config. SDK will search both; consider mentioning ~/.aws/credentials for clarity.
docs/integrate/tensorflow/index.md (1)
4-8
: Avoid hotlinking a hashed gstatic logo.That path may change; prefer a vendored asset or an official stable logo URL.
I can add the SVG under docs/_assets and switch the src if you want.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
docs/integrate/index.md
(1 hunks)docs/integrate/tensorflow/index.md
(1 hunks)docs/integrate/tensorflow/tutorial.rst
(1 hunks)docs/topic/ml/index.md
(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Build docs
🔇 Additional comments (6)
docs/integrate/tensorflow/tutorial.rst (1)
3-3
: Anchor addition looks good.New target
tensorflow-tutorial
matches the card link in the integration page.docs/integrate/tensorflow/index.md (3)
1-3
: Label and title LGTM.Provides a stable {ref}
tensorflow
target used elsewhere.
20-27
: Card-to-tutorial link resolves.:link-type: ref to
tensorflow-tutorial
matches tutorial.rst anchor.
32-36
: Hidden toctree entry is correct.Ensures sidebar shows the tutorial under the TensorFlow section.
docs/topic/ml/index.md (1)
268-270
: Good refactor — approve replacing inline section with cross‑ref.
Cross‑ref target (tensorflow)= exists in docs/integrate/tensorflow/index.md (1 match); no broken refs found.docs/integrate/index.md (1)
71-72
: Navigation entry added correctly — verified.
Single exact match found at docs/integrate/index.md:71 (tensorflow/index).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (8)
docs/integrate/tensorflow/tutorial.rst (8)
193-202
: Fix pandas import:pandas.io.aql
is invalid.Use pandas.io.sql or top‑level
pandas.read_sql_query
.Apply this diff:
-.. code-block:: python - - import pandas.io.aql as sqlio - from psycopg2 import connect +.. code-block:: python + + import pandas.io.sql as sqlio + from psycopg2 import connect
308-317
: TensorFlow alias mismatch and internal import.
- You import
tensorflow as f
but usetf
later → NameError.- Avoid
tensorflow.python.*
internals; import callbacks from the public API.Apply this diff:
- import tensorflow as f + import tensorflow as tf import matplotlib.pyplot as plt - from tensorflow.python.keras.callbacks import EarlyStopping - from tensorflow.keras import layers + from tensorflow.keras.callbacks import EarlyStopping + from tensorflow.keras import layers @@ - mlp_model = tf.keras.Sequential() + mlp_model = tf.keras.Sequential()Also applies to: 324-336
318-330
: Use sigmoid + binary cross‑entropy for binary classification.A linear output with MSE is suboptimal for a 0/1 target. Prefer a sigmoid output and
binary_crossentropy
.Apply this diff:
- mlp_model.add(layers.Dense(1)) + mlp_model.add(layers.Dense(1, activation="sigmoid")) @@ - mlp_model.compile(optimizer="adam", - loss=tf.keras.losses.MeanSquaredError(), - metrics=["accuracy"]) + mlp_model.compile( + optimizer="adam", + loss="binary_crossentropy", + metrics=["accuracy"] + )
421-431
: Filename/extension mismatch between save and load.You save
model_name = "some-model-name" + ".h5"
andscaler_name = "some-scaler-name" + ".joblib"
, but later download without extensions. This will 404.Apply this diff to keep names consistent and prefer the modern
.keras
format:- model_name = "some-model-name" + ".h5" + model_name = "some-model-name.keras" @@ - scaler_name = "some-scaler-name" + ".joblib" + scaler_name = "some-scaler-name.joblib"And here:
- model_name = "some-model-name" - scaler_name = "some-scaler-name" + model_name = "some-model-name.keras" + scaler_name = "some-scaler-name.joblib"Also applies to: 461-469
473-484
: Fix pandas import (again) and ensureconn_str
is documented.Repeat of the invalid
pandas.io.aql
. Also, either defineconn_str
or add a short note showing its format.Apply this diff:
- import pandas.io.aql as sqlio + import pandas.io.sql as sqlio @@ - with connect(conn_str) as conn: + with connect(conn_str) as conn: # e.g., "postgresql://crate@localhost:5432/doc"
494-509
: Inference window generation referencesdf_train
and mis-shapes input.
df_train
is undefined in this script.- Build
x_pred
as a 2‑D array of shape (num_windows, input_len).- Ensure
timestep
is defined in this file.Apply this diff:
- # Create the x_pred data set, containing the data of the specified time window as a vector - x_pred = [] - pred_saved_step = 0 - for i in norm_data_cols.index: - row = pred_saved_step - if row + timestep < len(df_train.index): - data_window = norm_data_cols[row:row + timestep] - data_vector = data_window.flatten() - x_pred.append(data_vector) - pred_saved_step = row + timestep - else: - break + # Define timestep if not already defined in this script + timestep = 60 + # Build prediction windows + x_pred = [] + pred_saved_step = 0 + n = len(norm_data_cols.index) + while pred_saved_step + timestep <= n: + row = pred_saved_step + data_window = norm_data_cols[row:row + timestep] + data_vector = data_window.flatten() + x_pred.append(data_vector) + pred_saved_step = row + timestep + x_pred = np.asarray(x_pred)
507-510
: Flatten not assigned; prediction on wrong shape.Call
model.predict
on a 2‑D array and assign the flattened result.Apply this diff:
- predictions = model.predict(x_pred.reshape(1, -1)) - predictions.flatten() + predictions = model.predict(x_pred) + predictions = predictions.flatten()
528-535
: Insert proper timestamp type instead oftime.time()
.
time.time()
returns seconds float; use an actual UTC timestamp forTIMESTAMP WITH TIME ZONE
.Apply this diff:
- import time + from datetime import datetime, timezone @@ - cur.execute(query, (float(entry), time.time(), model_name)) + cur.execute(query, (float(entry), datetime.now(timezone.utc), model_name))
🧹 Nitpick comments (2)
docs/topic/ml/index.md (1)
280-284
: Remove or use unused link reference definitions (MD053).These references are defined but not used in this file. Either link to them in the content or remove to satisfy markdownlint.
Apply this diff if you choose to remove:
-[AutoML with PyCaret and CrateDB]: https://github.com/crate/cratedb-examples/tree/main/topic/machine-learning/pycaret -[automl-classify-github]: https://github.com/crate/cratedb-examples/blob/main/topic/machine-learning/pycaret/automl_classification_with_pycaret.py -[automl-classify-colab]: https://colab.research.google.com/github/crate/cratedb-examples/blob/main/topic/machine-learning/pycaret/automl_classification_with_pycaret.ipynb -[automl-forecasting-github]: https://github.com/crate/cratedb-examples/blob/main/topic/machine-learning/pycaret/automl_timeseries_forecasting_with_pycaret.ipynb -[automl-forecasting-colab]: https://colab.research.google.com/github/crate/cratedb-examples/blob/main/topic/machine-learning/pycaret/automl_timeseries_forecasting_with_pycaret.ipynbdocs/integrate/tensorflow/tutorial.rst (1)
433-436
: Credentials path comment inconsistent with prerequisites.Earlier you instruct to use
~/.aws/credentials
, but this comment says~/.aws/config
. Align the comment.Apply this diff:
- # will be read from the ~/.aws/config file and then the files will be + # will be read from the ~/.aws/credentials file (via the default boto3 chain) and then the files will be
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
docs/integrate/index.md
(1 hunks)docs/integrate/tensorflow/index.md
(1 hunks)docs/integrate/tensorflow/tutorial.rst
(9 hunks)docs/topic/ml/index.md
(2 hunks)
✅ Files skipped from review due to trivial changes (1)
- docs/integrate/index.md
🧰 Additional context used
🪛 markdownlint-cli2 (0.17.2)
docs/topic/ml/index.md
280-280: Link and image reference definitions should be needed
Unused link or image reference definition: "automl with pycaret and cratedb"
(MD053, link-image-reference-definitions)
281-281: Link and image reference definitions should be needed
Unused link or image reference definition: "automl-classify-github"
(MD053, link-image-reference-definitions)
282-282: Link and image reference definitions should be needed
Unused link or image reference definition: "automl-classify-colab"
(MD053, link-image-reference-definitions)
283-283: Link and image reference definitions should be needed
Unused link or image reference definition: "automl-forecasting-github"
(MD053, link-image-reference-definitions)
284-284: Link and image reference definitions should be needed
Unused link or image reference definition: "automl-forecasting-colab"
(MD053, link-image-reference-definitions)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Build docs
🔇 Additional comments (6)
docs/integrate/tensorflow/index.md (3)
4-8
: External logo embed: confirm license and stability.You're hotlinking the TensorFlow lockup SVG from gstatic. Please confirm usage is permitted and that the asset URL is stable for long‑term docs. Consider vendoring the SVG or using an officially documented CDN path.
32-36
: Toctree looks good.Hidden toctree entry correctly points to
tutorial
in the same folder. No issues.
22-27
: Cross‑ref target present — run Sphinx build to confirm no warnings.
Anchor '.. _tensorflow-tutorial:' located at docs/integrate/tensorflow/tutorial.rst:3; link at docs/integrate/tensorflow/index.md:23.docs/topic/ml/index.md (1)
204-206
: Good deferral to dedicated page.Replacing inline content with a seealso to
{ref}
keeps this topic concise and reduces duplication.docs/integrate/tensorflow/tutorial.rst (2)
216-222
: Label binarization update LGTM.
.ravel()
is appropriate to get a 1‑D target array.
240-276
: Sliding window generation skips most rows; confirm intent.You advance by
timestep
each iteration (saved_step = row + timestep
), which creates non‑overlapping windows. If you intended overlapping windows, increment by 1 instead. Otherwise, leave as is.Also applies to: 353-368
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
docs/topic/ml/index.md (1)
204-206
: Good refactor to a dedicated TensorFlow page.Switching to a seealso keeps this page focused and reduces duplication. One minor copy edit for concision.
Apply this diff:
-:::{seealso} -Please navigate to the dedicated page about {ref}`tensorflow`. -::: +:::{seealso} +See the dedicated page: {ref}`tensorflow`. +:::
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
docs/integrate/tensorflow/tutorial.rst
(10 hunks)docs/topic/ml/index.md
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- docs/integrate/tensorflow/tutorial.rst
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Build docs
🔇 Additional comments (1)
docs/topic/ml/index.md (1)
204-206
: Resolved — 'tensorflow' cross-reference target exists.
Label(tensorflow)=
is defined at docs/integrate/tensorflow/index.md:1; the {ref}tensorflow
in docs/topic/ml/index.md resolves.
About
Continue adding tutorials from the community forum.
Preview
References
/cc @seut