-
-
Notifications
You must be signed in to change notification settings - Fork 6
feat: convert anomaly demo to spark-connect #209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
* make token a stack parameter * use 8080 * start in /notebook
name: notebook | ||
initContainers: | ||
- name: download-notebook | ||
image: oci.stackable.tech/sdp/spark-connect-client:3.5.5-stackable0.0.0-dev |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might change depending on the outcome mentioned in the dependent PR:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, let's wait with merging this one
docs/modules/demos/pages/jupyterhub-pyspark-hdfs-anomaly-detection-taxi-data.adoc
Outdated
Show resolved
Hide resolved
docs/modules/demos/pages/jupyterhub-pyspark-hdfs-anomaly-detection-taxi-data.adoc
Outdated
Show resolved
Hide resolved
NOTE: Using a custom image requires access to a repository where the image can be made available. | ||
The Python notebook uses libraries such as `pandas` and `scikit-learn` to analyze the data. | ||
In addition, since the model training is delegated to a Spark Connect server, some of these dependencies, most notably `scikit-learn`, must also be made available on the Spark Connect pods. | ||
For convenience, a custom image is used in this demo that bundles all the required libraries for both the notebook and the Spark Connect server. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
optional: We could link to the Dockerfile (so that others can take the next steps for their use case).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docs/modules/demos/pages/jupyterhub-pyspark-hdfs-anomaly-detection-taxi-data.adoc
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Were the comments supposed to be in there?
#SCL = spark.sparkContext.broadcast(scaler)
#CLF = spark.sparkContext.broadcast(clf)
# No broadcast variables when using Spark Connect
# x_test = SCL.value.transform(x_test)
# prediction = CLF.value.predict(x_test)[0]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I left them as an explanation / reminder that connect only supports a subset of the spark api
image: | ||
# Using an image that includes scikit-learn (among other things) | ||
# because this package needs to be available on the executors. | ||
custom: oci.stackable.tech/sdp/spark-connect-client:3.5.5-stackable0.0.0-dev |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might change depending on the outcome mentioned in the dependent PR:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated according to stackabletech/docker-images#1072
…tion-taxi-data.adoc Co-authored-by: Nick <[email protected]>
…tion-taxi-data.adoc Co-authored-by: Nick <[email protected]>
…tion-taxi-data.adoc Co-authored-by: Nick <[email protected]>
Part of stackabletech/spark-k8s-operator#284
Depends on stackabletech/docker-images#1071