Skip to content

Latest commit

 

History

History
10 lines (7 loc) · 362 Bytes

File metadata and controls

10 lines (7 loc) · 362 Bytes

DataScienceCourse

This notebook shows how to implement k-means clustering in Spark.

This example requires an installation of Spark at the location $SPARK_HOME and it uses sklearn for creating a random dataset. Then start PySpark with the notebook as follows:

PYSPARK_DRIVER_PYTHON=jupyter PYSPARK_DRIVER_PYTHON_OPTS=notebook $SPARK_HOME/bin/pyspark