DataScienceCourse

This notebook shows how to implement k-means clustering in Spark.

This example requires an installation of Spark at the location $SPARK_HOME and it uses sklearn for creating a random dataset. Then start PySpark with the notebook as follows:

PYSPARK_DRIVER_PYTHON=jupyter PYSPARK_DRIVER_PYTHON_OPTS=notebook $SPARK_HOME/bin/pyspark

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DataScienceCourse

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

DataScienceCourse