This repository includes the materials for the PySpark workshop in AMLD2019.
See INSTALLATION_UNIX.md in the docs folder.
See INSTALLATION_WINDOWS.md in the docs folder.
See GOOGLECOLAB_README.md in the docs folder.
If you run PySpark on your laptop then start with the notebook data_processing_start.ipynb in the src folder.
If you run PySpark on Google Colab then start with the notebook data_processing_gc_start.ipynb in the src folder.
If you run PySpark on your laptop then start with the notebook spark_mllib_start.ipynb in the src folder.
If you run PySpark on Google Colab then start with the notebook spark_mllib_gc_start.ipynb in the src folder.
See AWS_README.md in the docs folder.