Azure Machine Learning designer (preview) gives you a cloud-based interactive, visual workspace that you can use to easily and quickly prep data, train and deploy machine learning models. It supports Azure Machine Learning compute, GPU or CPU. Machine Learning designer also supports publishing models as web services on Azure Kubernetes Service that can easily be consumed by other applications.
In this lab, we will be using the Weather Dataset
that has weather data for 66 different airports in the USA from April to October 2013. We will cluster the dataset into 5 distinct clusters based on key weather metrics, such as visibility, temperature, dew point, wind speed etc. The goal is to group airports with similar weather conditions. We will do all of this from the Azure Machine Learning designer without writing a single line of code.
-
In Azure portal, open the available machine learning workspace.
-
Select Launch now under the Try the new Azure Machine Learning studio message.
-
When you first launch the studio, you may need to set the directory and subscription. If so, you will see this screen:
For the directory, select Udacity and for the subscription, select Azure Sponsorship. For the machine learning workspace, you may see multiple options listed. Select any of these (it doesn't matter which) and then click Get started.
-
From the studio, select Designer, +. This will open a
visual pipeline authoring editor
.
-
In the settings panel on the right, select Select compute target.
-
In the
Set up compute target
editor, select the available compute, and then select Save.
Note: If you are facing difficulties in accessing pop-up windows or buttons in the user interface, please refer to the Help section in the lab environment.
-
Select Datasets section in the left navigation. Next, select Samples, Weather Dataset and drag and drop the selected dataset on to the canvas.
-
Select Modules, Data Transformation section in the left navigation. Follow the steps outlined below:
-
Select the Select Columns in Dataset prebuilt module
-
Drag and drop the selected module on to the canvas
-
Connect the
Weather Dataset
module to theSelect Columns in Dataset
module -
Select Edit column link to open the
Select columns
editor
Note that you can submit the pipeline at any point to peek at the outputs and activities. Running pipeline also generates metadata that is available for downstream activities such selecting column names from a list in selection dialogs.
-
-
In the
Select columns
editor, follow the steps outlined below:-
Include: Column indices
-
Provide column indices: 8, 10-17, 20, 26
-
Select Save
-
-
Select Data Transformation section in the left navigation. Follow the steps outlined below:
-
Select the Split Data prebuilt module
-
Drag and drop the selected module on to the canvas
-
Connect the
Select Columns in Dataset
module to theSplit Data
module -
Fraction of rows in the first output dataset: 0.1
-
-
Select Data Transformation section in the left navigation. Follow the steps outlined below:
-
Select the Normalize Data prebuilt module
-
Drag and drop the selected module on to the canvas
-
Connect the left port of
Split Data
module to theNormalize Data
module -
Select Edit column link to open the
Columns to transform
editor
-
-
In the
Columns to transform
editor, follow the steps outlined below:-
Include: All columns
-
Select Save
-
-
Select Machine Learning Algorithms section in the left navigation. Follow the steps outlined below:
-
Select the K-Means Clustering prebuilt module
-
Drag and drop the selected module on to the canvas
-
Number of centroids: 5
-
-
Select Model Training section in the left navigation. Follow the steps outlined below:
-
Select the Train Clustering Model prebuilt module
-
Drag and drop the selected module on to the canvas
-
Connect the
K-Means Clustering
module to the first input of theTrain Clustering Model
module -
Connect the first output of the
Normalize Data
module to the second input of theTrain Clustering Model
module -
Select the Edit column link to open the
Column set
editor
-
-
In the
Columns set
editor, follow the steps outlined below:-
Include: All columns
-
Select Save
-
-
Select Model Scoring & Evaluation section in the left navigation. Follow the steps outlined below:
-
Select the Assign Data to Clusters prebuilt module
-
Drag and drop the selected module on to the canvas
-
Connect the first output of the
Train Clustering Model
module to the first input of theAssign Data to Clusters
module -
Connect the first output of the
Normalize Data
module to the second input of theAssign Data to Clusters
module
-
-
Select Submit to open the
Setup pipeline run
editor. -
In the
Setup pipeline run editor
, select Experiment, Create new and provideNew experiment name:
cluster-weather, and then select Submit. -
Wait for pipeline run to complete. It will take around 10 minutes to complete the run.
-
While you wait for the model training to complete, you can learn more about the K-Means Clustering algorithm used in this lab by selecting K-Means Clustering.
-
Select Assign Data to Clusters, Outputs + logs, Visualize to open the
Assign Data to Clusters result visualization
dialog.
-
Scroll to the right and select Assignments column.
-
In the right-hand-side pane, scroll down to the Visualizations section.
-
From the results you can observe that each row (input) in the dataset is assigned to one of the 5 clusters: 0, 1, 2, 3, or 4. You can also see for each input, how far that input was from the various centroids. The cluster assignment is made based on the shortest distance between the input and cluster centroids. From the bar graph you can see the frequency distribution of all the inputs across the 5 clusters.
Congratulations! You have trained and evaluated your first clustering algorithm. You can continue to experiment in the environment but are free to close the lab environment tab and return to the Udacity portal to continue with the lesson.