Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
ccd3dbe
Commiting part of interpretability script
johnwu0604 Oct 27, 2019
39142b5
updating mlops readme
Oct 28, 2019
fd29c84
Upadted interpretability notebook
Oct 28, 2019
3bd2ae7
Merge pull request #44 from microsoft/john-branch
johnwu0604 Oct 28, 2019
1d9c35d
MInor update
Oct 28, 2019
92c19bd
Merge pull request #45 from microsoft/maxluk3
Oct 28, 2019
b1a8698
Minor clean up in training
Oct 28, 2019
e1e5f9d
Merge pull request #46 from microsoft/maxluk3
Oct 28, 2019
3d62b85
Minor updates training
Oct 28, 2019
7d61f8c
Merge pull request #47 from microsoft/maxluk3
Oct 28, 2019
f122e1e
Training picture
maxluk Oct 28, 2019
dc70492
Merge branch 'maxluk3' of https://github.com/microsoft/bert-stack-ove…
maxluk Oct 28, 2019
f4ce091
Updating AKS Configuration
vaidya-s Oct 28, 2019
6c4444d
Add picture
Oct 28, 2019
d6b6336
Merge pull request #48 from microsoft/maxluk3
Oct 28, 2019
6f32748
Added updates to interept
Oct 28, 2019
f3664c2
Merge pull request #49 from microsoft/john-branch
johnwu0604 Oct 28, 2019
d0ba6f5
Using SAS token instead of DS_KEY, adding QA instructions (#50)
Oct 28, 2019
23ff489
Finished interpretability workshop
Oct 29, 2019
7e9cd58
Merge branch 'master' of https://github.com/microsoft/bert-stack-over…
Oct 29, 2019
ebd56f9
Removing old file
Oct 29, 2019
0d2111e
Cleared cells
Oct 29, 2019
4edfc80
Updating the evaluation model step to avoid errors (#51)
Oct 29, 2019
b95ae67
Updated storage sas token
Nov 5, 2019
69016d7
Clearing kernel
Nov 5, 2019
677c69a
Merge pull request #52 from microsoft/john
johnwu0604 Nov 5, 2019
3d44dd1
Fixed dataprep version issue
Nov 5, 2019
9f6c865
Fixed dataprep version issue
Nov 5, 2019
e23ba7d
Merge pull request #53 from microsoft/john
johnwu0604 Nov 5, 2019
0fb848e
Updated SaS token for storage
AbeOmor Nov 8, 2019
c4738b7
Changed datastore to personal johndatasets storage account and update…
johnwu0604 Mar 9, 2020
c8fca22
Merge pull request #58 from microsoft/john
johnwu0604 Mar 17, 2020
94df6de
Fixed hyperlink to debugging instructions
johnwu0604 Oct 13, 2020
0cb87b3
Merge pull request #61 from microsoft/hyperlink-fix
johnwu0604 Oct 13, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 21 additions & 18 deletions 1-Training/AzureServiceClassifier_Training.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@
"- Introduction to Transformer and BERT deep learning models\n",
"- Introduction to Azure Machine Learning service\n",
"- Preparing raw data for training using Apache Spark\n",
"- Registering cleanup training data as a Dataset\n",
"- Registering cleaned up training data as a Dataset\n",
"- Debugging the model in Tensorflow 2.0 Eager Mode\n",
"- Training the model on GPU cluster\n",
"- Monitoring training progress with built-in Tensorboard dashboard \n",
Expand Down Expand Up @@ -152,7 +152,10 @@
"<img src=\"http://jalammar.github.io/images/bert-classifier.png\" alt=\"Drawing\" style=\"width: 700px;\"/>\n",
"\n",
"_Taken from [5](http://jalammar.github.io/illustrated-bert/)_\n",
" "
"\n",
"The end-to-end training process of the stackoverflow question tagging model looks like this:\n",
"\n",
"![](images/model-training-e2e.png)\n"
]
},
{
Expand Down Expand Up @@ -314,9 +317,9 @@
"from azureml.core import Datastore, Dataset\n",
"\n",
"datastore_name = 'tfworld'\n",
"container_name = 'azureml-blobstore-7c6bdd88-21fa-453a-9c80-16998f02935f'\n",
"account_name = 'tfworld6818510241'\n",
"sas_token = '?sv=2019-02-02&ss=bfqt&srt=sco&sp=rl&se=2019-11-08T05:12:15Z&st=2019-10-23T20:12:15Z&spr=https&sig=eDqnc51TkqiIklpQfloT5vcU70pgzDuKb5PAGTvCdx4%3D'\n",
"container_name = 'azure-service-classifier'\n",
"account_name = 'johndatasets'\n",
"sas_token = '?sv=2019-02-02&ss=bfqt&srt=sco&sp=rl&se=2021-06-02T03:40:25Z&st=2020-03-09T19:40:25Z&spr=https&sig=bUwK7AJUj2c%2Fr90Qf8O1sojF0w6wRFgL2c9zMVCWNPA%3D'\n",
"\n",
"datastore = Datastore.register_azure_blob_container(workspace=workspace, \n",
" datastore_name=datastore_name, \n",
Expand Down Expand Up @@ -404,7 +407,7 @@
"\n",
"In addition to UI we can register datasets using SDK. In this workshop we will register second type of Datasets using code - File Dataset. File Dataset allows specific folder in our datastore that contains our data files to be registered as a Dataset.\n",
"\n",
"There is a folder within our datastore called **azure-service-data** that contains all our training and testing data. We will register this as a dataset."
"There is a folder within our datastore called **data** that contains all our training and testing data. We will register this as a dataset."
]
},
{
Expand All @@ -415,7 +418,7 @@
},
"outputs": [],
"source": [
"azure_dataset = Dataset.File.from_files(path=(datastore, 'azure-service-classifier/data'))\n",
"azure_dataset = Dataset.File.from_files(path=(datastore, 'data'))\n",
"\n",
"azure_dataset = azure_dataset.register(workspace=workspace,\n",
" name='Azure Services Dataset',\n",
Expand Down Expand Up @@ -474,7 +477,7 @@
"metadata": {},
"outputs": [],
"source": [
"%%pip install transformers==2.0.0"
"%pip install transformers==2.0.0"
]
},
{
Expand Down Expand Up @@ -534,7 +537,7 @@
"\n",
"* **ACTION**: Install [Microsoft VS Code](https://code.visualstudio.com/) on your local machine.\n",
"\n",
"* **ACTION**: Follow this [configuration guide](https://github.com/danielsc/azureml-debug-training/blob/master/Setting%20up%20VSCode%20Remote%20on%20an%20AzureML%20Notebook%20VM.md) to setup VS Code Remote connection to Notebook VM.\n",
"* **ACTION**: Follow this [configuration guide](https://github.com/danielsc/azureml-debug-training/blob/master/Setting%20up%20VSCode%20Remote%20on%20an%20AzureML%20Compute%20Instance.md) to setup VS Code Remote connection to Notebook VM.\n",
"\n",
"#### Debug training code using step-by-step debugger\n",
"\n",
Expand Down Expand Up @@ -610,7 +613,7 @@
" },\n",
" framework_version='2.0',\n",
" use_gpu=True,\n",
" pip_packages=['transformers==2.0.0', 'azureml-dataprep[fuse,pandas]==1.1.22'])"
" pip_packages=['transformers==2.0.0', 'azureml-dataprep[fuse,pandas]==1.3.0'])"
]
},
{
Expand Down Expand Up @@ -757,7 +760,7 @@
" },\n",
" framework_version='2.0',\n",
" use_gpu=True,\n",
" pip_packages=['transformers==2.0.0', 'azureml-dataprep[fuse,pandas]==1.1.22'])\n",
" pip_packages=['transformers==2.0.0', 'azureml-dataprep[fuse,pandas]==1.3.0'])\n",
"\n",
"run2 = experiment.submit(estimator2)"
]
Expand Down Expand Up @@ -865,7 +868,7 @@
"run2.download_files(prefix='outputs/model')\n",
"\n",
"# If you haven't finished training the model then just download pre-made model from datastore\n",
"datastore.download('./',prefix=\"azure-service-classifier/model\")"
"datastore.download('./',prefix=\"model\")"
]
},
{
Expand Down Expand Up @@ -906,7 +909,7 @@
" \n",
"labels = ['azure-web-app-service', 'azure-storage', 'azure-devops', 'azure-virtual-machine', 'azure-functions']\n",
"# Load model and tokenizer\n",
"loaded_model = TFBertForMultiClassification.from_pretrained('azure-service-classifier/model', num_labels=len(labels))\n",
"loaded_model = TFBertForMultiClassification.from_pretrained('model', num_labels=len(labels))\n",
"tokenizer = BertTokenizer.from_pretrained('bert-base-cased')\n",
"print(\"Model loaded from disk.\")"
]
Expand Down Expand Up @@ -1023,7 +1026,7 @@
" node_count=1,\n",
" distributed_training=Mpi(process_count_per_node=2),\n",
" use_gpu=True,\n",
" pip_packages=['transformers==2.0.0', 'azureml-dataprep[fuse,pandas]==1.1.22'])\n",
" pip_packages=['transformers==2.0.0', 'azureml-dataprep[fuse,pandas]==1.3.0'])\n",
"\n",
"run3 = experiment.submit(estimator3)"
]
Expand Down Expand Up @@ -1144,7 +1147,7 @@
" },\n",
" framework_version='2.0',\n",
" use_gpu=True,\n",
" pip_packages=['transformers==2.0.0', 'azureml-dataprep[fuse,pandas]==1.1.22'])"
" pip_packages=['transformers==2.0.0', 'azureml-dataprep[fuse,pandas]==1.3.0'])"
]
},
{
Expand Down Expand Up @@ -1262,9 +1265,9 @@
"metadata": {
"file_extension": ".py",
"kernelspec": {
"display_name": "Python 3.6 - AzureML",
"display_name": "Python 3",
"language": "python",
"name": "python3-azureml"
"name": "python3"
},
"language_info": {
"codemirror_mode": {
Expand All @@ -1276,7 +1279,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.2"
"version": "3.7.3"
},
"mimetype": "text/x-python",
"name": "python",
Expand Down
Binary file removed 1-Training/databricks/stackoverflow-data-prep.dbc
Binary file not shown.
42 changes: 0 additions & 42 deletions 1-Training/databricks/stackoverflow-data-prep.html

This file was deleted.

Binary file added 1-Training/images/model-training-e2e.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file removed 1-Training/spark/stackoverflow-data-prep.dbc
Binary file not shown.
42 changes: 0 additions & 42 deletions 1-Training/spark/stackoverflow-data-prep.html

This file was deleted.

2 changes: 1 addition & 1 deletion 1-Training/train_eager.py
Original file line number Diff line number Diff line change
Expand Up @@ -135,8 +135,8 @@ def main(_):
optimizer = tf.keras.optimizers.Adam(learning_rate=FLAGS.learning_rate, epsilon=1e-08, clipnorm=1.0)
loss = tf.keras.losses.SparseCategoricalCrossentropy()
metric = tf.keras.metrics.SparseCategoricalAccuracy('accuracy')
#model.compile(optimizer=optimizer, loss=loss, metrics=[metric])

# Train and evaluate model
for item, label in train_dataset:
with tf.GradientTape() as tape:
prediction, = model(item, training=True)
Expand Down
Loading