Updates to docs

aws-samples · May 6, 2021 · 526bca1 · 526bca1
1 parent ded8eed
commit 526bca1
Show file tree

Hide file tree

Showing 11 changed files with 178 additions and 225 deletions.
diff --git a/README.md b/README.md
diff --git a/docs/API_CONFIGURATION.md b/docs/API_CONFIGURATION.md
@@ -0,0 +1,25 @@
+# API and Testing infrastructure Configuration
+
+The API and Testing infrastructure stack reads configuration from context values in `cdk.json`.  These values can also be override by passing arguments to the cdk deploy command eg:
+
+```
+cdk deploy ab-testing-api -c stage_name=dev -c endpoint_prefix=sagemaker-ab-testing-pipeline
+```
+
+Following is a list of the context parameters and their defaults.
+
+| Property                  | Description                                                                                                                                                     | Default                            |
+|---------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------|
+| `api_name`                | The API Gateway Name                                                                                                                                            | "ab-testing"                       |
+| `stage_name`              | The stage namespace for resource and API Gateway path                                                                                                           | "dev"                              |
+| `endpoint_prefix`         | A prefix to filter Amazon SageMaker endpoints the API can invoke.                                                                                               | "sagemaker-"                       |
+| `api_lambda_memory`       | The [lambda memory](https://docs.aws.amazon.com/lambda/latest/dg/configuration-memory.html) allocation for API endpoint.                                        | 768                                |
+| `api_lambda_timeout`      | The lambda timeout for the API endpoint.                                                                                                                        | 10                                 |
+| `metrics_lambda_memory`   | The [lambda memory](https://docs.aws.amazon.com/lambda/latest/dg/configuration-memory.html) allocated for metrics processing Lambda                             | 768                                |
+| `metrics_lambda_timeout`  | The lambda timeout for the processing lambda.                                                                                                                   | 10                                 |
+| `dynamodb_read_capacity`  | The [Read Capacity](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.ReadWriteCapacityMode.html) for the DynamoDB tables             | 5                                  |
+| `dynamodb_write_capacity` | The [Write Capacity](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.ReadWriteCapacityMode.html) for the DynamoDB tables            | 5                                  |
+| `delivery_sync`           | When`true` metrics will be written directly to DynamoDB, instead of the Amazon Kinesis for processing.                                                          | false                              |
+| `firehose_interval`       | The [buffering](https://docs.aws.amazon.com/firehose/latest/dev/create-configure.html) interval in seconds which firehose will flush events to S3.              | 60                                 |
+| `firehose_mb_size`        | The buffering size in MB before the firehose will flush its events to S3.                                                                                       | 1                                  |
+| `log_level`               | Logging level for AWS Lambda functions                                                                                                                          | "INFO"                             |
diff --git a/docs/CONTRIBUTING.md b/docs/CONTRIBUTING.md
diff --git a/docs/CUSTOM_TEMPLATE.md b/docs/CUSTOM_TEMPLATE.md
@@ -0,0 +1,15 @@
+# Customize the Deployment Pipeline
+
+The [ab-testing-pipeline.yml](../ab-testing-pipeline.yml) is included as part of this distribution, and doesn't require updating unless you change the `infra/pipeline_stack.py` implementation.
+
+To generate a new pipeline you can run the following command.
+
+```
+cdk synth ab-testing-pipeline --path-metadata=false > ab-testing-pipeline.yml
+```
+
+This template will output a new Policy to attach to the `AmazonSageMakerServiceCatalogProductsUseRole` service role.  This policy is not required as this managed role already has these permissions.    In order for this to run within Amazon SageMaker Studio, you will need to remove this policy.  I recommend you diff the original to see where changes need to be made.  If there are additional roles or policies the project might not be validate when used inside of Amazon SageMaker Studio.
+
+```
+git diff ab-testing-pipeline.yml
+```
diff --git a/FAQ.md → docs/FAQ.md b/FAQ.md → docs/FAQ.md
diff --git a/OPERATIONS.md → docs/OPERATIONS.md b/OPERATIONS.md → docs/OPERATIONS.md
@@ -49,7 +49,7 @@ In addition to the above, you must specify the `champion` and `challenger` model
 
 These will be loaded from the two Model Package Groups in the registry that include the project name and suffixed with `champion` or `challenger` for example project name `ab-testing-pipeline` these model package groups in the sample notebook:
 
-![\[Model Registry\]](docs/ab-testing-pipeline-model-registry.png)
+![\[Model Registry\]](ab-testing-pipeline-model-registry.png)
 
 **Latest Approved Versions**
 
@@ -238,4 +238,4 @@ The API Lambda functions are instrumented with [AWS X-Ray](https://aws.amazon.co
 * Amazon SageMaker
 * Kinesis Firehose
 
-![\[AB Testing Pipeline X-Ray\]](docs/ab-testing-pipeline-xray.png)
+![\[AB Testing Pipeline X-Ray\]](ab-testing-pipeline-xray.png)
diff --git a/docs/SERVICE_CATALOG.md b/docs/SERVICE_CATALOG.md
@@ -0,0 +1,42 @@
+# AWS Service Catalog Provisioning
+
+If you have an existing AWS Service Catalog Portfolio, or would like to create the Product manually, follow these steps:
+
+1. Sign in to the console with the data science account.
+2. On the AWS Service Catalog console, under **Administration**, choose **Portfolios**.
+3. Choose **Create a new portfolio**.
+4. Name the portfolio `SageMaker Organization Templates`.
+5. Download the [AB testing template](../ab-testing-pipeline.yml) to your computer.
+6. Choose the new portfolio.
+7. Choose **Upload a new product.**
+8. For **Product name**¸ enter `A/B Testing Deployment Pipeline`.
+9. For **Description**, enter `Amazon SageMaker Project for A/B Testing models`.
+10. For **Owner**, enter your name.
+11. Under **Version details**, for **Method**, choose **Use a template file**.
+12. Choose **Upload a template**.
+13. Upload the template you downloaded.
+14. For **Version title**, enter `1.0`.
+
+The remaining parameters are optional.
+
+15. Choose **Review**.
+16. Review your settings and choose **Create product**.
+17. Choose **Refresh** to list the new product.
+18. Choose the product you just created.
+19. On the **Tags** tab, add the following tag to the product:
+  - **Key** – `sagemaker:studio-visibility`
+  - **Value** – `True`
+
+Finally we need to add launch constraint and role permissions.
+
+20. On the **Constraints** tab, choose Create constraint.
+21. For **Product**, choose **AB Testing Pipeline** (the product you just created).
+22. For **Constraint type**, choose **Launch**.
+23. Under **Launch Constraint**, for **Method**, choose **Select IAM role**.
+24. Choose **AmazonSageMakerServiceCatalogProductsLaunchRole**.
+25. Choose **Create**.
+26. On the **Groups, roles, and users** tab, choose **Add groups, roles, users**.
+27. On the **Roles** tab, select the role you used when configuring your SageMaker Studio domain.
+28. Choose **Add access**.
+
+If you don’t remember which role you selected, in your data science account, go to the SageMaker console and choose **Amazon SageMaker Studio**. In the Studio **Summary** section, locate the attribute **Execution role**. Search for the name of this role in the previous step.
diff --git a/docs/ab-testing-pipeline-architecture.png b/docs/ab-testing-pipeline-architecture.png
diff --git a/docs/ab-testing-pipeline-deployment.png b/docs/ab-testing-pipeline-deployment.png
diff --git a/lambda/api/algorithm.py b/lambda/api/algorithm.py
@@ -1,8 +1,9 @@
 import random
 import math
 
-# Consider modify these algorithms to be random operators that could be used with PlanOut
-# see: https://facebook.github.io/planout/docs/random-operators.html
+# Contains pure python class implementations for WeightedSampling, EpsilonGreedy, UCB1 and ThompsonSampling.
+# For maths and theory behind these algorithms see the following resource:
+# https://lilianweng.github.io/lil-log/2018/01/23/the-multi-armed-bandit-problem-and-its-solutions.html#ucb1
 
 
 class AlgorithmBase:
@@ -13,7 +14,7 @@ class AlgorithmBase:
     3. Thompson Smampling
     """
 
-    def __init__(self, variant_metrics):
+    def __init__(self, variant_metrics: list):
         pass
 
     @staticmethod

diff --git a/notebook/mab-reviews-helpfulness.ipynb b/notebook/mab-reviews-helpfulness.ipynb
@@ -14,7 +14,15 @@
     "\n",
     "Prior to running this notebook, you will have:\n",
     "1. Created the A/B Testing API and Infrastructure\n",
-    "2. Created an MLOPs A/B Testing Deployment project \n"
+    "2. Created an MLOPs A/B Testing Deployment project \n",
+    "\n",
+    "This notebook will take you thorugh a number of steps:\n",
+    "1. [Data Prep](#Data-Prep)\n",
+    "2. [Run SageMaker Pipeline](#Run-SageMaker-Pipeline)\n",
+    "3. [Run Tuning Job](#Run-Tuning-Job)\n",
+    "4. [Test Endpoint](#Test-Endpoint)\n",
+    "5. [Run A/B Testing Simulation](#Run-A/B-Testing-Simulation)\n",
+    "6. [Calling the winner](#Calling-the-winner)"
    ]
   },
   {
@@ -24,7 +32,7 @@
    "outputs": [],
    "source": [
     "%%capture\n",
-    "!pip install -U sagemaker seaborn\n",
+    "!pip install -U sagemaker pandas seaborn\n",
     "!pip install spacy\n",
     "!python -m spacy download en_core_web_sm"
    ]
@@ -64,7 +72,7 @@
     "import pandas as pd\n",
     "\n",
     "# Load a sample of the rows\n",
-    "df_reviews = pd.read_csv('reviews.tsv.gz', compression='gzip', error_bad_lines=False, #nrows=100000,\n",
+    "df_reviews = pd.read_csv('reviews.tsv.gz', compression='gzip', error_bad_lines=False, #nrows=1000,\n",
     "                         sep='\\t', usecols=['product_id', 'product_title',\n",
     "                                            'review_headline', 'review_body', 'star_rating',\n",
     "                                            'helpful_votes', 'total_votes']).dropna()\n",
@@ -633,9 +641,9 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### Run Pipeline\n",
+    "### Start Pipeline\n",
     "\n",
-    "And then, run it"
+    "And then, start pipeline and wait for it to complete."
    ]
   },
   {
@@ -734,7 +742,7 @@
     "pd.set_option('display.max_colwidth', 100) # Increase column width to show full copmontent name\n",
     "cols = ['TrialComponentName', 'SageMaker.InstanceType', \n",
     "        'train:accuracy - Last', 'validation:accuracy - Last'] # return the last accuracy for training and validation\n",
-    "analytics_df[analytics_df.columns & cols].head(2)"
+    "analytics_df[cols].head(2)"
    ]
   },
   {
@@ -1104,7 +1112,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Test endpoint\n",
+    "## Test Endpoint\n",
     "\n",
     "### Invoke endpoint\n",
     "\n",
@@ -1336,7 +1344,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## A/B Simulation\n",
+    "## Run A/B Testing Simulation\n",
     "\n",
     "### Define API Methods\n",
     "\n",
@@ -1638,7 +1646,7 @@
     "            v = vg.max()\n",
     "            alpha, beta = 1+v['reward'], 1+v['invocation']-v['reward']\n",
     "            axs[i].plot(x, stats.beta.pdf(x, alpha, beta), label=variant_name, color=colors[j])\n",
-    "        axs[i].set_title(f'Trial #{int(b.right)}')\n",
+    "        axs[i].set_title(f'Batch #{int(b.right)}')\n",
     "        axs[i].grid(False)\n",
     "\n",
     "    handles, labels = axs[0].get_legend_handles_labels()\n",
@@ -1657,8 +1665,6 @@
    "source": [
     "## Calling the winner\n",
     "\n",
-    "### Evaluate if statistically significant\n",
-    "\n",
     "Assuming a normal distribution, let's evaluate a confidence score for the best performing variant."
    ]
   },
@@ -1824,4 +1830,4 @@
  },
  "nbformat": 4,
  "nbformat_minor": 4
-}
+}