Skip to content

Commit

Permalink
Updates to docs
Browse files Browse the repository at this point in the history
  • Loading branch information
brightsparc committed May 6, 2021
1 parent ded8eed commit 526bca1
Show file tree
Hide file tree
Showing 11 changed files with 178 additions and 225 deletions.
221 changes: 72 additions & 149 deletions README.md

Large diffs are not rendered by default.

25 changes: 25 additions & 0 deletions docs/API_CONFIGURATION.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# API and Testing infrastructure Configuration

The API and Testing infrastructure stack reads configuration from context values in `cdk.json`. These values can also be override by passing arguments to the cdk deploy command eg:

```
cdk deploy ab-testing-api -c stage_name=dev -c endpoint_prefix=sagemaker-ab-testing-pipeline
```

Following is a list of the context parameters and their defaults.

| Property | Description | Default |
|---------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------|
| `api_name` | The API Gateway Name | "ab-testing" |
| `stage_name` | The stage namespace for resource and API Gateway path | "dev" |
| `endpoint_prefix` | A prefix to filter Amazon SageMaker endpoints the API can invoke. | "sagemaker-" |
| `api_lambda_memory` | The [lambda memory](https://docs.aws.amazon.com/lambda/latest/dg/configuration-memory.html) allocation for API endpoint. | 768 |
| `api_lambda_timeout` | The lambda timeout for the API endpoint. | 10 |
| `metrics_lambda_memory` | The [lambda memory](https://docs.aws.amazon.com/lambda/latest/dg/configuration-memory.html) allocated for metrics processing Lambda | 768 |
| `metrics_lambda_timeout` | The lambda timeout for the processing lambda. | 10 |
| `dynamodb_read_capacity` | The [Read Capacity](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.ReadWriteCapacityMode.html) for the DynamoDB tables | 5 |
| `dynamodb_write_capacity` | The [Write Capacity](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.ReadWriteCapacityMode.html) for the DynamoDB tables | 5 |
| `delivery_sync` | When`true` metrics will be written directly to DynamoDB, instead of the Amazon Kinesis for processing. | false |
| `firehose_interval` | The [buffering](https://docs.aws.amazon.com/firehose/latest/dev/create-configure.html) interval in seconds which firehose will flush events to S3. | 60 |
| `firehose_mb_size` | The buffering size in MB before the firehose will flush its events to S3. | 1 |
| `log_level` | Logging level for AWS Lambda functions | "INFO" |
59 changes: 0 additions & 59 deletions docs/CONTRIBUTING.md

This file was deleted.

15 changes: 15 additions & 0 deletions docs/CUSTOM_TEMPLATE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# Customize the Deployment Pipeline

The [ab-testing-pipeline.yml](../ab-testing-pipeline.yml) is included as part of this distribution, and doesn't require updating unless you change the `infra/pipeline_stack.py` implementation.

To generate a new pipeline you can run the following command.

```
cdk synth ab-testing-pipeline --path-metadata=false > ab-testing-pipeline.yml
```

This template will output a new Policy to attach to the `AmazonSageMakerServiceCatalogProductsUseRole` service role. This policy is not required as this managed role already has these permissions. In order for this to run within Amazon SageMaker Studio, you will need to remove this policy. I recommend you diff the original to see where changes need to be made. If there are additional roles or policies the project might not be validate when used inside of Amazon SageMaker Studio.

```
git diff ab-testing-pipeline.yml
```
File renamed without changes.
4 changes: 2 additions & 2 deletions OPERATIONS.md → docs/OPERATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ In addition to the above, you must specify the `champion` and `challenger` model

These will be loaded from the two Model Package Groups in the registry that include the project name and suffixed with `champion` or `challenger` for example project name `ab-testing-pipeline` these model package groups in the sample notebook:

![\[Model Registry\]](docs/ab-testing-pipeline-model-registry.png)
![\[Model Registry\]](ab-testing-pipeline-model-registry.png)

**Latest Approved Versions**

Expand Down Expand Up @@ -238,4 +238,4 @@ The API Lambda functions are instrumented with [AWS X-Ray](https://aws.amazon.co
* Amazon SageMaker
* Kinesis Firehose

![\[AB Testing Pipeline X-Ray\]](docs/ab-testing-pipeline-xray.png)
![\[AB Testing Pipeline X-Ray\]](ab-testing-pipeline-xray.png)
42 changes: 42 additions & 0 deletions docs/SERVICE_CATALOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# AWS Service Catalog Provisioning

If you have an existing AWS Service Catalog Portfolio, or would like to create the Product manually, follow these steps:

1. Sign in to the console with the data science account.
2. On the AWS Service Catalog console, under **Administration**, choose **Portfolios**.
3. Choose **Create a new portfolio**.
4. Name the portfolio `SageMaker Organization Templates`.
5. Download the [AB testing template](../ab-testing-pipeline.yml) to your computer.
6. Choose the new portfolio.
7. Choose **Upload a new product.**
8. For **Product name**¸ enter `A/B Testing Deployment Pipeline`.
9. For **Description**, enter `Amazon SageMaker Project for A/B Testing models`.
10. For **Owner**, enter your name.
11. Under **Version details**, for **Method**, choose **Use a template file**.
12. Choose **Upload a template**.
13. Upload the template you downloaded.
14. For **Version title**, enter `1.0`.

The remaining parameters are optional.

15. Choose **Review**.
16. Review your settings and choose **Create product**.
17. Choose **Refresh** to list the new product.
18. Choose the product you just created.
19. On the **Tags** tab, add the following tag to the product:
- **Key**`sagemaker:studio-visibility`
- **Value**`True`

Finally we need to add launch constraint and role permissions.

20. On the **Constraints** tab, choose Create constraint.
21. For **Product**, choose **AB Testing Pipeline** (the product you just created).
22. For **Constraint type**, choose **Launch**.
23. Under **Launch Constraint**, for **Method**, choose **Select IAM role**.
24. Choose **AmazonSageMakerServiceCatalogProductsLaunchRole**.
25. Choose **Create**.
26. On the **Groups, roles, and users** tab, choose **Add groups, roles, users**.
27. On the **Roles** tab, select the role you used when configuring your SageMaker Studio domain.
28. Choose **Add access**.

If you don’t remember which role you selected, in your data science account, go to the SageMaker console and choose **Amazon SageMaker Studio**. In the Studio **Summary** section, locate the attribute **Execution role**. Search for the name of this role in the previous step.
Binary file modified docs/ab-testing-pipeline-architecture.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/ab-testing-pipeline-deployment.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
7 changes: 4 additions & 3 deletions lambda/api/algorithm.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
import random
import math

# Consider modify these algorithms to be random operators that could be used with PlanOut
# see: https://facebook.github.io/planout/docs/random-operators.html
# Contains pure python class implementations for WeightedSampling, EpsilonGreedy, UCB1 and ThompsonSampling.
# For maths and theory behind these algorithms see the following resource:
# https://lilianweng.github.io/lil-log/2018/01/23/the-multi-armed-bandit-problem-and-its-solutions.html#ucb1


class AlgorithmBase:
Expand All @@ -13,7 +14,7 @@ class AlgorithmBase:
3. Thompson Smampling
"""

def __init__(self, variant_metrics):
def __init__(self, variant_metrics: list):
pass

@staticmethod
Expand Down
30 changes: 18 additions & 12 deletions notebook/mab-reviews-helpfulness.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,15 @@
"\n",
"Prior to running this notebook, you will have:\n",
"1. Created the A/B Testing API and Infrastructure\n",
"2. Created an MLOPs A/B Testing Deployment project \n"
"2. Created an MLOPs A/B Testing Deployment project \n",
"\n",
"This notebook will take you thorugh a number of steps:\n",
"1. [Data Prep](#Data-Prep)\n",
"2. [Run SageMaker Pipeline](#Run-SageMaker-Pipeline)\n",
"3. [Run Tuning Job](#Run-Tuning-Job)\n",
"4. [Test Endpoint](#Test-Endpoint)\n",
"5. [Run A/B Testing Simulation](#Run-A/B-Testing-Simulation)\n",
"6. [Calling the winner](#Calling-the-winner)"
]
},
{
Expand All @@ -24,7 +32,7 @@
"outputs": [],
"source": [
"%%capture\n",
"!pip install -U sagemaker seaborn\n",
"!pip install -U sagemaker pandas seaborn\n",
"!pip install spacy\n",
"!python -m spacy download en_core_web_sm"
]
Expand Down Expand Up @@ -64,7 +72,7 @@
"import pandas as pd\n",
"\n",
"# Load a sample of the rows\n",
"df_reviews = pd.read_csv('reviews.tsv.gz', compression='gzip', error_bad_lines=False, #nrows=100000,\n",
"df_reviews = pd.read_csv('reviews.tsv.gz', compression='gzip', error_bad_lines=False, #nrows=1000,\n",
" sep='\\t', usecols=['product_id', 'product_title',\n",
" 'review_headline', 'review_body', 'star_rating',\n",
" 'helpful_votes', 'total_votes']).dropna()\n",
Expand Down Expand Up @@ -633,9 +641,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Run Pipeline\n",
"### Start Pipeline\n",
"\n",
"And then, run it"
"And then, start pipeline and wait for it to complete."
]
},
{
Expand Down Expand Up @@ -734,7 +742,7 @@
"pd.set_option('display.max_colwidth', 100) # Increase column width to show full copmontent name\n",
"cols = ['TrialComponentName', 'SageMaker.InstanceType', \n",
" 'train:accuracy - Last', 'validation:accuracy - Last'] # return the last accuracy for training and validation\n",
"analytics_df[analytics_df.columns & cols].head(2)"
"analytics_df[cols].head(2)"
]
},
{
Expand Down Expand Up @@ -1104,7 +1112,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Test endpoint\n",
"## Test Endpoint\n",
"\n",
"### Invoke endpoint\n",
"\n",
Expand Down Expand Up @@ -1336,7 +1344,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## A/B Simulation\n",
"## Run A/B Testing Simulation\n",
"\n",
"### Define API Methods\n",
"\n",
Expand Down Expand Up @@ -1638,7 +1646,7 @@
" v = vg.max()\n",
" alpha, beta = 1+v['reward'], 1+v['invocation']-v['reward']\n",
" axs[i].plot(x, stats.beta.pdf(x, alpha, beta), label=variant_name, color=colors[j])\n",
" axs[i].set_title(f'Trial #{int(b.right)}')\n",
" axs[i].set_title(f'Batch #{int(b.right)}')\n",
" axs[i].grid(False)\n",
"\n",
" handles, labels = axs[0].get_legend_handles_labels()\n",
Expand All @@ -1657,8 +1665,6 @@
"source": [
"## Calling the winner\n",
"\n",
"### Evaluate if statistically significant\n",
"\n",
"Assuming a normal distribution, let's evaluate a confidence score for the best performing variant."
]
},
Expand Down Expand Up @@ -1824,4 +1830,4 @@
},
"nbformat": 4,
"nbformat_minor": 4
}
}

0 comments on commit 526bca1

Please sign in to comment.