Skip to content

Onboarding SageWorks to AWS

Brian Wylie edited this page Dec 29, 2023 · 46 revisions

Welcome to SageWorks

SageWorks pushes and pulls metadata from AWS Account Services (S3, Data Catalog, Feature Store, etc)

Please join our Discord for questions and issues, we provide free support and setup.

Two main options when using SageWorks

  1. Spin up a new AWS Account for the SageWorks Stacks (Make a New Account)
  2. Deploy SageWorks Stacks into your existing AWS Account

Either of these options are fully supported, but we highly suggest a NEW account as it gives the following benefits:

  • AWS Data Isolation: Data Scientists will feel empowered to play in the sandbox without impacting production services.
  • AWS Cost Accounting: Monitor and Track all those new ML Pipelines that your team creates with SageWorks :)

Setting up Users and Groups

If your AWS Account already has users and groups set up you can skip this but here's our recommendations on setting up SSO Users and Groups

Onboarding SageWorks to your AWS Account

Pulling down the SageWorks Repo

git clone https://github.com/SuperCowPowers/sageworks.git

SageWorks uses AWS Python CDK for Deployments into AWS

If you don't have AWS cdk already installed you can do these steps:

Mac

brew install node 
npm install -g aws-cdk

Linux

sudo apt install nodejs
sudo npm install -g aws-cdk

For more information on Linux installs see Digital Ocean NodeJS

Deploying the SageWorks Stack into your AWS Account

Note: Activate your AWS Account that's used for SageWorks deployment. Note2: For this one time install you should use an Admin Account (or an account that had permissions to create/update AWS Stacks) Note3: The bucket name below MUST BE globally unique (we often use <company_name>-sageworks)

cd sageworks/aws_setup/sageworks_core
export AWS_PROFLE=<aws_admin_account>
export SAGEWORKS_BUCKET=<name of your S3 bucket>
(optional) export SAGEWORKS_SSO_GROUP=DataScientist (or whatever SSO group)
pip install -r requirements.txt
cdk bootstrap
cdk deploy

AWS Account Setup Check

After setting up SageWorks config/AWS Account you can run this test/checking script. If the results ends with INFO AWS Account Clamp: AOK! you're in good shape. If not feel free to contact us on Discord and we'll get it straightened out for you :)

pip install sageworks
cd sageworks/aws_setup
python aws_account_check.py
<lot of print outs for various checks>
2023-04-12 11:17:09 (aws_account_check.py:48) INFO AWS Account Clamp: AOK!

Building our first ML Pipeline

Okay, now the more significant testing. We're literally going to build an entire AWS ML Pipeline. The script build_ml_pipeline.py uses the SageWorks API to quickly and easily build an AWS Modeling Pipeline.

  • DataLoader(abalone.csv) --> DataSource
  • DataToFeatureSet Transform --> FeatureSet
  • FeatureSetToModel Transform --> Model
  • ModelToEndpoint Transform --> Endpoint

This script will take a LONG TiME to run, most of the time is waiting on AWS to finalize FeatureGroup population.

❯ python build_ml_pipeline.py
<lot of building an ML pipeline outputs>

After the script completes you will see that it's built out an AWS ML Pipeline and testing artifacts.

How to Start the SageWorks Dashboard (Locally)

Note: Right now you must run the dashboard locally, an official AWS Deployment is in the works (see: https://github.com/SuperCowPowers/sageworks/issues/197).

cd sageworks/application/aws_dashboard
python aws_dashboard.py

Note: Open browser to http://localhost:8080

Screenshot 2023-04-30 at 12 23 12 PM

Congratulations: SageWorks is now deployed to your AWS Account

If you ran into any issues with this procedure please contact us via Discord or email [email protected] and the SCP team will provide free setup and support for new SageWorks users.