Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CLOUD-769] Improve guide on AKS setup #451

Merged
merged 3 commits into from
Feb 27, 2025
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
156 changes: 110 additions & 46 deletions docs/setup_installation/azure/getting_started.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,78 +24,129 @@ To run all the commands on this page the user needs to have at least the followi

You will also need to have a role such as *Application Administrator* on the Azure Active Directory to be able to create the hopsworks.ai service principal.

## Step 1: Azure AKS Setup
## Step 1: Azure Kubernetes Service (AKS) Setup

### Step 1.1: Create an Azure Blob Storage Account

Create a storage account to host project data. Ensure that the storage account is in the same region as the AKS cluster for performance and cost reasons:

```bash
az storage account create --name $storage_account_name --resource-group $resource_group --location $region
az storage account create --name $STORAGE_ACCOUNT_NAME --resource-group $RESOURCE_GROUP --location $REGION
```

Also create a corresponding container:
Also, create the corresponding container:

```bash
az storage container create --account-name $storage_account_name --name $container_name
az storage container create --account-name $STORAGE_ACCOUNT_NAME --name $CONTAINER_NAME
```


### Step 1.2: Create an Azure Container Registry (ACR)

Create an ACR to store the images used by Hopsworks:

```bash
az acr create --resource-group $resource_group --name $registry_name --sku Basic --location $region
az acr create --resource-group $RESOURCE_GROUP --name $CONTAINER_REGISTRY_NAME --sku Basic --location $REGION

export ACR_ID=`az acr show --name $CONTAINER_REGISTRY_NAME --resource-group $RESOURCE_GROUP --query "id" --output tsv`
```

### Step 1.3: Create an AKS Kubernetes Cluster
### Step 1.3: Create a User-Assigned Managed Identity

Provision an AKS cluster with a number of nodes:
Create a user-assigned managed identity to grant AKS access to the storage account and container registry:

```bash
az aks create --resource-group $resource_group --name $cluster_name --enable-cluster-autoscaler --min-count 1 --max-count 4 --node-count 3 --node-vm-size Standard_D16_v4 --network-plugin azure --enable-managed-identity --generate-ssh-keys
az identity create --name $UA_IDENTITY_NAME --resource-group $RESOURCE_GROUP

export UA_IDENTITY_PRINCIPAL_ID=`az identity show --name $UA_IDENTITY_NAME --resource-group $RESOURCE_GROUP --query principalId --output tsv`
export UA_IDENTITY_CLIENT_ID=`az identity show --name $UA_IDENTITY_NAME --resource-group $RESOURCE_GROUP --query clientId --output tsv`
export UA_IDENTITY_RESOURCE_ID=`az identity show --name $UA_IDENTITY_NAME --resource-group $RESOURCE_GROUP --query id --output tsv`
```

### Step 1.4: Retrieve setup Identifiers
### Step 1.4: Grant permissions to the User-Assigned Managed Identity

Create a set of environment variables for use in later steps.
Create a custom role definition with the minimum permissions needed to read and write to the storage account:

```bash
export managed_id=`az aks show --resource-group $resource_group --name $cluster_name --query "identity.principalId" --output tsv`

export storage_id=`az storage account show --name $storage_account_name --resource-group $resource_group --query "id" --output tsv`

export acr_id=`az acr show --name $registry_name --resource-group $resource_group --query "id" --output tsv`
export STORAGE_ID=`az storage account show --name $STORAGE_ACCOUNT_NAME --resource-group $RESOURCE_GROUP --query "id" --output tsv`

az role definition create --role-definition '{
"Name": "hopsfs-storage-permissions",
"IsCustom": true,
"Description": "Allow HopsFS to access the storage container",
"Actions": [
"Microsoft.Storage/storageAccounts/blobServices/containers/write",
"Microsoft.Storage/storageAccounts/blobServices/containers/read",
"Microsoft.Storage/storageAccounts/blobServices/write",
"Microsoft.Storage/storageAccounts/blobServices/read",
"Microsoft.Storage/storageAccounts/listKeys/action"
],
"NotActions": [],
"DataActions": [
"Microsoft.Storage/storageAccounts/blobServices/containers/blobs/delete",
"Microsoft.Storage/storageAccounts/blobServices/containers/blobs/read",
"Microsoft.Storage/storageAccounts/blobServices/containers/blobs/move/action",
"Microsoft.Storage/storageAccounts/blobServices/containers/blobs/write"
],
"AssignableScopes": [
"'$STORAGE_ID'"
]
}'

sleep 30 # give Azure some time to persist the new role

az role assignment create --role hopsfs-storage-permissions --assignee $UA_IDENTITY_PRINCIPAL_ID --scope $STORAGE_ID
```

### Step 1.5: Assign Roles to Managed Identity
### Step 1.5: Create Service Principal for Hopsworks services

```bash
az role assignment create --assignee $managed_id --role "Storage Blob Data Contributor" --scope $storage_id
Create a service principal to grant Hopsworks applications with access to the container registry. For example, Hopsworks uses this service principal to push new Python environments created via the Hopsworks UI.

az role assignment create --assignee $managed_id --role AcrPull --scope $acr_id
az role assignment create --assignee $managed_id --role "AcrPush" --scope $acr_id
az role assignment create --assignee $managed_id --role "AcrDelete" --scope $acr_id
```bash
export SP_PASSWORD=`az ad sp create-for-rbac --name $SP_NAME --scopes $ACR_ID --role acrpush --years 1 --query "password" --output tsv`
export SP_USER_NAME=`az ad sp list --display-name $SP_NAME --query "[].appId" --output tsv`
```

### Step 1.6: Allow AKS cluster access to ACR repository.
### Step 1.6: Create an AKS Kubernetes Cluster

Provision an AKS cluster with a number of nodes:

```bash
az aks update --resource-group $resource_group --name $cluster_name --attach-acr $registry_name
az aks create --resource-group $RESOURCE_GROUP --name $KUBERNETES_CLUSTER_NAME --network-plugin azure \
--enable-cluster-autoscaler --min-count 1 --max-count 4 --node-count 3 --node-vm-size Standard_D8_v4 \
--attach-acr $CONTAINER_REGISTRY_NAME \
--assign-identity $UA_IDENTITY_RESOURCE_ID --assign-kubelet-identity $UA_IDENTITY_RESOURCE_ID \
--enable-managed-identity --generate-ssh-keys
```

## Step 2: Configure kubectl

```bash
az aks get-credentials --resource-group $resource_group --name $cluster_name --file ~/my-aks-kubeconfig.yaml
az aks get-credentials --resource-group $RESOURCE_GROUP --name $KUBERNETES_CLUSTER_NAME --file ~/my-aks-kubeconfig.yaml
export KUBECONFIG=~/my-aks-kubeconfig.yaml
kubectl config current-context
```

## Step 3: Setup Hopsworks for Deployment
## Step 3: Create Secret for the Service Principal

### Step 3.1: Create Hopsworks namespace

```bash
kubectl create namespace hopsworks
```

### Step 3.2: Create secret

### Step 3.1: Add the Hopsworks Helm repository
```bash
kubectl create secret docker-registry azregcred \
--namespace hopsworks \
--docker-server=$CONTAINER_REGISTRY_NAME.azurecr.io \
--docker-username=$SP_USER_NAME \
--docker-password=$SP_PASSWORD
```

## Step 4: Setup Hopsworks for Deployment

### Step 4.1: Add the Hopsworks Helm repository

To obtain access to the Hopsworks helm chart repository, please obtain
an evaluation/startup licence [here](https://www.hopsworks.ai/try).
Expand All @@ -108,34 +159,49 @@ helm repo add hopsworks $HOPSWORKS_REPO
helm repo update hopsworks
```

### Step 3.2: Create Hopsworks namespace

```bash
kubectl create namespace hopsworks
```

### Step 3.3: Create helm values file
### Step 4.2: Create helm values file

Below is a simplifield values.azure.yaml file to get started which can be updated for improved performance and further customisation.

```bash
```yaml
global:
_hopsworks:
storageClassName: null
cloudProvider: "AWS"
managedDockerRegistry:
cloudProvider: "AZURE"
managedDockerRegistery:
enabled: true
domain: "rchopsworksrepo.azurecr.io"
domain: "CONTAINER_REGISTRY_NAME.azurecr.io"
namespace: "hopsworks"

managedObjectStorage:
enabled: true
endpoint: "https://rchopsworksbucket.blob.core.windows.net"
credHelper:
enabled: false
secretName: ""

minio:
enabled: false

hopsworks:
variables:
docker_operations_managed_docker_secrets: &azregcred "azregcred"
docker_operations_image_pull_secrets: *azregcred
dockerRegistry:
preset:
usePullPush: false
secrets:
- *azregcred

hopsfs:
objectStorage:
enabled: true
provider: "AZURE"
azure:
storage:
account: "STORAGE_ACCOUNT_NAME"
container: "STORAGE_ACCOUNT_CONTAINER_NAME"
identityClientId: "UA_IDENTITY_CLIENT_ID"

```

## Step 4: Deploy Hopsworks
## Step 5: Deploy Hopsworks

Deploy Hopsworks in the created namespace.

Expand All @@ -157,9 +223,7 @@ Upon completion (circa 20 minutes), setup a load balancer to access Hopsworks:
kubectl expose deployment hopsworks --type=LoadBalancer --name=hopsworks-service --namespace <namespace>
```



## Step 5: Next steps
## Step 6: Next steps

Check out our other guides for how to get started with Hopsworks and the Feature Store:

Expand Down