Kubernetes and Helm are used to install & run Hopsworks and the Feature Store in the cloud. They both integrate seamlessly with third-party platforms such as Databricks, SageMaker and KubeFlow. This guide shows how to set up the Hopsworks platform in your organization's Google Cloud Platform's (GCP) account.
To follow the instruction on this page you will need the following:
- Kubernetes Version: Hopsworks can be deployed on GKE clusters running Kubernetes >= 1.27.0.
- gcloud CLI to provision the GCP resources
- gke-gcloud-auth-plugin to manage authentication with the GKE cluster
- helm to deploy Hopsworks
-
The deployment requires cluster admin access to create ClusterRoles, ServiceAccounts, and ClusterRoleBindings.
-
A namespace is required to deploy the Hopsworks stack. If you don’t have permissions to create a namespace, ask your GKE administrator to provision one.
Create a bucket to store project data. Ensure the bucket is in the same region as your GKE cluster for performance and cost optimization.
gsutil mb -l $region gs://$bucket_name
Create a file named hopsworksai_role.yaml
with the following content:
title: Hopsworks AI Instances
description: Role that allows Hopsworks AI Instances to access resources
stage: GA
includedPermissions:
- storage.buckets.get
- storage.buckets.update
- storage.multipartUploads.abort
- storage.multipartUploads.create
- storage.multipartUploads.list
- storage.multipartUploads.listParts
- storage.objects.create
- storage.objects.delete
- storage.objects.get
- storage.objects.list
- storage.objects.update
- artifactregistry.repositories.create
- artifactregistry.repositories.get
- artifactregistry.repositories.uploadArtifacts
- artifactregistry.repositories.downloadArtifacts
- artifactregistry.tags.list
- artifactregistry.tags.delete
Execute the following gcloud command to create a custom role from the file. Replace $PROJECT_ID with your GCP project id:
gcloud iam roles create hopsworksai_instances \
--project=$PROJECT_ID \
--file=hopsworksai_role.yaml
Execute the following gcloud command to create a service account for Hopsworks AI instances. Replace $PROJECT_ID with your GCP project id:
gcloud iam service-accounts create hopsworksai_instances \
--project=$PROJECT_ID \
--description="Service account for Hopsworks AI instances" \
--display-name="Hopsworks AI instances"
Execute the following gcloud command to bind the custom role to the service account. Replace all occurrences $PROJECT_ID with your GCP project id:
gcloud projects add-iam-policy-binding $PROJECT_ID \
--member="serviceAccount:hopsworks-ai-instances@$PROJECT_ID.iam.gserviceaccount.com" \
--role="projects/$PROJECT_ID/roles/hopsworksai_instances"
gcloud container clusters create <cluster-name> \
--zone <zone> \
--machine-type n2-standard-8 \
--num-nodes 1 \
--enable-ip-alias \
--service-account [email protected]
Once the creation process is completed, you should be able to access the cluster using the kubectl CLI tool:
kubectl get nodes
Hopsworks allows users to customize images for Python jobs, Jupyter Notebooks, and (Py)Spark applications. These images should be stored in Google Container Registry (GCR). The GKE cluster needs access to a GCR repository to push project images.
Enable Artifact Registry and create a GCR repository to store images:
gcloud artifacts repositories create <repo-name> \
--repository-format=docker \
--location=<region>
To obtain access to the Hopsworks helm chart repository, please obtain an evaluation/startup licence here.
Once you have the helm chart repository URL, replace the environment variable $HOPSWORKS_REPO in the following command with this URL.
helm repo add hopsworks $HOPSWORKS_REPO
helm repo update hopsworks
kubectl create namespace hopsworks
Below is a simplifield values.gcp.yaml file to get started which can be updated for improved performance and further customisation.
global:
_hopsworks:
storageClassName: null
cloudProvider: "GCP"
managedDockerRegistery:
enabled: true
domain: "europe-north1-docker.pkg.dev"
namespace: "PROJECT_ID/hopsworks"
credHelper:
enabled: true
secretName: &gcpregcred "gcpregcred"
managedObjectStorage:
enabled: true
s3:
bucket:
name: &bucket "hopsworks"
region: ®ion "europe-north1"
endpoint: &gcpendpoint "https://storage.cloud.google.com"
secret:
name: &gcpcredentials "gcp-credentials"
acess_key_id: &gcpaccesskey "access-key-id"
secret_key_id: &gcpsecretkey "secret-access-key"
minio:
enabled: false
Deploy Hopsworks in the created namespace.
helm install hopsworks hopsworks/hopsworks \
--namespace hopsworks \
--values values.gcp.yaml \
--timeout=600s
Check that Hopsworks is installing on your provisioned AKS cluster.
kubectl get pods --namespace=hopsworks
kubectl get svc -n hopsworks -o wide
Upon completion (circa 20 minutes), setup a load balancer to access Hopsworks:
kubectl expose deployment hopsworks --type=LoadBalancer --name=hopsworks-service --namespace <namespace>
Check out our other guides for how to get started with Hopsworks and the Feature Store:
- Get started with the Hopsworks Feature Store{:target="_blank"}
- Follow one of our tutorials
- Follow one of our Guide