You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Hopsworks Python library installation documentation improvements
- Remmove references to pip install hsfs and hsfs.connection()
- Improve the documentation for the installation of the Python library (Including profiles)
- Add documentation for the installation of the Java library
* Typo
* Fix for review
Copy file name to clipboardexpand all lines: docs/user_guides/client_installation/index.md
+87-28
Original file line number
Diff line number
Diff line change
@@ -1,56 +1,115 @@
1
1
---
2
-
description: Documentation on how to install the Hopsworks and HSFS Python libraries, including the specific requirements for Mac OSX and Windows.
2
+
description: Documentation on how to install the Hopsworks Python and Java library.
3
3
---
4
4
# Client Installation Guide
5
5
6
-
## Hopsworks (including Feature Store and MLOps)
7
-
The Hopsworks client library is required to connect to the Hopsworks Feature Store and MLOps services from your local machine or any other Python environment such as Google Colab or AWS Sagemaker. Execute the following command to install the full Hopsworks client library in your Python environment:
6
+
## Hopsworks Python library
7
+
8
+
The Hopsworks Python client library is required to connect to Hopsworks from your local machine or any other Python environment such as Google Colab or AWS Sagemaker. Execute the following command to install the Hopsworks client library in your Python environment:
8
9
9
10
!!! note "Virtual environment"
10
11
It is recommended to use a virtual python environment instead of the system environment used by your operating system, in order to avoid any side effects regarding interfering dependencies.
Hopsworks latest version should work on OSX systems without any additional requirements. However if installing an older version of the Hopsworks SDK you might need to install `librdkafka` manually. Checkout the documentation for the specific version you are installing.
19
-
20
13
!!! attention "Windows/Conda Installation"
21
14
22
15
On Windows systems you might need to install twofish manually before installing hopsworks, if you don't have the Microsoft Visual C++ Build Tools installed. In that case, it is recommended to use a conda environment and run the following commands:
23
16
24
17
```bash
25
18
conda install twofish
26
-
pip install hopsworks
19
+
pip install hopsworks[python]
27
20
```
28
21
29
-
## Feature Store only
30
-
To only install the Hopsworks Feature Store client library, execute the following command:
The Hopsworks library has several profiles that bring additional dependencies and enable additional functionalities:
30
+
31
+
| Profile Name | Description |
32
+
| ------------------ | ------------- |
33
+
| No Profile | This is the base installation. Supports interacting with the feature store metadata, model registry and deployments. It also supports reading and writing from the feature store from PySpark environments. |
34
+
|`python`| This profile enables reading and writing from/to the feature store from a Python environment |
35
+
|`great-expectations`| This profile installs the [Great Expectations](https://greatexpectations.io/) Python library and enables data validation on feature pipelines |
36
+
|`polars`| This profile installs the [Polars](https://pola.rs/) library and enables reading and writing Polars DataFrames |
37
+
38
+
You can install all the above profiles with the following command:
Hopsworks latest version should work on OSX systems without any additional requirements. However if installing an older version of the Hopsworks SDK you might need to install `librdkafka` manually. Checkout the documentation for the specific version you are installing.
44
+
## HSFS Java Library:
41
45
42
-
!!! attention "Windows/Conda Installation"
46
+
If you want to interact with the Hopsworks Feature Store from environments such as Spark, Flink or Beam, you can use the Hopsworks Feature Store (HSFS) Java library.
43
47
44
-
On Windows systems you might need to install twofish manually before installing hsfs, if you don't have the Microsoft Visual C++ Build Tools installed. In that case, it is recommended to use a conda environment and run the following commands:
45
-
46
-
```bash
47
-
conda install twofish
48
-
pip install hsfs[python]
49
-
```
48
+
!!!note "Feature Store Only"
49
+
50
+
The Java library only allows interaction with the Feature Store component of the Hopsworks platform. Additionally each environment might restrict the supported API operation. You can see which API operation is supported by which environment [here](../fs/compute_engines)
51
+
52
+
The HSFS library is available on the Hopsworks' Maven repository. If you are using Maven as build tool, you can add the following in your `pom.xml` file:
The library has different builds targeting different environments:
71
+
72
+
### Spark
73
+
74
+
The `artifactId` for the Spark build is `hsfs-spark-spark{spark.version}`, if you are using Maven as build tool, you can add the following dependency:
75
+
76
+
```
77
+
<dependency>
78
+
<groupId>com.logicalclocks</groupId>
79
+
<artifactId>hsfs-spark-spark3.1</artifactId>
80
+
<version>${hsfs.version}</version>
81
+
</dependency>
82
+
```
83
+
84
+
Hopsworks provides builds for Spark 3.1, 3.3 and 3.5. The builds are also provided as JAR files which can be downloaded from [Hopsworks repository](https://repo.hops.works/master/hsfs)
85
+
86
+
### Flink
87
+
88
+
The `artifactId` for the Flink build is `hsfs-flink`, if you are using Maven as build tool, you can add the following dependency:
89
+
90
+
```
91
+
<dependency>
92
+
<groupId>com.logicalclocks</groupId>
93
+
<artifactId>hsfs-flink</artifactId>
94
+
<version>${hsfs.version}</version>
95
+
</dependency>
96
+
```
97
+
98
+
### Beam
99
+
100
+
The `artifactId` for the Beam build is `hsfs-beam`, if you are using Maven as build tool, you can add the following dependency:
101
+
102
+
```
103
+
<dependency>
104
+
<groupId>com.logicalclocks</groupId>
105
+
<artifactId>hsfs-beam</artifactId>
106
+
<version>${hsfs.version}</version>
107
+
</dependency>
108
+
```
50
109
51
110
## Next Steps
52
111
53
-
If you are using a local python environment and want to connect to the Hopsworks Feature Store, you can follow the [Python Guide](../integrations/python.md#generate-an-api-key) section to create an API Key and to get started.
112
+
If you are using a local python environment and want to connect to Hopsworks, you can follow the [Python Guide](../integrations/python.md#generate-an-api-key) section to create an API Key and to get started.
In order for the Databricks cluster to be able to communicate with the Hopsworks Feature Store, the clients running on Databricks need to be able to access a Hopsworks API key.
3
+
In order for the Databricks cluster to be able to communicate with Hopsworks, clients running on Databricks need to be able to access a Hopsworks API key.
4
4
5
5
## Generate an API key
6
6
@@ -15,127 +15,19 @@ For instructions on how to generate an API key follow this [user guide](../../pr
15
15
16
16
!!! hint "API key as Argument"
17
17
To get started quickly, without saving the Hopsworks API in a secret storage, you can simply supply it as an argument when instantiating a connection:
18
-
```python hl_lines="6"
19
-
import hsfs
20
-
conn = hsfs.connection(
21
-
host='my_instance', # DNS of your Feature Store instance
22
-
port=443, # Port to reach your Hopsworks instance, defaults to 443
23
-
project='my_project', # Name of your Hopsworks Feature Store project
24
-
api_key_value='apikey', # The API key to authenticate with Hopsworks
25
-
hostname_verification=True # Disable for self-signed certificates
26
-
)
27
-
fs = conn.get_feature_store() # Get the project's default feature store
28
-
```
29
18
30
-
## Store the API key
31
19
32
-
### AWS
33
-
34
-
#### Step 1: Create an instance profile to attach to your Databricks clusters
35
-
36
-
Go to the *AWS IAM* choose *Roles* and click on *Create Role*. Select *AWS Service* as the type of trusted entity and *EC2* as the use case as shown below:
37
-
38
-
<palign="center">
39
-
<figure>
40
-
<img src="../../../../assets/images/guides/integrations/create-instance-profile.png" alt="Create an instance profile">
41
-
<figcaption>Create an instance profile</figcaption>
42
-
</figure>
43
-
</p>
44
-
45
-
Click on *Next: Permissions*, *Next:Tags*, and then *Next: Review*. Name the instance profile role and then click *Create role*.
46
-
47
-
#### Step 2: Storing the API Key
48
-
49
-
**Option 1: Using the AWS Systems Manager Parameter Store**
50
-
51
-
In the AWS Management Console, ensure that your active region is the region you use for Databricks.
52
-
Go to the *AWS Systems Manager* choose *Parameter Store* and select *Create Parameter*.
53
-
As name enter `/hopsworks/role/[MY_DATABRICKS_ROLE]/type/api-key` replacing `[MY_DATABRICKS_ROLE]` with the name of the AWS role you have created in [Step 1](#step-1-create-an-instance-profile-to-attach-to-your-databricks-clusters). Select *Secure String* as type and create the parameter.
54
-
55
-
<palign="center">
56
-
<figure>
57
-
<img src="../../../../assets/images/guides/integrations/databricks/aws/databricks_parameter_store.png" alt="Storing the Feature Store API key in the Parameter Store">
58
-
<figcaption>Storing the Feature Store API key in the Parameter Store</figcaption>
59
-
</figure>
60
-
</p>
61
-
62
-
63
-
Once the API Key is stored, you need to grant access to it from the AWS role that you have created in [Step 1](#step-1-create-an-instance-profile-to-attach-to-your-databricks-clusters).
64
-
In the AWS Management Console, go to *IAM*, select *Roles* and then search for the role that you have created in [Step 1](#step-1-create-an-instance-profile-to-attach-to-your-databricks-clusters).
65
-
Select *Add inline policy*. Choose *Systems Manager* as service, expand the *Read* access level and check *GetParameter*.
66
-
Expand Resources and select *Add ARN*.
67
-
Enter the region of the *Systems Manager* as well as the name of the parameter **WITHOUT the leading slash** e.g. *hopsworks/role/[MY_DATABRICKS_ROLE]/type/api-key* and click *Add*.
68
-
Click on *Review*, give the policy a name and click on *Create policy*.
69
-
70
-
<palign="center">
71
-
<figure>
72
-
<img src="../../../../assets/images/guides/integrations/databricks/aws/databricks_parameter_store_policy.png" alt="Configuring the access policy for the Parameter Store">
73
-
<figcaption>Configuring the access policy for the Parameter Store</figcaption>
74
-
</figure>
75
-
</p>
76
-
77
-
78
-
**Option 2: Using the AWS Secrets Manager**
79
-
80
-
In the AWS management console ensure that your active region is the region you use for Databricks.
81
-
Go to the *AWS Secrets Manager* and select *Store new secret*. Select *Other type of secrets* and add *api-key*
82
-
as the key and paste the API key created in the previous step as the value. Click next.
83
-
84
-
<palign="center">
85
-
<figure>
86
-
<img src="../../../../assets/images/guides/integrations/databricks/aws/databricks_secrets_manager_step_1.png" alt="Storing a Feature Store API key in the Secrets Manager Step 1">
87
-
<figcaption>Storing a Feature Store API key in the Secrets Manager Step 1</figcaption>
88
-
</figure>
89
-
</p>
90
-
91
-
As secret name, enter *hopsworks/role/[MY_DATABRICKS_ROLE]* replacing [MY_DATABRICKS_ROLE] with the AWS role you have created in [Step 1](#step-1-create-an-instance-profile-to-attach-to-your-databricks-clusters). Select next twice and finally store the secret.
92
-
Then click on the secret in the secrets list and take note of the *Secret ARN*.
93
-
94
-
<palign="center">
95
-
<figure>
96
-
<img src="../../../../assets/images/guides/integrations/databricks/aws/databricks_secrets_manager_step_2.png" alt="Storing a Feature Store API key in the Secrets Manager Step 2">
97
-
<figcaption>Storing a Feature Store API key in the Secrets Manager Step 2</figcaption>
98
-
</figure>
99
-
</p>
100
-
101
-
Once the API Key is stored, you need to grant access to it from the AWS role that you have created in [Step 1](#step-1-create-an-instance-profile-to-attach-to-your-databricks-clusters).
102
-
In the AWS Management Console, go to *IAM*, select *Roles* and then the role that that you have created in [Step 1](#step-1-create-an-instance-profile-to-attach-to-your-databricks-clusters).
103
-
Select *Add inline policy*. Choose *Secrets Manager* as service, expand the *Read* access level and check *GetSecretValue*.
104
-
Expand Resources and select *Add ARN*. Paste the ARN of the secret created in the previous step.
105
-
Click on *Review*, give the policy a name and click on *Create policy*.
106
-
107
-
<palign="center">
108
-
<figure>
109
-
<img src="../../../../assets/images/guides/integrations/databricks/aws/databricks_secrets_manager_policy.png" alt="Configuring the access policy for the Secrets Manager">
110
-
<figcaption>Configuring the access policy for the Secrets Manager</figcaption>
111
-
</figure>
112
-
</p>
113
-
114
-
#### Step 3: Allow Databricks to use the AWS role created in Step 1
115
-
116
-
First you need to get the AWS role used by Databricks for deployments as described in [this step](https://docs.databricks.com/administration-guide/cloud-configurations/aws/instance-profiles.html#step-3-note-the-iam-role-used-to-create-the-databricks-deployment). Once you get the role name, go to *AWS IAM*, search for the role, and click on it. Then, select the *Permissions* tab, click on *Add inline policy*, select the *JSON* tab, and paste the following snippet. Replace *[ACCOUNT_ID]* with your AWS account id, and *[MY_DATABRICKS_ROLE]* with the AWS role name created in [Step 1](#step-1-create-an-instance-profile-to-attach-to-your-databricks-clusters).
host='my_instance', # DNS of your Feature Store instance
24
+
port=443, # Port to reach your Hopsworks instance, defaults to 443
25
+
project='my_project', # Name of your Hopsworks Feature Store project
26
+
api_key_value='apikey', # The API key to authenticate with Hopsworks
27
+
)
28
+
fs = project.get_feature_store() # Get the project's default feature store
130
29
```
131
30
132
-
Click *Review Policy*, name the policy, and click *Create Policy*. Then, go to your Databricks workspace and follow [this step](https://docs.databricks.com/administration-guide/cloud-configurations/aws/instance-profiles.html#step-5-add-the-instance-profile-to-databricks) to add the instance profile to your workspace. Finally, when launching Databricks clusters, select *Advanced* settings and choose the instance profile you have just added.
133
-
134
-
135
-
### Azure
136
-
137
-
On Azure we currently do not support storing the API key in a secret storage. Instead just store the API key in a file in your Databricks workspace so you can access it when connecting to the Feature Store.
138
-
139
31
## Next Steps
140
32
141
33
Continue with the [configuration guide](configuration.md) to finalize the configuration of the Databricks Cluster to communicate with the Hopsworks Feature Store.
Copy file name to clipboardexpand all lines: docs/user_guides/integrations/databricks/configuration.md
+10-32
Original file line number
Diff line number
Diff line change
@@ -90,38 +90,16 @@ When a cluster is configured for a specific project user, all the operations wit
90
90
At the end of the configuration, Hopsworks will start the cluster.
91
91
Once the cluster is running users can establish a connection to the Hopsworks Feature Store from Databricks:
92
92
93
-
!!! note "API key on Azure"
94
-
Please note, for Azure it is necessary to store the Hopsworks API key locally on the cluster as a file. As we currently do not support storing the API key on an Azure Secret Management Service as we do for AWS. Consult the [API key guide for Azure](api_key.md#azure), for more information.
95
-
96
-
=== "AWS"
97
-
98
-
```python
99
-
import hsfs
100
-
conn = hsfs.connection(
101
-
'my_instance', # DNS of your Feature Store instance
102
-
443, # Port to reach your Hopsworks instance, defaults to 443
103
-
'my_project', # Name of your Hopsworks Feature Store project
104
-
secrets_store='secretsmanager', # Either parameterstore or secretsmanager
105
-
hostname_verification=True # Disable for self-signed certificates
106
-
)
107
-
fs = conn.get_feature_store() # Get the project's default feature store
108
-
```
109
-
110
-
=== "Azure"
111
-
112
-
```python
113
-
import hsfs
114
-
conn = hsfs.connection(
115
-
'my_instance', # DNS of your Feature Store instance
116
-
443, # Port to reach your Hopsworks instance, defaults to 443
117
-
'my_project', # Name of your Hopsworks Feature Store project
118
-
secrets_store='local',
119
-
api_key_file="featurestore.key", # For Azure, store the API key locally
120
-
secrets_store = "local",
121
-
hostname_verification=True # Disable for self-signed certificates
122
-
)
123
-
fs = conn.get_feature_store() # Get the project's default feature store
124
-
```
93
+
```python
94
+
import hopsworks
95
+
project = hopsworks.login(
96
+
host='my_instance', # DNS of your Hopsworks instance
97
+
port=443, # Port to reach your Hopsworks instance, defaults to 443
98
+
project='my_project', # Name of your Hopsworks project
99
+
api_key_value='apikey', # The API key to authenticate with Hopsworks
100
+
)
101
+
fs = project.get_feature_store() # Get the project's default feature store
0 commit comments