Skip to content

Commit 7a0ff91

Browse files
Refactor Feast Helm charts for better end user install experience (#533)
* Refactor Feast Helm charts 1. Use --spring.config.additional-location to specify Spring application config so the default application.yaml bundled in the jar can be used, and users only need to override required config. This is to prevent chart users being overwhelmed by the amount of config to set. 2. Use less templating, if else, and functions in configmap.yaml in Feast charts. Instead set a reasonable default and allow users to override in application-override.yaml. This makes it easier to manage when application.yaml structure changes and documentation for application.yaml can be delegated to: https://github.com/gojek/feast/blob/master/core/src/main/resources/application.yml https://github.com/gojek/feast/blob/master/serving/src/main/resources/application.yml 3. Remove double nesting of charts. Previously, Postgresql and Kafka subcharts are bundled in Feast Core suchart. Now all 3 subcharts are sibling charts. This reduces coupling between charts and it's easier to configure chart values one at a time than one big values yaml. Also, documentation for third party subcharts can be delegated to the original authors. 4. Use helm-docs to generate README for Feast charts so the documentation is reproducible and easier to keep up to date when values.yaml changes. It is also easier to follow best practices when writing values.yaml. https://github.com/norwoodj/helm-docs 5. Add prometheus and grafana charts to support out of the box metrics collector and visualization when users install Feast with this chart. 6. Add tests for users to verify installation: 7. Add examples to install Feast with 3 different profiles: - online serving only with direct runner - online and batch serving with DirectRunner - online and batch serving with DataflowRunner * Update docs * Remove unneeded OWNERS file for 3rd party charts * Move .helmdocsignore to correct location helm-docs look for it in repo root folder * Add test prefix for projects created in helm test * Add missing kafka port in example feast stream config Incorrectly swapped resources requests and limit * Typo * Add application-secret.yaml, make it optional to use feast-core.postgresql.existingSecret, use JAVA_TOOL_OPTIONS instead of JAVA_OPTS - Also adjust CPU request and memory limit for test pods to fit the actual usage * Update README and configuration so it works Feast v0.5, but not Feast v0.4 * Typo * Add 4 different application config with flag toggle for Feast Core and Serving - application.yaml: default application.yaml bundled in the jar - application-generated.yaml: additional config generated by Helm that is valid when the dependencies like Postgres, Kafka and Redis are installed with default config - application-secret.yaml: config to override default and Helm generated config - application-override.yaml: same as application-secret.yaml but config is created as ConfigMap vs Secret and has a higher precendence than application-secret.yaml * Update apiVersion for deployment in prometheus-statsd-exporter so it works with latest Kubernetes version * Cleanup unsued template function and add checksum to configmap and secret So that deployment will be updated when configmap and secret are updated * Remove 3rd party charts from charts folder * Update example values so it's compatible with Feast v0.5. Add helm documentation for .Values.gcpProjectId
1 parent 07b8bdf commit 7a0ff91

52 files changed

Lines changed: 1667 additions & 1142 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.helmdocsignore

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
infra/charts/feast/charts/postgresql
2+
infra/charts/feast/charts/kafka
3+
infra/charts/feast/charts/redis
4+
infra/charts/feast/charts/prometheus-statsd-exporter
5+
infra/charts/feast/charts/prometheus
6+
infra/charts/feast/charts/grafana

infra/charts/feast/Chart.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
apiVersion: v1
2-
description: A Helm chart to install Feast on kubernetes
2+
description: Feature store for machine learning.
33
name: feast
4-
version: 0.4.4
4+
version: 0.5.0-alpha.1

infra/charts/feast/README.md

Lines changed: 356 additions & 228 deletions
Large diffs are not rendered by default.
Lines changed: 354 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,354 @@
1+
{{ template "chart.header" . }}
2+
3+
{{ template "chart.description" . }} {{ template "chart.versionLine" . }}
4+
5+
## TL;DR;
6+
7+
```bash
8+
# Add Feast Helm chart
9+
helm repo add feast-charts https://feast-charts.storage.googleapis.com
10+
helm repo update
11+
12+
# Create secret for Feast database, replace <your-password> with the desired value
13+
kubectl create secret generic feast-postgresql \
14+
--from-literal=postgresql-password=<your_password>
15+
16+
# Install Feast with Online Serving and Beam DirectRunner
17+
helm install --name myrelease feast-charts/feast \
18+
--set feast-core.postgresql.existingSecret=feast-postgresql \
19+
--set postgresql.existingSecret=feast-postgresql
20+
```
21+
22+
## Introduction
23+
This chart install Feast deployment on a Kubernetes cluster using the [Helm](https://v2.helm.sh/docs/using_helm/#installing-helm) package manager.
24+
25+
## Prerequisites
26+
- Kubernetes 1.12+
27+
- Helm 2.15+ (not tested with Helm 3)
28+
- Persistent Volume support on the underlying infrastructure
29+
30+
{{ template "chart.requirementsSection" . }}
31+
32+
{{ template "chart.valuesSection" . }}
33+
34+
## Configuration and installation details
35+
36+
The default configuration will install Feast with Online Serving. Ingestion
37+
of features will use Beam [DirectRunner](https://beam.apache.org/documentation/runners/direct/)
38+
that runs on the same container where Feast Core is running.
39+
40+
```bash
41+
# Create secret for Feast database, replace <your-password> accordingly
42+
kubectl create secret generic feast-postgresql \
43+
--from-literal=postgresql-password=<your_password>
44+
45+
# Install Feast with Online Serving and Beam DirectRunner
46+
helm install --name myrelease feast-charts/feast \
47+
--set feast-core.postgresql.existingSecret=feast-postgresql \
48+
--set postgresql.existingSecret=feast-postgresql
49+
```
50+
51+
In order to test that the installation is successful:
52+
```bash
53+
helm test myrelease
54+
55+
# If the installation is successful, the following should be printed
56+
RUNNING: myrelease-feast-online-serving-test
57+
PASSED: myrelease-feast-online-serving-test
58+
RUNNING: myrelease-grafana-test
59+
PASSED: myrelease-grafana-test
60+
RUNNING: myrelease-test-topic-create-consume-produce
61+
PASSED: myrelease-test-topic-create-consume-produce
62+
63+
# Once the test completes, to check the logs
64+
kubectl logs myrelease-feast-online-serving-test
65+
```
66+
67+
> The test pods can be safely deleted after the test finishes.
68+
> Check the yaml files in `templates/tests/` folder to see the processes
69+
> the test pods execute.
70+
71+
### Feast metrics
72+
73+
Feast default installation includes Grafana, StatsD exporter and Prometheus. Request
74+
metrics from Feast Core and Feast Serving, as well as ingestion statistic from
75+
Feast Ingestion are accessible from Prometheus and Grafana dashboard. The following
76+
show a quick example how to access the metrics.
77+
78+
```
79+
# Forwards local port 9090 to the Prometheus server pod
80+
kubectl port-forward svc/myrelease-prometheus-server 9090:80
81+
```
82+
83+
Visit http://localhost:9090 to access the Prometheus server:
84+
85+
![Prometheus Server](files/img/prometheus-server.png?raw=true)
86+
87+
### Enable Batch Serving
88+
89+
To install Feast Batch Serving for retrieval of historical features in offline
90+
training, access to BigQuery is required. First, create a [service account](https://cloud.google.com/iam/docs/creating-managing-service-account-keys) key that
91+
will provide the credentials to access BigQuery. Grant the service account `editor`
92+
role so it has write permissions to BigQuery and Cloud Storage.
93+
94+
> In production, it is advised to give only the required [permissions](foo-feast-batch-serving-test) for the
95+
> the service account, versus `editor` role which is very permissive.
96+
97+
Create a Kubernetes secret for the service account JSON file:
98+
```bash
99+
# By default Feast expects the secret to be named "feast-gcp-service-account"
100+
# and the JSON file to be named "credentials.json"
101+
kubectl create secret generic feast-gcp-service-account --from-file=credentials.json
102+
```
103+
104+
Create a new Cloud Storage bucket (if not exists) and make sure the service
105+
account has write access to the bucket:
106+
```bash
107+
gsutil mb <bucket_name>
108+
```
109+
110+
Use the following Helm values to enable Batch Serving:
111+
```yaml
112+
# values-batch-serving.yaml
113+
feast-core:
114+
gcpServiceAccount:
115+
enabled: true
116+
postgresql:
117+
existingSecret: feast-postgresql
118+
119+
feast-batch-serving:
120+
enabled: true
121+
gcpServiceAccount:
122+
enabled: true
123+
application-override.yaml:
124+
feast:
125+
active_store: historical
126+
stores:
127+
- name: historical
128+
type: BIGQUERY
129+
config:
130+
project_id: <google_project_id>
131+
dataset_id: <bigquery_dataset_id>
132+
staging_location: gs://<bucket_name>/feast-staging-location
133+
initial_retry_delay_seconds: 3
134+
total_timeout_seconds: 21600
135+
subscriptions:
136+
- name: "*"
137+
project: "*"
138+
version: "*"
139+
140+
postgresql:
141+
existingSecret: feast-postgresql
142+
```
143+
144+
> To delete the previous release, run `helm delete --purge myrelease`
145+
> Note this will not delete the persistent volume that has been claimed (PVC).
146+
> In a test cluster, run `kubectl delete pvc --all` to delete all claimed PVCs.
147+
148+
```bash
149+
# Install a new release
150+
helm install --name myrelease -f values-batch-serving.yaml feast-charts/feast
151+
152+
# Wait until all pods are created and running/completed (can take about 5m)
153+
kubectl get pods
154+
155+
# Batch Serving is installed so `helm test` will also test for batch retrieval
156+
helm test myrelease
157+
```
158+
159+
### Use DataflowRunner for ingestion
160+
161+
Apache Beam [DirectRunner](https://beam.apache.org/documentation/runners/direct/)
162+
is not suitable for production use case because it is not easy to scale the
163+
number of workers and there is no convenient API to monitor and manage the
164+
workers. Feast supports [DataflowRunner](https://beam.apache.org/documentation/runners/dataflow/) which is a managed service on Google Cloud.
165+
166+
> Make sure `feast-gcp-service-account` Kubernetes secret containing the
167+
> service account has been created and the service account has permissions
168+
> to manage Dataflow jobs.
169+
170+
Since Dataflow workers run outside the Kube cluster and they will need to interact
171+
with Kafka brokers, Redis stores and StatsD server installed in the cluster,
172+
these services need to be exposed for access outside the cluster by setting
173+
`service.type: LoadBalancer`.
174+
175+
In a typical use case, 5 `LoadBalancer` (internal) IP addresses are required by
176+
Feast when running with `DataflowRunner`. In Google Cloud, these (internal) IP
177+
addresses should be reserved first:
178+
```bash
179+
# Check with your network configuration which IP addresses are available for use
180+
gcloud compute addresses create \
181+
feast-kafka-1 feast-kafka-2 feast-kafka-3 feast-redis feast-statsd \
182+
--region <region> --subnet <subnet> \
183+
--addresses 10.128.0.11,10.128.0.12,10.128.0.13,10.128.0.14,10.128.0.15
184+
```
185+
186+
Use the following Helm values to enable DataflowRuner (and Batch Serving),
187+
replacing the `<*load_balancer_ip*>` tags with the ip addresses reserved above:
188+
189+
```yaml
190+
# values-dataflow-runner.yaml
191+
feast-core:
192+
gcpServiceAccount:
193+
enabled: true
194+
postgresql:
195+
existingSecret: feast-postgresql
196+
application-override.yaml:
197+
feast:
198+
stream:
199+
options:
200+
bootstrapServers: <kafka_sevice_load_balancer_ip_address_1:31090>
201+
jobs:
202+
active_runner: dataflow
203+
metrics:
204+
host: <prometheus_statsd_exporter_load_balancer_ip_address>
205+
runners:
206+
- name: dataflow
207+
type: DataflowRunner
208+
options:
209+
project: <google_project_id>
210+
region: <dataflow_regional_endpoint e.g. asia-east1>
211+
zone: <google_zone e.g. asia-east1-a>
212+
tempLocation: <gcs_path_for_temp_files e.g. gs://bucket/tempLocation>
213+
network: <google_cloud_network_name>
214+
subnetwork: <google_cloud_subnetwork_path e.g. regions/asia-east1/subnetworks/mysubnetwork>
215+
maxNumWorkers: 1
216+
autoscalingAlgorithm: THROUGHPUT_BASED
217+
usePublicIps: false
218+
workerMachineType: n1-standard-1
219+
deadLetterTableSpec: <bigquery_table_spec_for_deadletter e.g. project_id:dataset_id.table_id>
220+
221+
feast-online-serving:
222+
application-override.yaml:
223+
feast:
224+
stores:
225+
- name: online
226+
type: REDIS
227+
config:
228+
host: <redis_service_load_balancer_ip_addresss>
229+
port: 6379
230+
subscriptions:
231+
- name: "*"
232+
project: "*"
233+
version: "*"
234+
235+
feast-batch-serving:
236+
enabled: true
237+
gcpServiceAccount:
238+
enabled: true
239+
application-override.yaml:
240+
feast:
241+
active_store: historical
242+
stores:
243+
- name: historical
244+
type: BIGQUERY
245+
config:
246+
project_id: <google_project_id>
247+
dataset_id: <bigquery_dataset_id>
248+
staging_location: gs://<bucket_name>/feast-staging-location
249+
initial_retry_delay_seconds: 3
250+
total_timeout_seconds: 21600
251+
subscriptions:
252+
- name: "*"
253+
project: "*"
254+
version: "*"
255+
256+
postgresql:
257+
existingSecret: feast-postgresql
258+
259+
kafka:
260+
external:
261+
enabled: true
262+
type: LoadBalancer
263+
annotations:
264+
cloud.google.com/load-balancer-type: Internal
265+
loadBalancerSourceRanges:
266+
- 10.0.0.0/8
267+
- 172.16.0.0/12
268+
- 192.168.0.0/16
269+
firstListenerPort: 31090
270+
loadBalancerIP:
271+
- <kafka_sevice_load_balancer_ip_address_1>
272+
- <kafka_sevice_load_balancer_ip_address_2>
273+
- <kafka_sevice_load_balancer_ip_address_3>
274+
configurationOverrides:
275+
"advertised.listeners": |-
276+
EXTERNAL://${LOAD_BALANCER_IP}:31090
277+
"listener.security.protocol.map": |-
278+
PLAINTEXT:PLAINTEXT,EXTERNAL:PLAINTEXT
279+
"log.retention.hours": 1
280+
281+
redis:
282+
master:
283+
service:
284+
type: LoadBalancer
285+
loadBalancerIP: <redis_service_load_balancer_ip_addresss>
286+
annotations:
287+
cloud.google.com/load-balancer-type: Internal
288+
loadBalancerSourceRanges:
289+
- 10.0.0.0/8
290+
- 172.16.0.0/12
291+
- 192.168.0.0/16
292+
293+
prometheus-statsd-exporter:
294+
service:
295+
type: LoadBalancer
296+
annotations:
297+
cloud.google.com/load-balancer-type: Internal
298+
loadBalancerSourceRanges:
299+
- 10.0.0.0/8
300+
- 172.16.0.0/12
301+
- 192.168.0.0/16
302+
loadBalancerIP: <prometheus_statsd_exporter_load_balancer_ip_address>
303+
```
304+
305+
```bash
306+
# Install a new release
307+
helm install --name myrelease -f values-dataflow-runner.yaml feast-charts/feast
308+
309+
# Wait until all pods are created and running/completed (can take about 5m)
310+
kubectl get pods
311+
312+
# Test the installation
313+
helm test myrelease
314+
```
315+
316+
If the tests are successful, Dataflow jobs should appear in Google Cloud console
317+
running features ingestion: https://console.cloud.google.com/dataflow
318+
319+
![Dataflow Jobs](files/img/dataflow-jobs.png)
320+
321+
### Production configuration
322+
323+
#### Resources requests
324+
325+
The `resources` field in the deployment spec is left empty in the examples. In
326+
production these should be set according to the load each services are expected
327+
to handle and the service level objectives (SLO). Also Feast Core and Serving
328+
is Java application and it is [good practice](https://stackoverflow.com/a/6916718/3949303)
329+
to set the minimum and maximum heap. This is an example reasonable value to set for Feast Serving:
330+
331+
```yaml
332+
feast-online-serving:
333+
javaOpts: "-Xms2048m -Xmx2048m"
334+
resources:
335+
limits:
336+
memory: "2048Mi"
337+
requests:
338+
memory: "2048Mi"
339+
cpu: "1"
340+
```
341+
342+
#### High availability
343+
344+
Default Feast installation only configures a single instance of Redis
345+
server. If due to network failures or out of memory error Redis is down,
346+
Feast serving will fail to respond to requests. Soon, Feast will support
347+
highly available Redis via [Redis cluster](https://redis.io/topics/cluster-tutorial),
348+
sentinel or additional proxies.
349+
350+
### Documentation development
351+
352+
This `README.md` is generated using [helm-docs](https://github.com/norwoodj/helm-docs/).
353+
Please run `helm-docs` to regenerate the `README.md` every time `README.md.gotmpl`
354+
or `values.yaml` are updated.

infra/charts/feast/charts/feast-core/.helmignore

Lines changed: 0 additions & 22 deletions
This file was deleted.
Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
apiVersion: v1
2-
description: A Helm chart for core component of Feast
2+
description: Feast Core registers feature specifications and manage ingestion jobs.
33
name: feast-core
4-
version: 0.4.4
4+
version: 0.5.0-alpha.1

0 commit comments

Comments
 (0)