This repo showcases how Geode can be used in a simple streaming data pipeline. The source and sink apps are Spring Cloud Stream apps, and the processor is a simple Spring Boot app that enriches the data flowing through the pipeline with values pulled from a lookup against Geode.
In this demo we will set up a local cluster with minikube and deploy a pipeline that grabs from a file source and enriches the payload with data from geode. We use micrometer to grab throughput and count metrics.
You will need the following installed on your local workstation:
For mac installations.
On macs:
$ brew install kubernetes-cli
$ brew cask install minikube
Note that you will need to have a hypervisor installed; if you don't, check out the kubernetes guide to installing a hypervisor.
https://github.com/derailed/k9s
$ minikube start --cpus 4 --memory 8096 --vm-driver=hyperkit
Explore the k8s cluster with:
$ k9s
Note that selecting a container in k9s and pressing
s
will ssh you into that container.l
will display logs.
$ cd k8s
$ kubectl apply -f geode
$ kubectl apply -f kafka
$ kubectl apply -f prometheus
$ kubectl apply -f grafana
$ kubectl apply -f geode-stream.yml
After each step, confirm that the container is deployed. This may take a while as each container needs to be downloaded.
Congratulations! Your infrastructure should now be up and running in Kubernetes. Now let's populate Geode with some lookup data...
$ kubectl cp geode/data/1mil.gfd server-0:/tmp/1mil.gfd
If you're using k9s, select the locator and press s
.
If you are not using k9s, run the following command: $ kubectl exec -ti {locator-0-pod-name} bash
. (Replace the value surrounded with curly braces with the actual name of your locator pod).
$ gfsh
gfsh > connect --locator=locator-0[10334] --jmx-manager=locator-0[1099]
gfsh > create region --name=telemetryRegion --type=REPLICATE
gfsh > import data --region=telemetryRegion --file=/tmp/1mil.gfd --member=server-0
Note that the file has to be on the member you're pointing to; in this case, /tmp/1mil.gfd is on geode-server-0
gfsh > query --query='select count(*) from /telemetryRegion'
$ kubectl port-forward {grafana-pod-name} 3000
Use these credentials:
user: admin
password: password
Look for an "Import" option on the menu. For a quickstart, import our sample json dashboard, which you can find under the /k8s/grafana/dashboards/
path.
The file source app is a spring cloud stream app that:
- looks for files in the
/tmp
folder - reads files dropped into
/tmp
line by line - publish each line into a kafka topic
We will copy a file that contains 1 million sample telemetry records into the file source app. The file source app will begin streaming these records into a Kafka topic, which will then be consumed by our geode processor application, enriched with data stored in geode, and then published to another kafka topic. You should begin seeing total count and throughput in grafana as the data streams through the pipeline.
$ kubectl cp geode/data/1mil_telemetry.txt {file-source-pod-name}:/tmp/foo/1.txt
You can also view the data going through the processor and sink by watching the logs.
Since the pipeline runs on Kubernetes, you can try it in your preferred cloud provider. Try scaling the geode locators and servers and see how that affects your throughput. Have fun with it!
$ kubectl delete deployment --all
$ kubectl delete statefulset --all
To tear down minikube:
$ minikube delete