Skip to content
This repository was archived by the owner on Feb 16, 2024. It is now read-only.

Commit bf9d717

Browse files
committed
Docs on DBeaver
1 parent 64ca156 commit bf9d717

File tree

8 files changed

+49
-1
lines changed

8 files changed

+49
-1
lines changed
Loading
Loading
Loading
Loading
Loading
Loading
Loading

Diff for: docs/modules/ROOT/pages/demos/data-warehouse-iceberg-trino-spark.adoc

+49-1
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
[WARNING]
44
====
55
This demos uses significant amount of resources. It will most likely not run on your workstation.
6-
It was developed and tested on 9 nodes with 4 cores (8 threads), 20GB RAM and 30GB hdd disks.
6+
It was developed and tested on 10 nodes with 4 cores (8 threads), 20GB RAM and 30GB hdd disks.
77
Additionally persistent volumes with a total size of approx. 500GB will get created.
88
A smaller version of this demo might be created in the future.
99
====
@@ -147,3 +147,51 @@ On the right side are three strands, that
147147
3. Fetch the current shred bike bike status
148148

149149
For details on the NiFi workflow ingesting water-level data please read on the xref:demos/nifi-kafka-druid-water-level-data.adoc#_nifi[nifi-kafka-druid-water-level-data documentation on NiFi].
150+
151+
== Spark
152+
153+
https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html[Spark Structured Streaming] is used to stream data from Kafka into the warehouse.
154+
155+
To have access to the Spark WebUI you need to run the following command to port-forward the Port 4040 to your local machine
156+
157+
[source,console]
158+
----
159+
kubectl port-forward $(kubectl get pod -o name | grep 'spark-ingest-into-warehouse-.*-driver') 4040
160+
----
161+
162+
Afterwards you can reach the Webinterface on http://localhost:4040.
163+
164+
image::demo-data-warehouse-iceberg-trino-spark/spark_1.png[]
165+
166+
On the UI the last jobs are shown.
167+
Each running Structured Streaming job creates lots of Spark jobs internally.
168+
169+
Click on the tab `Structured Streaming` to see the running streaming jobs.
170+
171+
image::demo-data-warehouse-iceberg-trino-spark/spark_2.png[]
172+
173+
Five streaming jobs are currently running.
174+
The job with the highest throughput is the `ingest water_level measurements` job.
175+
Click on the bluely highlighted `Run ID` of it.
176+
177+
image::demo-data-warehouse-iceberg-trino-spark/spark_3.png[]
178+
179+
== Trino
180+
Trino is used to enable SQL access to the data.
181+
182+
=== View WebUI
183+
Open up the the given `trino` endpoint `coordinator-https` from your `stackablectl services list` command output (https://212.227.224.138:30876 in this case).
184+
185+
image::demo-data-warehouse-iceberg-trino-spark/trino_1.png[]
186+
187+
Log in with the username `admin` and password `admin`.
188+
189+
image::demo-data-warehouse-iceberg-trino-spark/trino_2.png[]
190+
191+
=== Connect with DBeaver
192+
https://dbeaver.io/[DBeaver] is free multi-platform database tool that can be used to connect to Trino.
193+
Please have a look at the <TODO> trino-operator documentation on how to connect DBeaver to Trino.
194+
195+
image::demo-data-warehouse-iceberg-trino-spark/dbeaver_1.png[]
196+
197+
image::demo-data-warehouse-iceberg-trino-spark/dbeaver_2.png[]

0 commit comments

Comments
 (0)