Skip to content
This repository was archived by the owner on Feb 16, 2024. It is now read-only.

Commit 9538515

Browse files
committed
Docs on DBeaver
1 parent 64ca156 commit 9538515

File tree

9 files changed

+61
-1
lines changed

9 files changed

+61
-1
lines changed
Loading
Loading
Loading
Loading
Loading
Loading
Loading
Loading

docs/modules/ROOT/pages/demos/data-warehouse-iceberg-trino-spark.adoc

+61-1
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
[WARNING]
44
====
55
This demos uses significant amount of resources. It will most likely not run on your workstation.
6-
It was developed and tested on 9 nodes with 4 cores (8 threads), 20GB RAM and 30GB hdd disks.
6+
It was developed and tested on 10 nodes with 4 cores (8 threads), 20GB RAM and 30GB hdd disks.
77
Additionally persistent volumes with a total size of approx. 500GB will get created.
88
A smaller version of this demo might be created in the future.
99
====
@@ -147,3 +147,63 @@ On the right side are three strands, that
147147
3. Fetch the current shred bike bike status
148148

149149
For details on the NiFi workflow ingesting water-level data please read on the xref:demos/nifi-kafka-druid-water-level-data.adoc#_nifi[nifi-kafka-druid-water-level-data documentation on NiFi].
150+
151+
== Spark
152+
153+
https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html[Spark Structured Streaming] is used to stream data from Kafka into the warehouse.
154+
155+
To have access to the Spark WebUI you need to run the following command to port-forward the Port 4040 to your local machine
156+
157+
[source,console]
158+
----
159+
kubectl port-forward $(kubectl get pod -o name | grep 'spark-ingest-into-warehouse-.*-driver') 4040
160+
----
161+
162+
Afterwards you can reach the Webinterface on http://localhost:4040.
163+
164+
image::demo-data-warehouse-iceberg-trino-spark/spark_1.png[]
165+
166+
On the UI the last jobs are shown.
167+
Each running Structured Streaming job creates lots of Spark jobs internally.
168+
169+
Click on the tab `Structured Streaming` to see the running streaming jobs.
170+
171+
image::demo-data-warehouse-iceberg-trino-spark/spark_2.png[]
172+
173+
Five streaming jobs are currently running.
174+
The job with the highest throughput is the `ingest water_level measurements` job.
175+
Click on the bluely highlighted `Run ID` of it.
176+
177+
image::demo-data-warehouse-iceberg-trino-spark/spark_3.png[]
178+
179+
== Trino
180+
Trino is used to enable SQL access to the data.
181+
182+
=== View WebUI
183+
Open up the the given `trino` endpoint `coordinator-https` from your `stackablectl services list` command output (https://212.227.224.138:30876 in this case).
184+
185+
image::demo-data-warehouse-iceberg-trino-spark/trino_1.png[]
186+
187+
Log in with the username `admin` and password `admin`.
188+
189+
image::demo-data-warehouse-iceberg-trino-spark/trino_2.png[]
190+
191+
=== Connect with DBeaver
192+
https://dbeaver.io/[DBeaver] is free multi-platform database tool that can be used to connect to Trino.
193+
Please have a look at the <TODO> trino-operator documentation on how to connect DBeaver to Trino.
194+
195+
image::demo-data-warehouse-iceberg-trino-spark/dbeaver_1.png[]
196+
197+
image::demo-data-warehouse-iceberg-trino-spark/dbeaver_2.png[]
198+
You need to modify the setting `TLS` to `true`.
199+
Additionally no need to add the setting `SSLVerification` and set it to `NONE`.
200+
201+
image::demo-data-warehouse-iceberg-trino-spark/dbeaver_3.png[]
202+
203+
Here you can see all the available Trino catalogs.
204+
205+
* `staging`: The staging area containing raw data in various data formats such as csv or parquet
206+
* `system`: Internal catalog to retrieve Trino internals
207+
* `tpcds`: https://trino.io/docs/current/connector/tpcds.html[TPCDS connector] providing a set of schemas to support the http://www.tpc.org/tpcds/[TPC Benchmark™ DS]
208+
* `tpch`: https://trino.io/docs/current/connector/tpch.html[TPCH connector] providing a set of schemas to support the http://www.tpc.org/tpcds/[TPC Benchmark™ DS]
209+
* `warehouse`: The warehouse area containing the enriched and performant accessible data

0 commit comments

Comments
 (0)