|
3 | 3 | [WARNING]
|
4 | 4 | ====
|
5 | 5 | This demos uses significant amount of resources. It will most likely not run on your workstation.
|
6 |
| -It was developed and tested on 9 nodes with 4 cores (8 threads), 20GB RAM and 30GB hdd disks. |
| 6 | +It was developed and tested on 10 nodes with 4 cores (8 threads), 20GB RAM and 30GB hdd disks. |
7 | 7 | Additionally persistent volumes with a total size of approx. 500GB will get created.
|
8 | 8 | A smaller version of this demo might be created in the future.
|
9 | 9 | ====
|
@@ -147,3 +147,63 @@ On the right side are three strands, that
|
147 | 147 | 3. Fetch the current shred bike bike status
|
148 | 148 |
|
149 | 149 | For details on the NiFi workflow ingesting water-level data please read on the xref:demos/nifi-kafka-druid-water-level-data.adoc#_nifi[nifi-kafka-druid-water-level-data documentation on NiFi].
|
| 150 | + |
| 151 | +== Spark |
| 152 | + |
| 153 | +https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html[Spark Structured Streaming] is used to stream data from Kafka into the warehouse. |
| 154 | + |
| 155 | +To have access to the Spark WebUI you need to run the following command to port-forward the Port 4040 to your local machine |
| 156 | + |
| 157 | +[source,console] |
| 158 | +---- |
| 159 | +kubectl port-forward $(kubectl get pod -o name | grep 'spark-ingest-into-warehouse-.*-driver') 4040 |
| 160 | +---- |
| 161 | + |
| 162 | +Afterwards you can reach the Webinterface on http://localhost:4040. |
| 163 | + |
| 164 | +image::demo-data-warehouse-iceberg-trino-spark/spark_1.png[] |
| 165 | + |
| 166 | +On the UI the last jobs are shown. |
| 167 | +Each running Structured Streaming job creates lots of Spark jobs internally. |
| 168 | + |
| 169 | +Click on the tab `Structured Streaming` to see the running streaming jobs. |
| 170 | + |
| 171 | +image::demo-data-warehouse-iceberg-trino-spark/spark_2.png[] |
| 172 | + |
| 173 | +Five streaming jobs are currently running. |
| 174 | +The job with the highest throughput is the `ingest water_level measurements` job. |
| 175 | +Click on the bluely highlighted `Run ID` of it. |
| 176 | + |
| 177 | +image::demo-data-warehouse-iceberg-trino-spark/spark_3.png[] |
| 178 | + |
| 179 | +== Trino |
| 180 | +Trino is used to enable SQL access to the data. |
| 181 | + |
| 182 | +=== View WebUI |
| 183 | +Open up the the given `trino` endpoint `coordinator-https` from your `stackablectl services list` command output (https://212.227.224.138:30876 in this case). |
| 184 | + |
| 185 | +image::demo-data-warehouse-iceberg-trino-spark/trino_1.png[] |
| 186 | + |
| 187 | +Log in with the username `admin` and password `admin`. |
| 188 | + |
| 189 | +image::demo-data-warehouse-iceberg-trino-spark/trino_2.png[] |
| 190 | + |
| 191 | +=== Connect with DBeaver |
| 192 | +https://dbeaver.io/[DBeaver] is free multi-platform database tool that can be used to connect to Trino. |
| 193 | +Please have a look at the <TODO> trino-operator documentation on how to connect DBeaver to Trino. |
| 194 | + |
| 195 | +image::demo-data-warehouse-iceberg-trino-spark/dbeaver_1.png[] |
| 196 | + |
| 197 | +image::demo-data-warehouse-iceberg-trino-spark/dbeaver_2.png[] |
| 198 | +You need to modify the setting `TLS` to `true`. |
| 199 | +Additionally no need to add the setting `SSLVerification` and set it to `NONE`. |
| 200 | + |
| 201 | +image::demo-data-warehouse-iceberg-trino-spark/dbeaver_3.png[] |
| 202 | + |
| 203 | +Here you can see all the available Trino catalogs. |
| 204 | + |
| 205 | +* `staging`: The staging area containing raw data in various data formats such as csv or parquet |
| 206 | +* `system`: Internal catalog to retrieve Trino internals |
| 207 | +* `tpcds`: https://trino.io/docs/current/connector/tpcds.html[TPCDS connector] providing a set of schemas to support the http://www.tpc.org/tpcds/[TPC Benchmark™ DS] |
| 208 | +* `tpch`: https://trino.io/docs/current/connector/tpch.html[TPCH connector] providing a set of schemas to support the http://www.tpc.org/tpcds/[TPC Benchmark™ DS] |
| 209 | +* `warehouse`: The warehouse area containing the enriched and performant accessible data |
0 commit comments