Skip to content
This repository was archived by the owner on Feb 16, 2024. It is now read-only.

Commit 64ca156

Browse files
committed
Add MinIO and NiFi to docs
1 parent e70a65f commit 64ca156

File tree

9 files changed

+111
-0
lines changed

9 files changed

+111
-0
lines changed
Loading
Loading
Loading
Loading
Loading
Loading
Loading
Loading

Diff for: docs/modules/ROOT/pages/demos/data-warehouse-iceberg-trino-spark.adoc

+111
Original file line numberDiff line numberDiff line change
@@ -36,3 +36,114 @@ This demo will
3636
You can see the deployed products as well as their relationship in the following diagram:
3737

3838
image::demo-data-warehouse-iceberg-trino-spark/overview.png[]
39+
40+
== List deployed Stackable services
41+
To list the installed installed Stackable services run the following command:
42+
43+
[source,console]
44+
----
45+
$ stackablectl services list --all-namespaces
46+
PRODUCT NAME NAMESPACE ENDPOINTS EXTRA INFOS
47+
48+
hive hive default hive 212.227.224.138:31022
49+
metrics 212.227.224.138:30459
50+
51+
hive hive-iceberg default hive 212.227.233.131:31511
52+
metrics 212.227.233.131:30003
53+
54+
kafka kafka default metrics 217.160.118.190:32160
55+
kafka 217.160.118.190:31736
56+
57+
nifi nifi default https https://217.160.120.117:31499 Admin user: admin, password: adminadmin
58+
59+
opa opa default http http://217.160.222.211:31767
60+
61+
superset superset default external-superset http://212.227.233.47:32393 Admin user: admin, password: admin
62+
63+
trino trino default coordinator-metrics 212.227.224.138:30610
64+
coordinator-https https://212.227.224.138:30876
65+
66+
zookeeper zookeeper default zk 212.227.224.138:32321
67+
68+
minio minio default http http://217.160.222.211:32031 Third party service
69+
console-http http://217.160.222.211:31429 Admin user: admin, password: adminadmin
70+
----
71+
72+
[NOTE]
73+
====
74+
When a product instance has not finished starting yet, the service will have no endpoint.
75+
Starting all of the product instances might take a considerable amount of time depending on your internet connectivity.
76+
In case the product is not ready yet a warning might be shown.
77+
====
78+
79+
== MinIO
80+
=== List buckets
81+
The S3 provided by MinIO is used as persistent storage to store all the data used.
82+
Open the `minio` endpoint `console-http` retrieved by `stackablectl services list` in your browser (http://217.160.222.211:31429 in this case).
83+
84+
image::demo-data-warehouse-iceberg-trino-spark/minio_1.png[]
85+
86+
Log in with the username `admin` and password `adminadmin`.
87+
88+
image::demo-data-warehouse-iceberg-trino-spark/minio_2.png[]
89+
90+
Here you can see the two buckets the S3 is split into:
91+
92+
1. `staging`: The demo loads static datasets into this area. It is stored in different formats, such as csv and parquet. It does contain actual data tables as well as lookup tables.
93+
2. `warehouse`: This bucket is where the cleaned and/or aggregated data resides. The data is stored in the https://iceberg.apache.org/[Apache Iceberg] table format.
94+
95+
=== Inspect warehouse
96+
Click on the blue button `Browse` on the bucket `warehouse`.
97+
98+
image::demo-data-warehouse-iceberg-trino-spark/minio_3.png[]
99+
100+
You can see multiple folders (called prefixes in S3) - each containing a different dataset.
101+
102+
Click on the folders `house-sales` afterwards the folder starting with `house-sales-*` afterwards 'data'.
103+
104+
image::demo-data-warehouse-iceberg-trino-spark/minio_4.png[]
105+
106+
As you can see the table `house-sales` is partitioned by day.
107+
Go ahead and click on any folder.
108+
109+
image::demo-data-warehouse-iceberg-trino-spark/minio_5.png[]
110+
111+
You can see that Trino has placed a single file here containing all the house sales of that particular year.
112+
113+
== NiFi
114+
115+
NiFi is used to fetch multiple datasources from the internet and ingest it into Kafka near-realtime.
116+
Some of the datasources are statically download (e.g. as csv) and others are fetch via APIs such as REST APIs.
117+
This includes the following datasources:
118+
119+
* https://www.pegelonline.wsv.de/webservice/guideRestapi[Water level measurements in Germany] (real-time)
120+
* https://mobidata-bw.de/dataset/bikesh[Shared bikes in Germany] (real-time)
121+
* https://www.gov.uk/government/statistical-data-sets/price-paid-data-downloads[House sales in UK] (static)
122+
* https://www.usgs.gov/programs/earthquake-hazards/earthquakes[Registered earthquakes worldwide] (static)
123+
* https://mobidata-bw.de/dataset/e-ladesaulen[E-charging stations in Germany] (static)
124+
* https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page[NewYork taxi data] (static)
125+
126+
=== View ingestion jobs
127+
You can have a look at the ingestion job running in NiFi by opening the given `nifi` endpoint `https` from your `stackablectl services list` command output (https://217.160.120.117:31499 in this case).
128+
If you get a warning regarding the self-signed certificate generated by the xref:secret-operator::index.adoc[Secret Operator] (e.g. `Warning: Potential Security Risk Ahead`), you have to tell your browser to trust the website and continue.
129+
130+
image::demo-data-warehouse-iceberg-trino-spark/nifi_1.png[]
131+
132+
Log in with the username `admin` and password `adminadmin`.
133+
134+
image::demo-data-warehouse-iceberg-trino-spark/nifi_2.png[]
135+
136+
As you can see, the NiFi workflow consists of lot's of components.
137+
You can zoom in by using your mouse and mouse wheel.
138+
On the left side are two strands, that
139+
140+
1. Fetch the list of known water-level stations and ingests them into Kafka
141+
2. Continuously run a loop fetching the measurements of the last 30 for every measuring stations and ingesting the measurements into Kafka
142+
143+
On the right side are three strands, that
144+
145+
1. Fetch the current shred bike stations information
146+
2. Fetch the current shred bike stations status
147+
3. Fetch the current shred bike bike status
148+
149+
For details on the NiFi workflow ingesting water-level data please read on the xref:demos/nifi-kafka-druid-water-level-data.adoc#_nifi[nifi-kafka-druid-water-level-data documentation on NiFi].

0 commit comments

Comments
 (0)