- Clone the repository git clone https://github.com/spicoflorin/trg-assessment.git
- Unzip https://github.com/spicoflorin/trg-assessment/blob/main/resources/data/2019-01.7z in the folder trg-assessment/resources/data
- Execute command ./start-homework.sh
- Step 3 start the Apache Zeppelin notebook server available on port 9080. If the port is not available, please modify the docker-compose.yaml and expose a free port. Go to http://localhost:9080
- In the Zeppelin UI, open the trg-assessment notebook (http://localhost:9080/#/notebook/2GPY2C1BN)
- First paragraph contains the ETL - loads the police data from csv file to rquested parquest file format . Run it with Shift+Enter
- Second paragraph loads the parquet file into Spark Dataframe. Run it.
- The KPIs will be in the next paragraphs,each of them details with their purpose
- After the analysis, you might destroy the stack by calling ./detroy-homework.sh
-
Notifications
You must be signed in to change notification settings - Fork 0
spicoflorin/trg-assessment
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published