Using the transactions in a supermarket to calculate the profit using MapReduction and Apache Spark
Steps:
-
Install Apache Spark and build it using sbt, Maven, etc. This project can be run in standalone mode or in a cluster. In my project, Apache Spark was built on sbt.
https://www.youtube.com/watch?v=eQ0nPdfVfc0.
Familarizing Apache Spark can be done by using video tutorials in YouTube
-
Create databases of transactions for a day. To obtain large number of random transactions, the ranodmize module in python can be used inside a loop.
-
Calculate profit for the day and append it to a separate database. The profit is calculated by submitting the python file to Spark for faster processing.
-
Repeat steps 2 and 3 in a loop for the number of days required.
-
Plot the graph
Reference links:
-
Apache Spark tutorials: https://www.tutorialspoint.com/apache_spark/
-
Map Reduction tutorials: https://www.tutorialspoint.com/map_reduce/
-
Map Reduction tutorials: http://www.journaldev.com/8848/mapreduce-algorithm-example