it helps to learn data engineering topics and technologies
- Hadoop
- HDFS (Hadoop Distributed File System)
- map reduce is the processing unit.
- YARN (Yet another Resource Negotiator is a resource management unit)
- hive (It is a data warehouse software project built on top of Apache Hadoop for providing data query and analysis using SQL.)
- Spark (It is an open-source used for large-scale data processing. Spark provides an interface for programming clusters with implicit data parallelism and fault tolerance)
- Cassandra (It is an open-source, distributed, wide-column store, NoSQL database management system designed to handle large amounts of data across many servers, providing high availability with no single point of failure)
- HBase (open-source non-relational distributed database modeled, which run on top of HDFS)
- Scala
- Pyspark
- Kafka