Skip to content

srikanth-iyengar/datahive

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Datahive

Datahive is an ingenious, configuration-driven end-to-end data pipeline solution that simplifies the complexities of managing data workflows. Harnessing the power of Kafka, Hadoop, Apache Spark, Elasticsearch, Kibana, and an intuitive UI, Datahive empowers users to effortlessly manage and monitor their data stacks.

Features

  • 🚀 Streamlined data pipeline setup
  • ☕ Automated data processing while you enjoy your coffee
  • 📊 Utilizes Kafka, Hadoop, Apache Spark, Elasticsearch, and Kibana
  • 🛠️ Easy configuration through YAML files
  • 🔄 Supports both stream and batch processing

How it Works

Define your data pipeline effortlessly using a simple YAML configuration file. Specify input and output schemas for each service, and let Datahive handle the rest. Below is a sample configuration for stream processing:

type: stream
kafka:
    - inTopic: <your-topic-name>
      outTopic: <your-topic-name>
      hdfs: false
      transform: | 
        def transform(record) {
            def jsonObject = record
            // do your transformation logic in a groovy script
            return jsonObject
        }
    - inTopic: <your-topic-name>
      hdfsFileName: <your-hdfs-filename>
      hdfs: true

spark:
    - app-resource: <path-for-your-spark-build-file>
      driver.memory: 1g
      executor.memory: 2g
    - app-resource: <path-for-your-second-spark-build-file>
      driver-memory: 1g
      executor-memory: 2g
      res-location: <path-for-the-spark-job-code>
      main-class: <main-class-of-your-spark-job>
      job-name: <name-of-your-job>

elasticsearch:
    - 

kibana:
    dashboard-config:

Getting Started

  1. Clone the repository.
  2. Install the required dependencies.
  3. Configure Datahive using the provided YAML files.
  4. Run the application.

Screenshots

Home Page Features
image image
Highlights Login Page
image image
Dashboard WorkerStats
image image
Datahive Stack Stats Alerts
image image