Kafka Ecommerce Analytics Platform

A comprehensive real-time ecommerce analytics platform built with Confluent Kafka, running locally with Docker Compose and visualising data using streamlit.

Architecture Diagram

Created using eraser.io: https://www.eraser.io/git-diagrammer

Architecture Components

Confluent Kafka: Message streaming platform
Producer: Generates mock ecommerce events
Consumer: Processes individual events
Spark Processor: Real-time stream processing and analytics
Dashboard: Streamlit-based real-time visualization
Control Center: Confluent's management interface

Project Structure

kafka-ecommerce-platform/
├── docker-compose.yml              # Docker Compose configuration for all services
├── Makefile                        # MakeFile for project orchestration
├── requirements.txt                # Python dependencies
├── .env                            # Environment variables for development
├── .gitignore                      
├── docs/
│   ├── README.md                   # Project README
│   └── architecture.png            # Architecture diagram
├── src/
│   ├── config.py                   # Shared configuration
│   ├── kafka_producer.py           # Kafka producer script
│   ├── kafka_consumer.py           # Kafka consumer script
│   ├── spark_processor.py          # Spark batch processor script
│   ├── dashboard.py                # Streamlit dashboard app
│   └── ...                         # Other source files
├── docker/
│   ├── Dockerfile.producer         # Dockerfile for producer
│   └── Dockerfile.consumer         # Dockerfile for consumer
├── kafka-connect-jars/             # Custom Kafka Connect plugins
└── venv                            # You will need to create this venv

Features

Ecommerce event generation (orders, page views, cart actions)
Stream processing with Apache Spark
Dashboard with visualisation
Docker containerization
Configuration management
Confluent Control Center monitoring

Quick Start

Prerequisites

Docker Desktop for Mac
Docker Compose
Make

Setup

Clone the repository

git clone <repository-url>
cd kafka-ecommerce-platform

Start the platform
```
make up
```
Access services
- Control Center: http://localhost:9021
- Dashboard: http://localhost:8501

.env file setup

# AWS credentials for S3 access (required for consumer, spark, etc.)
AWS_ACCESS_KEY_ID=your-aws-access-key-id
AWS_SECRET_ACCESS_KEY=your-aws-secret-access-key
AWS_REGION=your-aws-region

# Kafka settings
KAFKA_BOOTSTRAP_SERVERS=broker:29092
KAFKA_TOPIC= <topic_name>

# S3 bucket name (if referenced in your code)
S3_BUCKET_NAME= <your_bucket_name>

# (Optional)
LOG_LEVEL=INFO

Component Status

Check all services
```
docker-compose ps -a
```
View logs
```
make logs
```

Components

Data Generator

Generates realistic ecommerce events
Configurable event rates and patterns
Users are able to customise this file as they wish for generated mock data

Kafka Producer

Streams events to Kafka topics
Schema Registry integration
Configurable batch sizes and intervals

Kafka Consumer

Real-time event processing
Data is ingested via kafka-connect into an s3 bucket
Error handling and retry logic

Spark Processor

Stream processing with structured streaming
Real-time analytics calculations
Integration with Kafka and Schema Registry

Dashboard

Interactive charts and graphs
Auto-refresh capability
Users are able to download the data which is showcased on the dashboard as a .csv

Dashboard Example

Configuration

Environment Variables

# For local development (outside Docker)
KAFKA_BOOTSTRAP_SERVERS=localhost:9092
KAFKA_TOPIC=ecommerce-events

# For Docker containers (set in docker-compose.yml)
KAFKA_BOOTSTRAP_SERVERS=broker:29092
KAFKA_TOPIC=ecommerce-events

Python Virtual Environment Setup

It is recommended to use a Python virtual environment for local development:

# Create and activate a virtual environment
python3 -m venv venv
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

Development

Local Development

# Build docker images for kafka producer and kafka consumer (if you haven't already)
make build

# Start all services (Kafka broker, producer, consumer)
make up

# Run the spark processor
make processor

# Run the streamlit dashboard
make dashboard 

# View logs
make logs

# Stop services
make down

# Clean the setup 
make clean

Running Producer and Consumer Locally

# In one terminal
python src/kafka_producer.py

# In another terminal
python src/kafka_consumer.py

Run Spark Processor

Other than using the makefile to run the spark processor, users are also able to use the command-line arguments to run the spark processor:

spark-submit spark_processor.py --bucket your-bucket --input-prefix kafka-consumer-logs --output-folder kafka-consumer-logs-output

Monitoring

To list Kafka topics:

docker exec broker kafka-topics --bootstrap-server localhost:9092 --list

To view messages in a topic:

docker exec -it broker kafka-console-consumer --bootstrap-server localhost:9092 --topic ecommerce-events --from-beginning

Troubleshooting

If you see errors about replication factor or internal topics, you can ignore them for single-broker local development.
Ensure your Python scripts use localhost:9092 when running outside Docker, and broker:29092 when running inside Docker containers.

Further Troubleshooting Steps

Exec into the container
```
docker exec -it ecommerce-consumer bash
```
Users are able to run the aws cli and check they are able to connect to aws and list their buckets from within the docker container
```
apt-get update && apt-get install -y awscli
```

Test the connection to AWS

a. Using AWS cli

aws sts get-caller-identity

b. or with boto3

python -c "import boto3, os; s3 = boto3.client('s3'); s3.put_object(Bucket=os.environ['BUCKET_NAME'], Key='test.txt', Body=b'hello world')"

Contributing

Fork the repository
Create a feature branch
Make changes with tests
Submit a pull request

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kafka Ecommerce Analytics Platform

Architecture Diagram

Architecture Components

Project Structure

Features

Quick Start

Prerequisites

Setup

.env file setup

Component Status

Components

Data Generator

Kafka Producer

Kafka Consumer

Spark Processor

Dashboard

Dashboard Example

Configuration

Environment Variables

Python Virtual Environment Setup

Development

Local Development

Running Producer and Consumer Locally

Run Spark Processor

Monitoring

Troubleshooting

Further Troubleshooting Steps

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
docker		docker
docs		docs
kafka-connect-jars		kafka-connect-jars
src		src
.gitignore		.gitignore
Makefile		Makefile
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Kafka Ecommerce Analytics Platform

Architecture Diagram

Architecture Components

Project Structure

Features

Quick Start

Prerequisites

Setup

.env file setup

Component Status

Components

Data Generator

Kafka Producer

Kafka Consumer

Spark Processor

Dashboard

Dashboard Example

Configuration

Environment Variables

Python Virtual Environment Setup

Development

Local Development

Running Producer and Consumer Locally

Run Spark Processor

Monitoring

Troubleshooting

Further Troubleshooting Steps

Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages