A comprehensive real-time ecommerce analytics platform built with Confluent Kafka, running locally with Docker Compose and visualising data using streamlit.
Created using eraser.io: https://www.eraser.io/git-diagrammer
- Confluent Kafka: Message streaming platform
- Producer: Generates mock ecommerce events
- Consumer: Processes individual events
- Spark Processor: Real-time stream processing and analytics
- Dashboard: Streamlit-based real-time visualization
- Control Center: Confluent's management interface
kafka-ecommerce-platform/
├── docker-compose.yml # Docker Compose configuration for all services
├── Makefile # MakeFile for project orchestration
├── requirements.txt # Python dependencies
├── .env # Environment variables for development
├── .gitignore
├── docs/
│ ├── README.md # Project README
│ └── architecture.png # Architecture diagram
├── src/
│ ├── config.py # Shared configuration
│ ├── kafka_producer.py # Kafka producer script
│ ├── kafka_consumer.py # Kafka consumer script
│ ├── spark_processor.py # Spark batch processor script
│ ├── dashboard.py # Streamlit dashboard app
│ └── ... # Other source files
├── docker/
│ ├── Dockerfile.producer # Dockerfile for producer
│ └── Dockerfile.consumer # Dockerfile for consumer
├── kafka-connect-jars/ # Custom Kafka Connect plugins
└── venv # You will need to create this venv- Ecommerce event generation (orders, page views, cart actions)
- Stream processing with Apache Spark
- Dashboard with visualisation
- Docker containerization
- Configuration management
- Confluent Control Center monitoring
- Docker Desktop for Mac
- Docker Compose
- Make
-
Clone the repository
git clone <repository-url> cd kafka-ecommerce-platform
-
Start the platform
make up
-
Access services
- Control Center: http://localhost:9021
- Dashboard: http://localhost:8501
# AWS credentials for S3 access (required for consumer, spark, etc.)
AWS_ACCESS_KEY_ID=your-aws-access-key-id
AWS_SECRET_ACCESS_KEY=your-aws-secret-access-key
AWS_REGION=your-aws-region
# Kafka settings
KAFKA_BOOTSTRAP_SERVERS=broker:29092
KAFKA_TOPIC= <topic_name>
# S3 bucket name (if referenced in your code)
S3_BUCKET_NAME= <your_bucket_name>
# (Optional)
LOG_LEVEL=INFO-
Check all services
docker-compose ps -a
-
View logs
make logs
- Generates realistic ecommerce events
- Configurable event rates and patterns
- Users are able to customise this file as they wish for generated mock data
- Streams events to Kafka topics
- Schema Registry integration
- Configurable batch sizes and intervals
- Real-time event processing
- Data is ingested via kafka-connect into an s3 bucket
- Error handling and retry logic
- Stream processing with structured streaming
- Real-time analytics calculations
- Integration with Kafka and Schema Registry
- Interactive charts and graphs
- Auto-refresh capability
- Users are able to download the data which is showcased on the dashboard as a .csv
# For local development (outside Docker)
KAFKA_BOOTSTRAP_SERVERS=localhost:9092
KAFKA_TOPIC=ecommerce-events
# For Docker containers (set in docker-compose.yml)
KAFKA_BOOTSTRAP_SERVERS=broker:29092
KAFKA_TOPIC=ecommerce-eventsIt is recommended to use a Python virtual environment for local development:
# Create and activate a virtual environment
python3 -m venv venv
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt# Build docker images for kafka producer and kafka consumer (if you haven't already)
make build
# Start all services (Kafka broker, producer, consumer)
make up
# Run the spark processor
make processor
# Run the streamlit dashboard
make dashboard
# View logs
make logs
# Stop services
make down
# Clean the setup
make clean
# In one terminal
python src/kafka_producer.py
# In another terminal
python src/kafka_consumer.pyOther than using the makefile to run the spark processor, users are also able to use the command-line arguments to run the spark processor:
spark-submit spark_processor.py --bucket your-bucket --input-prefix kafka-consumer-logs --output-folder kafka-consumer-logs-output
-
To list Kafka topics:
docker exec broker kafka-topics --bootstrap-server localhost:9092 --list -
To view messages in a topic:
docker exec -it broker kafka-console-consumer --bootstrap-server localhost:9092 --topic ecommerce-events --from-beginning
- If you see errors about replication factor or internal topics, you can ignore them for single-broker local development.
- Ensure your Python scripts use
localhost:9092when running outside Docker, andbroker:29092when running inside Docker containers.
-
Exec into the container
docker exec -it ecommerce-consumer bash -
Users are able to run the aws cli and check they are able to connect to aws and list their buckets from within the docker container
apt-get update && apt-get install -y awscli -
Test the connection to AWS
a. Using AWS cli
aws sts get-caller-identity
b. or with boto3
python -c "import boto3, os; s3 = boto3.client('s3'); s3.put_object(Bucket=os.environ['BUCKET_NAME'], Key='test.txt', Body=b'hello world')"
- Fork the repository
- Create a feature branch
- Make changes with tests
- Submit a pull request

