This project sets up a real-time Kafka & Flink pipeline to categorize customer data based on age.
- Customers under 25 → Sent to
target-audience-topic - Customers 25 and older → Sent to
old-folks-topic - Raw data remains unchanged in
customer-input
- Docker & Docker Compose installed
- Python 3.7+ with
pip install kafka-python apache-flink
docker-compose up -d- Kafka Broker:
localhost:9092 - Flink UI: http://localhost:8081
docker exec -it kafka kafka-topics --create --topic customer-input --bootstrap-server localhost:9092 --partitions 1 --replication-factor 1
docker exec -it kafka kafka-topics --create --topic target-audience-topic --bootstrap-server localhost:9092 --partitions 1 --replication-factor 1
docker exec -it kafka kafka-topics --create --topic old-folks-topic --bootstrap-server localhost:9092 --partitions 1 --replication-factor 1Run the producer:
python3 producer.pydocker cp flink_job.py jobmanager:/flink_job.py
docker exec -it jobmanager flink run -py /flink_job.pypython3 consumer-target.pypython3 consumer-old.py To stop services:
docker-compose downTo remove Kafka topics:
docker exec -it kafka kafka-topics --delete --topic customer-input --bootstrap-server localhost:9092
docker exec -it kafka kafka-topics --delete --topic target-audience-topic --bootstrap-server localhost:9092
docker exec -it kafka kafka-topics --delete --topic old-folks-topic --bootstrap-server localhost:9092✅ Kafka Producer sends customer data to customer-input
✅ Flink categorizes data and sends it to 2 different topics
✅ Kafka Consumers read the processed messages in real-time
🚀 Now you have a real-time streaming pipeline with Kafka & Flink! 🎉