Skip to content

Latest commit

 

History

History
171 lines (124 loc) · 12.1 KB

README.md

File metadata and controls

171 lines (124 loc) · 12.1 KB

Kafkesc's kafka-dev-cluster: Kafka Development Cluster

A battery-included BUT development-only pairing of Makefile & docker-compose.yml, designed to quickly spin-up a Kafka cluster for development purposes.

The docker-compose-infra.yml launches:

  • a Kafka cluster made of 3 Brokers
  • a ZooKeeper ensemble, made of a single server

This should be plenty for local development of services and tools, designed to interact and operate against Kafka. Even when you are offline.

For the docker-compose-work.yml, refer to the Workload Generation section below.

Dependencies

Getting started

The project provides a Makefile with super-simple functionalities. It provides single mnemonic devices to manage the cluster lifecycle and do basic operations against Kafka topics.

A good Makefile autocompletion will make things even easier.

Command Arguments Description
init Prepares localhost to launch the cluster
start timeout=SEC Launches the cluster
stop timeout=SEC Shuts down the cluster: remove every resource for the project
restart timeout=SEC Restarts the cluster
kill timeout=SEC Forcefully shuts down the cluster (i.e. SIGKILL)
logs ?service=(zookeeper,kafka-0[1-3]) Tail-follow logs of the running services (default: all services).
ps Docker status of the running services
pull Pull all the Docker images necessary to run the cluster
consume topic=TOPIC,
?offset=(beginning, end, stored, OFFSET, -OFFSET, s@TS, e@TS),
?group=GROUP
Consume from a topic, from a given offset and using a given group.id; use CTRL+C to stop
produce topic=TOPIC,
?key=KEY,
?value=VALUE
Produce to a topic, using a given key/value pair
topic.create topic=TOPIC,
?partitions=PC,
?repfac=RF
Create a new topic
topic.read topic=TOPIC Describe a topic
topic.delete topic=TOPIC Delete a topic
meta.brokers Lists cluster brokers
meta.topics Lists clusters topics
meta.groups Lists clusters consumer groups
workload.setup Creates necessary topics for the producers/consumers to use
workload.start Start docker-compose-work.yml: launch producers (ksunami) and consumers (kcat)
workload.stop Stop docker-compose-work.yml: remove every resource for the project
workload.restart Restart docker-compose-work.yml
workload.kill Forcefully shuts down docker-compose-work.yml (i.e. SIGKILL)
workload.ps Docker status of the running services
workload.logs ?service=(prod-0[1-3],cons-0[1-3][abc]?) Tail-follow logs of the running services (default: all services).
workload.pull Pull all the Docker images necessary to run docker-compose-work.yml

NOTE:

  • ?ARG: the argument is optional
  • SEC: amount of seconds
  • group defaults to kafkesc-devcluster-group-id
  • offset defaults to end - see kcat for more details
    • s@TS: timestamp in milliseconds to start at
    • e@TS: timestamp in milliseconds to end at (not included)
    • OFFSET: integer of the absolute offset of a record
    • -OFFSET: integer of the relative offset of a record from the end
  • key defaults to a random alphanumeric string of 12 characters (ex. K-a21d38311c)
  • value defaults to a random alphanumeric string of 22 characters (ex. V-6bbeba0cf4d0d5c2de36)
  • partitions defaults to 3
    • PC: partitions count for the topic
  • repfac defaults to 3
    • RF: replication factor for the topic

Connecting

Once started, you can connect to the services using:

Configuration strings
Kafka bootstrap brokers localhost:19091,localhost:19092,localhost:19093
ZooKeeper server localhost:2181

Supported environment variables

docker-compose accepts environment variables that will be applied in the configuration, following the rules documented here.

The default values are defined in .env.

Network structure

The docker-compose-infra.yml is in charge of creating a single, default bridge network: kafkesc-devcluster-network.

The docker-compose-work.yml depends on it: it sets as its own default network, the one created by docker-compose-infra.yml, so that producers and consumers can connect to the Kafka brokers.

The containers launched by the -infra docker compose are reachable from your localhost. The ports are mapped like this:

Service localhost client port kafka-devcluster_default client port
zookeeper 2181 2181
kafka-01 19091 9091
kafka-02 19092 9092
kafka-03 19093 9093

The Kafka brokers will communicate with each other using the kafka-0[1-3]:909[1-3]. Instead, to communicate with ZooKeeper, brokers will use zookeeper:2181.

Running commands inside a container

This is probably obvious from section above, but it's good to explicitly call out: when docker exec-uting from inside one of the docker containers, it's possible to address the sibling containers using the Service name as target hostname.

Workload Generation

In addition to the "infrastructure", this project can also spawn a workload. Maybe it's not realistic to use this for benchmarking, but having a producers/consumers in place, that are moving records on the cluster, can be a time saver during development.

This is realised by the docker-compose-work.yml setup. But of course, everything is controlled via Makefile, so you should care about these details only if you want to make changes (or if something is broken).

Producers included in the Workload

Instances of ksunami are set up for a specific producer behaviour:

Produced Topic Name Container name Ksunami Behaviour
workload01 prod-01 spikes once a day; 100x traffic at spike
workload02 prod-02 no traffic for most; massive spike with 10000x traffic for 30s every 5 minutes
workload03 prod-03 pretty stable; min and max phase almost identical

Consumers included in the Workload

Instances of kcat are set up for a specific grouping:

Consumed Topic Consumer Group Container name
workload01 cons-01 cons-01a
workload01 cons-01 cons-01b
workload02 cons-02 cons-02
workload03 cons-03 cons-03a
workload03 cons-03 cons-03b
workload03 cons-03 cons-03c

Please see docker-compose-workload.yml for details.

Storage and Persistence

TODO

At current stage, at shutdown all data is lost.

Maybe this is a chance for Your contribution? 😉 😇

Using offline

It's possible that you might need to use kafka-dev-cluster while offline, or when you simply don't have enough bandwidth to pull hundreds of megabytes of docker images.

For those cases, just use the pull commands provided to pre-fetch all the docker images, before you get going.

License

Apache License 2.0