Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
257 changes: 257 additions & 0 deletions _posts/2023-03-26-Kafka-IaC.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,257 @@
---
layout: post
title: "Kafka Infrastructure as Code(IaC)"
author: dasasathyan
categories: [ Platform Engineering, Data, Infrastructure, Kafka ]
featured: true
hidden: true
teaser: Provision Kafka Infrastructure with Juli-Ops, Pulumi, Terraform
toc: true
---

# Infrastructure as Code(IaC)

Organizations require automated infrastructure provisioning, but the tools used should not impose too many restrictions on development teams. Developers typically do their best work with a high degree of autonomy. Managing infrastructure is not an easy task, and any missteps can worsen the situation. Replicating configurations for additional clusters is also a tedious process. This is where Infrastructure as Code (IaC) comes in handy. By using source code, IaC helps to provision the same infrastructure across environments. The advantages of IaC include faster infrastructure provisioning, reduced risk of human errors, idempotency, fewer configuration steps, and the elimination of configuration drift.

A few of the IaC tools are Terraform, Pulumi etc.

Apache Kafka is a real-time data streaming technology capable of handling trillions of events. It is a distributed system with servers and clients that communicate via a TCP network protocol. A couple of terminologies to keep in mind are:

1. Brokers - Brokers are servers in Kafka that store event streams from various sources. A Kafka cluster is typically comprised of several brokers. Every broker in a cluster is also a bootstrap server, meaning if you can connect to one broker in a cluster, you can connect to every broker.

2. Topics - The data is written by many processes called produces and the same are read by consumers. The data are partitioned into different partitions called topics. Kafka runs on a cluster of one or more servers called brokers and the partitions are distributed across the cluster.

3. Kafka Connect - Messages can be copied to and from external applications and data systems with Kafka Connect. There are 2 different types of connectors. They are Source connector and Sink Connectors.

4. Schema Registry - Schema Registry is a centralized repository that facilitates the management and validation of schemas for messages in Kafka topics. With Schema registry, producers and consumers of Kafka topics can ensure that data is consistent and compatible as schemas evolve over time.

5. Kafka Streams - Kafka Streams library provides real-time stream processing capabilities, built on top of the Kafka producer and consumer APIs. It is used to perform real-time data processing, apply various transformations, perform powerful aggregations on the messages.

6. kSql - Similar to K Streams, KSQL is used to perform filtering, aggregations, joins, and windowing operations, and generate real-time analytics and data transformations against Kafka Topics with SQL like interface.

All the above-mentioned infrastructure like Topics, Connectors etc can be configured with IaC tools like julie-ops, terraform, pulumi.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to talk more about why we are comparing JulieOps, Terraform and Pulumi. Also some words about the basis for our comparison


## JulieOps

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JulieOps is not a programming tool.

Please talk something useful about JulieOps. "Internally from confluent and external to confluent" -- is a complete wordsoup. Like who the heck is the guy?

How long has the project been around? How many stars on github?
who else uses it?
Is it mature? Would you recommend it? What is your subjective assessment? What are the advantages and disadvantages?

JulieOps, formally known as Kafka Topology Builder, is an open source project licensed under MIT License. It has got over 350+ stars on github. It is a tool designed to simplify the process of configuring topics, role-based access control (RBAC), Schema Registry, and other components. JulieOps is based on declarative programming principles, which means that developers can specify what is needed, and the tool takes care of the implementation details. The interface of JulieOps is a YAML file, which is known for its user-friendliness and straightforwardness. With JulieOps, developers can easily describe their configuration requirements and delegate the rest of the work to the tool.

[julie-ops][julie-ops] tool helps us to provision Kafka-related tasks in Confluent Cloud Infrastructure as a code.
The related tasks are usually [Topics][Topics], [Access Control][Access Control], [Handling schemas][Handling schemas],
[ksql artifacts][ksql artifacts] etc.
All these tasks are configured as [topologies][topologies] in julie-ops.

### Pre-Requisites

- You need julie-ops installed locally or in docker
- Topologies
- Write the following configurations to a `.properties` file to connect to Kafka cluster:
```
bootstrap.servers="<BOOTSTRAP_SERVER_URL>"
security.protocol=SASL_SSL
sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="<SASL_USERNAME>" password="<SASL_PASSWORD>";
ssl.endpoint.identification.algorithm=https
sasl.mechanism=PLAIN
# Required for correctness in Apache Kafka clients prior to 2.6
client.dns.lookup=use_all_dns_ips
# Confluent Cloud Schema Registry
schema.registry.url="<SCHEMA_REGISTRY_URL>"
basic.auth.credentials.source=USER_INFO
schema.registry.basic.auth.user.info="<SCHEMA_REGISTRY_API_KEY>":"<SCHEMA_REGISTRY_API_SECRET>"
```

### How to run

```
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Talk about the things to keep in mind (or nuances) while defining the topology. I am assuming topology is the one which defines the resources to be provisioned.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, this is a useless section that talks about what the JulieOps examples already have. You should atleast talk through some concepts (such as the declarative topology) and why they are useful

julie-ops --broker <BROKERS> --clientConfig <PROPERTIES_FILE> --topology <TOPOLOGY_FILE>
```

Once the run is completed without any errors a successful run will look like

```
log4j:WARN No appenders could be found for logger (org.apache.kafka.clients.admin.AdminClientConfig).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
List of Topics:
<topics that are created>
List of ACLs:
<acls that are created>
List of Principles:
List of Connectors:
List of KSQL Artifacts:
Kafka Topology updated
```

Want a quick start? checkout our sample JulieOps repo in [here].

[julie-ops]: https://julieops.readthedocs.io/en/latest/#
[Topics]: https://julieops.readthedocs.io/en/latest/futures/what-topic-management.html
[Handling schemas]: https://julieops.readthedocs.io/en/latest/futures/what-schema-management.html
[Access Control]: https://julieops.readthedocs.io/en/latest/futures/what-acl-management.html
[ksql artifacts]: https://julieops.readthedocs.io/en/latest/futures/what-ksql-management.html
[topologies]: https://julieops.readthedocs.io/en/latest/the-descriptor-files.html?highlight=topology
[here]: https://github.com/Platformatory/kafka-cd-julie

## Pulumi

Selecting the appropriate Infrastructure as Code (IaC) tool is crucial, as each tool has its own advantages and disadvantages. As discussed earlier, IaC helps in automating infrastructure provisioning and eliminates the possibility of human errors. In this section, we will be using Pulumi to provision Confluent Cloud Topics and Connectors. While Pulumi supports various programming languages, such as Python, Typescript, Go, C#, Java, and YAML, we will be using Typescript in this blog post. By utilizing Pulumi, we can automate the deployment process and achieve faster and more reliable infrastructure provisioning. The provider is developed utilizing the official Terraform Provider from ConfluentInc and is accessible across multiple languages and platforms.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again. Very generic. You need to talk about the Pulumi provider for Kafka. How does it work, advantages/disadv, maturity etc (same as JulieOps);

### Provisioning Kafka Topics

There are a couple of mandatory configs that are needed to create Kafka Topics. To begin with we need cluster arguments on which the infrastructure needs to be provisioned.

```
let clusterArgs: KafkaTopicKafkaCluster = {
id: cluster_id,
};
```

Then, we will be needing the Kafka credentials the API Key & API Secret.

```
let clusterCredentials: KafkaTopicCredentials = {
key: kafka_api_key,
secret: kafka_api_secret,
};
```
Then comes the topic configs. There are various Kafka topic configurations. Read about them [here](https://kafka.apache.org/documentation/#topicconfigs)

```
let topic_args: KafkaTopicArgs = {
kafkaCluster: clusterArgs,
topicName: topicName.toLowerCase(),
restEndpoint: rest_endpoint,
credentials: clusterCredentials,
config: {
["retention.ms"]: "-1",
["retention.bytes"]: "-1",
["num.partitions"]: "6",
},
};
```

Finally the creation of topics
```
const topics = new confluent.KafkaTopic(
topicNames[i].toLowerCase(),
topic_args
);
```

Save the above file to an `index.ts` and set the confluent cloud cluster credentials using `pulumi config set confluentcloud:cloudApiKey <cloud api key> --secret && pulumi config set confluentcloud:cloudApiSecret <cloud api secret> --secret`. It is important to pass the `--secret` flag to the config else the secrets will not be masked on the Pulumi infrastructure config files. On setting the credentials, pulumi will prompt for a stack to be selected. Select the stack if it already exists, else create a new stack.

Once the credentials are set run `pulumi up` command to provision the topics in confluent cloud.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can highlight the difference among the different IaC tools with respect to configurations, defining resources, storing state etc.

### Provisioning Kafka Connectors

Kafka Connect allows for the seamless integration of messages from Kafka topics with external applications and data systems. Connectors come in two types: Source connectors and Sink connectors.
Source connectors - The connector that takes data from a Producer and feeds them into a topic is called a Source connector.
Sink connectors. The connector that takes data from a Topic and delivers them to a Consumer is called a Sink Connector.

Let’s provision a Kafka Sink Connector that writes data from a Kafka topic to an Azure Data Lake Storage(ADLS)

The mandatory configs for provisioning a Kafka Connector are
The name of the resource
Environment
Cluster
And a couple of connector-specific configs like connector class, source topics etc.

The configs can contain secrets like passwords, api keys, tokens. They are by default masked by Pulumi. Those configs have to be under `configSensitive` block in config and non sensitive configs have to be under `configNonsensitive` block.

```
let connector_args: confluent.ConnectorArgs = {
configNonsensitive: {
["connector.class"]: "AzureDataLakeGen2Sink", # The class of the Connector. List of [supported connectors](https://docs.confluent.io/cloud/current/connectors/index.html#supported-connectors)
["name"]: "Connector Name",
["kafka.auth.mode"]: "KAFKA_API_KEY",
["topics"]: topicNames,
["input.data.format"]: "JSON",
["output.data.format"]: "JSON",
["time.interval"]: "HOURLY",
["tasks.max"]: "2",
["flush.size"]: "1000",
["rotate.schedule.interval.ms"]: "3600000",
["rotate.interval.ms"]: "3600000",
["path.format"]: "'year'=YYYY/'month'=MM/'day'=dd/'hour'=HH",
["topics.dir"]: "<Directory in ADLS>",
},
configSensitive: {
["kafka.api.key"]: kafka_api_key,
["kafka.api.secret"]: kafka_api_secret,
["azure.datalake.gen2.account.name"]: azure_data_lake_account_name,
["azure.datalake.gen2.access.key"]: azure_data_lake_access_key,
},
environment: cluster_environment,
kafkaCluster: cluster,
};

new confluent.Connector("pulumi-connector", connector_args);
```

If the Confluent Cloud cluster credentials are already set up, directly go ahead and run `pulumi up` command to provision the topics in the confluent cloud. If the Confluent Cloud cluster credentials aren’t set up, follow the steps from the topic provisioning and set them up.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confluent has an official terraform registry for Confluent cloud but not Confluent platform. Like that, please talk a little about which IaC tool is right for which use case

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to talk about Kafka components (and not just Connect); Connect is just one example.

## Terraform

Terraform is a widely-used IaC tool that supports a range of cloud, datacenter, and service providers. It is capable of provisioning infrastructure for popular cloud platforms such as Azure, AWS, Oracle, Google, and Kubernetes orchestration. A list of supported providers can be found on the official Terraform Registry. It's worth noting that the Pulumi provider is built on top of the Confluent Terraform Provider. Terraform uses the HashiCorp Configuration Language (HCL) to describe and provision infrastructure.

### Provisioning Topics

First, Initialize the provider with

```
terraform {
required_providers {
confluent = {
source = "confluentinc/confluent"
version = "1.13.0"
}
}
}
```
This installs the confluent cloud provider.

Configure the confluent secrets with

```
provider "confluent" {
cloud_api_key = var.confluent_cloud_api_key # optionally use CONFLUENT_CLOUD_API_KEY env var
cloud_api_secret = var.confluent_cloud_api_secret # optionally use CONFLUENT_CLOUD_API_SECRET env var
}
```

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think terraform apply needs to be done for the changes to apply.

Once the Confluent provider and the configurations are set up, run `terraform init` and `terraform apply`.

```
resource "confluent_kafka_topic" "dev_topics" {
kafka_cluster {
id = var.cluster_id
}
for_each = toset(var.topics)
topic_name = each.value
rest_endpoint = data.confluent_kafka_cluster.dev_cluster.rest_endpoint
partitions_count = 6
config = {
"retention.ms" = "604800000"
}
credentials {
key = var.api_key
secret = var.api_secret
}
}
```

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conclusion on the three different IaC and suggestions on when to use them. A scenario where one of them is more appropriate than others

| Features | Julie-Ops | Terraform | Pulumi |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add

  • maturity (github stars, adopters)
  • differentiators

| -------- | --------- | -------- | -------- |
| Language Support | YAML | HashiCorp Configuration Language (HCL) |Python, TypeScript, JavaScript, Go, C#, F#, Java, YAML |
| Supported Resources to Provision | Topics, RBACs (for Kafka Consumers, Kafka Producers, Kafka Connect, Kafka Streams applications ( microservices ), KSQL applications, Schema Registry instances, Confluent Control Center, KSQL server instances), Schemas, ACLs | confluent_api_key, confluent_byok_key, confluent_cluster_link, confluent_connector, confluent_environment, confluent_identity_pool, confluent_identity_provider, confluent_invitation, confluent_kafka_acl, confluent_kafka_client_quota, confluent_kafka_cluster, confluent_kafka_cluster_config, confluent_kafka_mirror_topic, confluent_kafka_topic, confluent_ksql_cluster, confluent_network, confluent_peering, confluent_private_link_access, confluent_role_binding, confluent_schema, confluent_schema_registry_cluster, confluent_schema_registry_cluster_config, confluent_schema_registry_cluster_mode, confluent_service_account, confluent_subject_config, confluent_subject_mode, confluent_transit_gateway_attachment | ApiKey, ByokKey, ClusterLink, Connector, Environment, IdentityPool, IdentityProvider, Invitation, KafkaAcl, KafkaClientQuota, KafkaCluster, KafkaClusterConfig, KafkaMirrorTopic, KafkaTopic, KsqlCluster, Network, Peering, PrivateLinkAccess, Provider, RoleBinding, Schema, SchemaRegistryCluster, SchemaRegistryClusterConfig, SchemaRegistryClusterMode, ServiceAccount, SubjectConfig, SubjectMode, TransitGatewayAttachment |
| Import code from other IaC tools | No | No | Yes |
| Secrets Encryption | Secrets are retrieved from `.properties` file | Secrets are stored in Vault and aren’t encrypted in the state file. | Secrets are encrypted. |
| Open Sourced | Yes | Yes | Yes |
| Github Stars | 350+ | 81 | 6 |
| State Store | Stored in `.cluster-state` file | Stored in `.tfstate` file or Backend of user's choice | Managed by Pulumi Service and Backend of user's choice |

# Conclusion

Although Terraform remains the dominant Infrastructure as Code (IaC) tool in the industry, Pulumi is rapidly gaining traction. Each tool has its strengths and weaknesses, with Terraform being more established and providing a wider range of resources, while Pulumi is renowned for its ease of use and its growing community, which is continuously improving its functionality. A person who possesses coding experience but lacks familiarity with infrastructure as code tools may find Pulumi to be a more straightforward or fascinating tool to start with due to its support for multiple programming languages, including Python, TypeScript, JavaScript, Go, C#, F#, Java, and YAML.

A suitable tool ultimately depends on specific needs, such as prioritizing stability and access to a vast resource and knowledge base, in which case Terraform may be the superior option, or prioritizing efficiency and the ability to use a familiar language, in which case Pulumi might be the ideal solution. Regardless of the chosen tool, both can effectively help in managing infrastructure code.