This repo the code for an AWS Lambda function that receives JSON events from an HTTP POST request and sends them to a Kafka topic, to be eventually ingested into a data lake. The function is written in Rust, using the AWS Lambda Rust Runtime and is meant to be deployed with the provided.al2023 OS-only runtime.
This was released as part of the "Real-time data lakes with the LOAD stack" blog post, which shows how to use this as a component, along with Arroyo and DuckDB to build a simple and cost-effective near-real-time data lake.
Building this lambda is somewhat complex, as it uses the librdkafka C library under the hood. We've provided a Dockerfile to make the build process easier, particularly on non-Linux systems.
To build the zip file, you can use the provided build script:
./build.sh
which will produce a lambda.zip
file in the current directory.
This function relies on several pieces of configuration, using AWS SSM (for secrets) and environment variables (for everything else):
Configuration Option | Type | Description |
---|---|---|
KAFKA_BROKERS |
Required | Specifies the Kafka brokers to connect to, provided as a comma-separated list of broker addresses. |
KAFKA_TOPIC |
Required | Defines the Kafka topic where messages will be published. |
KAFKA_USERNAME_PARAM |
Optional | Specifies the SSM parameter name containing the Kafka username for SASL authentication. |
KAFKA_PASSWORD_PARAM |
Optional | Specifies the SSM parameter name containing the Kafka password for SASL authentication. |
SECURITY_PROTOCOL |
Optional | Specifies the security.protocol used to construct the Kafka producer |
SASL_MECHANISMS |
Optional | Specifies the sasl.mechanisms used to construct the Kafka producer |
For a full guide to deploying the lambda, see the instructions here