diff --git a/CHANGELOG.md b/CHANGELOG.md index 57ca4cef89..f92623bad1 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -15,27 +15,50 @@ All notable changes to this project will be documented in this file. ## 4.45.0 - TBD +### Fixed + +- The `code` and `file` fields on the `javascript` processor docs no longer erroneously mention interpolation support. (@mihaitodor) +- The `postgres_cdc` now correctly handles `null` values. (@rockwotj) +- The `redpanda_migrator` output no longer rejects messages if it can't perform schema ID translation. (@mihaitodor) +- The `redpanda_migrator` input no longer converts the kafka key to string. (@mihaitodor) + ### Added -- `aws_sqs` now has a `max_outstanding` field to prevent unbounded memory usage. (@rockwotj) +- `aws_sqs` input now has a `max_outstanding` field to prevent unbounded memory usage. (@rockwotj) - `avro` scanner now emits metadata for the Avro schema it used along with the schema fingerprint. (@rockwotj) - Field `content_type` added to the `amqp_1` output. (@timo102) -- `kafka_franz`, `ockam_kafka`, `redpanda`, `redpanda_common`, `redpanda_migrator` now support `fetch_max_wait` configuration field. -- `snowpipe_streaming` now supports interpolating table names. (@rockwotj) -- `snowpipe_streaming` now supports interpolating channel names. (@rockwotj) -- `snowpipe_streaming` now supports exactly once delivery using `offset_token`. (@rockwotj) -- `ollama_chat` now supports tool calling. (@rockwotj) -- New `ollama_moderation` which allows using LlamaGuard or ShieldGemma to check if LLM responses are safe. (@rockwotj) +- Field `fetch_max_wait` added to the `kafka_franz`, `ockam_kafka`, `redpanda`, `redpanda_common` and `redpanda_migrator` inputs. (@birdayz) +- `snowpipe_streaming` output now supports interpolating table names. (@rockwotj) +- `snowpipe_streaming` output now supports interpolating channel names. (@rockwotj) +- `snowpipe_streaming` output now supports exactly once delivery using `offset_token`. (@rockwotj) +- `ollama_chat` processor now supports tool calling. (@rockwotj) +- New `ollama_moderation` processor which allows using LlamaGuard or ShieldGemma to check if LLM responses are safe. (@rockwotj) - Field `queries` added to `sql_raw` processor and output to support rummong multiple SQL statements transactionally. (@rockwotj) +- New `redpanda_migrator_offsets` input. (@mihaitodor) +- Fields `offset_topic`, `offset_group`, `offset_partition`, `offset_commit_timestamp` and `offset_metadata` added to the `redpanda_migrator_offsets` output. (@mihaitodor) +- Field `topic_lag_refresh_period` added to the `redpanda` and `redpanda_common` inputs. (@mihaitodor) +- Metric `redpanda_lag` now emitted by the `redpanda` and `redpanda_common` inputs. (@mihaitodor) +- Metadata `kafka_lag` now emitted by the `redpanda` and `redpanda_common` inputs. (@mihaitodor) +- The `redpanda_migrator_bundle` input and output now set labels for their subcomponents. (@mihaitodor) +- (Benthos) Field `label` added to the template tests definitions. (@mihaitodor) +- (Benthos) Metadata field `label` can now be utilized within a template's `mapping` field to access the label that is associated with the template instantiation in a config. (@mihaitodor) +- (Benthos) `bloblang` scalar type added to template fields. (@mihaitodor) +- (Benthos) Go API: Method `SetOutputBrokerPattern` added to the `StreamBuilder` type. (@mihaitodor) +- (Benthos) New `error_source_name`, `error_source_label` and `error_source_path` bloblang functions. (@mihaitodor) +- (Benthos) Flag `--verbose` added to the `benthos lint` and `benthos template lint` commands. (@mihaitodor) -### Fixed +### Changed -- The `code` and `file` fields on the `javascript` processor docs no longer erroneously mention interpolation support. (@mihaitodor) -- The `postgres_cdc` now correctly handles `null` values. (@rockwotj) - Fix an issue in `aws_sqs` with refreshing in-flight message leases which could prevent acks from processed. (@rockwotj) - Fix an issue with `postgres_cdc` with TOAST values not being propagated with `REPLICA IDENTITY FULL`. (@rockwotj) - Fix a initial snapshot streaming consistency issue with `postgres_cdc`. (@rockwotj) -- Fix bug in `sftp` input where the last file was not deleted when `watcher` and `delete_on_finish` were enabled (@ooesili) +- Fix bug in `sftp` input where the last file was not deleted when `watcher` and `delete_on_finish` were enabled. (@ooesili) +- Fields `batch_size`, `multi_header`, `replication_factor`, `replication_factor_override` and `output_resource` for the `redpanda_migrator` input are now deprecated. (@mihaitodor) +- Fields `kafka_key` and `max_in_flight` for the `redpanda_migrator_offsets` output are now deprecated. (@mihaitodor) +- Field `batching` for the `redpanda_migrator` output is now deprecated. (@mihaitodor) +- The `redpanda_migrator` input no longer emits tombstone messages. (@mihaitodor) +- (Benthos) The `branch` processor no longer emits an entry in the log at error level when the child processors throw errors. (@mihaitodor) +- (Benthos) Streams and the StreamBuilder API now use `reject` by default when no output is specified in the config and `stdout` isn't registered (for example when the `io` components are not imported). (@mihaitodor) ## 4.44.0 - 2024-12-13 @@ -72,7 +95,7 @@ All notable changes to this project will be documented in this file. - Add support for `spanner` driver to SQL plugins. (@yufeng-deng) - Add support for complex database types (JSONB, TEXT[], INET, TSVECTOR, TSRANGE, POINT, INTEGER[]) for `pg_stream` input. (@le-vlad) -- Add support for Parquet files to `bigquery` output (@rockwotj) +- Add support for Parquet files to `bigquery` output. (@rockwotj) - (Benthos) New `exists` operator added to the `cache` processor. (@mihaitodor) - New CLI flag `redpanda-license` added as an alternative way to specify a Redpanda license. (@Jeffail) diff --git a/docs/modules/components/pages/inputs/redpanda.adoc b/docs/modules/components/pages/inputs/redpanda.adoc index af0dff4358..935f4b52a4 100644 --- a/docs/modules/components/pages/inputs/redpanda.adoc +++ b/docs/modules/components/pages/inputs/redpanda.adoc @@ -76,6 +76,7 @@ input: consumer_group: "" # No default (optional) commit_period: 5s partition_buffer_bytes: 1MB + topic_lag_refresh_period: 5s auto_replay_nacks: true ``` @@ -115,6 +116,10 @@ output: Records are processed and delivered from each partition in batches as received from brokers. These batch sizes are therefore dynamically sized in order to optimise throughput, but can be tuned with the config fields `fetch_max_partition_bytes` and `fetch_max_bytes`. Batches can be further broken down using the xref:components:processors/split.adoc[`split`] processor. +== Metrics + +Emits a `redpanda_lag` metric with `topic` and `partition` labels for each consumed topic. + == Metadata This input adds the following metadata fields to each message: @@ -124,6 +129,7 @@ This input adds the following metadata fields to each message: - kafka_topic - kafka_partition - kafka_offset +- kafka_lag - kafka_timestamp_ms - kafka_timestamp_unix - kafka_tombstone_message @@ -645,6 +651,15 @@ A buffer size (in bytes) for each consumed partition, allowing records to be que *Default*: `"1MB"` +=== `topic_lag_refresh_period` + +The period of time between each topic lag refresh cycle. + + +*Type*: `string` + +*Default*: `"5s"` + === `auto_replay_nacks` Whether messages that are rejected (nacked) at the output level should be automatically replayed indefinitely, eventually resulting in back pressure if the cause of the rejections is persistent. If set to `false` these messages will instead be deleted. Disabling auto replays can greatly improve memory efficiency of high throughput streams as the original shape of the data can be discarded immediately upon consumption and mutation. diff --git a/docs/modules/components/pages/inputs/redpanda_common.adoc b/docs/modules/components/pages/inputs/redpanda_common.adoc index ea11508757..ac23e21541 100644 --- a/docs/modules/components/pages/inputs/redpanda_common.adoc +++ b/docs/modules/components/pages/inputs/redpanda_common.adoc @@ -64,6 +64,7 @@ input: consumer_group: "" # No default (optional) commit_period: 5s partition_buffer_bytes: 1MB + topic_lag_refresh_period: 5s auto_replay_nacks: true ``` @@ -101,6 +102,10 @@ output: Records are processed and delivered from each partition in batches as received from brokers. These batch sizes are therefore dynamically sized in order to optimise throughput, but can be tuned with the config fields `fetch_max_partition_bytes` and `fetch_max_bytes`. Batches can be further broken down using the xref:components:processors/split.adoc[`split`] processor. +== Metrics + +Emits a `redpanda_lag` metric with `topic` and `partition` labels for each consumed topic. + == Metadata This input adds the following metadata fields to each message: @@ -110,6 +115,7 @@ This input adds the following metadata fields to each message: - kafka_topic - kafka_partition - kafka_offset +- kafka_lag - kafka_timestamp_ms - kafka_timestamp_unix - kafka_tombstone_message @@ -245,6 +251,15 @@ A buffer size (in bytes) for each consumed partition, allowing records to be que *Default*: `"1MB"` +=== `topic_lag_refresh_period` + +The period of time between each topic lag refresh cycle. + + +*Type*: `string` + +*Default*: `"5s"` + === `auto_replay_nacks` Whether messages that are rejected (nacked) at the output level should be automatically replayed indefinitely, eventually resulting in back pressure if the cause of the rejections is persistent. If set to `false` these messages will instead be deleted. Disabling auto replays can greatly improve memory efficiency of high throughput streams as the original shape of the data can be discarded immediately upon consumption and mutation. diff --git a/docs/modules/components/pages/inputs/redpanda_migrator.adoc b/docs/modules/components/pages/inputs/redpanda_migrator.adoc index 94acea9fab..a2724a8dac 100644 --- a/docs/modules/components/pages/inputs/redpanda_migrator.adoc +++ b/docs/modules/components/pages/inputs/redpanda_migrator.adoc @@ -77,13 +77,9 @@ input: fetch_max_partition_bytes: 1MiB consumer_group: "" # No default (optional) commit_period: 5s - multi_header: false - batch_size: 1024 - auto_replay_nacks: true + partition_buffer_bytes: 1MB topic_lag_refresh_period: 5s - output_resource: redpanda_migrator_output - replication_factor_override: true - replication_factor: 3 + auto_replay_nacks: true ``` -- @@ -91,11 +87,11 @@ input: Reads a batch of messages from a Kafka broker and waits for the output to acknowledge the writes before updating the Kafka consumer group offset. -This input should be used in combination with a `redpanda_migrator` output which it can query for existing topics. +This input should be used in combination with a `redpanda_migrator` output. When a consumer group is specified this input consumes one or more topics where partitions will automatically balance across any other connected clients with the same consumer group. When a consumer group is not specified topics can either be consumed in their entirety or with explicit partitions. -It attempts to create all selected topics it along with their associated ACLs in the broker that the `redpanda_migrator` output points to identified by the label specified in `output_resource`. +It provides the same delivery guarantees and ordering semantics as the `redpanda` input. == Metrics @@ -623,32 +619,14 @@ The period of time between each commit of the current partition offsets. Offsets *Default*: `"5s"` -=== `multi_header` - -Decode headers into lists to allow handling of multiple values with the same key - - -*Type*: `bool` - -*Default*: `false` - -=== `batch_size` - -The maximum number of messages that should be accumulated into each batch. - - -*Type*: `int` +=== `partition_buffer_bytes` -*Default*: `1024` - -=== `auto_replay_nacks` - -Whether messages that are rejected (nacked) at the output level should be automatically replayed indefinitely, eventually resulting in back pressure if the cause of the rejections is persistent. If set to `false` these messages will instead be deleted. Disabling auto replays can greatly improve memory efficiency of high throughput streams as the original shape of the data can be discarded immediately upon consumption and mutation. +A buffer size (in bytes) for each consumed partition, allowing records to be queued internally before flushing. Increasing this may improve throughput at the cost of higher memory utilisation. Note that each buffer can grow slightly beyond this value. -*Type*: `bool` +*Type*: `string` -*Default*: `true` +*Default*: `"1MB"` === `topic_lag_refresh_period` @@ -659,31 +637,13 @@ The period of time between each topic lag refresh cycle. *Default*: `"5s"` -=== `output_resource` - -The label of the redpanda_migrator output in which the currently selected topics need to be created before attempting to read messages. - - -*Type*: `string` - -*Default*: `"redpanda_migrator_output"` - -=== `replication_factor_override` +=== `auto_replay_nacks` -Use the specified replication factor when creating topics. +Whether messages that are rejected (nacked) at the output level should be automatically replayed indefinitely, eventually resulting in back pressure if the cause of the rejections is persistent. If set to `false` these messages will instead be deleted. Disabling auto replays can greatly improve memory efficiency of high throughput streams as the original shape of the data can be discarded immediately upon consumption and mutation. *Type*: `bool` *Default*: `true` -=== `replication_factor` - -Replication factor for created topics. This is only used when `replication_factor_override` is set to `true`. - - -*Type*: `int` - -*Default*: `3` - diff --git a/docs/modules/components/pages/inputs/redpanda_migrator_offsets.adoc b/docs/modules/components/pages/inputs/redpanda_migrator_offsets.adoc new file mode 100644 index 0000000000..1090c192d9 --- /dev/null +++ b/docs/modules/components/pages/inputs/redpanda_migrator_offsets.adoc @@ -0,0 +1,577 @@ += redpanda_migrator_offsets +:type: input +:status: beta +:categories: ["Services"] + + + +//// + THIS FILE IS AUTOGENERATED! + + To make changes, edit the corresponding source file under: + + https://github.com/redpanda-data/connect/tree/main/internal/impl/. + + And: + + https://github.com/redpanda-data/connect/tree/main/cmd/tools/docs_gen/templates/plugin.adoc.tmpl +//// + +// © 2024 Redpanda Data Inc. + + +component_type_dropdown::[] + + +Redpanda Migrator consumer group offsets input using the https://github.com/twmb/franz-go[Franz Kafka client library^]. + +Introduced in version 4.45.0. + + +[tabs] +====== +Common:: ++ +-- + +```yml +# Common config fields, showing default values +input: + label: "" + redpanda_migrator_offsets: + seed_brokers: [] # No default (required) + topics: [] # No default (required) + regexp_topics: false + consumer_group: "" # No default (optional) + auto_replay_nacks: true +``` + +-- +Advanced:: ++ +-- + +```yml +# All config fields, showing default values +input: + label: "" + redpanda_migrator_offsets: + seed_brokers: [] # No default (required) + client_id: benthos + tls: + enabled: false + skip_cert_verify: false + enable_renegotiation: false + root_cas: "" + root_cas_file: "" + client_certs: [] + sasl: [] # No default (optional) + metadata_max_age: 5m + topics: [] # No default (required) + regexp_topics: false + rack_id: "" + consumer_group: "" # No default (optional) + commit_period: 5s + partition_buffer_bytes: 1MB + topic_lag_refresh_period: 5s + auto_replay_nacks: true +``` + +-- +====== + +TODO: Description + +== Metadata + +This input adds the following metadata fields to each message: + +```text +- kafka_key +- kafka_topic +- kafka_partition +- kafka_offset +- kafka_timestamp_unix +- kafka_timestamp_ms +- kafka_tombstone_message +- kafka_offset_topic +- kafka_offset_group +- kafka_offset_partition +- kafka_offset_commit_timestamp +- kafka_offset_metadata +``` + + +== Fields + +=== `seed_brokers` + +A list of broker addresses to connect to in order to establish connections. If an item of the list contains commas it will be expanded into multiple addresses. + + +*Type*: `array` + + +```yml +# Examples + +seed_brokers: + - localhost:9092 + +seed_brokers: + - foo:9092 + - bar:9092 + +seed_brokers: + - foo:9092,bar:9092 +``` + +=== `client_id` + +An identifier for the client connection. + + +*Type*: `string` + +*Default*: `"benthos"` + +=== `tls` + +Custom TLS settings can be used to override system defaults. + + +*Type*: `object` + + +=== `tls.enabled` + +Whether custom TLS settings are enabled. + + +*Type*: `bool` + +*Default*: `false` + +=== `tls.skip_cert_verify` + +Whether to skip server side certificate verification. + + +*Type*: `bool` + +*Default*: `false` + +=== `tls.enable_renegotiation` + +Whether to allow the remote server to repeatedly request renegotiation. Enable this option if you're seeing the error message `local error: tls: no renegotiation`. + + +*Type*: `bool` + +*Default*: `false` +Requires version 3.45.0 or newer + +=== `tls.root_cas` + +An optional root certificate authority to use. This is a string, representing a certificate chain from the parent trusted root certificate, to possible intermediate signing certificates, to the host certificate. +[CAUTION] +==== +This field contains sensitive information that usually shouldn't be added to a config directly, read our xref:configuration:secrets.adoc[secrets page for more info]. +==== + + + +*Type*: `string` + +*Default*: `""` + +```yml +# Examples + +root_cas: |- + -----BEGIN CERTIFICATE----- + ... + -----END CERTIFICATE----- +``` + +=== `tls.root_cas_file` + +An optional path of a root certificate authority file to use. This is a file, often with a .pem extension, containing a certificate chain from the parent trusted root certificate, to possible intermediate signing certificates, to the host certificate. + + +*Type*: `string` + +*Default*: `""` + +```yml +# Examples + +root_cas_file: ./root_cas.pem +``` + +=== `tls.client_certs` + +A list of client certificates to use. For each certificate either the fields `cert` and `key`, or `cert_file` and `key_file` should be specified, but not both. + + +*Type*: `array` + +*Default*: `[]` + +```yml +# Examples + +client_certs: + - cert: foo + key: bar + +client_certs: + - cert_file: ./example.pem + key_file: ./example.key +``` + +=== `tls.client_certs[].cert` + +A plain text certificate to use. + + +*Type*: `string` + +*Default*: `""` + +=== `tls.client_certs[].key` + +A plain text certificate key to use. +[CAUTION] +==== +This field contains sensitive information that usually shouldn't be added to a config directly, read our xref:configuration:secrets.adoc[secrets page for more info]. +==== + + + +*Type*: `string` + +*Default*: `""` + +=== `tls.client_certs[].cert_file` + +The path of a certificate to use. + + +*Type*: `string` + +*Default*: `""` + +=== `tls.client_certs[].key_file` + +The path of a certificate key to use. + + +*Type*: `string` + +*Default*: `""` + +=== `tls.client_certs[].password` + +A plain text password for when the private key is password encrypted in PKCS#1 or PKCS#8 format. The obsolete `pbeWithMD5AndDES-CBC` algorithm is not supported for the PKCS#8 format. + +Because the obsolete pbeWithMD5AndDES-CBC algorithm does not authenticate the ciphertext, it is vulnerable to padding oracle attacks that can let an attacker recover the plaintext. +[CAUTION] +==== +This field contains sensitive information that usually shouldn't be added to a config directly, read our xref:configuration:secrets.adoc[secrets page for more info]. +==== + + + +*Type*: `string` + +*Default*: `""` + +```yml +# Examples + +password: foo + +password: ${KEY_PASSWORD} +``` + +=== `sasl` + +Specify one or more methods of SASL authentication. SASL is tried in order; if the broker supports the first mechanism, all connections will use that mechanism. If the first mechanism fails, the client will pick the first supported mechanism. If the broker does not support any client mechanisms, connections will fail. + + +*Type*: `array` + + +```yml +# Examples + +sasl: + - mechanism: SCRAM-SHA-512 + password: bar + username: foo +``` + +=== `sasl[].mechanism` + +The SASL mechanism to use. + + +*Type*: `string` + + +|=== +| Option | Summary + +| `AWS_MSK_IAM` +| AWS IAM based authentication as specified by the 'aws-msk-iam-auth' java library. +| `OAUTHBEARER` +| OAuth Bearer based authentication. +| `PLAIN` +| Plain text authentication. +| `SCRAM-SHA-256` +| SCRAM based authentication as specified in RFC5802. +| `SCRAM-SHA-512` +| SCRAM based authentication as specified in RFC5802. +| `none` +| Disable sasl authentication + +|=== + +=== `sasl[].username` + +A username to provide for PLAIN or SCRAM-* authentication. + + +*Type*: `string` + +*Default*: `""` + +=== `sasl[].password` + +A password to provide for PLAIN or SCRAM-* authentication. +[CAUTION] +==== +This field contains sensitive information that usually shouldn't be added to a config directly, read our xref:configuration:secrets.adoc[secrets page for more info]. +==== + + + +*Type*: `string` + +*Default*: `""` + +=== `sasl[].token` + +The token to use for a single session's OAUTHBEARER authentication. + + +*Type*: `string` + +*Default*: `""` + +=== `sasl[].extensions` + +Key/value pairs to add to OAUTHBEARER authentication requests. + + +*Type*: `object` + + +=== `sasl[].aws` + +Contains AWS specific fields for when the `mechanism` is set to `AWS_MSK_IAM`. + + +*Type*: `object` + + +=== `sasl[].aws.region` + +The AWS region to target. + + +*Type*: `string` + +*Default*: `""` + +=== `sasl[].aws.endpoint` + +Allows you to specify a custom endpoint for the AWS API. + + +*Type*: `string` + +*Default*: `""` + +=== `sasl[].aws.credentials` + +Optional manual configuration of AWS credentials to use. More information can be found in xref:guides:cloud/aws.adoc[]. + + +*Type*: `object` + + +=== `sasl[].aws.credentials.profile` + +A profile from `~/.aws/credentials` to use. + + +*Type*: `string` + +*Default*: `""` + +=== `sasl[].aws.credentials.id` + +The ID of credentials to use. + + +*Type*: `string` + +*Default*: `""` + +=== `sasl[].aws.credentials.secret` + +The secret for the credentials being used. +[CAUTION] +==== +This field contains sensitive information that usually shouldn't be added to a config directly, read our xref:configuration:secrets.adoc[secrets page for more info]. +==== + + + +*Type*: `string` + +*Default*: `""` + +=== `sasl[].aws.credentials.token` + +The token for the credentials being used, required when using short term credentials. + + +*Type*: `string` + +*Default*: `""` + +=== `sasl[].aws.credentials.from_ec2_role` + +Use the credentials of a host EC2 machine configured to assume https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use_switch-role-ec2.html[an IAM role associated with the instance^]. + + +*Type*: `bool` + +*Default*: `false` +Requires version 4.2.0 or newer + +=== `sasl[].aws.credentials.role` + +A role ARN to assume. + + +*Type*: `string` + +*Default*: `""` + +=== `sasl[].aws.credentials.role_external_id` + +An external ID to provide when assuming a role. + + +*Type*: `string` + +*Default*: `""` + +=== `metadata_max_age` + +The maximum age of metadata before it is refreshed. + + +*Type*: `string` + +*Default*: `"5m"` + +=== `topics` + +A list of topics to consume from. Multiple comma separated topics can be listed in a single element. When a `consumer_group` is specified partitions are automatically distributed across consumers of a topic, otherwise all partitions are consumed. + + +*Type*: `array` + + +```yml +# Examples + +topics: + - foo + - bar + +topics: + - things.* + +topics: + - foo,bar +``` + +=== `regexp_topics` + +Whether listed topics should be interpreted as regular expression patterns for matching multiple topics. + + +*Type*: `bool` + +*Default*: `false` + +=== `rack_id` + +A rack specifies where the client is physically located and changes fetch requests to consume from the closest replica as opposed to the leader replica. + + +*Type*: `string` + +*Default*: `""` + +=== `consumer_group` + +An optional consumer group to consume as. When specified the partitions of specified topics are automatically distributed across consumers sharing a consumer group, and partition offsets are automatically committed and resumed under this name. Consumer groups are not supported when specifying explicit partitions to consume from in the `topics` field. + + +*Type*: `string` + + +=== `commit_period` + +The period of time between each commit of the current partition offsets. Offsets are always committed during shutdown. + + +*Type*: `string` + +*Default*: `"5s"` + +=== `partition_buffer_bytes` + +A buffer size (in bytes) for each consumed partition, allowing records to be queued internally before flushing. Increasing this may improve throughput at the cost of higher memory utilisation. Note that each buffer can grow slightly beyond this value. + + +*Type*: `string` + +*Default*: `"1MB"` + +=== `topic_lag_refresh_period` + +The period of time between each topic lag refresh cycle. + + +*Type*: `string` + +*Default*: `"5s"` + +=== `auto_replay_nacks` + +Whether messages that are rejected (nacked) at the output level should be automatically replayed indefinitely, eventually resulting in back pressure if the cause of the rejections is persistent. If set to `false` these messages will instead be deleted. Disabling auto replays can greatly improve memory efficiency of high throughput streams as the original shape of the data can be discarded immediately upon consumption and mutation. + + +*Type*: `bool` + +*Default*: `true` + + diff --git a/docs/modules/components/pages/outputs/redpanda_migrator.adoc b/docs/modules/components/pages/outputs/redpanda_migrator.adoc index 5966836055..3a022e819e 100644 --- a/docs/modules/components/pages/outputs/redpanda_migrator.adoc +++ b/docs/modules/components/pages/outputs/redpanda_migrator.adoc @@ -46,12 +46,7 @@ output: metadata: include_prefixes: [] include_patterns: [] - max_in_flight: 10 - batching: - count: 0 - byte_size: 0 - period: "" - check: "" + max_in_flight: 256 ``` -- @@ -82,13 +77,7 @@ output: include_prefixes: [] include_patterns: [] timestamp_ms: ${! timestamp_unix_milli() } # No default (optional) - max_in_flight: 10 - batching: - count: 0 - byte_size: 0 - period: "" - check: "" - processors: [] # No default (optional) + max_in_flight: 256 input_resource: redpanda_migrator_input replication_factor_override: true replication_factor: 3 @@ -107,11 +96,12 @@ output: Writes a batch of messages to a Kafka broker and waits for acknowledgement before propagating it back to the input. -This output should be used in combination with a `redpanda_migrator` input which it can query for topic and ACL configurations. +This output should be used in combination with a `redpanda_migrator` input identified by the label specified in +`input_resource` which it can query for topic and ACL configurations. Once connected, the output will attempt to +create all topics which the input consumes from along with their ACLs. -If the configured broker does not contain the current message topic, it attempts to create it along with the topic -ACLs which are read automatically from the `redpanda_migrator` input identified by the label specified in -`input_resource`. +If the configured broker does not contain the current message topic, this output attempts to create it along with its +ACLs. ACL migration adheres to the following principles: @@ -641,109 +631,7 @@ The maximum number of batches to be sending in parallel at any given time. *Type*: `int` -*Default*: `10` - -=== `batching` - -Allows you to configure a xref:configuration:batching.adoc[batching policy]. - - -*Type*: `object` - - -```yml -# Examples - -batching: - byte_size: 5000 - count: 0 - period: 1s - -batching: - count: 10 - period: 1s - -batching: - check: this.contains("END BATCH") - count: 0 - period: 1m -``` - -=== `batching.count` - -A number of messages at which the batch should be flushed. If `0` disables count based batching. - - -*Type*: `int` - -*Default*: `0` - -=== `batching.byte_size` - -An amount of bytes at which the batch should be flushed. If `0` disables size based batching. - - -*Type*: `int` - -*Default*: `0` - -=== `batching.period` - -A period in which an incomplete batch should be flushed regardless of its size. - - -*Type*: `string` - -*Default*: `""` - -```yml -# Examples - -period: 1s - -period: 1m - -period: 500ms -``` - -=== `batching.check` - -A xref:guides:bloblang/about.adoc[Bloblang query] that should return a boolean value indicating whether a message should end a batch. - - -*Type*: `string` - -*Default*: `""` - -```yml -# Examples - -check: this.type == "end_of_transaction" -``` - -=== `batching.processors` - -A list of xref:components:processors/about.adoc[processors] to apply to a batch as it is flushed. This allows you to aggregate and archive the batch however you see fit. Please note that all resulting messages are flushed as a single batch, therefore splitting the batch into smaller batches using these processors is a no-op. - - -*Type*: `array` - - -```yml -# Examples - -processors: - - archive: - format: concatenate - -processors: - - archive: - format: lines - -processors: - - archive: - format: json_array -``` +*Default*: `256` === `input_resource` diff --git a/docs/modules/components/pages/outputs/redpanda_migrator_offsets.adoc b/docs/modules/components/pages/outputs/redpanda_migrator_offsets.adoc index b76f32f6fb..a235113196 100644 --- a/docs/modules/components/pages/outputs/redpanda_migrator_offsets.adoc +++ b/docs/modules/components/pages/outputs/redpanda_migrator_offsets.adoc @@ -40,8 +40,11 @@ output: label: "" redpanda_migrator_offsets: seed_brokers: [] # No default (required) - kafka_key: ${! @kafka_key } - max_in_flight: 1 + offset_topic: ${! @kafka_offset_topic } + offset_group: ${! @kafka_offset_group } + offset_partition: ${! @kafka_offset_partition } + offset_commit_timestamp: ${! @kafka_offset_commit_timestamp } + offset_metadata: ${! @kafka_offset_metadata } ``` -- @@ -65,8 +68,11 @@ output: client_certs: [] sasl: [] # No default (optional) metadata_max_age: 5m - kafka_key: ${! @kafka_key } - max_in_flight: 1 + offset_topic: ${! @kafka_offset_topic } + offset_group: ${! @kafka_offset_group } + offset_partition: ${! @kafka_offset_partition } + offset_commit_timestamp: ${! @kafka_offset_commit_timestamp } + offset_metadata: ${! @kafka_offset_metadata } timeout: 10s max_message_bytes: 1MB broker_write_max_bytes: 100MB @@ -470,24 +476,55 @@ The maximum age of metadata before it is refreshed. *Default*: `"5m"` -=== `kafka_key` +=== `offset_topic` -Kafka key. +Kafka offset topic. This field supports xref:configuration:interpolation.adoc#bloblang-queries[interpolation functions]. *Type*: `string` -*Default*: `"${! @kafka_key }"` +*Default*: `"${! @kafka_offset_topic }"` -=== `max_in_flight` +=== `offset_group` -The maximum number of batches to be sending in parallel at any given time. +Kafka offset group. +This field supports xref:configuration:interpolation.adoc#bloblang-queries[interpolation functions]. -*Type*: `int` +*Type*: `string` + +*Default*: `"${! @kafka_offset_group }"` + +=== `offset_partition` + +Kafka offset partition. +This field supports xref:configuration:interpolation.adoc#bloblang-queries[interpolation functions]. + + +*Type*: `string` + +*Default*: `"${! @kafka_offset_partition }"` + +=== `offset_commit_timestamp` + +Kafka offset commit timestamp. +This field supports xref:configuration:interpolation.adoc#bloblang-queries[interpolation functions]. + + +*Type*: `string` + +*Default*: `"${! @kafka_offset_commit_timestamp }"` + +=== `offset_metadata` + +Kafka offset metadata value. +This field supports xref:configuration:interpolation.adoc#bloblang-queries[interpolation functions]. + + +*Type*: `string` -*Default*: `1` +*Default*: `"${! @kafka_offset_metadata }"` === `timeout` diff --git a/docs/modules/configuration/pages/templating.adoc b/docs/modules/configuration/pages/templating.adoc index 6fe12f610c..6227fed865 100644 --- a/docs/modules/configuration/pages/templating.adoc +++ b/docs/modules/configuration/pages/templating.adoc @@ -223,6 +223,8 @@ The scalar type of the field. | standard float type | `bool` | a boolean true/false +| `bloblang` +| a bloblang mapping | `unknown` | allows for nesting arbitrary configuration inside of a field @@ -309,6 +311,15 @@ A name to identify the test. *Type*: `string` +=== `tests[].label` + +A label to assign to this template when running the test. + + +*Type*: `string` + +*Default*: `""` + === `tests[].config` A configuration to run this test with, the config resulting from applying the template with this config will be linted. diff --git a/docs/modules/guides/pages/bloblang/functions.adoc b/docs/modules/guides/pages/bloblang/functions.adoc index 8558a04f33..6bb40c66a7 100644 --- a/docs/modules/guides/pages/bloblang/functions.adoc +++ b/docs/modules/guides/pages/bloblang/functions.adoc @@ -391,6 +391,39 @@ If an error has occurred during the processing of a message this function return root.doc.error = error() ``` +=== `error_source_label` + +Returns the label of the source component which raised the error during the processing of a message or an empty string if not set. `null` is returned when the error is null or no source component is associated with it. For more information about error handling patterns read xref:configuration:error_handling.adoc[]. + +==== Examples + + +```coffeescript +root.doc.error_source_label = error_source_label() +``` + +=== `error_source_name` + +Returns the name of the source component which raised the error during the processing of a message. `null` is returned when the error is null or no source component is associated with it. For more information about error handling patterns read xref:configuration:error_handling.adoc[]. + +==== Examples + + +```coffeescript +root.doc.error_source_name = error_source_name() +``` + +=== `error_source_path` + +Returns the path of the source component which raised the error during the processing of a message. `null` is returned when the error is null or no source component is associated with it. For more information about error handling patterns read xref:configuration:error_handling.adoc[]. + +==== Examples + + +```coffeescript +root.doc.error_source_path = error_source_path() +``` + === `errored` Returns a boolean value indicating whether an error has occurred during the processing of a message. For more information about error handling patterns read xref:configuration:error_handling.adoc[]. diff --git a/go.mod b/go.mod index 30ff892c95..73bed77273 100644 --- a/go.mod +++ b/go.mod @@ -109,7 +109,7 @@ require ( github.com/rabbitmq/amqp091-go v1.10.0 github.com/rcrowley/go-metrics v0.0.0-20201227073835-cf1acfcdf475 github.com/redis/go-redis/v9 v9.7.0 - github.com/redpanda-data/benthos/v4 v4.42.0 + github.com/redpanda-data/benthos/v4 v4.43.0 github.com/redpanda-data/common-go/secrets v0.1.2 github.com/redpanda-data/connect/public/bundle/free/v4 v4.31.0 github.com/rs/xid v1.5.0 @@ -118,15 +118,15 @@ require ( github.com/smira/go-statsd v1.3.3 github.com/snowflakedb/gosnowflake v1.11.0 github.com/sourcegraph/conc v0.3.0 - github.com/stretchr/testify v1.9.0 + github.com/stretchr/testify v1.10.0 github.com/testcontainers/testcontainers-go/modules/ollama v0.32.0 github.com/testcontainers/testcontainers-go/modules/qdrant v0.32.0 github.com/tetratelabs/wazero v1.7.3 github.com/timeplus-io/proton-go-driver/v2 v2.0.17 github.com/trinodb/trino-go-client v0.315.0 - github.com/twmb/franz-go v1.17.1 + github.com/twmb/franz-go v1.18.0 github.com/twmb/franz-go/pkg/kadm v1.13.0 - github.com/twmb/franz-go/pkg/kmsg v1.8.0 + github.com/twmb/franz-go/pkg/kmsg v1.9.0 github.com/twmb/franz-go/pkg/sr v1.2.0 github.com/vmihailenco/msgpack/v5 v5.4.1 github.com/xdg-go/scram v1.1.2 @@ -136,13 +136,13 @@ require ( github.com/youmark/pkcs8 v0.0.0-20240726163527-a2c0da244d78 go.mongodb.org/mongo-driver/v2 v2.0.0 go.nanomsg.org/mangos/v3 v3.4.2 - go.opentelemetry.io/otel v1.29.0 + go.opentelemetry.io/otel v1.33.0 go.opentelemetry.io/otel/exporters/jaeger v1.17.0 go.opentelemetry.io/otel/exporters/otlp/otlptrace v1.28.0 go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc v1.28.0 go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp v1.28.0 go.opentelemetry.io/otel/sdk v1.29.0 - go.opentelemetry.io/otel/trace v1.29.0 + go.opentelemetry.io/otel/trace v1.33.0 go.uber.org/multierr v1.11.0 golang.org/x/crypto v0.32.0 golang.org/x/exp v0.0.0-20250106191152-7588d65b2ba8 @@ -179,7 +179,6 @@ require ( github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd // indirect github.com/modern-go/reflect2 v1.0.2 // indirect github.com/munnerz/goautoneg v0.0.0-20191010083416-a7dc8b61c822 // indirect - github.com/onsi/gomega v1.34.2 // indirect github.com/pingcap/errors v0.11.5-0.20240311024730-e056997136bb // indirect github.com/pingcap/failpoint v0.0.0-20240528011301-b51a646c7c86 // indirect github.com/pingcap/log v1.1.1-0.20230317032135-a0d097d16e22 // indirect @@ -191,6 +190,7 @@ require ( github.com/tidwall/gjson v1.18.0 // indirect github.com/tidwall/match v1.1.1 // indirect github.com/tidwall/pretty v1.2.1 // indirect + go.opentelemetry.io/auto/sdk v1.1.0 // indirect go.opentelemetry.io/contrib/detectors/gcp v1.29.0 // indirect go.opentelemetry.io/otel/sdk/metric v1.29.0 // indirect gonum.org/v1/gonum v0.15.1 // indirect @@ -203,7 +203,7 @@ require ( cloud.google.com/go/compute/metadata v0.5.2 // indirect cloud.google.com/go/iam v1.2.2 // indirect cloud.google.com/go/trace v1.11.2 // indirect - cuelang.org/go v0.9.2 // indirect + cuelang.org/go v0.11.1 // indirect dario.cat/mergo v1.0.0 // indirect filippo.io/edwards25519 v1.1.0 // indirect github.com/99designs/go-keychain v0.0.0-20191008050251-8e49817e8af4 // indirect @@ -265,7 +265,7 @@ require ( github.com/couchbase/goprotostellar v1.0.2 // indirect github.com/couchbaselabs/gocbconnstr/v2 v2.0.0-20240607131231-fb385523de28 // indirect github.com/cpuguy83/dockercfg v0.3.1 // indirect - github.com/cpuguy83/go-md2man/v2 v2.0.4 // indirect + github.com/cpuguy83/go-md2man/v2 v2.0.5 // indirect github.com/danieljoos/wincred v1.2.0 // indirect github.com/davecgh/go-spew v1.1.1 // indirect github.com/dgryski/go-rendezvous v0.0.0-20200823014737-9f7001d12a5f // indirect @@ -279,10 +279,10 @@ require ( github.com/eapache/go-resiliency v1.7.0 // indirect github.com/eapache/go-xerial-snappy v0.0.0-20230731223053-c322873962e3 // indirect github.com/eapache/queue v1.1.0 // indirect - github.com/fatih/color v1.17.0 // indirect + github.com/fatih/color v1.18.0 // indirect github.com/felixge/httpsnoop v1.0.4 // indirect github.com/frankban/quicktest v1.14.6 // indirect - github.com/fsnotify/fsnotify v1.7.0 // indirect + github.com/fsnotify/fsnotify v1.8.0 // indirect github.com/gabriel-vasile/mimetype v1.4.7 // indirect github.com/go-faster/city v1.0.1 // indirect github.com/go-faster/errors v0.7.1 // indirect @@ -309,7 +309,7 @@ require ( github.com/gorilla/mux v1.8.1 // indirect github.com/gorilla/websocket v1.5.3 // indirect github.com/gosimple/unidecode v1.0.1 // indirect - github.com/govalues/decimal v0.1.29 // indirect + github.com/govalues/decimal v0.1.32 // indirect github.com/grpc-ecosystem/go-grpc-middleware v1.4.0 // indirect github.com/grpc-ecosystem/grpc-gateway/v2 v2.22.0 // indirect github.com/hailocab/go-hostpool v0.0.0-20160125115350-e80d13ce29ed // indirect @@ -322,7 +322,7 @@ require ( github.com/hashicorp/golang-lru/arc/v2 v2.0.7 // indirect github.com/hashicorp/golang-lru/v2 v2.0.7 // indirect github.com/influxdata/go-syslog/v3 v3.0.0 // indirect - github.com/itchyny/gojq v0.12.16 // indirect + github.com/itchyny/gojq v0.12.17 // indirect github.com/itchyny/timefmt-go v0.1.6 // indirect github.com/jackc/chunkreader/v2 v2.0.1 // indirect github.com/jackc/pgconn v1.14.3 @@ -372,7 +372,7 @@ require ( github.com/paulmach/orb v0.11.1 // indirect github.com/pgvector/pgvector-go v0.2.2 github.com/pierrec/lz4 v2.6.1+incompatible // indirect - github.com/pierrec/lz4/v4 v4.1.21 // indirect + github.com/pierrec/lz4/v4 v4.1.22 // indirect github.com/pkg/browser v0.0.0-20240102092130-5ac0b6a4141c // indirect github.com/pkg/errors v0.9.1 // indirect github.com/pmezard/go-difflib v1.0.0 // indirect @@ -381,7 +381,7 @@ require ( github.com/prometheus/procfs v0.15.1 // indirect github.com/quipo/dependencysolver v0.0.0-20170801134659-2b009cb4ddcc // indirect github.com/remyoudompheng/bigfft v0.0.0-20230129092748-24d4a6f8daec // indirect - github.com/rickb777/period v1.0.6 // indirect + github.com/rickb777/period v1.0.7 // indirect github.com/rickb777/plural v1.4.2 // indirect github.com/rivo/uniseg v0.4.7 // indirect github.com/robfig/cron/v3 v3.0.1 // indirect @@ -399,7 +399,7 @@ require ( github.com/tilinna/z85 v1.0.0 // indirect github.com/tklauser/go-sysconf v0.3.13 // indirect github.com/tklauser/numcpus v0.7.0 // indirect - github.com/urfave/cli/v2 v2.27.4 + github.com/urfave/cli/v2 v2.27.5 github.com/vmihailenco/tagparser/v2 v2.0.0 // indirect github.com/xdg-go/pbkdf2 v1.0.0 // indirect github.com/xdg-go/stringprep v1.0.4 // indirect @@ -411,7 +411,7 @@ require ( go.opencensus.io v0.24.0 // indirect go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc v0.54.0 // indirect go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp v0.54.0 // indirect - go.opentelemetry.io/otel/metric v1.29.0 // indirect + go.opentelemetry.io/otel/metric v1.33.0 // indirect go.opentelemetry.io/proto/otlp v1.3.1 // indirect go.uber.org/atomic v1.11.0 // indirect go.uber.org/zap v1.27.0 // indirect diff --git a/go.sum b/go.sum index 9ff7d48811..f34214bc14 100644 --- a/go.sum +++ b/go.sum @@ -630,10 +630,10 @@ cloud.google.com/go/workflows v1.7.0/go.mod h1:JhSrZuVZWuiDfKEFxU0/F1PQjmpnpcoIS cloud.google.com/go/workflows v1.8.0/go.mod h1:ysGhmEajwZxGn1OhGOGKsTXc5PyxOc0vfKf5Af+to4M= cloud.google.com/go/workflows v1.9.0/go.mod h1:ZGkj1aFIOd9c8Gerkjjq7OW7I5+l6cSvT3ujaO/WwSA= cloud.google.com/go/workflows v1.10.0/go.mod h1:fZ8LmRmZQWacon9UCX1r/g/DfAXx5VcPALq2CxzdePw= -cuelabs.dev/go/oci/ociregistry v0.0.0-20240404174027-a39bec0462d2 h1:BnG6pr9TTr6CYlrJznYUDj6V7xldD1W+1iXPum0wT/w= -cuelabs.dev/go/oci/ociregistry v0.0.0-20240404174027-a39bec0462d2/go.mod h1:pK23AUVXuNzzTpfMCA06sxZGeVQ/75FdVtW249de9Uo= -cuelang.org/go v0.9.2 h1:pfNiry2PdRBr02G/aKm5k2vhzmqbAOoaB4WurmEbWvs= -cuelang.org/go v0.9.2/go.mod h1:qpAYsLOf7gTM1YdEg6cxh553uZ4q9ZDWlPbtZr9q1Wk= +cuelabs.dev/go/oci/ociregistry v0.0.0-20240906074133-82eb438dd565 h1:R5wwEcbEZSBmeyg91MJZTxfd7WpBo2jPof3AYjRbxwY= +cuelabs.dev/go/oci/ociregistry v0.0.0-20240906074133-82eb438dd565/go.mod h1:5A4xfTzHTXfeVJBU6RAUf+QrlfTCW+017q/QiW+sMLg= +cuelang.org/go v0.11.1 h1:pV+49MX1mmvDm8Qh3Za3M786cty8VKPWzQ1Ho4gZRP0= +cuelang.org/go v0.11.1/go.mod h1:PBY6XvPUswPPJ2inpvUozP9mebDVTXaeehQikhZPBz0= dario.cat/mergo v1.0.0 h1:AGCNq9Evsj31mOgNPcLyXc+4PNABt905YmuqPYYpBWk= dario.cat/mergo v1.0.0/go.mod h1:uNxQE+84aUszobStD9th8a29P2fMDhsBdgRYvZOxGmk= dmitri.shuralyov.com/gpu/mtl v0.0.0-20190408044501-666a987793e9/go.mod h1:H6x//7gZCb22OMCxBHrMx7a5I7Hp++hsVxbQ4BYO7hU= @@ -991,8 +991,8 @@ github.com/couchbaselabs/gocbconnstr/v2 v2.0.0-20240607131231-fb385523de28 h1:lh github.com/couchbaselabs/gocbconnstr/v2 v2.0.0-20240607131231-fb385523de28/go.mod h1:o7T431UOfFVHDNvMBUmUxpHnhivwv7BziUao/nMl81E= github.com/cpuguy83/dockercfg v0.3.1 h1:/FpZ+JaygUR/lZP2NlFI2DVfrOEMAIKP5wWEJdoYe9E= github.com/cpuguy83/dockercfg v0.3.1/go.mod h1:sugsbF4//dDlL/i+S+rtpIWp+5h0BHJHfjj5/jFyUJc= -github.com/cpuguy83/go-md2man/v2 v2.0.4 h1:wfIWP927BUkWJb2NmU/kNDYIBTh/ziUX91+lVfRxZq4= -github.com/cpuguy83/go-md2man/v2 v2.0.4/go.mod h1:tgQtvFlXSQOSOSIRvRPT7W67SCa46tRHOmNcaadrF8o= +github.com/cpuguy83/go-md2man/v2 v2.0.5 h1:ZtcqGrnekaHpVLArFSe4HK5DoKx1T0rq2DwVB0alcyc= +github.com/cpuguy83/go-md2man/v2 v2.0.5/go.mod h1:tgQtvFlXSQOSOSIRvRPT7W67SCa46tRHOmNcaadrF8o= github.com/creack/pty v1.1.7/go.mod h1:lj5s0c3V2DBrqTV7llrYr5NG6My20zk30Fl46Y7DoTY= github.com/creack/pty v1.1.9/go.mod h1:oKZEueFk5CKHvIhNR5MUki03XCEU+Q6VDXinZuGJ33E= github.com/creack/pty v1.1.21 h1:1/QdRyBaHHJP61QkWMXlOIBfsgdDeeKfK8SYVUWJKf0= @@ -1043,8 +1043,8 @@ github.com/eapache/queue v1.1.0 h1:YOEu7KNc61ntiQlcEeUIoDTJ2o8mQznoNvUhiigpIqc= github.com/eapache/queue v1.1.0/go.mod h1:6eCeP0CKFpHLu8blIFXhExK/dRa7WDZfr6jVFPTqq+I= github.com/eclipse/paho.mqtt.golang v1.5.0 h1:EH+bUVJNgttidWFkLLVKaQPGmkTUfQQqjOsyvMGvD6o= github.com/eclipse/paho.mqtt.golang v1.5.0/go.mod h1:du/2qNQVqJf/Sqs4MEL77kR8QTqANF7XU7Fk0aOTAgk= -github.com/emicklei/proto v1.10.0 h1:pDGyFRVV5RvV+nkBK9iy3q67FBy9Xa7vwrOTE+g5aGw= -github.com/emicklei/proto v1.10.0/go.mod h1:rn1FgRS/FANiZdD2djyH7TMA9jdRDcYQ9IEN9yvjX0A= +github.com/emicklei/proto v1.13.2 h1:z/etSFO3uyXeuEsVPzfl56WNgzcvIr42aQazXaQmFZY= +github.com/emicklei/proto v1.13.2/go.mod h1:rn1FgRS/FANiZdD2djyH7TMA9jdRDcYQ9IEN9yvjX0A= github.com/envoyproxy/go-control-plane v0.9.0/go.mod h1:YTl/9mNaCwkRvm6d1a2C3ymFceY/DCBVvsKhRF0iEA4= github.com/envoyproxy/go-control-plane v0.9.1-0.20191026205805-5f8ba28d4473/go.mod h1:YTl/9mNaCwkRvm6d1a2C3ymFceY/DCBVvsKhRF0iEA4= github.com/envoyproxy/go-control-plane v0.9.4/go.mod h1:6rpuAdCZL397s3pYoYcLgu1mIlRU8Am5FuJP05cCM98= @@ -1066,8 +1066,8 @@ github.com/envoyproxy/protoc-gen-validate v0.10.1/go.mod h1:DRjgyB0I43LtJapqN6Ni github.com/envoyproxy/protoc-gen-validate v1.1.0 h1:tntQDh69XqOCOZsDz0lVJQez/2L6Uu2PdjCQwWCJ3bM= github.com/envoyproxy/protoc-gen-validate v1.1.0/go.mod h1:sXRDRVmzEbkM7CVcM06s9shE/m23dg3wzjl0UWqJ2q4= github.com/fatih/color v1.7.0/go.mod h1:Zm6kSWBoL9eyXnKyktHP6abPY2pDugNf5KwzbycvMj4= -github.com/fatih/color v1.17.0 h1:GlRw1BRJxkpqUCBKzKOw098ed57fEsKeNjpTe3cSjK4= -github.com/fatih/color v1.17.0/go.mod h1:YZ7TlrGPkiz6ku9fK3TLD/pl3CpsiFyu8N92HLgmosI= +github.com/fatih/color v1.18.0 h1:S8gINlzdQ840/4pfAwic/ZE0djQEH3wM94VfqLTZcOM= +github.com/fatih/color v1.18.0/go.mod h1:4FelSpRwEGDpQ12mAdzqdOukCy4u8WUtOY6lkT/6HfU= github.com/felixge/httpsnoop v1.0.4 h1:NFTV2Zj1bL4mc9sqWACXbQFVBBg2W3GPvqp8/ESS2Wg= github.com/felixge/httpsnoop v1.0.4/go.mod h1:m8KPJKqk1gH5J9DgRY2ASl2lWCfGKXixSwevea8zH2U= github.com/fogleman/gg v1.2.1-0.20190220221249-0403632d5b90/go.mod h1:R/bRT+9gY/C5z7JzPU0zXsXHKM4/ayA+zqcVNZzPa1k= @@ -1077,8 +1077,8 @@ github.com/fortytw2/leaktest v1.3.0 h1:u8491cBMTQ8ft8aeV+adlcytMZylmA5nnwwkRZjI8 github.com/fortytw2/leaktest v1.3.0/go.mod h1:jDsjWgpAGjm2CA7WthBh/CdZYEPF31XHquHwclZch5g= github.com/frankban/quicktest v1.14.6 h1:7Xjx+VpznH+oBnejlPUj8oUpdxnVs4f8XU8WnHkI4W8= github.com/frankban/quicktest v1.14.6/go.mod h1:4ptaffx2x8+WTWXmUCuVU6aPUX1/Mz7zb5vbUoiM6w0= -github.com/fsnotify/fsnotify v1.7.0 h1:8JEhPFa5W2WU7YfeZzPNqzMP6Lwt7L2715Ggo0nosvA= -github.com/fsnotify/fsnotify v1.7.0/go.mod h1:40Bi/Hjc2AVfZrqy+aj+yEI+/bRxZnMJyTJwOpGvigM= +github.com/fsnotify/fsnotify v1.8.0 h1:dAwr6QBTBZIkG8roQaJjGof0pp0EeF+tNV7YBP3F/8M= +github.com/fsnotify/fsnotify v1.8.0/go.mod h1:8jBTzvmWwFyi3Pb8djgCCO5IBqzKJ/Jwo8TRcHyHii0= github.com/gabriel-vasile/mimetype v1.4.7 h1:SKFKl7kD0RiPdbht0s7hFtjl489WcQ1VyPW8ZzUMYCA= github.com/gabriel-vasile/mimetype v1.4.7/go.mod h1:GDlAgAyIRT27BhFl53XNAFtfjzOkLaF35JdEG0P7LtU= github.com/gdamore/optopia v0.2.0/go.mod h1:YKYEwo5C1Pa617H7NlPcmQXl+vG6YnSSNB44n8dNL0Q= @@ -1308,8 +1308,8 @@ github.com/gosimple/slug v1.14.0 h1:RtTL/71mJNDfpUbCOmnf/XFkzKRtD6wL6Uy+3akm4Es= github.com/gosimple/slug v1.14.0/go.mod h1:UiRaFH+GEilHstLUmcBgWcI42viBN7mAb818JrYOeFQ= github.com/gosimple/unidecode v1.0.1 h1:hZzFTMMqSswvf0LBJZCZgThIZrpDHFXux9KeGmn6T/o= github.com/gosimple/unidecode v1.0.1/go.mod h1:CP0Cr1Y1kogOtx0bJblKzsVWrqYaqfNOnHzpgWw4Awc= -github.com/govalues/decimal v0.1.29 h1:GKC5g9y9oWxKIy51czdHTShOABwHm/shVuOVPwG415M= -github.com/govalues/decimal v0.1.29/go.mod h1:LUlHHucpCmA4rJfNrDvMgrWibDpYnDNWqJuNU1/gxW8= +github.com/govalues/decimal v0.1.32 h1:jsZHwjLKteAlG5nGjlqvhtkGBq7/4SKkk6yGTluwPk0= +github.com/govalues/decimal v0.1.32/go.mod h1:Ee7eI3Llf7hfqDZtpj8Q6NCIgJy1iY3kH1pSwDrNqlM= github.com/grpc-ecosystem/go-grpc-middleware v1.4.0 h1:UH//fgunKIs4JdUbpDl1VZCDaL56wXCB/5+wF6uHfaI= github.com/grpc-ecosystem/go-grpc-middleware v1.4.0/go.mod h1:g5qyo/la0ALbONm6Vbp88Yd8NsDy6rZz+RcrMPxvld8= github.com/grpc-ecosystem/grpc-gateway v1.16.0/go.mod h1:BDjrQk3hbvj6Nolgz8mAMFbcEtjT1g+wF4CSlocrBnw= @@ -1365,8 +1365,8 @@ github.com/influxdata/go-syslog/v3 v3.0.0 h1:jichmjSZlYK0VMmlz+k4WeOQd7z745YLsvG github.com/influxdata/go-syslog/v3 v3.0.0/go.mod h1:tulsOp+CecTAYC27u9miMgq21GqXRW6VdKbOG+QSP4Q= github.com/influxdata/influxdb1-client v0.0.0-20220302092344-a9ab5670611c h1:qSHzRbhzK8RdXOsAdfDgO49TtqC1oZ+acxPrkfTxcCs= github.com/influxdata/influxdb1-client v0.0.0-20220302092344-a9ab5670611c/go.mod h1:qj24IKcXYK6Iy9ceXlo3Tc+vtHo9lIhSX5JddghvEPo= -github.com/itchyny/gojq v0.12.16 h1:yLfgLxhIr/6sJNVmYfQjTIv0jGctu6/DgDoivmxTr7g= -github.com/itchyny/gojq v0.12.16/go.mod h1:6abHbdC2uB9ogMS38XsErnfqJ94UlngIJGlRAIj4jTM= +github.com/itchyny/gojq v0.12.17 h1:8av8eGduDb5+rvEdaOO+zQUjA04MS0m3Ps8HiD+fceg= +github.com/itchyny/gojq v0.12.17/go.mod h1:WBrEMkgAfAGO1LUcGOckBl5O726KPp+OlkKug0I/FEY= github.com/itchyny/timefmt-go v0.1.6 h1:ia3s54iciXDdzWzwaVKXZPbiXzxxnv1SPGFfM/myJ5Q= github.com/itchyny/timefmt-go v0.1.6/go.mod h1:RRDZYC5s9ErkjQvTvvU7keJjxUYzIISJGxm9/mAERQg= github.com/jackc/chunkreader v1.0.0/go.mod h1:RT6O25fNZIuasFJRyZ4R/Y2BbhasbmZXF9QQ7T3kePo= @@ -1631,8 +1631,8 @@ github.com/ollama/ollama v0.5.4 h1:CzsHBNDeli5hiqe8yj7M4cg8X7qnFg2B3fFNhaUmHw0= github.com/ollama/ollama v0.5.4/go.mod h1:etr//7OWrZeFfWnnx5QHeH435jHBBsNtjntDP7WVxco= github.com/onsi/ginkgo v1.16.5 h1:8xi0RTUf59SOSfEtZMvwTvXYMzG4gV23XVHOZiXNtnE= github.com/onsi/ginkgo v1.16.5/go.mod h1:+E8gABHa3K6zRBolWtd+ROzc/U5bkGt0FwiG042wbpU= -github.com/onsi/gomega v1.34.2 h1:pNCwDkzrsv7MS9kpaQvVb1aVLahQXyJ/Tv5oAZMI3i8= -github.com/onsi/gomega v1.34.2/go.mod h1:v1xfxRgk0KIsG+QOdm7p8UosrOzPYRo60fd3B/1Dukc= +github.com/onsi/gomega v1.35.0 h1:xuM1M/UvMp9BCdS4hojhS9/4jEuVqS9Er3bqupeaoPM= +github.com/onsi/gomega v1.35.0/go.mod h1:PvZbdDc8J6XJEpDK4HCuRBm8a6Fzp9/DmhC9C7yFlog= github.com/opencontainers/go-digest v1.0.0 h1:apOUWs51W5PlhuyGyz9FCeeBIOUDA/6nW8Oi/yOhh5U= github.com/opencontainers/go-digest v1.0.0/go.mod h1:0JzlMkj0TRzQZfJkVvzbP0HBR3IKzErnv2BNG4W4MAM= github.com/opencontainers/image-spec v1.1.0 h1:8SG7/vwALn54lVB/0yZ/MMwhFrPYtpEHQb2IpWsCzug= @@ -1657,6 +1657,9 @@ github.com/paulmach/protoscan v0.2.1/go.mod h1:SpcSwydNLrxUGSDvXvO0P7g7AuhJ7lcKf github.com/pborman/getopt v0.0.0-20180729010549-6fdd0a2c7117/go.mod h1:85jBQOZwpVEaDAr341tbn15RS4fCAsIst0qp7i8ex1o= github.com/pebbe/zmq4 v1.2.11 h1:Ua5mgIaZeabUGnH7tqswkUcjkL7JYGai5e8v4hpEU9Q= github.com/pebbe/zmq4 v1.2.11/go.mod h1:nqnPueOapVhE2wItZ0uOErngczsJdLOGkebMxaO8r48= +github.com/pelletier/go-toml v1.9.5 h1:4yBQzkHv+7BHq2PQUZF3Mx0IYxG7LsP222s7Agd3ve8= +github.com/pelletier/go-toml/v2 v2.2.3 h1:YmeHyLY8mFWbdkNWwpr+qIL2bEqT0o95WSdkNHvL12M= +github.com/pelletier/go-toml/v2 v2.2.3/go.mod h1:MfCQTFTvCcUyyvvwm1+G6H/jORL20Xlb6rzQu9GuUkc= github.com/pgvector/pgvector-go v0.2.2 h1:Q/oArmzgbEcio88q0tWQksv/u9Gnb1c3F1K2TnalxR0= github.com/pgvector/pgvector-go v0.2.2/go.mod h1:u5sg3z9bnqVEdpe1pkTij8/rFhTaMCMNyQagPDLK8gQ= github.com/phpdave11/gofpdf v1.4.2/go.mod h1:zpO6xFn9yxo3YLyMvW8HcKWVdbNqgIfOOp2dXMnm1mY= @@ -1666,8 +1669,8 @@ github.com/pierrec/lz4 v2.6.1+incompatible h1:9UY3+iC23yxF0UfGaYrGplQ+79Rg+h/q9F github.com/pierrec/lz4 v2.6.1+incompatible/go.mod h1:pdkljMzZIN41W+lC3N2tnIh5sFi+IEE17M5jbnwPHcY= github.com/pierrec/lz4/v4 v4.1.8/go.mod h1:gZWDp/Ze/IJXGXf23ltt2EXimqmTUXEy0GFuRQyBid4= github.com/pierrec/lz4/v4 v4.1.15/go.mod h1:gZWDp/Ze/IJXGXf23ltt2EXimqmTUXEy0GFuRQyBid4= -github.com/pierrec/lz4/v4 v4.1.21 h1:yOVMLb6qSIDP67pl/5F7RepeKYu/VmTyEXvuMI5d9mQ= -github.com/pierrec/lz4/v4 v4.1.21/go.mod h1:gZWDp/Ze/IJXGXf23ltt2EXimqmTUXEy0GFuRQyBid4= +github.com/pierrec/lz4/v4 v4.1.22 h1:cKFw6uJDK+/gfw5BcDL0JL5aBsAFdsIT18eRtLj7VIU= +github.com/pierrec/lz4/v4 v4.1.22/go.mod h1:gZWDp/Ze/IJXGXf23ltt2EXimqmTUXEy0GFuRQyBid4= github.com/pinecone-io/go-pinecone v1.0.0 h1:90euw+0EKSgdeE9q7iGSTVmdx9r9+x3mxWkrCCLab+o= github.com/pinecone-io/go-pinecone v1.0.0/go.mod h1:KfJhn4yThX293+fbtrZLnxe2PJYo8557Py062W4FYKk= github.com/pingcap/errors v0.11.0/go.mod h1:Oi8TUi2kEtXXLMJk9l1cGmz20kV3TaQ0usTwv5KuLY8= @@ -1724,8 +1727,8 @@ github.com/prometheus/procfs v0.0.8/go.mod h1:7Qr8sr6344vo1JqZ6HhLceV9o3AJ1Ff+Gx github.com/prometheus/procfs v0.7.3/go.mod h1:cz+aTbrPOrUb4q7XlbU9ygM+/jj0fzG6c1xBZuNvfVA= github.com/prometheus/procfs v0.15.1 h1:YagwOFzUgYfKKHX6Dr+sHT7km/hxC76UB0learggepc= github.com/prometheus/procfs v0.15.1/go.mod h1:fB45yRUv8NstnjriLhBQLuOUt+WW4BsoGhij/e3PBqk= -github.com/protocolbuffers/txtpbfmt v0.0.0-20230328191034-3462fbc510c0 h1:sadMIsgmHpEOGbUs6VtHBXRR1OHevnj7hLx9ZcdNGW4= -github.com/protocolbuffers/txtpbfmt v0.0.0-20230328191034-3462fbc510c0/go.mod h1:jgxiZysxFPM+iWKwQwPR+y+Jvo54ARd4EisXxKYpB5c= +github.com/protocolbuffers/txtpbfmt v0.0.0-20240823084532-8e6b51fa9bef h1:ej+64jiny5VETZTqcc1GFVAPEtaSk6U1D0kKC2MS5Yc= +github.com/protocolbuffers/txtpbfmt v0.0.0-20240823084532-8e6b51fa9bef/go.mod h1:jgxiZysxFPM+iWKwQwPR+y+Jvo54ARd4EisXxKYpB5c= github.com/pusher/pusher-http-go v4.0.1+incompatible h1:4u6tomPG1WhHaST7Wi9mw83Y+MS/j2EplR2YmDh8Xp4= github.com/pusher/pusher-http-go v4.0.1+incompatible/go.mod h1:XAv1fxRmVTI++2xsfofDhg7whapsLRG/gH/DXbF3a18= github.com/qdrant/go-client v1.11.1 h1:kla7n21wSEWWZLrvpttTOnCppDm6jluYDZEFe2kJ8zs= @@ -1742,8 +1745,8 @@ github.com/rcrowley/go-metrics v0.0.0-20201227073835-cf1acfcdf475 h1:N/ElC8H3+5X github.com/rcrowley/go-metrics v0.0.0-20201227073835-cf1acfcdf475/go.mod h1:bCqnVzQkZxMG4s8nGwiZ5l3QUCyqpo9Y+/ZMZ9VjZe4= github.com/redis/go-redis/v9 v9.7.0 h1:HhLSs+B6O021gwzl+locl0zEDnyNkxMtf/Z3NNBMa9E= github.com/redis/go-redis/v9 v9.7.0/go.mod h1:f6zhXITC7JUJIlPEiBOTXxJgPLdZcA93GewI7inzyWw= -github.com/redpanda-data/benthos/v4 v4.42.0 h1:3sKmHhdC1t/IH63oTzlYurfJaO0TsEWSEKeiE6FIvG8= -github.com/redpanda-data/benthos/v4 v4.42.0/go.mod h1:T5Nb0hH1Sa1ChlH4hLW7+nA1+jQ/3CP/cVFI73z6ZIM= +github.com/redpanda-data/benthos/v4 v4.43.0 h1:DO/LsiBmbb/fykI8RtPhmFz1TsGKUWDnr/ImOTSiuHs= +github.com/redpanda-data/benthos/v4 v4.43.0/go.mod h1:2dUKS2373CZQ5lzp1qm6XflDhlm1mbhMGXBVoEuZMFQ= github.com/redpanda-data/common-go/secrets v0.1.2 h1:UCDLN/yL8yjSIYhS5MB+2Am1Jy4XZMZPtuuCRL/82Rw= github.com/redpanda-data/common-go/secrets v0.1.2/go.mod h1:WjaDI39reE/GPRPHTsaYmiMjhHj+qsSJLe+kHsPKsXk= github.com/redpanda-data/connect/public/bundle/free/v4 v4.31.0 h1:Qiz4Q8ZO17n8797hgDdJ2f1XN7wh6J2hIRgeeSw4F24= @@ -1751,8 +1754,8 @@ github.com/redpanda-data/connect/public/bundle/free/v4 v4.31.0/go.mod h1:ISgO+/k github.com/remyoudompheng/bigfft v0.0.0-20200410134404-eec4a21b6bb0/go.mod h1:qqbHyh8v60DhA7CoWK5oRCqLrMHRGoxYCSS9EjAz6Eo= github.com/remyoudompheng/bigfft v0.0.0-20230129092748-24d4a6f8daec h1:W09IVJc94icq4NjY3clb7Lk8O1qJ8BdBEF8z0ibU0rE= github.com/remyoudompheng/bigfft v0.0.0-20230129092748-24d4a6f8daec/go.mod h1:qqbHyh8v60DhA7CoWK5oRCqLrMHRGoxYCSS9EjAz6Eo= -github.com/rickb777/period v1.0.6 h1:f4TcHBtL/4qa4D44eqgxs7785/kfLKUjRI7XYI2HCvk= -github.com/rickb777/period v1.0.6/go.mod h1:TKkPHI/WSyjjVdeVCyqwBoQg0Cdb/jRvnc8FFdq2cgw= +github.com/rickb777/period v1.0.7 h1:IERfp7bk5a1w8rZ6A98CLScQbRQX+p/O5x3HwytKBwI= +github.com/rickb777/period v1.0.7/go.mod h1:M13FB5SGZf4zJmF/zfLDqwfQ0XafHxgOsw6DAL0EFw0= github.com/rickb777/plural v1.4.2 h1:Kl/syFGLFZ5EbuV8c9SVud8s5HI2HpCCtOMw2U1kS+A= github.com/rickb777/plural v1.4.2/go.mod h1:kdmXUpmKBJTS0FtG/TFumd//VBWsNTD7zOw7x4umxNw= github.com/rivo/uniseg v0.2.0/go.mod h1:J6wj4VEh+S6ZtnVlnTBMWIodfgj8LQOQFoIToxlJtxc= @@ -1764,8 +1767,8 @@ github.com/rogpeppe/fastuuid v1.2.0/go.mod h1:jVj6XXZzXRy/MSR5jhDC/2q6DgLz+nrA6L github.com/rogpeppe/go-internal v1.3.0/go.mod h1:M8bDsm7K2OlrFYOpmOWEs/qY81heoFRclV5y23lUDJ4= github.com/rogpeppe/go-internal v1.6.1/go.mod h1:xXDCJY+GAPziupqXw64V24skbSoqbTEfhy4qGm1nDQc= github.com/rogpeppe/go-internal v1.9.0/go.mod h1:WtVeX8xhTBvf0smdhujwtBcq4Qrzq/fJaraNFVN+nFs= -github.com/rogpeppe/go-internal v1.12.0 h1:exVL4IDcn6na9z1rAb56Vxr+CgyK3nn3O+epU5NdKM8= -github.com/rogpeppe/go-internal v1.12.0/go.mod h1:E+RYuTGaKKdloAfM02xzb0FW3Paa99yedzYV+kq4uf4= +github.com/rogpeppe/go-internal v1.13.1 h1:KvO1DLK/DRN07sQ1LQKScxyZJuNnedQ5/wKSR38lUII= +github.com/rogpeppe/go-internal v1.13.1/go.mod h1:uMEvuHeurkdAXX61udpOXGD/AzZDWNMNyH2VO9fmH0o= github.com/rs/xid v1.2.1/go.mod h1:+uKXf+4Djp6Md1KODXJxgGQPKngRmWyn10oCKFzNHOQ= github.com/rs/xid v1.5.0 h1:mKX4bl4iPYJtEIxp6CYiUuLQ/8DYMoz0PUdtGgMFRVc= github.com/rs/xid v1.5.0/go.mod h1:trrq9SKmegXys3aeAKXMUTdJsYXVwGY3RLcfgqegfbg= @@ -1843,8 +1846,8 @@ github.com/stretchr/testify v1.8.0/go.mod h1:yNjHg4UonilssWZ8iaSj1OCr/vHnekPRkoO github.com/stretchr/testify v1.8.1/go.mod h1:w2LPCIKwWwSfY2zedu0+kehJoqGctiVI29o6fzry7u4= github.com/stretchr/testify v1.8.2/go.mod h1:w2LPCIKwWwSfY2zedu0+kehJoqGctiVI29o6fzry7u4= github.com/stretchr/testify v1.8.3/go.mod h1:sz/lmYIOXD/1dqDmKjjqLyZ2RngseejIcXlSw2iwfAo= -github.com/stretchr/testify v1.9.0 h1:HtqpIVDClZ4nwg75+f6Lvsy/wHu+3BoSGCbBAcpTsTg= -github.com/stretchr/testify v1.9.0/go.mod h1:r2ic/lqez/lEtzL7wO/rwa5dbSLXVDPFyf8C91i36aY= +github.com/stretchr/testify v1.10.0 h1:Xv5erBjTwe/5IxqUQTdXv5kgmIvbHo3QQyRwhJsOfJA= +github.com/stretchr/testify v1.10.0/go.mod h1:r2ic/lqez/lEtzL7wO/rwa5dbSLXVDPFyf8C91i36aY= github.com/testcontainers/testcontainers-go v0.33.0 h1:zJS9PfXYT5O0ZFXM2xxXfk4J5UMw/kRiISng037Gxdw= github.com/testcontainers/testcontainers-go v0.33.0/go.mod h1:W80YpTa8D5C3Yy16icheD01UTDu+LmXIA2Keo+jWtT8= github.com/testcontainers/testcontainers-go/modules/ollama v0.32.0 h1:nuYlIE4zOGd8m+TzjY0v41kyfYre3inp/iw1p4qn2eU= @@ -1880,12 +1883,12 @@ github.com/trivago/grok v1.0.0/go.mod h1:9t59xLInhrncYq9a3J7488NgiBZi5y5yC7bss+w github.com/trivago/tgo v1.0.7 h1:uaWH/XIy9aWYWpjm2CU3RpcqZXmX2ysQ9/Go+d9gyrM= github.com/trivago/tgo v1.0.7/go.mod h1:w4dpD+3tzNIIiIfkWWa85w5/B77tlvdZckQ+6PkFnhc= github.com/tv42/httpunix v0.0.0-20150427012821-b75d8614f926/go.mod h1:9ESjWnEqriFuLhtthL60Sar/7RFoluCcXsuvEwTV5KM= -github.com/twmb/franz-go v1.17.1 h1:0LwPsbbJeJ9R91DPUHSEd4su82WJWcTY1Zzbgbg4CeQ= -github.com/twmb/franz-go v1.17.1/go.mod h1:NreRdJ2F7dziDY/m6VyspWd6sNxHKXdMZI42UfQ3GXM= +github.com/twmb/franz-go v1.18.0 h1:25FjMZfdozBywVX+5xrWC2W+W76i0xykKjTdEeD2ejw= +github.com/twmb/franz-go v1.18.0/go.mod h1:zXCGy74M0p5FbXsLeASdyvfLFsBvTubVqctIaa5wQ+I= github.com/twmb/franz-go/pkg/kadm v1.13.0 h1:bJq4C2ZikUE2jh/wl9MtMTQ/kpmnBgVFh8XMQBEC+60= github.com/twmb/franz-go/pkg/kadm v1.13.0/go.mod h1:VMvpfjz/szpH9WB+vGM+rteTzVv0djyHFimci9qm2C0= -github.com/twmb/franz-go/pkg/kmsg v1.8.0 h1:lAQB9Z3aMrIP9qF9288XcFf/ccaSxEitNA1CDTEIeTA= -github.com/twmb/franz-go/pkg/kmsg v1.8.0/go.mod h1:HzYEb8G3uu5XevZbtU0dVbkphaKTHk0X68N5ka4q6mU= +github.com/twmb/franz-go/pkg/kmsg v1.9.0 h1:JojYUph2TKAau6SBtErXpXGC7E3gg4vGZMv9xFU/B6M= +github.com/twmb/franz-go/pkg/kmsg v1.9.0/go.mod h1:CMbfazviCyY6HM0SXuG5t9vOwYDHRCSrJJyBAe5paqg= github.com/twmb/franz-go/pkg/sr v1.2.0 h1:zYr0Ly7KLFfeCGaSr8teN6LvAVeYVrZoUsyyPHTYB+M= github.com/twmb/franz-go/pkg/sr v1.2.0/go.mod h1:gpd2Xl5/prkj3gyugcL+rVzagjaxFqMgvKMYcUlrpDw= github.com/uptrace/bun v1.1.12 h1:sOjDVHxNTuM6dNGaba0wUuz7KvDE1BmNu9Gqs2gJSXQ= @@ -1894,8 +1897,8 @@ github.com/uptrace/bun/dialect/pgdialect v1.1.12 h1:m/CM1UfOkoBTglGO5CUTKnIKKOAp github.com/uptrace/bun/dialect/pgdialect v1.1.12/go.mod h1:Ij6WIxQILxLlL2frUBxUBOZJtLElD2QQNDcu/PWDHTc= github.com/uptrace/bun/driver/pgdriver v1.1.12 h1:3rRWB1GK0psTJrHwxzNfEij2MLibggiLdTqjTtfHc1w= github.com/uptrace/bun/driver/pgdriver v1.1.12/go.mod h1:ssYUP+qwSEgeDDS1xm2XBip9el1y9Mi5mTAvLoiADLM= -github.com/urfave/cli/v2 v2.27.4 h1:o1owoI+02Eb+K107p27wEX9Bb8eqIoZCfLXloLUSWJ8= -github.com/urfave/cli/v2 v2.27.4/go.mod h1:m4QzxcD2qpra4z7WhzEGn74WZLViBnMpb1ToCAKdGRQ= +github.com/urfave/cli/v2 v2.27.5 h1:WoHEJLdsXr6dDWoJgMq/CboDmyY/8HMMH1fTECbih+w= +github.com/urfave/cli/v2 v2.27.5/go.mod h1:3Sevf16NykTbInEnD0yKkjDAeZDS0A6bzhBH5hrMvTQ= github.com/vmihailenco/bufpool v0.1.11 h1:gOq2WmBrq0i2yW5QJ16ykccQ4wH9UyEsgLm6czKAd94= github.com/vmihailenco/bufpool v0.1.11/go.mod h1:AFf/MOy3l2CFTKbxwt0mp2MwnqjNEs5H/UxrkA5jxTQ= github.com/vmihailenco/msgpack/v5 v5.3.5/go.mod h1:7xyJ9e+0+9SaZT0Wt1RGleJXzli6Q/V5KbhBonMG9jc= @@ -1969,14 +1972,16 @@ go.opencensus.io v0.22.5/go.mod h1:5pWMHQbX5EPX2/62yrJeAkowc+lfs/XD7Uxpq3pI6kk= go.opencensus.io v0.23.0/go.mod h1:XItmlyltB5F7CS4xOC1DcqMoFqwtC6OG2xF7mCv7P7E= go.opencensus.io v0.24.0 h1:y73uSU6J157QMP2kn2r30vwW1A2W2WFwSCGnAVxeaD0= go.opencensus.io v0.24.0/go.mod h1:vNK8G9p7aAivkbmorf4v+7Hgx+Zs0yY+0fOtgBfjQKo= +go.opentelemetry.io/auto/sdk v1.1.0 h1:cH53jehLUN6UFLY71z+NDOiNJqDdPRaXzTel0sJySYA= +go.opentelemetry.io/auto/sdk v1.1.0/go.mod h1:3wSPjt5PWp2RhlCcmmOial7AvC4DQqZb7a7wCow3W8A= go.opentelemetry.io/contrib/detectors/gcp v1.29.0 h1:TiaiXB4DpGD3sdzNlYQxruQngn5Apwzi1X0DRhuGvDQ= go.opentelemetry.io/contrib/detectors/gcp v1.29.0/go.mod h1:GW2aWZNwR2ZxDLdv8OyC2G8zkRoQBuURgV7RPQgcPoU= go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc v0.54.0 h1:r6I7RJCN86bpD/FQwedZ0vSixDpwuWREjW9oRMsmqDc= go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc v0.54.0/go.mod h1:B9yO6b04uB80CzjedvewuqDhxJxi11s7/GtiGa8bAjI= go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp v0.54.0 h1:TT4fX+nBOA/+LUkobKGW1ydGcn+G3vRw9+g5HwCphpk= go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp v0.54.0/go.mod h1:L7UH0GbB0p47T4Rri3uHjbpCFYrVrwc1I25QhNPiGK8= -go.opentelemetry.io/otel v1.29.0 h1:PdomN/Al4q/lN6iBJEN3AwPvUiHPMlt93c8bqTG5Llw= -go.opentelemetry.io/otel v1.29.0/go.mod h1:N/WtXPs1CNCUEx+Agz5uouwCba+i+bJGFicT8SR4NP8= +go.opentelemetry.io/otel v1.33.0 h1:/FerN9bax5LoK51X/sI0SVYrjSE0/yUL7DpxW4K3FWw= +go.opentelemetry.io/otel v1.33.0/go.mod h1:SUUkR6csvUQl+yjReHu5uM3EtVV7MBm5FHKRlNx4I8I= go.opentelemetry.io/otel/exporters/jaeger v1.17.0 h1:D7UpUy2Xc2wsi1Ras6V40q806WM07rqoCWzXu7Sqy+4= go.opentelemetry.io/otel/exporters/jaeger v1.17.0/go.mod h1:nPCqOnEH9rNLKqH/+rrUjiMzHJdV1BlpKcTwRTyKkKI= go.opentelemetry.io/otel/exporters/otlp/otlptrace v1.28.0 h1:3Q/xZUyC1BBkualc9ROb4G8qkH90LXEIICcs5zv1OYY= @@ -1985,14 +1990,14 @@ go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc v1.28.0 h1:R3X6Z go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc v1.28.0/go.mod h1:QWFXnDavXWwMx2EEcZsf3yxgEKAqsxQ+Syjp+seyInw= go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp v1.28.0 h1:j9+03ymgYhPKmeXGk5Zu+cIZOlVzd9Zv7QIiyItjFBU= go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp v1.28.0/go.mod h1:Y5+XiUG4Emn1hTfciPzGPJaSI+RpDts6BnCIir0SLqk= -go.opentelemetry.io/otel/metric v1.29.0 h1:vPf/HFWTNkPu1aYeIsc98l4ktOQaL6LeSoeV2g+8YLc= -go.opentelemetry.io/otel/metric v1.29.0/go.mod h1:auu/QWieFVWx+DmQOUMgj0F8LHWdgalxXqvp7BII/W8= +go.opentelemetry.io/otel/metric v1.33.0 h1:r+JOocAyeRVXD8lZpjdQjzMadVZp2M4WmQ+5WtEnklQ= +go.opentelemetry.io/otel/metric v1.33.0/go.mod h1:L9+Fyctbp6HFTddIxClbQkjtubW6O9QS3Ann/M82u6M= go.opentelemetry.io/otel/sdk v1.29.0 h1:vkqKjk7gwhS8VaWb0POZKmIEDimRCMsopNYnriHyryo= go.opentelemetry.io/otel/sdk v1.29.0/go.mod h1:pM8Dx5WKnvxLCb+8lG1PRNIDxu9g9b9g59Qr7hfAAok= go.opentelemetry.io/otel/sdk/metric v1.29.0 h1:K2CfmJohnRgvZ9UAj2/FhIf/okdWcNdBwe1m8xFXiSY= go.opentelemetry.io/otel/sdk/metric v1.29.0/go.mod h1:6zZLdCl2fkauYoZIOn/soQIDSWFmNSRcICarHfuhNJQ= -go.opentelemetry.io/otel/trace v1.29.0 h1:J/8ZNK4XgR7a21DZUAsbF8pZ5Jcw1VhACmnYt39JTi4= -go.opentelemetry.io/otel/trace v1.29.0/go.mod h1:eHl3w0sp3paPkYstJOmAimxhiFXPg+MMTlEh3nsQgWQ= +go.opentelemetry.io/otel/trace v1.33.0 h1:cCJuF7LRjUFso9LPnEAHJDB2pqzp+hbO8eu1qqW2d/s= +go.opentelemetry.io/otel/trace v1.33.0/go.mod h1:uIcdVUZMpTAmz0tI1z04GoVSezK37CbGV4fr1f2nBck= go.opentelemetry.io/proto/otlp v0.7.0/go.mod h1:PqfVotwruBrMGOCsRd/89rSnXhoiJIqeYNgFYFoEGnI= go.opentelemetry.io/proto/otlp v0.15.0/go.mod h1:H7XAot3MsfNsj7EXtrA2q5xSNQ10UqI405h3+duxN4U= go.opentelemetry.io/proto/otlp v0.19.0/go.mod h1:H7XAot3MsfNsj7EXtrA2q5xSNQ10UqI405h3+duxN4U= diff --git a/internal/asyncroutine/periodic.go b/internal/asyncroutine/periodic.go index 53a922e5e0..1d278d85ee 100644 --- a/internal/asyncroutine/periodic.go +++ b/internal/asyncroutine/periodic.go @@ -54,7 +54,7 @@ func NewPeriodicWithContext(duration time.Duration, work func(context.Context)) // Start starts the `Periodic` work. // -// It does not do work immedately, only after the time has passed. +// It does not do work immediately, only after the time has passed. func (p *Periodic) Start() { if p.cancel != nil { return diff --git a/internal/impl/kafka/enterprise/integration_test.go b/internal/impl/kafka/enterprise/integration_test.go index f5a077f533..96976fa28d 100644 --- a/internal/impl/kafka/enterprise/integration_test.go +++ b/internal/impl/kafka/enterprise/integration_test.go @@ -38,6 +38,7 @@ import ( "github.com/redpanda-data/connect/v4/internal/impl/kafka/enterprise" "github.com/redpanda-data/connect/v4/internal/license" "github.com/redpanda-data/connect/v4/internal/protoconnect" + _ "github.com/redpanda-data/connect/v4/public/components/confluent" ) func createKafkaTopic(ctx context.Context, address, id string, partitions int32) error { @@ -97,57 +98,24 @@ func TestKafkaEnterpriseIntegration(t *testing.T) { pool, err := dockertest.NewPool("") require.NoError(t, err) - - kafkaPort, err := integration.GetFreePort() - require.NoError(t, err) - - kafkaPortStr := strconv.Itoa(kafkaPort) - options := &dockertest.RunOptions{ - Repository: "redpandadata/redpanda", - Tag: "latest", - Hostname: "redpanda", - ExposedPorts: []string{"9092/tcp"}, - PortBindings: map[docker.Port][]docker.PortBinding{ - "9092/tcp": {{HostIP: "", HostPort: kafkaPortStr + "/tcp"}}, - }, - Cmd: []string{ - "redpanda", - "start", - "--node-id 0", - "--mode dev-container", - "--set rpk.additional_start_flags=[--reactor-backend=epoll]", - "--kafka-addr 0.0.0.0:9092", - fmt.Sprintf("--advertise-kafka-addr localhost:%v", kafkaPort), - }, - } - - brokerAddr := "localhost:" + kafkaPortStr - pool.MaxWait = time.Minute - resource, err := pool.RunWithOptions(options) + + container, err := startRedpanda(t, pool, true, true) require.NoError(t, err) - t.Cleanup(func() { - assert.NoError(t, pool.Purge(resource)) - }) ctx, done := context.WithTimeout(context.Background(), time.Minute*3) defer done() - _ = resource.Expire(900) - require.NoError(t, pool.Retry(func() error { - return createKafkaTopic(ctx, brokerAddr, "testingconnection", 1) - })) - t.Run("test_logs_happy", func(t *testing.T) { - testLogsHappy(ctx, t, brokerAddr) + testLogsHappy(ctx, t, container.brokerAddr) }) t.Run("test_status_happy", func(t *testing.T) { - testStatusHappy(ctx, t, brokerAddr) + testStatusHappy(ctx, t, container.brokerAddr) }) t.Run("test_logs_close_flush", func(t *testing.T) { - testLogsCloseFlush(ctx, t, brokerAddr) + testLogsCloseFlush(ctx, t, container.brokerAddr) }) } @@ -293,74 +261,20 @@ max_message_bytes: 1MB assert.Equal(t, "buz", string(outRecords[1].Key)) } -func startSchemaRegistry(t *testing.T, pool *dockertest.Pool) int { - // TODO: Generalise this helper for the other Kafka tests here which use Redpanda... +func createSchema(t *testing.T, url string, subject string, schema string, references []franz_sr.SchemaReference) { t.Helper() - options := &dockertest.RunOptions{ - Repository: "redpandadata/redpanda", - Tag: "latest", - Hostname: "redpanda", - ExposedPorts: []string{"8081/tcp"}, - Cmd: []string{ - "redpanda", - "start", - "--node-id 0", - "--mode dev-container", - "--set rpk.additional_start_flags=[--reactor-backend=epoll]", - "--schema-registry-addr 0.0.0.0:8081", - }, - } - - resource, err := pool.RunWithOptions(options) - require.NoError(t, err) - t.Cleanup(func() { - assert.NoError(t, pool.Purge(resource)) - }) - - port, err := strconv.Atoi(resource.GetPort("8081/tcp")) - require.NoError(t, err) - - _ = resource.Expire(900) - require.NoError(t, pool.Retry(func() error { - ctx, done := context.WithTimeout(context.Background(), 3*time.Second) - defer done() - - req, err := http.NewRequestWithContext(ctx, http.MethodGet, fmt.Sprintf("http://localhost:%d/subjects", port), nil) - if err != nil { - return err - } - - resp, err := http.DefaultClient.Do(req) - if err != nil { - return err - } - defer resp.Body.Close() - - if resp.StatusCode != http.StatusOK { - return errors.New("invalid status") - } - - return nil - })) - - return port -} - -func createSchema(t *testing.T, port int, subject string, schema string, references []franz_sr.SchemaReference) { - t.Helper() - - client, err := franz_sr.NewClient(franz_sr.URLs(fmt.Sprintf("http://localhost:%d", port))) + client, err := franz_sr.NewClient(franz_sr.URLs(url)) require.NoError(t, err) _, err = client.CreateSchema(context.Background(), subject, franz_sr.Schema{Schema: schema, References: references}) require.NoError(t, err) } -func deleteSubject(t *testing.T, port int, subject string, hardDelete bool) { +func deleteSubject(t *testing.T, url string, subject string, hardDelete bool) { t.Helper() - client, err := franz_sr.NewClient(franz_sr.URLs(fmt.Sprintf("http://localhost:%d", port))) + client, err := franz_sr.NewClient(franz_sr.URLs(url)) require.NoError(t, err) deleteMode := franz_sr.SoftDelete @@ -412,8 +326,10 @@ func TestSchemaRegistryIntegration(t *testing.T) { }, } - sourcePort := startSchemaRegistry(t, pool) - destinationPort := startSchemaRegistry(t, pool) + source, err := startRedpanda(t, pool, false, true) + require.NoError(t, err) + destination, err := startRedpanda(t, pool, false, true) + require.NoError(t, err) for _, test := range tests { t.Run(test.name, func(t *testing.T) { @@ -424,55 +340,55 @@ func TestSchemaRegistryIntegration(t *testing.T) { t.Cleanup(func() { // Clean up the extraSubject first since it may contain schemas with references. if test.extraSubject != "" { - deleteSubject(t, sourcePort, test.extraSubject, false) - deleteSubject(t, sourcePort, test.extraSubject, true) + deleteSubject(t, source.schemaRegistryURL, test.extraSubject, false) + deleteSubject(t, source.schemaRegistryURL, test.extraSubject, true) if test.subjectFilter == "" { - deleteSubject(t, destinationPort, test.extraSubject, false) - deleteSubject(t, destinationPort, test.extraSubject, true) + deleteSubject(t, destination.schemaRegistryURL, test.extraSubject, false) + deleteSubject(t, destination.schemaRegistryURL, test.extraSubject, true) } } if !test.includeSoftDeletedSubjects { - deleteSubject(t, sourcePort, subject, false) + deleteSubject(t, source.schemaRegistryURL, subject, false) } - deleteSubject(t, sourcePort, subject, true) + deleteSubject(t, source.schemaRegistryURL, subject, true) - deleteSubject(t, destinationPort, subject, false) - deleteSubject(t, destinationPort, subject, true) + deleteSubject(t, destination.schemaRegistryURL, subject, false) + deleteSubject(t, destination.schemaRegistryURL, subject, true) }) - createSchema(t, sourcePort, subject, test.schema, nil) + createSchema(t, source.schemaRegistryURL, subject, test.schema, nil) if test.subjectFilter != "" { - createSchema(t, sourcePort, test.extraSubject, test.schema, nil) + createSchema(t, source.schemaRegistryURL, test.extraSubject, test.schema, nil) } if test.includeSoftDeletedSubjects { - deleteSubject(t, sourcePort, subject, false) + deleteSubject(t, source.schemaRegistryURL, subject, false) } if test.schemaWithReference != "" { - createSchema(t, sourcePort, test.extraSubject, test.schemaWithReference, []franz_sr.SchemaReference{{Name: "foo", Subject: subject, Version: 1}}) + createSchema(t, source.schemaRegistryURL, test.extraSubject, test.schemaWithReference, []franz_sr.SchemaReference{{Name: "foo", Subject: subject, Version: 1}}) } streamBuilder := service.NewStreamBuilder() require.NoError(t, streamBuilder.SetYAML(fmt.Sprintf(` input: schema_registry: - url: http://localhost:%d + url: %s include_deleted: %t subject_filter: %s fetch_in_order: %t output: fallback: - schema_registry: - url: http://localhost:%d + url: %s subject: ${! @schema_registry_subject } # Preserve schema order. max_in_flight: 1 # Don't retry the same message multiple times so we do fail if schemas with references are sent in the wrong order - drop: {} -`, sourcePort, test.includeSoftDeletedSubjects, test.subjectFilter, test.schemaWithReference != "", destinationPort))) +`, source.schemaRegistryURL, test.includeSoftDeletedSubjects, test.subjectFilter, test.schemaWithReference != "", destination.schemaRegistryURL))) require.NoError(t, streamBuilder.SetLoggerYAML(`level: OFF`)) stream, err := streamBuilder.Build() @@ -486,7 +402,7 @@ output: err = stream.Run(ctx) require.NoError(t, err) - resp, err := http.DefaultClient.Get(fmt.Sprintf("http://localhost:%d/subjects", destinationPort)) + resp, err := http.DefaultClient.Get(fmt.Sprintf("%s/subjects", destination.schemaRegistryURL)) require.NoError(t, err) body, err := io.ReadAll(resp.Body) require.NoError(t, err) @@ -497,7 +413,7 @@ output: assert.NotContains(t, string(body), test.extraSubject) } - resp, err = http.DefaultClient.Get(fmt.Sprintf("http://localhost:%d/subjects/%s/versions/1", destinationPort, subject)) + resp, err = http.DefaultClient.Get(fmt.Sprintf("%s/subjects/%s/versions/1", destination.schemaRegistryURL, subject)) require.NoError(t, err) body, err = io.ReadAll(resp.Body) require.NoError(t, err) @@ -511,7 +427,7 @@ output: assert.JSONEq(t, test.schema, sd.Schema.Schema) if test.schemaWithReference != "" { - resp, err = http.DefaultClient.Get(fmt.Sprintf("http://localhost:%d/subjects/%s/versions/1", destinationPort, test.extraSubject)) + resp, err = http.DefaultClient.Get(fmt.Sprintf("%s/subjects/%s/versions/1", destination.schemaRegistryURL, test.extraSubject)) require.NoError(t, err) body, err = io.ReadAll(resp.Body) require.NoError(t, err) @@ -534,22 +450,25 @@ func TestSchemaRegistryIDTranslationIntegration(t *testing.T) { pool, err := dockertest.NewPool("") require.NoError(t, err) + pool.MaxWait = time.Minute - sourcePort := startSchemaRegistry(t, pool) - destinationPort := startSchemaRegistry(t, pool) + source, err := startRedpanda(t, pool, false, true) + require.NoError(t, err) + destination, err := startRedpanda(t, pool, false, true) + require.NoError(t, err) // Create two schemas under subject `foo`. - createSchema(t, sourcePort, "foo", `{"name":"foo", "type": "record", "fields":[{"name":"str", "type": "string"}]}`, nil) - createSchema(t, sourcePort, "foo", `{"name":"foo", "type": "record", "fields":[{"name":"str", "type": "string"}, {"name":"num", "type": "int", "default": 42}]}`, nil) + createSchema(t, source.schemaRegistryURL, "foo", `{"name":"foo", "type": "record", "fields":[{"name":"str", "type": "string"}]}`, nil) + createSchema(t, source.schemaRegistryURL, "foo", `{"name":"foo", "type": "record", "fields":[{"name":"str", "type": "string"}, {"name":"num", "type": "int", "default": 42}]}`, nil) // Create a schema under subject `bar` which references the second schema under `foo`. - createSchema(t, sourcePort, "bar", `{"name":"bar", "type": "record", "fields":[{"name":"data", "type": "foo"}]}`, + createSchema(t, source.schemaRegistryURL, "bar", `{"name":"bar", "type": "record", "fields":[{"name":"data", "type": "foo"}]}`, []franz_sr.SchemaReference{{Name: "foo", Subject: "foo", Version: 2}}, ) // Create a schema at the destination which will have ID 1 so we can check that the ID translation works // correctly. - createSchema(t, destinationPort, "baz", `{"name":"baz", "type": "record", "fields":[{"name":"num", "type": "int"}]}`, nil) + createSchema(t, destination.schemaRegistryURL, "baz", `{"name":"baz", "type": "record", "fields":[{"name":"num", "type": "int"}]}`, nil) // Use a Stream with a mapping filter to send only the schema with the reference to the destination in order // to force the output to backfill the rest of the schemas. @@ -557,20 +476,20 @@ func TestSchemaRegistryIDTranslationIntegration(t *testing.T) { require.NoError(t, streamBuilder.SetYAML(fmt.Sprintf(` input: schema_registry: - url: http://localhost:%d + url: %s processors: - mapping: | if this.id != 3 { root = deleted() } output: fallback: - schema_registry: - url: http://localhost:%d + url: %s subject: ${! @schema_registry_subject } # Preserve schema order. max_in_flight: 1 # Don't retry the same message multiple times so we do fail if schemas with references are sent in the wrong order - drop: {} -`, sourcePort, destinationPort))) +`, source.schemaRegistryURL, destination.schemaRegistryURL))) require.NoError(t, streamBuilder.SetLoggerYAML(`level: OFF`)) stream, err := streamBuilder.Build() @@ -611,7 +530,7 @@ output: for _, test := range tests { t.Run("", func(t *testing.T) { - resp, err := http.DefaultClient.Get(fmt.Sprintf("http://localhost:%d/subjects/%s/versions/%d", destinationPort, test.subject, test.version)) + resp, err := http.DefaultClient.Get(fmt.Sprintf("%s/subjects/%s/versions/%d", destination.schemaRegistryURL, test.subject, test.version)) require.NoError(t, err) body, err := io.ReadAll(resp.Body) require.NoError(t, err) @@ -626,3 +545,370 @@ output: }) } } + +type redpandaEndpoints struct { + brokerAddr string + schemaRegistryURL string +} + +// TODO: Generalise this helper for the other Kafka tests here which use Redpanda. +func startRedpanda(t *testing.T, pool *dockertest.Pool, exposeBroker bool, autocreateTopics bool) (redpandaEndpoints, error) { + t.Helper() + + cmd := []string{ + "redpanda", + "start", + "--node-id 0", + "--mode dev-container", + "--set rpk.additional_start_flags=[--reactor-backend=epoll]", + "--schema-registry-addr 0.0.0.0:8081", + } + + if !autocreateTopics { + cmd = append(cmd, "--set redpanda.auto_create_topics_enabled=false") + } + + // Expose Schema Registry and Admin API by default. The Admin API is required for health checks. + exposedPorts := []string{"8081/tcp", "9644/tcp"} + var portBindings map[docker.Port][]docker.PortBinding + var kafkaPort string + if exposeBroker { + brokerPort, err := integration.GetFreePort() + if err != nil { + return redpandaEndpoints{}, fmt.Errorf("failed to start container: %s", err) + } + + // Note: Schema Registry uses `--advertise-kafka-addr` to talk to the broker, so we need to use the same port for `--kafka-addr`. + // TODO: Ensure we don't stomp over some ports which are already in use inside the container. + cmd = append(cmd, fmt.Sprintf("--kafka-addr 0.0.0.0:%d", brokerPort), fmt.Sprintf("--advertise-kafka-addr localhost:%d", brokerPort)) + + kafkaPort = fmt.Sprintf("%d/tcp", brokerPort) + exposedPorts = append(exposedPorts, kafkaPort) + portBindings = map[docker.Port][]docker.PortBinding{docker.Port(kafkaPort): {{HostPort: kafkaPort}}} + } + + options := &dockertest.RunOptions{ + Repository: "redpandadata/redpanda", + Tag: "latest", + Hostname: "redpanda", + Cmd: cmd, + ExposedPorts: exposedPorts, + PortBindings: portBindings, + } + + resource, err := pool.RunWithOptions(options) + if err != nil { + return redpandaEndpoints{}, fmt.Errorf("failed to start container: %s", err) + } + + if err := resource.Expire(900); err != nil { + return redpandaEndpoints{}, fmt.Errorf("failed to set container expiry period: %s", err) + } + + t.Cleanup(func() { + assert.NoError(t, pool.Purge(resource)) + }) + + require.NoError(t, pool.Retry(func() error { + ctx, done := context.WithTimeout(context.Background(), 3*time.Second) + defer done() + + req, err := http.NewRequestWithContext(ctx, http.MethodGet, fmt.Sprintf("http://localhost:%s/v1/cluster/health_overview", resource.GetPort("9644/tcp")), nil) + if err != nil { + return fmt.Errorf("failed to create request: %s", err) + } + + resp, err := http.DefaultClient.Do(req) + if err != nil { + return fmt.Errorf("failed to execute request: %s", err) + } + defer resp.Body.Close() + + if resp.StatusCode != http.StatusOK { + return errors.New("invalid status") + } + + body, err := io.ReadAll(resp.Body) + if err != nil { + return fmt.Errorf("failed to read response body: %s", err) + } + + var res struct { + IsHealthy bool `json:"is_healthy"` + } + + if err := json.Unmarshal(body, &res); err != nil { + return fmt.Errorf("failed to unmarshal response body: %s", err) + } + + if !res.IsHealthy { + return errors.New("unhealthy") + } + + return nil + })) + + return redpandaEndpoints{ + brokerAddr: fmt.Sprintf("localhost:%s", resource.GetPort(kafkaPort)), + schemaRegistryURL: fmt.Sprintf("http://localhost:%s", resource.GetPort("8081/tcp")), + }, nil +} + +func produceMessage(t *testing.T, rpe redpandaEndpoints, topic, message string) { + streamBuilder := service.NewStreamBuilder() + require.NoError(t, streamBuilder.SetYAML(fmt.Sprintf(` +pipeline: + processors: + - schema_registry_encode: + url: %s + subject: %s + avro_raw_json: true +output: + kafka_franz: + seed_brokers: [ %s ] + topic: %s +`, rpe.schemaRegistryURL, topic, rpe.brokerAddr, topic))) + require.NoError(t, streamBuilder.SetLoggerYAML(`level: OFF`)) + + inFunc, err := streamBuilder.AddProducerFunc() + require.NoError(t, err) + + stream, err := streamBuilder.Build() + require.NoError(t, err) + + license.InjectTestService(stream.Resources()) + + ctx, done := context.WithTimeout(context.Background(), 5*time.Second) + t.Cleanup(done) + + go func() { + require.NoError(t, inFunc(ctx, service.NewMessage([]byte(message)))) + + require.NoError(t, stream.StopWithin(3*time.Second)) + }() + + err = stream.Run(ctx) + require.NoError(t, err) +} + +func readMessageWithCG(t *testing.T, rpe redpandaEndpoints, topic, consumerGroup, message string) { + streamBuilder := service.NewStreamBuilder() + require.NoError(t, streamBuilder.SetYAML(fmt.Sprintf(` +input: + kafka_franz: + seed_brokers: [ %s ] + topics: [ %s ] + consumer_group: %s + start_from_oldest: true + processors: + - schema_registry_decode: + url: %s + avro_raw_json: true +output: + # Need to use drop explicitly with SetYAML(). Otherwise, the output will be inproc + # (or stdout if we import github.com/redpanda-data/benthos/v4/public/components/io) + drop: {} +`, rpe.brokerAddr, topic, consumerGroup, rpe.schemaRegistryURL))) + require.NoError(t, streamBuilder.SetLoggerYAML(`level: OFF`)) + + recvChan := make(chan struct{}) + err := streamBuilder.AddConsumerFunc(func(ctx context.Context, m *service.Message) error { + b, err := m.AsBytes() + require.NoError(t, err) + + assert.Equal(t, message, string(b)) + + close(recvChan) + return nil + }) + require.NoError(t, err) + + stream, err := streamBuilder.Build() + require.NoError(t, err) + + license.InjectTestService(stream.Resources()) + + ctx, done := context.WithTimeout(context.Background(), 5*time.Second) + t.Cleanup(done) + + go func() { + require.NoError(t, stream.Run(ctx)) + }() + + <-recvChan + require.NoError(t, stream.StopWithin(3*time.Second)) +} + +func runMigratorBundle(t *testing.T, source, destination redpandaEndpoints, topic string, callback func(*service.Message)) { + streamBuilder := service.NewStreamBuilder() + require.NoError(t, streamBuilder.SetYAML(fmt.Sprintf(` +input: + redpanda_migrator_bundle: + redpanda_migrator: + seed_brokers: [ %s ] + topics: [ %s ] + consumer_group: migrator_cg + start_from_oldest: true + replication_factor_override: true + replication_factor: -1 + schema_registry: + url: %s + processors: + - switch: + - check: '@input_label == "redpanda_migrator_offsets_input"' + processors: + - log: + message: Migrating Kafka offset + fields: + kafka_offset_topic: ${! @kafka_offset_topic } + kafka_offset_group: ${! @kafka_offset_group } + kafka_offset_partition: ${! @kafka_offset_partition } + kafka_offset_commit_timestamp: ${! @kafka_offset_commit_timestamp } + kafka_offset_metadata: ${! @kafka_offset_metadata } + kafka_offset: ${! @kafka_offset } # This is just the offset of the __consumer_offsets topic + - check: '@input_label == "redpanda_migrator_input"' + processors: + - branch: + processors: + - schema_registry_decode: + url: %s + avro_raw_json: true + - log: + message: 'Migrating Kafka message: ${! content() }' + - check: '@input_label == "schema_registry_input"' + processors: + - branch: + processors: + - log: + message: 'Migrating Schema Registry schema: ${! content() }' + +output: + redpanda_migrator_bundle: + redpanda_migrator: + seed_brokers: [ %s ] + replication_factor_override: true + replication_factor: -1 + schema_registry: + url: %s +`, source.brokerAddr, topic, source.schemaRegistryURL, source.schemaRegistryURL, destination.brokerAddr, destination.schemaRegistryURL))) + require.NoError(t, streamBuilder.SetLoggerYAML(`level: INFO`)) + + require.NoError(t, streamBuilder.AddConsumerFunc(func(_ context.Context, m *service.Message) error { + callback(m) + return nil + })) + + // Ensure the callback function is called after the output wrote the message + streamBuilder.SetOutputBrokerPattern(service.OutputBrokerPatternFanOutSequential) + + stream, err := streamBuilder.Build() + require.NoError(t, err) + + license.InjectTestService(stream.Resources()) + + // Run stream in the background and shut it down when the test is finished + closeChan := make(chan struct{}) + go func() { + err = stream.Run(context.Background()) + require.NoError(t, err) + + t.Log("Migrator shut down") + + close(closeChan) + }() + t.Cleanup(func() { + require.NoError(t, stream.StopWithin(3*time.Second)) + + <-closeChan + }) +} + +func TestRedpandaMigratorIntegration(t *testing.T) { + integration.CheckSkip(t) + t.Parallel() + + pool, err := dockertest.NewPool("") + require.NoError(t, err) + pool.MaxWait = time.Minute + + source, err := startRedpanda(t, pool, true, true) + require.NoError(t, err) + destination, err := startRedpanda(t, pool, true, false) + require.NoError(t, err) + + t.Logf("Source broker: %s", source.brokerAddr) + t.Logf("Destination broker: %s", destination.brokerAddr) + + dummyTopic := "test" + + // Create a schema associated with the test topic + createSchema(t, source.schemaRegistryURL, dummyTopic, fmt.Sprintf(`{"name":"%s", "type": "record", "fields":[{"name":"test", "type": "string"}]}`, dummyTopic), nil) + + // Produce one message + dummyMessage := `{"test":"foo"}` + produceMessage(t, source, dummyTopic, dummyMessage) + t.Log("Finished producing first message in source") + + // Run the Redpanda Migrator bundle + msgChan := make(chan *service.Message) + checkMigrated := func(label string, validate func(string, map[string]string)) { + loop: + for { + select { + case m := <-msgChan: + l, ok := m.MetaGet("input_label") + require.True(t, ok) + if l != label { + goto loop + } + + b, err := m.AsBytes() + require.NoError(t, err) + + meta := map[string]string{} + require.NoError(t, m.MetaWalk(func(k, v string) error { + meta[k] = v + return nil + })) + + validate(string(b), meta) + case <-time.After(5 * time.Second): + t.Error("timed out waiting for migrator transfer") + } + + break loop + } + + } + runMigratorBundle(t, source, destination, dummyTopic, func(m *service.Message) { + msgChan <- m + }) + + checkMigrated("redpanda_migrator_input", func(msg string, _ map[string]string) { + assert.Equal(t, "\x00\x00\x00\x00\x01\x06foo", msg) + }) + t.Log("Migrator started") + + dummyCG := "foobar_cg" + // Read the message from source using a consumer group + readMessageWithCG(t, source, dummyTopic, dummyCG, dummyMessage) + checkMigrated("redpanda_migrator_offsets_input", func(_ string, meta map[string]string) { + assert.Equal(t, dummyTopic, meta["kafka_offset_topic"]) + }) + t.Logf("Finished reading first message from source with consumer group %q", dummyCG) + + // Produce one more message in the source + secondDummyMessage := `{"test":"bar"}` + produceMessage(t, source, dummyTopic, secondDummyMessage) + checkMigrated("redpanda_migrator_input", func(msg string, _ map[string]string) { + assert.Equal(t, "\x00\x00\x00\x00\x01\x06bar", msg) + }) + t.Log("Finished producing second message in source") + + // Read the new message from the destination using a consumer group + readMessageWithCG(t, destination, dummyTopic, dummyCG, secondDummyMessage) + checkMigrated("redpanda_migrator_offsets_input", func(_ string, meta map[string]string) { + assert.Equal(t, dummyTopic, meta["kafka_offset_topic"]) + }) + t.Logf("Finished reading second message from destination with consumer group %q", dummyCG) +} diff --git a/internal/impl/kafka/enterprise/redpanda_common_input.go b/internal/impl/kafka/enterprise/redpanda_common_input.go index c9e852d9e5..80ff46bf8e 100644 --- a/internal/impl/kafka/enterprise/redpanda_common_input.go +++ b/internal/impl/kafka/enterprise/redpanda_common_input.go @@ -65,6 +65,10 @@ output: Records are processed and delivered from each partition in batches as received from brokers. These batch sizes are therefore dynamically sized in order to optimise throughput, but can be tuned with the config fields ` + "`fetch_max_partition_bytes` and `fetch_max_bytes`" + `. Batches can be further broken down using the ` + "xref:components:processors/split.adoc[`split`] processor" + `. +== Metrics + +Emits a ` + "`redpanda_lag`" + ` metric with ` + "`topic`" + ` and ` + "`partition`" + ` labels for each consumed topic. + == Metadata This input adds the following metadata fields to each message: @@ -74,6 +78,7 @@ This input adds the following metadata fields to each message: - kafka_topic - kafka_partition - kafka_offset +- kafka_lag - kafka_timestamp_ms - kafka_timestamp_unix - kafka_tombstone_message diff --git a/internal/impl/kafka/enterprise/redpanda_common_output.go b/internal/impl/kafka/enterprise/redpanda_common_output.go index 8b3fdcc109..308e74a826 100644 --- a/internal/impl/kafka/enterprise/redpanda_common_output.go +++ b/internal/impl/kafka/enterprise/redpanda_common_output.go @@ -79,9 +79,15 @@ func init() { if batchPolicy, err = conf.FieldBatchPolicy(roFieldBatching); err != nil { return } - output, err = kafka.NewFranzWriterFromConfig(conf, func(fn kafka.FranzSharedClientUseFn) error { - return kafka.FranzSharedClientUse(sharedGlobalRedpandaClientKey, mgr, fn) - }, func(context.Context) error { return nil }) + output, err = kafka.NewFranzWriterFromConfig( + conf, + kafka.NewFranzWriterHooks( + func(_ context.Context, fn kafka.FranzSharedClientUseFn) error { + return kafka.FranzSharedClientUse(sharedGlobalRedpandaClientKey, mgr, fn) + }). + WithYieldClientFn( + func(context.Context) error { return nil }), + ) return }) if err != nil { diff --git a/internal/impl/kafka/enterprise/redpanda_migrator_bundle_input.tmpl.yaml b/internal/impl/kafka/enterprise/redpanda_migrator_bundle_input.tmpl.yaml index b21046833d..b06559b224 100644 --- a/internal/impl/kafka/enterprise/redpanda_migrator_bundle_input.tmpl.yaml +++ b/internal/impl/kafka/enterprise/redpanda_migrator_bundle_input.tmpl.yaml @@ -38,7 +38,11 @@ fields: mapping: | #!blobl - let redpandaMigratorOffsets = this.redpanda_migrator.with("seed_brokers", "consumer_group", "client_id", "rack_id", "tls", "sasl").assign({"topics": ["__consumer_offsets"]}) + let labelPrefix = @label.not_empty().or("redpanda_migrator_bundle") + + let redpandaMigrator = this.redpanda_migrator.assign({"output_resource": "%s_redpanda_migrator_output".format($labelPrefix)}) + + let redpandaMigratorOffsets = this.redpanda_migrator.with("seed_brokers", "topics", "regexp_topics", "consumer_group", "topic_lag_refresh_period", "client_id", "rack_id", "tls", "sasl") root = if this.redpanda_migrator.length() == 0 { throw("the redpanda_migrator input must be configured") @@ -48,9 +52,10 @@ mapping: | inputs: - sequence: inputs: - - schema_registry: %s + - label: %s_schema_registry_input + schema_registry: %s processors: - - mapping: meta input_label = "schema_registry" + - mapping: meta input_label = "schema_registry_input" - generate: count: 1 mapping: root = "" @@ -60,22 +65,25 @@ mapping: | - mapping: root = deleted() - broker: inputs: - - redpanda_migrator: %s + - label: %s_redpanda_migrator_input + redpanda_migrator: %s processors: - - mapping: meta input_label = "redpanda_migrator" - - kafka_franz: %s + - mapping: meta input_label = "redpanda_migrator_input" + - label: %s_redpanda_migrator_offsets_input + redpanda_migrator_offsets: %s processors: - - mapping: meta input_label = "redpanda_migrator_offsets" - """.format(this.schema_registry.string(), this.redpanda_migrator.string(), $redpandaMigratorOffsets.string()).parse_yaml() + - mapping: meta input_label = "redpanda_migrator_offsets_input" + """.format($labelPrefix, this.schema_registry.string(), $labelPrefix, $redpandaMigrator.string(), $labelPrefix, $redpandaMigratorOffsets.string()).parse_yaml() } else if this.schema_registry.length() > 0 { """ broker: inputs: - sequence: inputs: - - schema_registry: %s + - label: %s_schema_registry_input + schema_registry: %s processors: - - mapping: meta input_label = "schema_registry" + - mapping: meta input_label = "schema_registry_input" - generate: count: 1 mapping: root = "" @@ -83,24 +91,28 @@ mapping: | - log: message: Finished importing schemas - mapping: root = deleted() - - redpanda_migrator: %s + - label: %s_redpanda_migrator_input + redpanda_migrator: %s processors: - - mapping: meta input_label = "redpanda_migrator" - - kafka_franz: %s + - mapping: meta input_label = "redpanda_migrator_input" + - label: %s_redpanda_migrator_offsets_input + redpanda_migrator_offsets: %s processors: - - mapping: meta input_label = "redpanda_migrator_offsets" - """.format(this.schema_registry.string(), this.redpanda_migrator.string(), $redpandaMigratorOffsets.string()).parse_yaml() + - mapping: meta input_label = "redpanda_migrator_offsets_input" + """.format($labelPrefix, this.schema_registry.string(), $labelPrefix, $redpandaMigrator.string(), $labelPrefix, $redpandaMigratorOffsets.string()).parse_yaml() } else { """ broker: inputs: - - redpanda_migrator: %s + - label: %s_redpanda_migrator_input + redpanda_migrator: %s processors: - - mapping: meta input_label = "redpanda_migrator" - - kafka_franz: %s + - mapping: meta input_label = "redpanda_migrator_input" + - label: %s_redpanda_migrator_offsets_input + redpanda_migrator_offsets: %s processors: - - mapping: meta input_label = "redpanda_migrator_offsets" - """.format(this.redpanda_migrator.string(), $redpandaMigratorOffsets.string()).parse_yaml() + - mapping: meta input_label = "redpanda_migrator_offsets_input" + """.format($labelPrefix, $redpandaMigrator.string(), $labelPrefix, $redpandaMigratorOffsets.string()).parse_yaml() } tests: @@ -121,7 +133,8 @@ tests: - sequence: inputs: - processors: - - mapping: meta input_label = "schema_registry" + - mapping: meta input_label = "schema_registry_input" + label: redpanda_migrator_bundle_schema_registry_input schema_registry: url: http://localhost:8081 - generate: @@ -131,18 +144,21 @@ tests: - log: message: Finished importing schemas - mapping: root = deleted() - - redpanda_migrator: + - label: redpanda_migrator_bundle_redpanda_migrator_input + redpanda_migrator: seed_brokers: [ "127.0.0.1:9092" ] topics: [ "foobar" ] consumer_group: "migrator" + output_resource: redpanda_migrator_bundle_redpanda_migrator_output processors: - - mapping: meta input_label = "redpanda_migrator" - - kafka_franz: + - mapping: meta input_label = "redpanda_migrator_input" + - label: redpanda_migrator_bundle_redpanda_migrator_offsets_input + redpanda_migrator_offsets: seed_brokers: [ "127.0.0.1:9092" ] - topics: [ "__consumer_offsets" ] + topics: [ "foobar" ] consumer_group: "migrator" processors: - - mapping: meta input_label = "redpanda_migrator_offsets" + - mapping: meta input_label = "redpanda_migrator_offsets_input" - name: Migrate schemas first, then messages and offsets config: @@ -159,7 +175,8 @@ tests: - sequence: inputs: - processors: - - mapping: meta input_label = "schema_registry" + - mapping: meta input_label = "schema_registry_input" + label: redpanda_migrator_bundle_schema_registry_input schema_registry: url: http://localhost:8081 - generate: @@ -171,18 +188,21 @@ tests: - mapping: root = deleted() - broker: inputs: - - redpanda_migrator: + - label: redpanda_migrator_bundle_redpanda_migrator_input + redpanda_migrator: seed_brokers: [ "127.0.0.1:9092" ] topics: [ "foobar" ] consumer_group: "migrator" + output_resource: redpanda_migrator_bundle_redpanda_migrator_output processors: - - mapping: meta input_label = "redpanda_migrator" - - kafka_franz: + - mapping: meta input_label = "redpanda_migrator_input" + - label: redpanda_migrator_bundle_redpanda_migrator_offsets_input + redpanda_migrator_offsets: seed_brokers: [ "127.0.0.1:9092" ] - topics: [ "__consumer_offsets" ] + topics: [ "foobar" ] consumer_group: "migrator" processors: - - mapping: meta input_label = "redpanda_migrator_offsets" + - mapping: meta input_label = "redpanda_migrator_offsets_input" - name: Migrate messages and offsets without schemas config: @@ -194,15 +214,18 @@ tests: expected: broker: inputs: - - redpanda_migrator: + - label: redpanda_migrator_bundle_redpanda_migrator_input + redpanda_migrator: seed_brokers: [ "127.0.0.1:9092" ] topics: [ "foobar" ] consumer_group: "migrator" + output_resource: redpanda_migrator_bundle_redpanda_migrator_output processors: - - mapping: meta input_label = "redpanda_migrator" - - kafka_franz: + - mapping: meta input_label = "redpanda_migrator_input" + - label: redpanda_migrator_bundle_redpanda_migrator_offsets_input + redpanda_migrator_offsets: seed_brokers: [ "127.0.0.1:9092" ] - topics: [ "__consumer_offsets" ] + topics: [ "foobar" ] consumer_group: "migrator" processors: - - mapping: meta input_label = "redpanda_migrator_offsets" + - mapping: meta input_label = "redpanda_migrator_offsets_input" diff --git a/internal/impl/kafka/enterprise/redpanda_migrator_bundle_output.tmpl.yaml b/internal/impl/kafka/enterprise/redpanda_migrator_bundle_output.tmpl.yaml index 3e04ed91d1..b2d54275b2 100644 --- a/internal/impl/kafka/enterprise/redpanda_migrator_bundle_output.tmpl.yaml +++ b/internal/impl/kafka/enterprise/redpanda_migrator_bundle_output.tmpl.yaml @@ -33,6 +33,8 @@ fields: mapping: | #!blobl + let labelPrefix = @label.not_empty().or("redpanda_migrator_bundle") + if ["topic", "key", "partition", "partitioner", "timestamp"].any(f -> this.redpanda_migrator.keys().contains(f)) { root = throw("The topic, key, partition, partitioner and timestamp fields of the redpanda_migrator output must be left empty") } @@ -51,10 +53,17 @@ mapping: | "^(?:[^k].*|k[^a].*|ka[^f].*|kaf[^k].*|kafk[^a].*|kafka[^_].*)" ] }, - "translate_schema_ids": this.redpanda_migrator.translate_schema_ids.or(true) && this.schema_registry.length() != 0 + "translate_schema_ids": this.redpanda_migrator.translate_schema_ids.or(true) && this.schema_registry.length() != 0, + "input_resource": "%s_redpanda_migrator_input".format($labelPrefix) } ) + if this.schema_registry.length() != 0 { + let redpandaMigrator = $redpandaMigrator.assign({ + "schema_registry_output_resource": "%s_schema_registry_output".format($labelPrefix) + }) + } + let redpandaMigratorOffsets = this.redpanda_migrator.with("seed_brokers", "consumer_group", "client_id", "rack_id", "max_message_bytes", "broker_write_max_bytes", "tls", "sasl") if this.schema_registry.keys().contains("subject") { @@ -65,7 +74,8 @@ mapping: | let schemaRegistry = if this.schema_registry.length() > 0 { this.schema_registry.assign({ "subject": "${! @schema_registry_subject }", - "max_in_flight": $srMaxInFlight + "max_in_flight": $srMaxInFlight, + "input_resource": "%s_schema_registry_input".format($labelPrefix) }) } @@ -75,10 +85,11 @@ mapping: | """ switch: cases: - - check: metadata("input_label") == "redpanda_migrator" + - check: metadata("input_label") == "redpanda_migrator_input" output: fallback: - - redpanda_migrator: %s + - label: %s_redpanda_migrator_output + redpanda_migrator: %s processors: - mapping: | meta input_label = deleted() @@ -88,20 +99,22 @@ mapping: | - log: message: | Dropping message: ${! content() } / ${! metadata() } - - check: metadata("input_label") == "redpanda_migrator_offsets" + - check: metadata("input_label") == "redpanda_migrator_offsets_input" output: fallback: - - redpanda_migrator_offsets: %s + - label: %s_redpanda_migrator_offsets_output + redpanda_migrator_offsets: %s # TODO: Use a DLQ - drop: {} processors: - log: message: | Dropping message: ${! content() } / ${! metadata() } - - check: metadata("input_label") == "schema_registry" + - check: metadata("input_label") == "schema_registry_input" output: fallback: - - schema_registry: %s + - label: %s_schema_registry_output + schema_registry: %s - switch: cases: - check: '@fallback_error == "request returned status: 422"' @@ -114,15 +127,16 @@ mapping: | Subject '${! @schema_registry_subject }' version ${! @schema_registry_version } already has schema: ${! content() } - output: reject: ${! @fallback_error } - """.format($redpandaMigrator.string(), $redpandaMigratorOffsets.string(), $schemaRegistry.string()).parse_yaml() + """.format($labelPrefix, $redpandaMigrator.string(), $labelPrefix, $redpandaMigratorOffsets.string(), $labelPrefix, $schemaRegistry.string()).parse_yaml() } else { """ switch: cases: - - check: metadata("input_label") == "redpanda_migrator" + - check: metadata("input_label") == "redpanda_migrator_input" output: fallback: - - redpanda_migrator: %s + - label: %s_redpanda_migrator_output + redpanda_migrator: %s processors: - mapping: | meta input_label = deleted() @@ -132,17 +146,18 @@ mapping: | - log: message: | Dropping message: ${! content() } / ${! metadata() } - - check: metadata("input_label") == "redpanda_migrator_offsets" + - check: metadata("input_label") == "redpanda_migrator_offsets_input" output: fallback: - - redpanda_migrator_offsets: %s + - label: %s_redpanda_migrator_offsets_output + redpanda_migrator_offsets: %s # TODO: Use a DLQ - drop: {} processors: - log: message: | Dropping message: ${! content() } / ${! metadata() } - """.format($redpandaMigrator.string(), $redpandaMigratorOffsets.string()).parse_yaml() + """.format($labelPrefix, $redpandaMigrator.string(), $labelPrefix, $redpandaMigratorOffsets.string()).parse_yaml() } tests: @@ -158,10 +173,11 @@ tests: expected: switch: cases: - - check: metadata("input_label") == "redpanda_migrator" + - check: metadata("input_label") == "redpanda_migrator_input" output: fallback: - - redpanda_migrator: + - label: redpanda_migrator_bundle_redpanda_migrator_output + redpanda_migrator: key: ${! metadata("kafka_key") } max_in_flight: 1 partition: ${! metadata("kafka_partition").or(throw("missing kafka_partition metadata")) } @@ -174,6 +190,8 @@ tests: include_patterns: - ^(?:[^k].*|k[^a].*|ka[^f].*|kaf[^k].*|kafk[^a].*|kafka[^_].*) translate_schema_ids: true + input_resource: redpanda_migrator_bundle_redpanda_migrator_input + schema_registry_output_resource: redpanda_migrator_bundle_schema_registry_output processors: - mapping: | meta input_label = deleted() @@ -182,10 +200,11 @@ tests: - log: message: | Dropping message: ${! content() } / ${! metadata() } - - check: metadata("input_label") == "redpanda_migrator_offsets" + - check: metadata("input_label") == "redpanda_migrator_offsets_input" output: fallback: - - redpanda_migrator_offsets: + - label: redpanda_migrator_bundle_redpanda_migrator_offsets_output + redpanda_migrator_offsets: seed_brokers: - 127.0.0.1:9092 - drop: {} @@ -193,13 +212,15 @@ tests: - log: message: | Dropping message: ${! content() } / ${! metadata() } - - check: metadata("input_label") == "schema_registry" + - check: metadata("input_label") == "schema_registry_input" output: fallback: - - schema_registry: + - label: redpanda_migrator_bundle_schema_registry_output + schema_registry: subject: ${! @schema_registry_subject } url: http://localhost:8081 max_in_flight: 1 + input_resource: redpanda_migrator_bundle_schema_registry_input - switch: cases: - check: '@fallback_error == "request returned status: 422"' @@ -225,10 +246,11 @@ tests: expected: switch: cases: - - check: metadata("input_label") == "redpanda_migrator" + - check: metadata("input_label") == "redpanda_migrator_input" output: fallback: - - redpanda_migrator: + - label: redpanda_migrator_bundle_redpanda_migrator_output + redpanda_migrator: key: ${! metadata("kafka_key") } max_in_flight: 1 partition: ${! metadata("kafka_partition").or(throw("missing kafka_partition metadata")) } @@ -241,6 +263,8 @@ tests: include_patterns: - ^(?:[^k].*|k[^a].*|ka[^f].*|kaf[^k].*|kafk[^a].*|kafka[^_].*) translate_schema_ids: false + input_resource: redpanda_migrator_bundle_redpanda_migrator_input + schema_registry_output_resource: redpanda_migrator_bundle_schema_registry_output processors: - mapping: | meta input_label = deleted() @@ -249,10 +273,11 @@ tests: - log: message: | Dropping message: ${! content() } / ${! metadata() } - - check: metadata("input_label") == "redpanda_migrator_offsets" + - check: metadata("input_label") == "redpanda_migrator_offsets_input" output: fallback: - - redpanda_migrator_offsets: + - label: redpanda_migrator_bundle_redpanda_migrator_offsets_output + redpanda_migrator_offsets: seed_brokers: - 127.0.0.1:9092 - drop: {} @@ -260,13 +285,15 @@ tests: - log: message: | Dropping message: ${! content() } / ${! metadata() } - - check: metadata("input_label") == "schema_registry" + - check: metadata("input_label") == "schema_registry_input" output: fallback: - - schema_registry: + - label: redpanda_migrator_bundle_schema_registry_output + schema_registry: subject: ${! @schema_registry_subject } url: http://localhost:8081 max_in_flight: 1 + input_resource: redpanda_migrator_bundle_schema_registry_input - switch: cases: - check: '@fallback_error == "request returned status: 422"' @@ -292,10 +319,11 @@ tests: expected: switch: cases: - - check: metadata("input_label") == "redpanda_migrator" + - check: metadata("input_label") == "redpanda_migrator_input" output: fallback: - - redpanda_migrator: + - label: redpanda_migrator_bundle_redpanda_migrator_output + redpanda_migrator: key: ${! metadata("kafka_key") } max_in_flight: 1 partition: ${! metadata("kafka_partition").or(throw("missing kafka_partition metadata")) } @@ -308,6 +336,8 @@ tests: include_patterns: - ^(?:[^k].*|k[^a].*|ka[^f].*|kaf[^k].*|kafk[^a].*|kafka[^_].*) translate_schema_ids: true + input_resource: redpanda_migrator_bundle_redpanda_migrator_input + schema_registry_output_resource: redpanda_migrator_bundle_schema_registry_output processors: - mapping: | meta input_label = deleted() @@ -316,10 +346,11 @@ tests: - log: message: | Dropping message: ${! content() } / ${! metadata() } - - check: metadata("input_label") == "redpanda_migrator_offsets" + - check: metadata("input_label") == "redpanda_migrator_offsets_input" output: fallback: - - redpanda_migrator_offsets: + - label: redpanda_migrator_bundle_redpanda_migrator_offsets_output + redpanda_migrator_offsets: seed_brokers: - 127.0.0.1:9092 - drop: {} @@ -327,13 +358,15 @@ tests: - log: message: | Dropping message: ${! content() } / ${! metadata() } - - check: metadata("input_label") == "schema_registry" + - check: metadata("input_label") == "schema_registry_input" output: fallback: - - schema_registry: + - label: redpanda_migrator_bundle_schema_registry_output + schema_registry: subject: ${! @schema_registry_subject } url: http://localhost:8081 max_in_flight: 1 + input_resource: redpanda_migrator_bundle_schema_registry_input - switch: cases: - check: '@fallback_error == "request returned status: 422"' @@ -355,10 +388,11 @@ tests: expected: switch: cases: - - check: metadata("input_label") == "redpanda_migrator" + - check: metadata("input_label") == "redpanda_migrator_input" output: fallback: - - redpanda_migrator: + - label: redpanda_migrator_bundle_redpanda_migrator_output + redpanda_migrator: key: ${! metadata("kafka_key") } max_in_flight: 1 partition: ${! metadata("kafka_partition").or(throw("missing kafka_partition metadata")) } @@ -371,6 +405,7 @@ tests: include_patterns: - ^(?:[^k].*|k[^a].*|ka[^f].*|kaf[^k].*|kafk[^a].*|kafka[^_].*) translate_schema_ids: false + input_resource: redpanda_migrator_bundle_redpanda_migrator_input processors: - mapping: | meta input_label = deleted() @@ -379,10 +414,11 @@ tests: - log: message: | Dropping message: ${! content() } / ${! metadata() } - - check: metadata("input_label") == "redpanda_migrator_offsets" + - check: metadata("input_label") == "redpanda_migrator_offsets_input" output: fallback: - - redpanda_migrator_offsets: + - label: redpanda_migrator_bundle_redpanda_migrator_offsets_output + redpanda_migrator_offsets: seed_brokers: - 127.0.0.1:9092 - drop: {} diff --git a/internal/impl/kafka/enterprise/redpanda_migrator_input.go b/internal/impl/kafka/enterprise/redpanda_migrator_input.go index e66b9ab138..5cfdc1e9c2 100644 --- a/internal/impl/kafka/enterprise/redpanda_migrator_input.go +++ b/internal/impl/kafka/enterprise/redpanda_migrator_input.go @@ -10,16 +10,8 @@ package enterprise import ( "context" - "errors" - "fmt" - "regexp" "slices" - "strconv" - "sync" - "time" - "github.com/Jeffail/shutdown" - "github.com/twmb/franz-go/pkg/kadm" "github.com/twmb/franz-go/pkg/kgo" "github.com/redpanda-data/benthos/v4/public/service" @@ -29,11 +21,9 @@ import ( ) const ( - rmiFieldConsumerGroup = "consumer_group" - rmiFieldCommitPeriod = "commit_period" + // Deprecated fields rmiFieldMultiHeader = "multi_header" rmiFieldBatchSize = "batch_size" - rmiFieldTopicLagRefreshPeriod = "topic_lag_refresh_period" rmiFieldOutputResource = "output_resource" rmiFieldReplicationFactorOverride = "replication_factor_override" rmiFieldReplicationFactor = "replication_factor" @@ -50,11 +40,11 @@ func redpandaMigratorInputConfig() *service.ConfigSpec { Description(` Reads a batch of messages from a Kafka broker and waits for the output to acknowledge the writes before updating the Kafka consumer group offset. -This input should be used in combination with a ` + "`redpanda_migrator`" + ` output which it can query for existing topics. +This input should be used in combination with a ` + "`redpanda_migrator`" + ` output. When a consumer group is specified this input consumes one or more topics where partitions will automatically balance across any other connected clients with the same consumer group. When a consumer group is not specified topics can either be consumed in their entirety or with explicit partitions. -It attempts to create all selected topics it along with their associated ACLs in the broker that the ` + "`redpanda_migrator`" + ` output points to identified by the label specified in ` + "`output_resource`" + `. +It provides the same delivery guarantees and ordering semantics as the ` + "`redpanda`" + ` input. == Metrics @@ -76,7 +66,7 @@ This input adds the following metadata fields to each message: - All record headers ` + "```" + ` `). - Fields(RedpandaMigratorInputConfigFields()...). + Fields(redpandaMigratorInputConfigFields()...). LintRule(` let has_topic_partitions = this.topics.any(t -> t.contains(":")) root = if $has_topic_partitions { @@ -93,45 +83,40 @@ root = if $has_topic_partitions { `) } -// RedpandaMigratorInputConfigFields returns the full suite of config fields for a `redpanda_migrator` input using the -// franz-go client library. -func RedpandaMigratorInputConfigFields() []*service.ConfigField { +func redpandaMigratorInputConfigFields() []*service.ConfigField { return slices.Concat( kafka.FranzConnectionFields(), kafka.FranzConsumerFields(), + kafka.FranzReaderOrderedConfigFields(), []*service.ConfigField{ - service.NewStringField(rmiFieldConsumerGroup). - Description("An optional consumer group to consume as. When specified the partitions of specified topics are automatically distributed across consumers sharing a consumer group, and partition offsets are automatically committed and resumed under this name. Consumer groups are not supported when specifying explicit partitions to consume from in the `topics` field."). - Optional(), - service.NewDurationField(rmiFieldCommitPeriod). - Description("The period of time between each commit of the current partition offsets. Offsets are always committed during shutdown."). - Default("5s"). - Advanced(), - service.NewBoolField(rmiFieldMultiHeader). - Description("Decode headers into lists to allow handling of multiple values with the same key"). - Default(false). - Advanced(), - service.NewIntField(rmiFieldBatchSize). - Description("The maximum number of messages that should be accumulated into each batch."). - Default(1024). - Advanced(), service.NewAutoRetryNacksToggleField(), - service.NewDurationField(rmiFieldTopicLagRefreshPeriod). - Description("The period of time between each topic lag refresh cycle."). - Default("5s"). - Advanced(), + + // Deprecated fields service.NewStringField(rmiFieldOutputResource). Description("The label of the redpanda_migrator output in which the currently selected topics need to be created before attempting to read messages."). Default(rmoResourceDefaultLabel). - Advanced(), + Advanced(). + Deprecated(), service.NewBoolField(rmiFieldReplicationFactorOverride). Description("Use the specified replication factor when creating topics."). Default(true). - Advanced(), + Advanced(). + Deprecated(), service.NewIntField(rmiFieldReplicationFactor). Description("Replication factor for created topics. This is only used when `replication_factor_override` is set to `true`."). Default(3). - Advanced(), + Advanced(). + Deprecated(), + service.NewBoolField(rmiFieldMultiHeader). + Description("Decode headers into lists to allow handling of multiple values with the same key"). + Default(false). + Advanced(). + Deprecated(), + service.NewIntField(rmiFieldBatchSize). + Description("The maximum number of messages that should be accumulated into each batch."). + Default(1024). + Advanced(). + Deprecated(), }, ) } @@ -143,363 +128,94 @@ func init() { return nil, err } - rdr, err := NewRedpandaMigratorReaderFromConfig(conf, mgr) + tmpOpts, err := kafka.FranzConnectionOptsFromConfig(conf, mgr.Logger()) if err != nil { return nil, err } - return service.AutoRetryNacksBatchedToggled(conf, rdr) - }) - if err != nil { - panic(err) - } -} - -//------------------------------------------------------------------------------ - -// RedpandaMigratorReader implements a kafka reader using the franz-go library. -type RedpandaMigratorReader struct { - clientDetails *kafka.FranzConnectionDetails - consumerDetails *kafka.FranzConsumerDetails - - clientLabel string - - topicPatterns []*regexp.Regexp - - consumerGroup string - commitPeriod time.Duration - multiHeader bool - batchSize int - topicLagRefreshPeriod time.Duration - outputResource string - replicationFactorOverride bool - replicationFactor int - - connMut sync.Mutex - readMut sync.Mutex - client *kgo.Client - topicLagGauge *service.MetricGauge - topicLagCache sync.Map - outputTopicsCreated bool - - mgr *service.Resources - shutSig *shutdown.Signaller -} - -// NewRedpandaMigratorReaderFromConfig attempts to instantiate a new RedpandaMigratorReader -// from a parsed config. -func NewRedpandaMigratorReaderFromConfig(conf *service.ParsedConfig, mgr *service.Resources) (*RedpandaMigratorReader, error) { - r := RedpandaMigratorReader{ - mgr: mgr, - shutSig: shutdown.NewSignaller(), - topicLagGauge: mgr.Metrics().NewGauge("input_redpanda_migrator_lag", "topic", "partition"), - } + clientOpts := append([]kgo.Opt{}, tmpOpts...) - var err error + if tmpOpts, err = kafka.FranzConsumerOptsFromConfig(conf); err != nil { + return nil, err + } + clientOpts = append(clientOpts, tmpOpts...) - if r.clientDetails, err = kafka.FranzConnectionDetailsFromConfig(conf, mgr.Logger()); err != nil { - return nil, err - } - if r.consumerDetails, err = kafka.FranzConsumerDetailsFromConfig(conf); err != nil { - return nil, err - } + clientLabel := mgr.Label() + if clientLabel == "" { + clientLabel = rmiResourceDefaultLabel + } - if r.consumerDetails.RegexPattern { - r.topicPatterns = make([]*regexp.Regexp, 0, len(r.consumerDetails.Topics)) - for _, topic := range r.consumerDetails.Topics { - tp, err := regexp.Compile(topic) + rdr, err := kafka.NewFranzReaderOrderedFromConfig(conf, mgr, + func() ([]kgo.Opt, error) { + return clientOpts, nil + }) if err != nil { - return nil, fmt.Errorf("failed to compile topic regex %q: %s", topic, err) + return nil, err } - r.topicPatterns = append(r.topicPatterns, tp) - } - } - - if conf.Contains(rmiFieldConsumerGroup) { - if r.consumerGroup, err = conf.FieldString(rmiFieldConsumerGroup); err != nil { - return nil, err - } - } - - if r.batchSize, err = conf.FieldInt(rmiFieldBatchSize); err != nil { - return nil, err - } - - if r.commitPeriod, err = conf.FieldDuration(rmiFieldCommitPeriod); err != nil { - return nil, err - } - - if r.multiHeader, err = conf.FieldBool(rmiFieldMultiHeader); err != nil { - return nil, err - } - - if r.topicLagRefreshPeriod, err = conf.FieldDuration(rmiFieldTopicLagRefreshPeriod); err != nil { - return nil, err - } - - if r.replicationFactorOverride, err = conf.FieldBool(rmiFieldReplicationFactorOverride); err != nil { - return nil, err - } - - if r.replicationFactor, err = conf.FieldInt(rmiFieldReplicationFactor); err != nil { - return nil, err - } - - if r.outputResource, err = conf.FieldString(rmiFieldOutputResource); err != nil { - return nil, err - } - if r.clientLabel = mgr.Label(); r.clientLabel == "" { - r.clientLabel = rmiResourceDefaultLabel - } - - return &r, nil -} - -func (r *RedpandaMigratorReader) recordToMessage(record *kgo.Record) *service.Message { - msg := kafka.FranzRecordToMessageV0(record, r.multiHeader) - - lag := int64(0) - if val, ok := r.topicLagCache.Load(fmt.Sprintf("%s_%d", record.Topic, record.Partition)); ok { - lag = val.(int64) + return service.AutoRetryNacksBatchedToggled(conf, &redpandaMigratorInput{ + FranzReaderOrdered: rdr, + clientLabel: clientLabel, + mgr: mgr, + }) + }) + if err != nil { + panic(err) } - msg.MetaSetMut("kafka_lag", lag) - - // The record lives on for checkpointing, but we don't need the contents - // going forward so discard these. This looked fine to me but could - // potentially be a source of problems so treat this as sus. - record.Key = nil - record.Value = nil - - return msg } //------------------------------------------------------------------------------ -// Connect to the kafka seed brokers. -func (r *RedpandaMigratorReader) Connect(ctx context.Context) error { - r.connMut.Lock() - defer r.connMut.Unlock() - - if r.client != nil { - return nil - } +type redpandaMigratorInput struct { + *kafka.FranzReaderOrdered - if r.shutSig.IsSoftStopSignalled() { - r.shutSig.TriggerHasStopped() - return service.ErrEndOfInput - } + clientLabel string - clientOpts := append([]kgo.Opt{}, r.clientDetails.FranzOpts()...) - clientOpts = append(clientOpts, r.consumerDetails.FranzOpts()...) - if r.consumerGroup != "" { - clientOpts = append(clientOpts, - // TODO: Do we need to do anything in `kgo.OnPartitionsRevoked()` / `kgo.OnPartitionsLost()` - kgo.ConsumerGroup(r.consumerGroup), - kgo.AutoCommitMarks(), - kgo.BlockRebalanceOnPoll(), - kgo.AutoCommitInterval(r.commitPeriod), - kgo.WithLogger(&kafka.KGoLogger{L: r.mgr.Logger()}), - ) - } + mgr *service.Resources +} - var err error - if r.client, err = kgo.NewClient(clientOpts...); err != nil { +func (rmi *redpandaMigratorInput) Connect(ctx context.Context) error { + if err := rmi.FranzReaderOrdered.Connect(ctx); err != nil { return err } - // Check connectivity to cluster - if err = r.client.Ping(ctx); err != nil { - return fmt.Errorf("failed to connect to cluster: %s", err) + if err := kafka.FranzSharedClientSet(rmi.clientLabel, &kafka.FranzSharedClientInfo{ + Client: rmi.FranzReaderOrdered.Client, + }, rmi.mgr); err != nil { + rmi.mgr.Logger().Warnf("Failed to store client connection for sharing: %s", err) } - if err = kafka.FranzSharedClientSet(r.clientLabel, &kafka.FranzSharedClientInfo{ - Client: r.client, - ConnDetails: r.clientDetails, - }, r.mgr); err != nil { - r.mgr.Logger().With("error", err).Warn("Failed to store client connection for sharing") - } - - go func() { - closeCtx, done := r.shutSig.SoftStopCtx(context.Background()) - defer done() - - adminClient := kadm.NewClient(r.client) - - for { - ctx, done = context.WithTimeout(closeCtx, r.topicLagRefreshPeriod) - var lags kadm.DescribedGroupLags - var err error - if lags, err = adminClient.Lag(ctx, r.consumerGroup); err != nil { - r.mgr.Logger().Errorf("Failed to fetch group lags: %s", err) - } - done() - - lags.Each(func(gl kadm.DescribedGroupLag) { - for _, gl := range gl.Lag { - for _, pl := range gl { - lag := pl.Lag - if lag < 0 { - lag = 0 - } - - r.topicLagGauge.Set(lag, pl.Topic, strconv.Itoa(int(pl.Partition))) - r.topicLagCache.Store(fmt.Sprintf("%s_%d", pl.Topic, pl.Partition), lag) - } - } - }) - - select { - case <-r.shutSig.SoftStopChan(): - return - case <-time.After(r.topicLagRefreshPeriod): - } - } - }() - return nil } -// ReadBatch attempts to extract a batch of messages from the target topics. -func (r *RedpandaMigratorReader) ReadBatch(ctx context.Context) (service.MessageBatch, service.AckFunc, error) { - r.connMut.Lock() - defer r.connMut.Unlock() - - r.readMut.Lock() - defer r.readMut.Unlock() - - if r.client == nil { - return nil, nil, service.ErrNotConnected - } - - // TODO: Is there a way to wait a while until we actually get f.batchSize messages instead of returning as many as - // we have right now? Otherwise, maybe switch back to `PollFetches()` and have `batch_byte_size` and `batch_period` - // via `FetchMinBytes`, `FetchMaxBytes` and `FetchMaxWait()`? - - // TODO: Looks like when using `regexp_topics: true`, franz-go takes over a minute to discover topics which were - // created after `PollRecords()` was called for the first time. Might need to adjust the timeout for the internal - // topic cache. - fetches := r.client.PollRecords(ctx, r.batchSize) - if errs := fetches.Errors(); len(errs) > 0 { - // Any non-temporal error sets this true and we close the client - // forcing a reconnect. - nonTemporalErr := false - - for _, kerr := range errs { - // TODO: The documentation from franz-go is top-tier, it - // should be straight forward to expand this to include more - // errors that are safe to disregard. - if errors.Is(kerr.Err, context.DeadlineExceeded) || - errors.Is(kerr.Err, context.Canceled) { - continue - } - - nonTemporalErr = true - - if !errors.Is(kerr.Err, kgo.ErrClientClosed) { - r.mgr.Logger().Errorf("Kafka poll error on topic %v, partition %v: %v", kerr.Topic, kerr.Partition, kerr.Err) - } +func (rmi *redpandaMigratorInput) ReadBatch(ctx context.Context) (service.MessageBatch, service.AckFunc, error) { + for { + batch, ack, err := rmi.FranzReaderOrdered.ReadBatch(ctx) + if err != nil { + return batch, ack, err } - if nonTemporalErr { - r.client.Close() - r.client = nil - return nil, nil, service.ErrNotConnected - } - } + batch = slices.DeleteFunc(batch, func(msg *service.Message) bool { + b, err := msg.AsBytes() - // TODO: Is there a way to get the actual selected topics instead of all of them? - topics := r.client.GetConsumeTopics() - if r.consumerDetails.RegexPattern { - topics = slices.DeleteFunc(topics, func(topic string) bool { - for _, tp := range r.topicPatterns { - if tp.MatchString(topic) { - return false - } + if b == nil { + rmi.mgr.Logger().Debugf("Skipping tombstone message") + return true } - return true - }) - } - if len(topics) > 0 { - r.mgr.Logger().Debugf("Consuming from topics: %s", topics) - } else if r.consumerDetails.RegexPattern { - r.mgr.Logger().Warn("No matching topics found") - } + return err != nil + }) - if !r.outputTopicsCreated { - if err := kafka.FranzSharedClientUse(r.outputResource, r.mgr, func(details *kafka.FranzSharedClientInfo) error { - for _, topic := range topics { - if err := createTopic(ctx, topic, r.replicationFactorOverride, r.replicationFactor, r.client, details.Client); err != nil && err != errTopicAlreadyExists { - // We could end up attempting to create a topic which doesn't have any messages in it, so if that - // fails, we can just log an error and carry on. If it does contain messages, the output will - // attempt to create it again anyway and will trigger and error if it can't. - // The output `topicCache` could be populated here to avoid the redundant call to create topics, but - // it's not worth the complexity. - r.mgr.Logger().Errorf("Failed to create topic %q and ACLs: %s", topic, err) - } else { - if err == errTopicAlreadyExists { - r.mgr.Logger().Debugf("Topic %q already exists", topic) - } else { - r.mgr.Logger().Infof("Created topic %q in output cluster", topic) - } - if err := createACLs(ctx, topic, r.client, details.Client); err != nil { - r.mgr.Logger().Errorf("Failed to create ACLs for topic %q: %s", topic, err) - } - } - } - r.outputTopicsCreated = true - return nil - }); err != nil { - r.mgr.Logger().With("error", err, "resource", r.outputResource).Warn("Failed to access shared client for given resource identifier") + if len(batch) == 0 { + _ = ack(ctx, nil) // TODO: Log this error? + continue } + return batch, ack, nil } - - resBatch := make(service.MessageBatch, 0, fetches.NumRecords()) - fetches.EachRecord(func(rec *kgo.Record) { - resBatch = append(resBatch, r.recordToMessage(rec)) - }) - - return resBatch, func(ctx context.Context, res error) error { - r.readMut.Lock() - defer r.readMut.Unlock() - - // TODO: What should happen when `auto_replay_nacks: false` and a batch gets rejected followed by another one - // which gets acked? - // Also see "Res will always be nil because we initialize with service.AutoRetryNacks" comment in - // `input_kafka_franz.go` - if res != nil { - return res - } - - r.client.MarkCommitRecords(fetches.Records()...) - r.client.AllowRebalance() - - return nil - }, nil } -// Close underlying connections. -func (r *RedpandaMigratorReader) Close(ctx context.Context) error { - r.connMut.Lock() - defer r.connMut.Unlock() +func (rmi *redpandaMigratorInput) Close(ctx context.Context) error { + _, _ = kafka.FranzSharedClientPop(rmi.clientLabel, rmi.mgr) - go func() { - r.shutSig.TriggerSoftStop() - if r.client != nil { - _, _ = kafka.FranzSharedClientPop(r.clientLabel, r.mgr) - - r.client.Close() - r.client = nil - - r.shutSig.TriggerHasStopped() - } - }() - - select { - case <-r.shutSig.HasStoppedChan(): - case <-ctx.Done(): - return ctx.Err() - } - return nil + return rmi.FranzReaderOrdered.Close(ctx) } diff --git a/internal/impl/kafka/enterprise/redpanda_migrator_offsets_input.go b/internal/impl/kafka/enterprise/redpanda_migrator_offsets_input.go new file mode 100644 index 0000000000..4abb95c769 --- /dev/null +++ b/internal/impl/kafka/enterprise/redpanda_migrator_offsets_input.go @@ -0,0 +1,233 @@ +// Copyright 2024 Redpanda Data, Inc. +// +// Licensed as a Redpanda Enterprise file under the Redpanda Community +// License (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +// https://github.com/redpanda-data/connect/blob/main/licenses/rcl.md + +package enterprise + +import ( + "context" + "errors" + "fmt" + "regexp" + "slices" + + "github.com/twmb/franz-go/pkg/kgo" + "github.com/twmb/franz-go/pkg/kmsg" + + "github.com/redpanda-data/benthos/v4/public/service" + + "github.com/redpanda-data/connect/v4/internal/impl/kafka" +) + +const ( + // Consumer fields + rmoiFieldTopics = "topics" + rmoiFieldRegexpTopics = "regexp_topics" + rmoiFieldRackID = "rack_id" +) + +func redpandaMigratorOffsetsInputConfig() *service.ConfigSpec { + return service.NewConfigSpec(). + Beta(). + Categories("Services"). + Version("4.45.0"). + Summary(`Redpanda Migrator consumer group offsets input using the https://github.com/twmb/franz-go[Franz Kafka client library^].`). + Description(` +TODO: Description + +== Metadata + +This input adds the following metadata fields to each message: + +` + "```text" + ` +- kafka_key +- kafka_topic +- kafka_partition +- kafka_offset +- kafka_timestamp_unix +- kafka_timestamp_ms +- kafka_tombstone_message +- kafka_offset_topic +- kafka_offset_group +- kafka_offset_partition +- kafka_offset_commit_timestamp +- kafka_offset_metadata +` + "```" + ` +`). + Fields(redpandaMigratorOffsetsInputConfigFields()...) +} + +func redpandaMigratorOffsetsInputConfigFields() []*service.ConfigField { + return slices.Concat( + kafka.FranzConnectionFields(), + []*service.ConfigField{ + service.NewStringListField(rmoiFieldTopics). + Description(` +A list of topics to consume from. Multiple comma separated topics can be listed in a single element. When a ` + "`consumer_group`" + ` is specified partitions are automatically distributed across consumers of a topic, otherwise all partitions are consumed.`). + Example([]string{"foo", "bar"}). + Example([]string{"things.*"}). + Example([]string{"foo,bar"}). + LintRule(`if this.length() == 0 { ["at least one topic must be specified"] }`), + service.NewBoolField(rmoiFieldRegexpTopics). + Description("Whether listed topics should be interpreted as regular expression patterns for matching multiple topics."). + Default(false), + service.NewStringField(rmoiFieldRackID). + Description("A rack specifies where the client is physically located and changes fetch requests to consume from the closest replica as opposed to the leader replica."). + Default(""). + Advanced(), + }, + kafka.FranzReaderOrderedConfigFields(), + []*service.ConfigField{ + service.NewAutoRetryNacksToggleField(), + }, + ) +} + +func init() { + err := service.RegisterBatchInput("redpanda_migrator_offsets", redpandaMigratorOffsetsInputConfig(), + func(conf *service.ParsedConfig, mgr *service.Resources) (service.BatchInput, error) { + clientOpts, err := kafka.FranzConnectionOptsFromConfig(conf, mgr.Logger()) + if err != nil { + return nil, err + } + + var topics []string + if topicList, err := conf.FieldStringList(rmoiFieldTopics); err != nil { + return nil, err + } else { + topics, _, err = kafka.ParseTopics(topicList, -1, false) + if err != nil { + return nil, err + } + if len(topics) == 0 { + return nil, errors.New("at least one topic must be specified") + } + } + + var topicPatterns []*regexp.Regexp + if regexpTopics, err := conf.FieldBool(rmoiFieldRegexpTopics); err != nil { + return nil, err + } else if regexpTopics { + topicPatterns = make([]*regexp.Regexp, 0, len(topics)) + for _, topic := range topics { + tp, err := regexp.Compile(topic) + if err != nil { + return nil, fmt.Errorf("failed to compile topic regex %q: %s", topic, err) + } + topicPatterns = append(topicPatterns, tp) + } + } + + var rackID string + if rackID, err = conf.FieldString(rmoiFieldRackID); err != nil { + return nil, err + } + clientOpts = append(clientOpts, kgo.Rack(rackID)) + + // Configure `start_from_oldest: true` + clientOpts = append(clientOpts, kgo.ConsumeResetOffset(kgo.NewOffset().AtStart())) + + // Consume messages from the `__consumer_offsets` topic + clientOpts = append(clientOpts, kgo.ConsumeTopics("__consumer_offsets")) + + rdr, err := kafka.NewFranzReaderOrderedFromConfig(conf, mgr, func() ([]kgo.Opt, error) { + return clientOpts, nil + }) + if err != nil { + return nil, err + } + + return service.AutoRetryNacksBatchedToggled(conf, &redpandaMigratorOffsetsInput{ + FranzReaderOrdered: rdr, + topicPatterns: topicPatterns, + topics: topics, + mgr: mgr, + }) + }) + if err != nil { + panic(err) + } +} + +//------------------------------------------------------------------------------ + +type redpandaMigratorOffsetsInput struct { + *kafka.FranzReaderOrdered + + topicPatterns []*regexp.Regexp + topics []string + + mgr *service.Resources +} + +func (rmoi *redpandaMigratorOffsetsInput) matchesTopic(topic string) bool { + if len(rmoi.topicPatterns) > 0 { + return slices.ContainsFunc(rmoi.topicPatterns, func(tp *regexp.Regexp) bool { + return tp.MatchString(topic) + }) + } + return slices.ContainsFunc(rmoi.topics, func(t string) bool { + return t == topic + }) +} + +func (rmoi *redpandaMigratorOffsetsInput) ReadBatch(ctx context.Context) (service.MessageBatch, service.AckFunc, error) { + for { + batch, ack, err := rmoi.FranzReaderOrdered.ReadBatch(ctx) + if err != nil { + return batch, ack, err + } + + batch = slices.DeleteFunc(batch, func(msg *service.Message) bool { + var recordKey []byte + if key, ok := msg.MetaGetMut("kafka_key"); !ok { + return true + } else { + recordKey = key.([]byte) + } + + // Check the version to ensure that we process only offset commit keys + key := kmsg.NewOffsetCommitKey() + if err := key.ReadFrom(recordKey); err != nil || (key.Version != 0 && key.Version != 1) { + rmoi.mgr.Logger().Debugf("Failed to decode record key: %s", err) + return true + } + + isExpectedTopic := rmoi.matchesTopic(key.Topic) + if !isExpectedTopic { + rmoi.mgr.Logger().Tracef("Skipping updates for topic %q", key.Topic) + return true + } + + recordValue, err := msg.AsBytes() + if err != nil { + return true + } + + offsetCommitValue := kmsg.NewOffsetCommitValue() + if err = offsetCommitValue.ReadFrom(recordValue); err != nil { + rmoi.mgr.Logger().Debugf("Failed to decode offset commit value: %s", err) + return true + } + + msg.MetaSetMut("kafka_offset_topic", key.Topic) + msg.MetaSetMut("kafka_offset_group", key.Group) + msg.MetaSetMut("kafka_offset_partition", key.Partition) + msg.MetaSetMut("kafka_offset_commit_timestamp", offsetCommitValue.CommitTimestamp) + msg.MetaSetMut("kafka_offset_metadata", offsetCommitValue.Metadata) + + return false + }) + + if len(batch) == 0 { + _ = ack(ctx, nil) // TODO: Log this error? + continue + } + + return batch, ack, nil + } +} diff --git a/internal/impl/kafka/enterprise/redpanda_migrator_offsets_output.go b/internal/impl/kafka/enterprise/redpanda_migrator_offsets_output.go index 6167bba732..4e864a1c7e 100644 --- a/internal/impl/kafka/enterprise/redpanda_migrator_offsets_output.go +++ b/internal/impl/kafka/enterprise/redpanda_migrator_offsets_output.go @@ -12,13 +12,13 @@ import ( "context" "fmt" "slices" + "strconv" "sync" "time" "github.com/cenkalti/backoff/v4" "github.com/twmb/franz-go/pkg/kadm" "github.com/twmb/franz-go/pkg/kgo" - "github.com/twmb/franz-go/pkg/kmsg" "github.com/redpanda-data/benthos/v4/public/service" @@ -28,8 +28,15 @@ import ( ) const ( - rmooFieldMaxInFlight = "max_in_flight" + rmooFieldOffsetTopic = "offset_topic" + rmooFieldOffsetGroup = "offset_group" + rmooFieldOffsetPartition = "offset_partition" + rmooFieldOffsetCommitTimestamp = "offset_commit_timestamp" + rmooFieldOffsetMetadata = "offset_metadata" + + // Deprecated fields rmooFieldKafkaKey = "kafka_key" + rmooFieldMaxInFlight = "max_in_flight" ) func redpandaMigratorOffsetsOutputConfig() *service.ConfigSpec { @@ -39,20 +46,32 @@ func redpandaMigratorOffsetsOutputConfig() *service.ConfigSpec { Version("4.37.0"). Summary("Redpanda Migrator consumer group offsets output using the https://github.com/twmb/franz-go[Franz Kafka client library^]."). Description("This output can be used in combination with the `kafka_franz` input that is configured to read the `__consumer_offsets` topic."). - Fields(RedpandaMigratorOffsetsOutputConfigFields()...) + Fields(redpandaMigratorOffsetsOutputConfigFields()...) } -// RedpandaMigratorOffsetsOutputConfigFields returns the full suite of config fields for a redpanda_migrator_offsets output using the +// redpandaMigratorOffsetsOutputConfigFields returns the full suite of config fields for a redpanda_migrator_offsets output using the // franz-go client library. -func RedpandaMigratorOffsetsOutputConfigFields() []*service.ConfigField { +func redpandaMigratorOffsetsOutputConfigFields() []*service.ConfigField { return slices.Concat( kafka.FranzConnectionFields(), []*service.ConfigField{ + service.NewInterpolatedStringField(rmooFieldOffsetTopic). + Description("Kafka offset topic.").Default("${! @kafka_offset_topic }"), + service.NewInterpolatedStringField(rmooFieldOffsetGroup). + Description("Kafka offset group.").Default("${! @kafka_offset_group }"), + service.NewInterpolatedStringField(rmooFieldOffsetPartition). + Description("Kafka offset partition.").Default("${! @kafka_offset_partition }"), + service.NewInterpolatedStringField(rmooFieldOffsetCommitTimestamp). + Description("Kafka offset commit timestamp.").Default("${! @kafka_offset_commit_timestamp }"), + service.NewInterpolatedStringField(rmooFieldOffsetMetadata). + Description("Kafka offset metadata value.").Default(`${! @kafka_offset_metadata }`), + + // Deprecated fields service.NewInterpolatedStringField(rmooFieldKafkaKey). - Description("Kafka key.").Default("${! @kafka_key }"), + Description("Kafka key.").Default("${! @kafka_key }").Deprecated(), service.NewIntField(rmooFieldMaxInFlight). Description("The maximum number of batches to be sending in parallel at any given time."). - Default(1), + Default(1).Deprecated(), }, kafka.FranzProducerLimitsFields(), retries.CommonRetryBackOffFields(0, "1s", "5s", "30s"), @@ -70,10 +89,9 @@ func init() { return } - if maxInFlight, err = conf.FieldInt(rmooFieldMaxInFlight); err != nil { - return - } - output, err = NewRedpandaMigratorOffsetsWriterFromConfig(conf, mgr) + maxInFlight = 1 + + output, err = newRedpandaMigratorOffsetsWriterFromConfig(conf, mgr) return }) if err != nil { @@ -83,12 +101,16 @@ func init() { //------------------------------------------------------------------------------ -// RedpandaMigratorOffsetsWriter implements a Redpanda Migrator offsets writer using the franz-go library. -type RedpandaMigratorOffsetsWriter struct { - clientDetails *kafka.FranzConnectionDetails - clientOpts []kgo.Opt - kafkaKey *service.InterpolatedString - backoffCtor func() backoff.BackOff +// redpandaMigratorOffsetsWriter implements a Redpanda Migrator offsets writer using the franz-go library. +type redpandaMigratorOffsetsWriter struct { + clientDetails *kafka.FranzConnectionDetails + clientOpts []kgo.Opt + offsetTopic *service.InterpolatedString + offsetGroup *service.InterpolatedString + offsetPartition *service.InterpolatedString + offsetCommitTimestamp *service.InterpolatedString + offsetMetadata *service.InterpolatedString + backoffCtor func() backoff.BackOff connMut sync.Mutex client *kadm.Client @@ -96,9 +118,9 @@ type RedpandaMigratorOffsetsWriter struct { mgr *service.Resources } -// NewRedpandaMigratorOffsetsWriterFromConfig attempts to instantiate a RedpandaMigratorOffsetsWriter from a parsed config. -func NewRedpandaMigratorOffsetsWriterFromConfig(conf *service.ParsedConfig, mgr *service.Resources) (*RedpandaMigratorOffsetsWriter, error) { - w := RedpandaMigratorOffsetsWriter{ +// newRedpandaMigratorOffsetsWriterFromConfig attempts to instantiate a redpandaMigratorOffsetsWriter from a parsed config. +func newRedpandaMigratorOffsetsWriterFromConfig(conf *service.ParsedConfig, mgr *service.Resources) (*redpandaMigratorOffsetsWriter, error) { + w := redpandaMigratorOffsetsWriter{ mgr: mgr, } @@ -107,7 +129,23 @@ func NewRedpandaMigratorOffsetsWriterFromConfig(conf *service.ParsedConfig, mgr return nil, err } - if w.kafkaKey, err = conf.FieldInterpolatedString(rmooFieldKafkaKey); err != nil { + if w.offsetTopic, err = conf.FieldInterpolatedString(rmooFieldOffsetTopic); err != nil { + return nil, err + } + + if w.offsetGroup, err = conf.FieldInterpolatedString(rmooFieldOffsetGroup); err != nil { + return nil, err + } + + if w.offsetPartition, err = conf.FieldInterpolatedString(rmooFieldOffsetPartition); err != nil { + return nil, err + } + + if w.offsetCommitTimestamp, err = conf.FieldInterpolatedString(rmooFieldOffsetCommitTimestamp); err != nil { + return nil, err + } + + if w.offsetMetadata, err = conf.FieldInterpolatedString(rmooFieldOffsetMetadata); err != nil { return nil, err } @@ -125,7 +163,7 @@ func NewRedpandaMigratorOffsetsWriterFromConfig(conf *service.ParsedConfig, mgr //------------------------------------------------------------------------------ // Connect to the target seed brokers. -func (w *RedpandaMigratorOffsetsWriter) Connect(ctx context.Context) error { +func (w *redpandaMigratorOffsetsWriter) Connect(ctx context.Context) error { w.connMut.Lock() defer w.connMut.Unlock() @@ -162,7 +200,7 @@ func (w *RedpandaMigratorOffsetsWriter) Connect(ctx context.Context) error { } // Write attempts to write a message to the output cluster. -func (w *RedpandaMigratorOffsetsWriter) Write(ctx context.Context, msg *service.Message) error { +func (w *redpandaMigratorOffsetsWriter) Write(ctx context.Context, msg *service.Message) error { w.connMut.Lock() defer w.connMut.Unlock() @@ -170,46 +208,71 @@ func (w *RedpandaMigratorOffsetsWriter) Write(ctx context.Context, msg *service. return service.ErrNotConnected } - var kafkaKey []byte + var topic string var err error - // TODO: The `kafka_key` metadata field is cast from `[]byte` to string in the `kafka_franz` input, which is wrong. - if kafkaKey, err = w.kafkaKey.TryBytes(msg); err != nil { - return fmt.Errorf("failed to extract kafka key: %w", err) + if topic, err = w.offsetTopic.TryString(msg); err != nil { + return fmt.Errorf("failed to extract offset topic: %s", err) } - key := kmsg.NewOffsetCommitKey() - // Check the version to ensure that we process only offset commit keys - if err := key.ReadFrom(kafkaKey); err != nil || (key.Version != 0 && key.Version != 1) { - return nil + var group string + if group, err = w.offsetGroup.TryString(msg); err != nil { + return fmt.Errorf("failed to extract offset group: %s", err) } - msgBytes, err := msg.AsBytes() - if err != nil { - return fmt.Errorf("failed to get message bytes: %s", err) + var offsetPartition int32 + if p, err := w.offsetPartition.TryString(msg); err != nil { + return fmt.Errorf("failed to extract offset partition: %s", err) + } else { + i, err := strconv.Atoi(p) + if err != nil { + return fmt.Errorf("failed to parse offset partition: %s", err) + } + offsetPartition = int32(i) } - val := kmsg.NewOffsetCommitValue() - if err := val.ReadFrom(msgBytes); err != nil { - return fmt.Errorf("failed to decode offset commit value: %s", err) + var offsetCommitTimestamp int64 + if t, err := w.offsetCommitTimestamp.TryString(msg); err != nil { + return fmt.Errorf("failed to extract offset commit timestamp: %s", err) + } else { + offsetCommitTimestamp, err = strconv.ParseInt(t, 10, 64) + if err != nil { + return fmt.Errorf("failed to parse offset partition: %s", err) + } + } + + var offsetMetadata string + if w.offsetMetadata != nil { + if offsetMetadata, err = w.offsetMetadata.TryString(msg); err != nil { + return fmt.Errorf("failed to extract offset metadata: %w", err) + } } updateConsumerOffsets := func() error { - listedOffsets, err := w.client.ListOffsetsAfterMilli(ctx, val.CommitTimestamp, key.Topic) + listedOffsets, err := w.client.ListOffsetsAfterMilli(ctx, offsetCommitTimestamp, topic) if err != nil { return fmt.Errorf("failed to translate consumer offsets: %s", err) } if err := listedOffsets.Error(); err != nil { - return fmt.Errorf("listed offsets returned and error: %s", err) + return fmt.Errorf("listed offsets error: %s", err) } - // TODO: Add metadata to offsets! offsets := listedOffsets.Offsets() - offsets.KeepFunc(func(o kadm.Offset) bool { - return o.Partition == key.Partition - }) + // Logic extracted from offsets.KeepFunc() and adjusted to set the metadata. + for topic, partitionOffsets := range offsets { + for partition, offset := range partitionOffsets { + if offset.Partition != offsetPartition { + delete(partitionOffsets, partition) + } + offset.Metadata = offsetMetadata + partitionOffsets[partition] = offset + } + if len(partitionOffsets) == 0 { + delete(offsets, topic) + } + } - offsetResponses, err := w.client.CommitOffsets(ctx, key.Group, offsets) + offsetResponses, err := w.client.CommitOffsets(ctx, group, offsets) if err != nil { return fmt.Errorf("failed to commit consumer offsets: %s", err) } @@ -223,6 +286,7 @@ func (w *RedpandaMigratorOffsetsWriter) Write(ctx context.Context, msg *service. backOff := w.backoffCtor() for { + // TODO: Use `dispatch.TriggerSignal()` to consume new messages while `updateConsumerOffsets()` is running. err := updateConsumerOffsets() if err == nil { break @@ -230,7 +294,7 @@ func (w *RedpandaMigratorOffsetsWriter) Write(ctx context.Context, msg *service. wait := backOff.NextBackOff() if wait == backoff.Stop { - return fmt.Errorf("failed to update consumer offsets for topic %q and partition %d: %s", key.Topic, key.Partition, err) + return fmt.Errorf("failed to update consumer offsets for topic %q and partition %d: %s", topic, offsetPartition, err) } time.Sleep(wait) @@ -240,7 +304,7 @@ func (w *RedpandaMigratorOffsetsWriter) Write(ctx context.Context, msg *service. } // Close underlying connections. -func (w *RedpandaMigratorOffsetsWriter) Close(ctx context.Context) error { +func (w *redpandaMigratorOffsetsWriter) Close(ctx context.Context) error { w.connMut.Lock() defer w.connMut.Unlock() diff --git a/internal/impl/kafka/enterprise/redpanda_migrator_output.go b/internal/impl/kafka/enterprise/redpanda_migrator_output.go index a852c5dfb8..5f8d1ec62c 100644 --- a/internal/impl/kafka/enterprise/redpanda_migrator_output.go +++ b/internal/impl/kafka/enterprise/redpanda_migrator_output.go @@ -48,11 +48,12 @@ func redpandaMigratorOutputConfig() *service.ConfigSpec { Description(` Writes a batch of messages to a Kafka broker and waits for acknowledgement before propagating it back to the input. -This output should be used in combination with a `+"`redpanda_migrator`"+` input which it can query for topic and ACL configurations. +This output should be used in combination with a `+"`redpanda_migrator`"+` input identified by the label specified in +`+"`input_resource`"+` which it can query for topic and ACL configurations. Once connected, the output will attempt to +create all topics which the input consumes from along with their ACLs. -If the configured broker does not contain the current message `+"topic"+`, it attempts to create it along with the topic -ACLs which are read automatically from the `+"`redpanda_migrator`"+` input identified by the label specified in -`+"`input_resource`"+`. +If the configured broker does not contain the current message topic, this output attempts to create it along with its +ACLs. ACL migration adheres to the following principles: @@ -60,7 +61,7 @@ ACL migration adheres to the following principles: - `+"`ALLOW ALL`"+` ACLs for topics are downgraded to `+"`ALLOW READ`"+` - Only topic ACLs are migrated, group ACLs are not migrated `). - Fields(RedpandaMigratorOutputConfigFields()...). + Fields(redpandaMigratorOutputConfigFields()...). LintRule(kafka.FranzWriterConfigLints()). Example("Transfer data", "Writes messages to the configured broker and creates topics and topic ACLs if they don't exist. It also ensures that the message order is preserved.", ` output: @@ -76,17 +77,14 @@ output: `) } -// RedpandaMigratorOutputConfigFields returns the full suite of config fields for a `redpanda_migrator` output using -// the franz-go client library. -func RedpandaMigratorOutputConfigFields() []*service.ConfigField { +func redpandaMigratorOutputConfigFields() []*service.ConfigField { return slices.Concat( kafka.FranzConnectionFields(), kafka.FranzWriterConfigFields(), []*service.ConfigField{ service.NewIntField(rmoFieldMaxInFlight). Description("The maximum number of batches to be sending in parallel at any given time."). - Default(10), - service.NewBatchPolicyField(rmoFieldBatching), + Default(256), service.NewStringField(rmoFieldInputResource). Description("The label of the redpanda_migrator input from which to read the configurations for topics and ACLs which need to be created."). Default(rmiResourceDefaultLabel). @@ -107,6 +105,7 @@ func RedpandaMigratorOutputConfigFields() []*service.ConfigField { // Deprecated service.NewStringField(rmoFieldRackID).Deprecated(), + service.NewBatchPolicyField(rmoFieldBatching).Deprecated(), }, kafka.FranzProducerFields(), ) @@ -127,217 +126,195 @@ func init() { if maxInFlight, err = conf.FieldInt(rmoFieldMaxInFlight); err != nil { return } - if batchPolicy, err = conf.FieldBatchPolicy(rmoFieldBatching); err != nil { + + var inputResource string + if inputResource, err = conf.FieldString(rmoFieldInputResource); err != nil { return } - output, err = NewRedpandaMigratorWriterFromConfig(conf, mgr) - return - }) - if err != nil { - panic(err) - } -} - -//------------------------------------------------------------------------------ - -// RedpandaMigratorWriter implements a Kafka writer using the franz-go library. -type RedpandaMigratorWriter struct { - recordConverter *kafka.FranzWriter - replicationFactorOverride bool - replicationFactor int - translateSchemaIDs bool - inputResource string - schemaRegistryOutputResource srResourceKey - - clientDetails *kafka.FranzConnectionDetails - clientOpts []kgo.Opt - connMut sync.Mutex - client *kgo.Client - topicCache sync.Map - // Stores the source to destination SchemaID mapping. - schemaIDCache sync.Map - schemaRegistryOutput *schemaRegistryOutput - - clientLabel string - - mgr *service.Resources -} - -// NewRedpandaMigratorWriterFromConfig attempts to instantiate a RedpandaMigratorWriter from a parsed config. -func NewRedpandaMigratorWriterFromConfig(conf *service.ParsedConfig, mgr *service.Resources) (*RedpandaMigratorWriter, error) { - w := RedpandaMigratorWriter{ - mgr: mgr, - } - - var err error - - // NOTE: We do not provide closures for client access and yielding because - // this writer is only used for its BatchToRecords method. If we ever expand - // in order to use this as a full writer then we need to provide a full - // suite of arguments here. - if w.recordConverter, err = kafka.NewFranzWriterFromConfig(conf, nil, nil); err != nil { - return nil, err - } - - if w.clientDetails, err = kafka.FranzConnectionDetailsFromConfig(conf, mgr.Logger()); err != nil { - return nil, err - } - w.clientOpts = w.clientDetails.FranzOpts() - - var tmpOpts []kgo.Opt - if tmpOpts, err = kafka.FranzProducerOptsFromConfig(conf); err != nil { - return nil, err - } - w.clientOpts = append(w.clientOpts, tmpOpts...) - - if w.inputResource, err = conf.FieldString(rmoFieldInputResource); err != nil { - return nil, err - } - - if w.replicationFactorOverride, err = conf.FieldBool(rmoFieldRepFactorOverride); err != nil { - return nil, err - } - - if w.replicationFactor, err = conf.FieldInt(rmoFieldRepFactor); err != nil { - return nil, err - } - - if w.translateSchemaIDs, err = conf.FieldBool(rmoFieldTranslateSchemaIDs); err != nil { - return nil, err - } - - if w.translateSchemaIDs { - var res string - if res, err = conf.FieldString(rmoFieldSchemaRegistryOutputResource); err != nil { - return nil, err - } - w.schemaRegistryOutputResource = srResourceKey(res) - } - - if w.clientLabel = mgr.Label(); w.clientLabel == "" { - w.clientLabel = rmoResourceDefaultLabel - } - - return &w, nil -} - -//------------------------------------------------------------------------------ - -// Connect to the target seed brokers. -func (w *RedpandaMigratorWriter) Connect(ctx context.Context) error { - w.connMut.Lock() - defer w.connMut.Unlock() - - if w.client != nil { - return nil - } - - var err error - if w.client, err = kgo.NewClient(w.clientOpts...); err != nil { - return err - } - - // Check connectivity to cluster - if err := w.client.Ping(ctx); err != nil { - return fmt.Errorf("failed to connect to cluster: %s", err) - } - - if err = kafka.FranzSharedClientSet(w.clientLabel, &kafka.FranzSharedClientInfo{ - Client: w.client, - ConnDetails: w.clientDetails, - }, w.mgr); err != nil { - w.mgr.Logger().With("error", err).Warn("Failed to store client connection for sharing") - } - - if w.translateSchemaIDs { - if res, ok := w.mgr.GetGeneric(w.schemaRegistryOutputResource); ok { - w.schemaRegistryOutput = res.(*schemaRegistryOutput) - } else { - w.mgr.Logger().Warnf("schema_registry output resource %q not found; skipping schema ID translation", w.schemaRegistryOutputResource) - } - } - return nil -} - -// WriteBatch attempts to write a batch of messages to the target topics. -func (w *RedpandaMigratorWriter) WriteBatch(ctx context.Context, b service.MessageBatch) error { - w.connMut.Lock() - defer w.connMut.Unlock() - - if w.client == nil { - return service.ErrNotConnected - } - - records, err := w.recordConverter.BatchToRecords(ctx, b) - if err != nil { - return err - } - - var ch franz_sr.ConfluentHeader - if w.translateSchemaIDs && w.schemaRegistryOutput != nil { - for recordIdx, record := range records { - schemaID, _, err := ch.DecodeID(record.Value) - if err != nil { - return fmt.Errorf("failed to extract schema ID from message index %d: %s", recordIdx, err) + var replicationFactorOverride bool + if replicationFactorOverride, err = conf.FieldBool(rmoFieldRepFactorOverride); err != nil { + return } - var destSchemaID int - if cachedID, ok := w.schemaIDCache.Load(schemaID); !ok { - destSchemaID, err = w.schemaRegistryOutput.GetDestinationSchemaID(ctx, schemaID) - if err != nil { - return fmt.Errorf("failed to fetch destination schema ID from message index %d: %s", recordIdx, err) - } - w.schemaIDCache.Store(schemaID, destSchemaID) - } else { - destSchemaID = cachedID.(int) + var replicationFactor int + if replicationFactor, err = conf.FieldInt(rmoFieldRepFactor); err != nil { + return } - err = sr.UpdateID(record.Value, destSchemaID) - if err != nil { - return fmt.Errorf("failed to update schema ID in message index %d: %s", recordIdx, err) + var translateSchemaIDs bool + if translateSchemaIDs, err = conf.FieldBool(rmoFieldTranslateSchemaIDs); err != nil { + return } - } - } - if err := kafka.FranzSharedClientUse(w.inputResource, w.mgr, func(details *kafka.FranzSharedClientInfo) error { - for _, record := range records { - if _, ok := w.topicCache.Load(record.Topic); !ok { - if err := createTopic(ctx, record.Topic, w.replicationFactorOverride, w.replicationFactor, details.Client, w.client); err != nil && err != errTopicAlreadyExists { - return fmt.Errorf("failed to create topic %q: %s", record.Topic, err) - } else { - if err == errTopicAlreadyExists { - w.mgr.Logger().Debugf("Topic %q already exists", record.Topic) - } else { - w.mgr.Logger().Infof("Created topic %q", record.Topic) - } - if err := createACLs(ctx, record.Topic, details.Client, w.client); err != nil { - w.mgr.Logger().Errorf("Failed to create ACLs for topic %q: %s", record.Topic, err) - } - - w.topicCache.Store(record.Topic, struct{}{}) + var schemaRegistryOutputResource srResourceKey + if translateSchemaIDs { + var res string + if res, err = conf.FieldString(rmoFieldSchemaRegistryOutputResource); err != nil { + return } + schemaRegistryOutputResource = srResourceKey(res) } - } - return nil - }); err != nil { - w.mgr.Logger().With("error", err, "resource", w.inputResource).Warn("Failed to access shared client for given resource identifier") - } - return w.client.ProduceSync(ctx, records...).FirstErr() -} + var tmpOpts, clientOpts []kgo.Opt -func (w *RedpandaMigratorWriter) disconnect() { - if w.client == nil { - return - } - _, _ = kafka.FranzSharedClientPop(w.clientLabel, w.mgr) - w.client.Close() - w.client = nil -} + var connDetails *kafka.FranzConnectionDetails + if connDetails, err = kafka.FranzConnectionDetailsFromConfig(conf, mgr.Logger()); err != nil { + return + } + clientOpts = append(clientOpts, connDetails.FranzOpts()...) -// Close underlying connections. -func (w *RedpandaMigratorWriter) Close(ctx context.Context) error { - w.disconnect() - return nil + if tmpOpts, err = kafka.FranzProducerOptsFromConfig(conf); err != nil { + return + } + clientOpts = append(clientOpts, tmpOpts...) + + clientOpts = append(clientOpts, kgo.AllowAutoTopicCreation()) // TODO: Configure this? + + var client *kgo.Client + var clientMut sync.Mutex + // Stores the source to destination SchemaID mapping. + var schemaIDCache sync.Map + var topicCache sync.Map + var runOnce sync.Once + output, err = kafka.NewFranzWriterFromConfig( + conf, + kafka.NewFranzWriterHooks( + func(ctx context.Context, fn kafka.FranzSharedClientUseFn) error { + clientMut.Lock() + defer clientMut.Unlock() + + if client == nil { + var err error + if client, err = kgo.NewClient(clientOpts...); err != nil { + return err + } + } + + return fn(&kafka.FranzSharedClientInfo{Client: client, ConnDetails: connDetails}) + }).WithYieldClientFn( + func(context.Context) error { + clientMut.Lock() + defer clientMut.Unlock() + + if client == nil { + return nil + } + + client.Close() + client = nil + return nil + }).WithWriteHookFn( + func(ctx context.Context, client *kgo.Client, records []*kgo.Record) error { + // Try to create all topics which the input `redpanda_migrator` resource is configured to read + // from when we receive the first message. + runOnce.Do(func() { + err := kafka.FranzSharedClientUse(inputResource, mgr, func(details *kafka.FranzSharedClientInfo) error { + inputClient := details.Client + outputClient := client + topics := inputClient.GetConsumeTopics() + + for _, topic := range topics { + if err := createTopic(ctx, topic, replicationFactorOverride, replicationFactor, inputClient, outputClient); err != nil { + if err == errTopicAlreadyExists { + topicCache.Store(topic, struct{}{}) + mgr.Logger().Debugf("Topic %q already exists", topic) + } else { + // This may be a topic which doesn't have any messages in it, so if we + // failed to create it now, we log an error and continue. If it does contain + // messages, we'll attempt to create it again anyway when receiving a + // message from it. + mgr.Logger().Errorf("Failed to create topic %q and ACLs: %s", topic, err) + } + + continue + } + + mgr.Logger().Infof("Created topic %q", topic) + + if err := createACLs(ctx, topic, inputClient, outputClient); err != nil { + mgr.Logger().Errorf("Failed to create ACLs for topic %q: %s", topic, err) + } + + topicCache.Store(topic, struct{}{}) + } + + return nil + }) + if err != nil { + mgr.Logger().Errorf("Failed to fetch topics from input %q: %s", inputResource, err) + } + }) + + if translateSchemaIDs { + if res, ok := mgr.GetGeneric(schemaRegistryOutputResource); ok { + srOutput := res.(*schemaRegistryOutput) + + var ch franz_sr.ConfluentHeader + for recordIdx, record := range records { + schemaID, _, err := ch.DecodeID(record.Value) + if err != nil { + mgr.Logger().Warnf("Failed to extract schema ID from message index %d on topic %q: %s", recordIdx, record.Topic, err) + continue + } + + var destSchemaID int + if cachedID, ok := schemaIDCache.Load(schemaID); !ok { + destSchemaID, err = srOutput.GetDestinationSchemaID(ctx, schemaID) + if err != nil { + mgr.Logger().Warnf("Failed to fetch destination schema ID from message index %d on topic %q: %s", recordIdx, record.Topic, err) + continue + } + schemaIDCache.Store(schemaID, destSchemaID) + } else { + destSchemaID = cachedID.(int) + } + + err = sr.UpdateID(record.Value, destSchemaID) + if err != nil { + mgr.Logger().Warnf("Failed to update schema ID in message index %d on topic %s: %q", recordIdx, record.Topic, err) + continue + } + } + } else { + mgr.Logger().Warnf("schema_registry output resource %q not found; skipping schema ID translation", schemaRegistryOutputResource) + return nil + } + + } + + // The current record may be coming from a topic which was created later during runtime, so we + // need to try and create it if we haven't done so already. + if err := kafka.FranzSharedClientUse(inputResource, mgr, func(details *kafka.FranzSharedClientInfo) error { + for _, record := range records { + if _, ok := topicCache.Load(record.Topic); !ok { + if err := createTopic(ctx, record.Topic, replicationFactorOverride, replicationFactor, details.Client, client); err != nil { + if err == errTopicAlreadyExists { + mgr.Logger().Debugf("Topic %q already exists", record.Topic) + } else { + return fmt.Errorf("failed to create topic %q and ACLs: %s", record.Topic, err) + } + } + + mgr.Logger().Infof("Created topic %q", record.Topic) + + if err := createACLs(ctx, record.Topic, details.Client, client); err != nil { + mgr.Logger().Errorf("Failed to create ACLs for topic %q: %s", record.Topic, err) + } + + topicCache.Store(record.Topic, struct{}{}) + } + } + return nil + }); err != nil { + mgr.Logger().With("error", err, "resource", inputResource).Warn("Failed to access shared client for given resource identifier") + } + + return nil + })) + return + }) + if err != nil { + panic(err) + } } diff --git a/internal/impl/kafka/franz_reader.go b/internal/impl/kafka/franz_reader.go index 345900e321..78bc75faa7 100644 --- a/internal/impl/kafka/franz_reader.go +++ b/internal/impl/kafka/franz_reader.go @@ -37,7 +37,8 @@ func bytesFromStrField(name string, pConf *service.ParsedConfig) (uint64, error) return fieldAsBytes, nil } -func bytesFromStrFieldAsInt32(name string, pConf *service.ParsedConfig) (int32, error) { +// BytesFromStrFieldAsInt32 attempts to parse string field containing a human-readable byte size +func BytesFromStrFieldAsInt32(name string, pConf *service.ParsedConfig) (int32, error) { ui64, err := bytesFromStrField(name, pConf) if err != nil { return 0, err @@ -168,13 +169,13 @@ func FranzConsumerDetailsFromConfig(conf *service.ParsedConfig) (*FranzConsumerD return nil, err } - if d.FetchMaxBytes, err = bytesFromStrFieldAsInt32(kfrFieldFetchMaxBytes, conf); err != nil { + if d.FetchMaxBytes, err = BytesFromStrFieldAsInt32(kfrFieldFetchMaxBytes, conf); err != nil { return nil, err } - if d.FetchMinBytes, err = bytesFromStrFieldAsInt32(kfrFieldFetchMinBytes, conf); err != nil { + if d.FetchMinBytes, err = BytesFromStrFieldAsInt32(kfrFieldFetchMinBytes, conf); err != nil { return nil, err } - if d.FetchMaxPartitionBytes, err = bytesFromStrFieldAsInt32(kfrFieldFetchMaxPartitionBytes, conf); err != nil { + if d.FetchMaxPartitionBytes, err = BytesFromStrFieldAsInt32(kfrFieldFetchMaxPartitionBytes, conf); err != nil { return nil, err } diff --git a/internal/impl/kafka/franz_reader_ordered.go b/internal/impl/kafka/franz_reader_ordered.go index d32f57a08e..b5fae83b99 100644 --- a/internal/impl/kafka/franz_reader_ordered.go +++ b/internal/impl/kafka/franz_reader_ordered.go @@ -17,6 +17,8 @@ package kafka import ( "context" "errors" + "fmt" + "strconv" "sync" "sync/atomic" "time" @@ -24,17 +26,20 @@ import ( "github.com/Jeffail/checkpoint" "github.com/Jeffail/shutdown" "github.com/cenkalti/backoff/v4" + "github.com/twmb/franz-go/pkg/kadm" "github.com/twmb/franz-go/pkg/kgo" "github.com/redpanda-data/benthos/v4/public/service" + "github.com/redpanda-data/connect/v4/internal/asyncroutine" "github.com/redpanda-data/connect/v4/internal/dispatch" ) const ( - kroFieldConsumerGroup = "consumer_group" - kroFieldCommitPeriod = "commit_period" - kroFieldPartitionBuffer = "partition_buffer_bytes" + kroFieldConsumerGroup = "consumer_group" + kroFieldCommitPeriod = "commit_period" + kroFieldPartitionBuffer = "partition_buffer_bytes" + kroFieldTopicLagRefreshPeriod = "topic_lag_refresh_period" ) // FranzReaderOrderedConfigFields returns config fields for customising the @@ -52,6 +57,10 @@ func FranzReaderOrderedConfigFields() []*service.ConfigField { Description("A buffer size (in bytes) for each consumed partition, allowing records to be queued internally before flushing. Increasing this may improve throughput at the cost of higher memory utilisation. Note that each buffer can grow slightly beyond this value."). Default("1MB"). Advanced(), + service.NewDurationField(kroFieldTopicLagRefreshPeriod). + Description("The period of time between each topic lag refresh cycle."). + Default("5s"). + Advanced(), } } @@ -61,21 +70,24 @@ func FranzReaderOrderedConfigFields() []*service.ConfigField { type FranzReaderOrdered struct { clientOpts func() ([]kgo.Opt, error) - partState *partitionState + partState *partitionState + lagUpdater *asyncroutine.Periodic + topicLagGauge *service.MetricGauge + topicLagCache sync.Map + Client *kgo.Client - consumerGroup string - commitPeriod time.Duration - cacheLimit uint64 - - readBackOff backoff.BackOff + consumerGroup string + commitPeriod time.Duration + topicLagRefreshPeriod time.Duration + cacheLimit uint64 + readBackOff backoff.BackOff res *service.Resources log *service.Logger shutSig *shutdown.Signaller } -// NewFranzReaderOrderedFromConfig attempts to instantiate a new -// FranzReaderOrdered reader from a parsed config. +// NewFranzReaderOrderedFromConfig attempts to instantiate a new FranzReaderOrdered reader from a parsed config. func NewFranzReaderOrderedFromConfig(conf *service.ParsedConfig, res *service.Resources, optsFn func() ([]kgo.Opt, error)) (*FranzReaderOrdered, error) { readBackOff := backoff.NewExponentialBackOff() readBackOff.InitialInterval = time.Millisecond @@ -83,11 +95,12 @@ func NewFranzReaderOrderedFromConfig(conf *service.ParsedConfig, res *service.Re readBackOff.MaxElapsedTime = 0 f := FranzReaderOrdered{ - readBackOff: readBackOff, - res: res, - log: res.Logger(), - shutSig: shutdown.NewSignaller(), - clientOpts: optsFn, + readBackOff: readBackOff, + res: res, + log: res.Logger(), + shutSig: shutdown.NewSignaller(), + clientOpts: optsFn, + topicLagGauge: res.Metrics().NewGauge("redpanda_lag", "topic", "partition"), } f.consumerGroup, _ = conf.FieldString(kroFieldConsumerGroup) @@ -101,6 +114,10 @@ func NewFranzReaderOrderedFromConfig(conf *service.ParsedConfig, res *service.Re return nil, err } + if f.topicLagRefreshPeriod, err = conf.FieldDuration(kroFieldTopicLagRefreshPeriod); err != nil { + return nil, err + } + return &f, nil } @@ -115,7 +132,17 @@ func (f *FranzReaderOrdered) recordsToBatch(records []*kgo.Record) *batchWithRec var batch service.MessageBatch for _, r := range records { length += uint64(len(r.Value) + len(r.Key)) - batch = append(batch, FranzRecordToMessageV1(r)) + + lag := int64(0) + if val, ok := f.topicLagCache.Load(fmt.Sprintf("%s_%d", r.Topic, r.Partition)); ok { + lag = val.(int64) + } + + msg := FranzRecordToMessageV1(r) + msg.MetaSetMut("kafka_lag", lag) + + batch = append(batch, msg) + // The record lives on for checkpointing, but we don't need the contents // going forward so discard these. This looked fine to me but could // potentially be a source of problems so treat this as sus. @@ -336,14 +363,13 @@ func (f *FranzReaderOrdered) Connect(ctx context.Context) error { return err } - var cl *kgo.Client commitFn := func(r *kgo.Record) {} if f.consumerGroup != "" { commitFn = func(r *kgo.Record) { - if cl == nil { + if f.Client == nil { return } - cl.MarkCommitRecords(r) + f.Client.MarkCommitRecords(r) } } @@ -376,13 +402,47 @@ func (f *FranzReaderOrdered) Connect(ctx context.Context) error { ) } - if cl, err = kgo.NewClient(clientOpts...); err != nil { + if f.Client, err = kgo.NewClient(clientOpts...); err != nil { return err } + // Check connectivity to cluster + if err = f.Client.Ping(ctx); err != nil { + return fmt.Errorf("failed to connect to cluster: %s", err) + } + + if f.lagUpdater != nil { + f.lagUpdater.Stop() + } + adminClient := kadm.NewClient(f.Client) + f.lagUpdater = asyncroutine.NewPeriodicWithContext(f.topicLagRefreshPeriod, func(ctx context.Context) { + ctx, done := context.WithTimeout(ctx, f.topicLagRefreshPeriod) + defer done() + + lags, err := adminClient.Lag(ctx, f.consumerGroup) + if err != nil { + f.log.Debugf("Failed to fetch group lags: %s", err) + } + + lags.Each(func(gl kadm.DescribedGroupLag) { + for _, gl := range gl.Lag { + for _, pl := range gl { + lag := pl.Lag + if lag < 0 { + lag = 0 + } + + f.topicLagGauge.Set(lag, pl.Topic, strconv.Itoa(int(pl.Partition))) + f.topicLagCache.Store(fmt.Sprintf("%s_%d", pl.Topic, pl.Partition), lag) + } + } + }) + }) + f.lagUpdater.Start() + go func() { defer func() { - cl.Close() + f.Client.Close() if f.shutSig.IsSoftStopSignalled() { f.shutSig.TriggerHasStopped() } @@ -399,7 +459,7 @@ func (f *FranzReaderOrdered) Connect(ctx context.Context) error { // In this case we don't want to actually resume any of them yet so // I add a forced timeout to deal with it. stallCtx, pollDone := context.WithTimeout(closeCtx, time.Second) - fetches := cl.PollFetches(stallCtx) + fetches := f.Client.PollFetches(stallCtx) pollDone() if errs := fetches.Errors(); len(errs) > 0 { @@ -424,7 +484,7 @@ func (f *FranzReaderOrdered) Connect(ctx context.Context) error { } if nonTemporalErr { - cl.Close() + f.Client.Close() return } } @@ -434,17 +494,24 @@ func (f *FranzReaderOrdered) Connect(ctx context.Context) error { pauseTopicPartitions := map[string][]int32{} fetches.EachPartition(func(p kgo.FetchTopicPartition) { - if len(p.Records) > 0 { - if checkpoints.addRecords(p.Topic, p.Partition, f.recordsToBatch(p.Records), f.cacheLimit) { - pauseTopicPartitions[p.Topic] = append(pauseTopicPartitions[p.Topic], p.Partition) - } + if len(p.Records) == 0 { + return + } + + batch := f.recordsToBatch(p.Records) + if len(batch.b) == 0 { + return + } + + if checkpoints.addRecords(p.Topic, p.Partition, batch, f.cacheLimit) { + pauseTopicPartitions[p.Topic] = append(pauseTopicPartitions[p.Topic], p.Partition) } }) - _ = cl.PauseFetchPartitions(pauseTopicPartitions) + _ = f.Client.PauseFetchPartitions(pauseTopicPartitions) noActivePartitions: for { - pausedPartitionTopics := cl.PauseFetchPartitions(nil) + pausedPartitionTopics := f.Client.PauseFetchPartitions(nil) // Walk all the disabled topic partitions and check whether any // of them can be resumed. @@ -457,7 +524,7 @@ func (f *FranzReaderOrdered) Connect(ctx context.Context) error { } } if len(resumeTopicPartitions) > 0 { - cl.ResumeFetchPartitions(resumeTopicPartitions) + f.Client.ResumeFetchPartitions(resumeTopicPartitions) } if len(f.consumerGroup) == 0 || len(resumeTopicPartitions) > 0 || checkpoints.tallyActivePartitions(pausedPartitionTopics) > 0 { diff --git a/internal/impl/kafka/franz_reader_unordered.go b/internal/impl/kafka/franz_reader_unordered.go index 0e68b99468..4b36020395 100644 --- a/internal/impl/kafka/franz_reader_unordered.go +++ b/internal/impl/kafka/franz_reader_unordered.go @@ -17,6 +17,7 @@ package kafka import ( "context" "errors" + "fmt" "sync" "sync/atomic" "time" @@ -491,6 +492,11 @@ func (f *FranzReaderUnordered) Connect(ctx context.Context) error { return err } + // Check connectivity to cluster + if err = cl.Ping(ctx); err != nil { + return fmt.Errorf("failed to connect to cluster: %s", err) + } + go func() { defer func() { cl.Close() diff --git a/internal/impl/kafka/franz_writer.go b/internal/impl/kafka/franz_writer.go index 076437fea5..ea7e44befc 100644 --- a/internal/impl/kafka/franz_writer.go +++ b/internal/impl/kafka/franz_writer.go @@ -253,6 +253,29 @@ func FranzWriterConfigLints() string { }` } +type franzWriterHooks struct { + accessClientFn func(context.Context, FranzSharedClientUseFn) error + yieldClientFn func(context.Context) error + writeHookFn func(ctx context.Context, client *kgo.Client, records []*kgo.Record) error +} + +// NewFranzWriterHooks creates a new franzWriterHooks instance with a hook function that's executed to fetch the client. +func NewFranzWriterHooks(fn func(context.Context, FranzSharedClientUseFn) error) franzWriterHooks { + return franzWriterHooks{accessClientFn: fn} +} + +// WithYieldClientFn adds a hook function that's executed during close to yield the client. +func (h franzWriterHooks) WithYieldClientFn(fn func(context.Context) error) franzWriterHooks { + h.yieldClientFn = fn + return h +} + +// WithWriteHookFn adds a hook function that's executed before a message batch is written. +func (h franzWriterHooks) WithWriteHookFn(fn func(ctx context.Context, client *kgo.Client, records []*kgo.Record) error) franzWriterHooks { + h.writeHookFn = fn + return h +} + // FranzWriter implements a Kafka writer using the franz-go library. type FranzWriter struct { Topic *service.InterpolatedString @@ -261,18 +284,14 @@ type FranzWriter struct { Timestamp *service.InterpolatedString IsTimestampMs bool MetaFilter *service.MetadataFilter - - accessClientFn func(FranzSharedClientUseFn) error - yieldClientFn func(context.Context) error + hooks franzWriterHooks } -// NewFranzWriterFromConfig uses a parsed config to extract customisation for -// writing data to a Kafka broker. A closure function must be provided that is -// responsible for granting access to a connected client. -func NewFranzWriterFromConfig(conf *service.ParsedConfig, accessClientFn func(FranzSharedClientUseFn) error, yieldClientFn func(context.Context) error) (*FranzWriter, error) { +// NewFranzWriterFromConfig uses a parsed config to extract customisation for writing data to a Kafka broker. A closure +// function must be provided that is responsible for granting access to a connected client. +func NewFranzWriterFromConfig(conf *service.ParsedConfig, hooks franzWriterHooks) (*FranzWriter, error) { w := FranzWriter{ - accessClientFn: accessClientFn, - yieldClientFn: yieldClientFn, + hooks: hooks, } var err error @@ -394,7 +413,7 @@ func (w *FranzWriter) BatchToRecords(ctx context.Context, b service.MessageBatch // Connect to the target seed brokers. func (w *FranzWriter) Connect(ctx context.Context) error { - return w.accessClientFn(func(details *FranzSharedClientInfo) error { + return w.hooks.accessClientFn(ctx, func(details *FranzSharedClientInfo) error { // Check connectivity to cluster if err := details.Client.Ping(ctx); err != nil { return fmt.Errorf("failed to connect to cluster: %s", err) @@ -408,12 +427,18 @@ func (w *FranzWriter) WriteBatch(ctx context.Context, b service.MessageBatch) er if len(b) == 0 { return nil } - return w.accessClientFn(func(details *FranzSharedClientInfo) error { + return w.hooks.accessClientFn(ctx, func(details *FranzSharedClientInfo) error { records, err := w.BatchToRecords(ctx, b) if err != nil { return err } + if w.hooks.writeHookFn != nil { + if err := w.hooks.writeHookFn(ctx, details.Client, records); err != nil { + return fmt.Errorf("on write hook failed: %s", err) + } + } + var ( wg sync.WaitGroup results = make(kgo.ProduceResults, 0, len(records)) @@ -438,5 +463,9 @@ func (w *FranzWriter) WriteBatch(ctx context.Context, b service.MessageBatch) er // Close calls into the provided yield client func. func (w *FranzWriter) Close(ctx context.Context) error { - return w.yieldClientFn(ctx) + if w.hooks.yieldClientFn != nil { + return w.hooks.yieldClientFn(ctx) + } + + return nil } diff --git a/internal/impl/kafka/input_redpanda.go b/internal/impl/kafka/input_redpanda.go index 5c8bd94664..b8a6217c35 100644 --- a/internal/impl/kafka/input_redpanda.go +++ b/internal/impl/kafka/input_redpanda.go @@ -60,6 +60,10 @@ output: Records are processed and delivered from each partition in batches as received from brokers. These batch sizes are therefore dynamically sized in order to optimise throughput, but can be tuned with the config fields ` + "`fetch_max_partition_bytes` and `fetch_max_bytes`" + `. Batches can be further broken down using the ` + "xref:components:processors/split.adoc[`split`] processor" + `. +== Metrics + +Emits a ` + "`redpanda_lag`" + ` metric with ` + "`topic`" + ` and ` + "`partition`" + ` labels for each consumed topic. + == Metadata This input adds the following metadata fields to each message: @@ -69,6 +73,7 @@ This input adds the following metadata fields to each message: - kafka_topic - kafka_partition - kafka_offset +- kafka_lag - kafka_timestamp_ms - kafka_timestamp_unix - kafka_tombstone_message diff --git a/internal/impl/kafka/output_kafka_franz.go b/internal/impl/kafka/output_kafka_franz.go index 0938312a5f..033e321b54 100644 --- a/internal/impl/kafka/output_kafka_franz.go +++ b/internal/impl/kafka/output_kafka_franz.go @@ -97,25 +97,29 @@ func init() { var client *kgo.Client - output, err = NewFranzWriterFromConfig(conf, func(fn FranzSharedClientUseFn) error { - if client == nil { - var err error - if client, err = kgo.NewClient(clientOpts...); err != nil { - return err - } - } - return fn(&FranzSharedClientInfo{ - Client: client, - ConnDetails: connDetails, - }) - }, func(context.Context) error { - if client == nil { - return nil - } - client.Close() - client = nil - return nil - }) + output, err = NewFranzWriterFromConfig( + conf, + NewFranzWriterHooks( + func(_ context.Context, fn FranzSharedClientUseFn) error { + if client == nil { + var err error + if client, err = kgo.NewClient(clientOpts...); err != nil { + return err + } + } + return fn(&FranzSharedClientInfo{ + Client: client, + ConnDetails: connDetails, + }) + }).WithYieldClientFn( + func(context.Context) error { + if client == nil { + return nil + } + client.Close() + client = nil + return nil + })) return }) if err != nil { diff --git a/internal/impl/kafka/output_redpanda.go b/internal/impl/kafka/output_redpanda.go index 76ec443d2d..7433073a48 100644 --- a/internal/impl/kafka/output_redpanda.go +++ b/internal/impl/kafka/output_redpanda.go @@ -83,31 +83,35 @@ func init() { var client *kgo.Client var clientMut sync.Mutex - output, err = NewFranzWriterFromConfig(conf, func(fn FranzSharedClientUseFn) error { - clientMut.Lock() - defer clientMut.Unlock() - - if client == nil { - var err error - if client, err = kgo.NewClient(clientOpts...); err != nil { - return err - } - } - return fn(&FranzSharedClientInfo{ - Client: client, - ConnDetails: connDetails, - }) - }, func(context.Context) error { - clientMut.Lock() - defer clientMut.Unlock() - - if client == nil { - return nil - } - client.Close() - client = nil - return nil - }) + output, err = NewFranzWriterFromConfig( + conf, + NewFranzWriterHooks( + func(_ context.Context, fn FranzSharedClientUseFn) error { + clientMut.Lock() + defer clientMut.Unlock() + + if client == nil { + var err error + if client, err = kgo.NewClient(clientOpts...); err != nil { + return err + } + } + return fn(&FranzSharedClientInfo{ + Client: client, + ConnDetails: connDetails, + }) + }).WithYieldClientFn( + func(context.Context) error { + clientMut.Lock() + defer clientMut.Unlock() + + if client == nil { + return nil + } + client.Close() + client = nil + return nil + })) return }) if err != nil { diff --git a/internal/impl/ockam/output_kafka.go b/internal/impl/ockam/output_kafka.go index f07e5d6c66..ae77510eb9 100644 --- a/internal/impl/ockam/output_kafka.go +++ b/internal/impl/ockam/output_kafka.go @@ -204,24 +204,26 @@ func newOckamKafkaOutput(conf *service.ParsedConfig, log *service.Logger) (*ocka ) var client *kgo.Client - kafkaWriter, err := kafka.NewFranzWriterFromConfig(conf.Namespace("kafka"), func(fn kafka.FranzSharedClientUseFn) error { - if client == nil { - var err error - if client, err = kgo.NewClient(clientOpts...); err != nil { - return err + kafkaWriter, err := kafka.NewFranzWriterFromConfig( + conf.Namespace("kafka"), + kafka.NewFranzWriterHooks(func(_ context.Context, fn kafka.FranzSharedClientUseFn) error { + if client == nil { + var err error + if client, err = kgo.NewClient(clientOpts...); err != nil { + return err + } } - } - return fn(&kafka.FranzSharedClientInfo{ - Client: client, - }) - }, func(context.Context) error { - if client == nil { + return fn(&kafka.FranzSharedClientInfo{ + Client: client, + }) + }).WithYieldClientFn(func(context.Context) error { + if client == nil { + return nil + } + client.Close() + client = nil return nil - } - client.Close() - client = nil - return nil - }) + })) if err != nil { return nil, err } diff --git a/internal/plugins/info.csv b/internal/plugins/info.csv index 9265151f37..ad394732fc 100644 --- a/internal/plugins/info.csv +++ b/internal/plugins/info.csv @@ -206,6 +206,7 @@ redpanda_migrator ,input ,redpanda_migrator ,4.37.0 ,enterp redpanda_migrator ,output ,redpanda_migrator ,4.37.0 ,enterprise ,n ,y ,y redpanda_migrator_bundle ,input ,redpanda_migrator_bundle ,4.37.0 ,enterprise ,n ,y ,y redpanda_migrator_bundle ,output ,redpanda_migrator_bundle ,4.37.0 ,enterprise ,n ,y ,y +redpanda_migrator_offsets ,input ,redpanda_migrator_offsets ,4.45.0 ,enterprise ,n ,y ,y redpanda_migrator_offsets ,output ,redpanda_migrator_offsets ,4.37.0 ,enterprise ,n ,y ,y reject ,output ,reject ,0.0.0 ,certified ,n ,y ,y reject_errored ,output ,reject_errored ,0.0.0 ,certified ,n ,y ,y