Releases · lensesio/stream-reactor

DataLakes (S3, GCP) source fixes

Polling Backoff

The connector incurs high costs when there is no data available in the buckets because it continuously polls the data lake in a tight loop, as controlled by Kafka Connect.

From this version by default a backoff queue is used, introducing a standard method for backing off calls to the underlying cloud platform.

Avoid filtering by lastSeenFile where a post process action is configured

When ordering by LastModified and a post-process action is configured, avoid filtering to the latest result.

This change avoids bugs caused by inconsistent LastModified dates used for sorting.
If LastModified sorting is used, ensure objects do not arrive late, or use a post-processing step to handle them.

Add a flag to populate kafka headers with the watermark partition/offset

This adds a connector property for GCP Storage and S3 Sources:
connect.s3.source.write.watermark.header
connect.gcpstorage.source.write.watermark.header

If set to true then the headers in the source record produced will include details of the source and line number of the file.

If set to false (the default) then the headers won't be set.

Currently this does not apply when using the envelope mode.

Enhance DataLake Source Connectors: Robust State Management and Move Location Path Handling

This release addresses two critical issues:

Corrupted connector state when DELETE/MOVE is used: The connector is designed to store the last processed document and its location within its state for every message sent to Kafka. This mechanism ensures that the connector can resume processing from the correct point in case of a restart. However, when the connector is configured with a post-operation to move or delete processed objects within the data lake, it stores the last processed object in its state. If the connector restarts and the referenced object has been moved or deleted externally, the state points to a non-existent object, causing the connector to fail. The current workaround requires manually cleaning the state and restarting the connector, which is inefficient and error-prone.
Incorrect Handling of Move Location Prefixes: When configuring the move location within the data lake, if the prefix ends with a forward slash (/), it results in malformed keys like a//b. Such incorrect paths can break compatibility with query engines like Athena, which may not handle double slashes properly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DataLakes (S3, GCP) source fixes

Polling Backoff

Avoid filtering by lastSeenFile where a post process action is configured

Add a flag to populate kafka headers with the watermark partition/offset

Enhance DataLake Source Connectors: Robust State Management and Move Location Path Handling

Azure Service Bus source

Releases: lensesio/stream-reactor

Stream Reactor 8.1.29

Stream Reactor 8.1.28

Stream Reactor 8.1.27

Stream Reactor 8.1.26

Stream Reactor 8.1.25

Stream Reactor 8.1.24

Stream Reactor 8.1.23

DataLakes (S3, GCP) source fixes

Polling Backoff

Avoid filtering by lastSeenFile where a post process action is configured

Add a flag to populate kafka headers with the watermark partition/offset

Stream Reactor 8.1.22

Enhance DataLake Source Connectors: Robust State Management and Move Location Path Handling

Stream Reactor 8.1.21

Azure Service Bus source

Stream Reactor 8.1.20