You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I searched in the issues and found nothing similar.
Motivation
This issue aims to enhance our Flink connector by introducing support for the $changelog auxiliary table. This feature is essential for capturing and processing change data capture (CDC) events seamlessly within Flink streaming jobs.
Fluss primary key tables support change data capature to track row-level changes for updates and deletes. When streaming read the primary key table, the flink connector emit records with Flink native RowKind (INSERT, UPDATE_BEFORE, UPDATE_AFTER, DELETE) to enable stateful computation on changelogs. On the other hand, there are many use cases that users want to consume the plain logs without converting into Flink native RowKind. This feature is similar to Paimon $audit_log table, and Databricks table_changes(..) query.
Solution
Implementation
FlinkCatalog supports getTable for <table_name>$changelog table path, and the returned table should include additional metadata columns (see following).
FlinkRecordEmitter of FlinkSourceReader should have a special FlussRowToFlinkRowConverter that converts the Fluss InternalRow into Flink RowData with the additional metadata columns.
CoordinatorService#createTable should add validation that whether the created table using system reserved columns (_change_type, _log_offset, _commit_timestamp).
Schema of the $changelog table
Column Name
Type
Values
_change_type
String
+I, -U, +U, -D
_log_offset
long
the offset of the log
_commit_timestamp
TIMESTAMP_LTZ
the timestamp associated when the change was happended
Search before asking
Motivation
This issue aims to enhance our Flink connector by introducing support for the
$changelog
auxiliary table. This feature is essential for capturing and processing change data capture (CDC) events seamlessly within Flink streaming jobs.Fluss primary key tables support change data capature to track row-level changes for updates and deletes. When streaming read the primary key table, the flink connector emit records with Flink native
RowKind
(INSERT
,UPDATE_BEFORE
,UPDATE_AFTER
,DELETE
) to enable stateful computation on changelogs. On the other hand, there are many use cases that users want to consume the plain logs without converting into Flink nativeRowKind
. This feature is similar to Paimon$audit_log
table, and Databrickstable_changes(..)
query.Solution
Implementation
FlinkCatalog
supportsgetTable
for<table_name>$changelog
table path, and the returned table should include additional metadata columns (see following).FlinkRecordEmitter
ofFlinkSourceReader
should have a specialFlussRowToFlinkRowConverter
that converts the FlussInternalRow
into FlinkRowData
with the additional metadata columns.CoordinatorService#createTable
should add validation that whether the created table using system reserved columns (_change_type
,_log_offset
,_commit_timestamp
).Schema of the
$changelog
table_change_type
+I
,-U
,+U
,-D
_log_offset
_commit_timestamp
Reference: https://docs.databricks.com/en/delta/delta-change-data-feed.html#what-is-the-schema-for-the-change-data-feed
Anything else?
You can take Paimon
$audit_log
implementation as an example: https://github.com/apache/paimon/blob/release-1.0/paimon-core/src/main/java/org/apache/paimon/table/system/SystemTableLoader.java#L69Willingness to contribute
The text was updated successfully, but these errors were encountered: