Support $changelog auxiliary table for flink connector #356

wuchong · 2025-02-08T03:29:02Z

Search before asking

I searched in the issues and found nothing similar.

Motivation

This issue aims to enhance our Flink connector by introducing support for the $changelog auxiliary table. This feature is essential for capturing and processing change data capture (CDC) events seamlessly within Flink streaming jobs.

Fluss primary key tables support change data capature to track row-level changes for updates and deletes. When streaming read the primary key table, the flink connector emit records with Flink native RowKind (INSERT, UPDATE_BEFORE, UPDATE_AFTER, DELETE) to enable stateful computation on changelogs. On the other hand, there are many use cases that users want to consume the plain logs without converting into Flink native RowKind. This feature is similar to Paimon $audit_log table, and Databricks table_changes(..) query.

Solution

Implementation

FlinkCatalog supports getTable for <table_name>$changelog table path, and the returned table should include additional metadata columns (see following).
FlinkRecordEmitter of FlinkSourceReader should have a special FlussRowToFlinkRowConverter that converts the Fluss InternalRow into Flink RowData with the additional metadata columns.
CoordinatorService#createTable should add validation that whether the created table using system reserved columns (_change_type, _log_offset, _commit_timestamp).

Schema of the `$changelog` table

Column Name	Type	Values
`_change_type`	String	`+I`, `-U`, `+U`, `-D`
`_log_offset`	long	the offset of the log
`_commit_timestamp`	TIMESTAMP_LTZ	the timestamp associated when the change was happended

Reference: https://docs.databricks.com/en/delta/delta-change-data-feed.html#what-is-the-schema-for-the-change-data-feed

Anything else?

You can take Paimon $audit_log implementation as an example: https://github.com/apache/paimon/blob/release-1.0/paimon-core/src/main/java/org/apache/paimon/table/system/SystemTableLoader.java#L69

Willingness to contribute

I'm willing to submit a PR!

The text was updated successfully, but these errors were encountered:

MehulBatra · 2025-02-22T03:34:04Z

I would like to take a stab at it! @wuchong

wuchong · 2025-02-22T04:25:20Z

Thank you, @MehulBatra. I've assigned the task to you. Please feel free to reach out if you need any additional guidance after your investigation.

wuchong added the component=connector/flink label Feb 8, 2025

wuchong added the good first issue Good for newcomers label Feb 22, 2025

wuchong assigned MehulBatra Feb 22, 2025

MehulBatra linked a pull request Mar 1, 2025 that will close this issue

[connector] flink read support for fluss primary key table changelog auxiliary table/ cdc events #510

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support $changelog auxiliary table for flink connector #356

Support $changelog auxiliary table for flink connector #356

wuchong commented Feb 8, 2025 •

edited

Loading

MehulBatra commented Feb 22, 2025

wuchong commented Feb 22, 2025

Support $changelog auxiliary table for flink connector #356

Support $changelog auxiliary table for flink connector #356

Comments

wuchong commented Feb 8, 2025 • edited Loading

Search before asking

Motivation

Solution

Implementation

Schema of the $changelog table

Anything else?

Willingness to contribute

MehulBatra commented Feb 22, 2025

wuchong commented Feb 22, 2025

wuchong commented Feb 8, 2025 •

edited

Loading

Schema of the `$changelog` table