Skip to content

[lake/hudi] Introduce Hudi lake writer support for tiering tables#3507

Merged
luoyuxia merged 3 commits into
apache:mainfrom
fhan688:Introduce-Hudi-LakeWriter-to-support-tiering-table
Jun 22, 2026
Merged

[lake/hudi] Introduce Hudi lake writer support for tiering tables#3507
luoyuxia merged 3 commits into
apache:mainfrom
fhan688:Introduce-Hudi-LakeWriter-to-support-tiering-table

Conversation

@fhan688

@fhan688 fhan688 commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Purpose

Linked issue: #3280

This PR introduces the writer-side implementation for tiering Fluss tables to Hudi.

It builds on the previous lake/hudi PRs that added Hudi catalog/source support and the tiering writer init metadata such as split index and tiering round timestamp. This PR only migrates the Hudi writer capability. The Hudi committer and full LakeStorage#createLakeTieringFactory enablement will be completed in follow-up PRs.

Brief change log

  • Add Hudi lake tiering writer scaffolding:

    • HudiLakeTieringFactory
    • HudiLakeWriter
    • HudiWriteResult
    • HudiWriteResultSerializer
    • HudiWriteTableInfo
  • Add writer-side record conversion and buffering:

    • Convert Fluss LogRecord / InternalRow into Hudi-compatible Flink RowData.
    • Map Fluss ChangeType to Flink RowKind.
    • Add Hudi bucket/file-id routing for insert/upsert writes.
    • Buffer and flush records through HoodieFlinkWriteClient.
  • Add Hudi instant coordination metadata:

    • Introduce DFS-backed checkpoint metadata for sharing the initialized Hudi instant across tiering writer subtasks.
    • Use WriterInitContext#splitIndex() to let only the first split initialize the Hudi instant.
    • Use WriterInitContext#tieringRoundTimestamp() to wait for the correct round instant.
  • Keep the feature intentionally writer-only for now:

    • Hudi committer APIs still throw UnsupportedOperationException.
    • HudiLakeStorage#createLakeTieringFactory() is not enabled yet to avoid exposing incomplete tiering-to-Hudi service behavior.
  • Add flink-table-runtime as a provided dependency for the Hudi writer buffer.

Tests

  • mvn -q -pl fluss-lake/fluss-lake-hudi -am -DskipITs -Dcheckstyle.skip=true -DfailIfNoTests=false -Dtest=FlussRecordAsHudiRowTest,HudiWriteResultSerializerTest test
  • git diff --check

mvn clean verify was not run locally.

API and Format

This PR does not introduce a new user-facing public API.

It adds internal Hudi tiering writer classes and a versioned HudiWriteResultSerializer. It also introduces internal checkpoint metadata files under Hudi auxiliary metadata paths for coordinating writer-side instant
initialization. The full committable format and commit protocol will be finalized in the follow-up committer PR.

Documentation

No user-facing documentation is added in this PR because Hudi tiering is not fully enabled yet. Documentation should be added when the committer and end-to-end Hudi tiering service are enabled.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces the writer-side implementation for tiering Fluss tables into Hudi within the fluss-lake-hudi module. It adds Hudi tiering writer scaffolding, Fluss→Flink/Hudi row conversions, buffered batch writes via HoodieFlinkWriteClient, and DFS-backed checkpoint metadata to coordinate Hudi instant initialization across writer subtasks (with committer support intentionally left for follow-up work).

Changes:

  • Add Hudi tiering writer components (HudiLakeTieringFactory, HudiLakeWriter, RecordWriter, buffering/conversion utilities).
  • Add DFS-backed checkpoint metadata utilities (CkpMetadata*) to coordinate instant initialization across splits/rounds.
  • Add initial serialization format for writer results (HudiWriteResult*) plus targeted unit tests, and add flink-table-runtime as a provided dependency.

Reviewed changes

Copilot reviewed 22 out of 22 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
fluss-lake/fluss-lake-hudi/src/test/java/org/apache/fluss/lake/hudi/tiering/writer/FlussRecordAsHudiRowTest.java Adds unit tests for Fluss LogRecord→Hudi/Flink RowData wrapping and system columns/kind mapping.
fluss-lake/fluss-lake-hudi/src/test/java/org/apache/fluss/lake/hudi/tiering/HudiWriteResultSerializerTest.java Adds unit tests for HudiWriteResultSerializer versioning and empty result round-trip.
fluss-lake/fluss-lake-hudi/src/main/java/org/apache/fluss/lake/hudi/utils/meta/CkpMetadataProvider.java Provides table-scoped access/caching for checkpoint metadata coordination objects.
fluss-lake/fluss-lake-hudi/src/main/java/org/apache/fluss/lake/hudi/utils/meta/CkpMetadataFactory.java Constructs DFS-backed checkpoint metadata using Hudi/Hadoop filesystem utilities.
fluss-lake/fluss-lake-hudi/src/main/java/org/apache/fluss/lake/hudi/utils/meta/CkpMetadata.java Implements DFS-backed “message bus” files to track instant lifecycle and retention.
fluss-lake/fluss-lake-hudi/src/main/java/org/apache/fluss/lake/hudi/utils/meta/CkpMessage.java Represents checkpoint metadata messages (instant + state) derived from filenames.
fluss-lake/fluss-lake-hudi/src/main/java/org/apache/fluss/lake/hudi/utils/HudiConversions.java Adds Fluss ChangeType→Flink RowKind conversion for Hudi row data.
fluss-lake/fluss-lake-hudi/src/main/java/org/apache/fluss/lake/hudi/tiering/writer/RecordWriteBuffer.java Buffers records into Hudi buckets and flushes batches via HoodieFlinkWriteClient.
fluss-lake/fluss-lake-hudi/src/main/java/org/apache/fluss/lake/hudi/tiering/writer/HudiRecordWriter.java Writes Fluss LogRecords by converting to HoodieFlinkInternalRow and routing/buffering.
fluss-lake/fluss-lake-hudi/src/main/java/org/apache/fluss/lake/hudi/tiering/writer/HudiRecordConverter.java Converts buffered RowData into HoodieRecords using key/partition generation.
fluss-lake/fluss-lake-hudi/src/main/java/org/apache/fluss/lake/hudi/tiering/writer/FlussRowAsHudiRow.java Wraps Fluss InternalRow as Flink RowData for Hudi ingestion.
fluss-lake/fluss-lake-hudi/src/main/java/org/apache/fluss/lake/hudi/tiering/writer/FlussRecordAsHudiRow.java Extends row wrapper to include Fluss system columns and change-type→row-kind mapping.
fluss-lake/fluss-lake-hudi/src/main/java/org/apache/fluss/lake/hudi/tiering/writer/FlussMapAsHudiMap.java Wraps Fluss InternalMap as Flink MapData.
fluss-lake/fluss-lake-hudi/src/main/java/org/apache/fluss/lake/hudi/tiering/writer/FlussArrayAsHudiArray.java Wraps Fluss InternalArray as Flink ArrayData.
fluss-lake/fluss-lake-hudi/src/main/java/org/apache/fluss/lake/hudi/tiering/RecordWriter.java Base writer implementing bucket/file-id routing and completing buffered writes.
fluss-lake/fluss-lake-hudi/src/main/java/org/apache/fluss/lake/hudi/tiering/HudiWriteTableInfo.java Resolves Hudi table metadata/config and creates the Hudi write client used by writers.
fluss-lake/fluss-lake-hudi/src/main/java/org/apache/fluss/lake/hudi/tiering/HudiWriteResultSerializer.java Introduces versioned serializer for Hudi writer results.
fluss-lake/fluss-lake-hudi/src/main/java/org/apache/fluss/lake/hudi/tiering/HudiWriteResult.java Defines writer result container to be consumed by a future committer.
fluss-lake/fluss-lake-hudi/src/main/java/org/apache/fluss/lake/hudi/tiering/HudiLakeWriter.java Implements LakeWriter for Hudi, including instant initialization + record writing lifecycle.
fluss-lake/fluss-lake-hudi/src/main/java/org/apache/fluss/lake/hudi/tiering/HudiLakeTieringFactory.java Implements LakeTieringFactory wiring writer + serializers; committer intentionally unsupported.
fluss-lake/fluss-lake-hudi/src/main/java/org/apache/fluss/lake/hudi/tiering/HudiCatalogProvider.java Serializable provider to create/open Hudi catalogs inside tiering subtasks.
fluss-lake/fluss-lake-hudi/pom.xml Adds flink-table-runtime as a provided dependency for the writer buffer implementation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@luoyuxia luoyuxia left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fhan688 Thanks for the pr. lgtm overall. Left minor comments. PTAL

@luoyuxia luoyuxia left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fhan688 Thanks. LGTM

@luoyuxia luoyuxia merged commit dfbf1a5 into apache:main Jun 22, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants