Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request]: [IcebergIO] Configure data writers to track metrics #34112

Open
2 of 17 tasks
ahmedabu98 opened this issue Feb 28, 2025 · 7 comments · May be fixed by #34140
Open
2 of 17 tasks

[Feature Request]: [IcebergIO] Configure data writers to track metrics #34112

ahmedabu98 opened this issue Feb 28, 2025 · 7 comments · May be fixed by #34140

Comments

@ahmedabu98
Copy link
Contributor

What would you like to happen?

We're configuring our Iceberg data writers without a metrics config! So we're losing out on writing important statistics for each data file (statistics that help with query planning).

This can be easily added over here with:

   .metricsConfig(metricsConfig)

Issue Priority

Priority: 2 (default / most feature requests should be filed as P2)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam YAML
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Infrastructure
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner
@Suvrat1629
Copy link

@ahmedabu98 Is this issue available to work on?

@ahmedabu98
Copy link
Contributor Author

Hey @Suvrat1629! Yeah give it a shot and tag me in the PR to review it. Let's make sure we add some unit testing that columns marked for statistic collection are respected.

Take a look at the following table options for enabling metrics on table columns:

  • write.metadata.metrics.default
  • write.metadata.metrics.column.

@Suvrat1629
Copy link

Suvrat1629 commented Mar 1, 2025

@ahmedabu98
Thank you i will give it a shot but since I am new to the codebase so before i start working I would like some guidance and like to verify my approach:

  • first we add the metricsConfig(metricsConfig) to the RecordWriter.java file
  • then update all the corresponding Write*.java files.

does this approach fit right? Do I need to look into anymore files?

@ahmedabu98
Copy link
Contributor Author

I think it'll end up being a lot simpler than that. RecordWriter is the essential piece that actually writes data. The other Write* files just take care of moving data around. The essential piece that actually writes data to files is RecordWriter. We just need to create a metrics config (something like MetricsConfig metricsConfig = MetricsConfig.forTable(table);) and pass it into the data writers

@Suvrat1629
Copy link

@ahmedabu98 So I need to add MetricsConfig metricsConfig = MetricsConfig.forTable(table); and then add test cases for the same?

@Suvrat1629
Copy link

@ahmedabu98 Since now that I have made the necessary change as you told me, do i add a new Test case file for the tests or make changes to and existing one?

@ahmedabu98
Copy link
Contributor Author

You can add a new test case to an existing class: RecordWriterManagerTest.java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants