Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: enable iceberg compat tests, more tests for complex types #1550

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

comphead
Copy link
Contributor

Which issue does this PR close?

Part of #1454 .

Rationale for this change

Enable Iceberg compat type tests, add more tests for complex types

What changes are included in this PR?

How are these changes tested?

@@ -63,4 +61,67 @@ class CometNativeReaderSuite extends CometTestBase with AdaptiveSparkPlanHelper
|""".stripMargin,
"select arr from tbl")
}

/*
Copy link
Contributor Author

@comphead comphead Mar 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Followup ticket to fix schema issues for native reader - read STRUCT of ARRAY fields #1551

@codecov-commenter
Copy link

codecov-commenter commented Mar 19, 2025

Codecov Report

Attention: Patch coverage is 55.00000% with 9 lines in your changes missing coverage. Please review.

Project coverage is 58.53%. Comparing base (f09f8af) to head (0e9423a).
Report is 104 commits behind head on main.

Files with missing lines Patch % Lines
.../scala/org/apache/comet/serde/QueryPlanSerde.scala 28.57% 4 Missing and 1 partial ⚠️
...org/apache/comet/CometSparkSessionExtensions.scala 66.66% 0 Missing and 4 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #1550      +/-   ##
============================================
+ Coverage     56.12%   58.53%   +2.40%     
- Complexity      976      977       +1     
============================================
  Files           119      122       +3     
  Lines         11743    12237     +494     
  Branches       2251     2279      +28     
============================================
+ Hits           6591     7163     +572     
+ Misses         4012     3944      -68     
+ Partials       1140     1130      -10     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@comphead
Copy link
Contributor Author

Not sure why test failed, it is okay on local

@comphead comphead requested a review from parthchandra March 19, 2025 17:50
Copy link
Contributor

@parthchandra parthchandra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great tests @comphead
Do you think we need to add some cases with one more level of nesting -

  array
    +- struct 
            +- array

and

 struct
   +- array
           +- struct

or would that be overkill?

@comphead
Copy link
Contributor Author

comphead commented Mar 20, 2025

Great tests @comphead Do you think we need to add some cases with one more level of nesting -

  array
    +- struct 
            +- array

and

 struct
   +- array
           +- struct

or would that be overkill?

Thanks @parthchandra I'll try to add them once I fx other tests currently failing

UPD: it can be done once #1551 is fixed

@parthchandra
Copy link
Contributor

 struct
   +- array
           +- struct

fyi, currently some tests in Spark SQL (e.g. "add a nested column at the end of the leaf struct column") fail because of structs like this with.

Caused by: org.apache.comet.CometNativeException: no entry found for key
[info]         at comet::errors::init::{{closure}}(__internal__:0)
[info]         at std::panicking::rust_panic_with_hook(__internal__:0)
[info]         at std::panicking::begin_panic_handler::{{closure}}(__internal__:0)
[info]         at std::sys::backtrace::__rust_end_short_backtrace(__internal__:0)
[info]         at rust_begin_unwind(__internal__:0)
[info]         at core::panicking::panic_fmt(__internal__:0)
[info]         at core::option::expect_failed(__internal__:0)
[info]         at comet::parquet::parquet_support::cast_struct_to_struct(__internal__:0)
[info]         at comet::parquet::parquet_support::cast_array(__internal__:0)
[info]         at comet::parquet::parquet_support::cast_struct_to_struct(__internal__:0)
[info]         at comet::parquet::parquet_support::cast_array(__internal__:0)
[info]         at comet::parquet::parquet_support::spark_parquet_convert(__internal__:0)
[info]         at <core::iter::adapters::GenericShunt<I,R> as core::iter::traits::iterator::Iterator>::next(__internal__:0)
[info]         at <comet::parquet::schema_adapter::SchemaMapping as datafusion::datasource::schema_adapter::SchemaMapper>::map_batch(__internal__:0)
[info]         at <futures_util::stream::stream::map::Map<St,F> as futures_core::stream::Stream>::poll_next(__internal__:0)
[info]         at <datafusion_datasource::file_stream::FileStream as futures_core::stream::Stream>::poll_next(__internal__:0)
[info]         at Java_org_apache_comet_parquet_Native_readNextRecordBatch(__internal__:0)
[info]         at <unknown>(__internal__:0)
[info] 	at org.apache.comet.parquet.Native.readNextRecordBatch(Native Method)

@comphead
Copy link
Contributor Author

 struct
   +- array
           +- struct

fyi, currently some tests in Spark SQL (e.g. "add a nested column at the end of the leaf struct column") fail because of structs like this with.

Caused by: org.apache.comet.CometNativeException: no entry found for key
[info]         at comet::errors::init::{{closure}}(__internal__:0)
[info]         at std::panicking::rust_panic_with_hook(__internal__:0)
[info]         at std::panicking::begin_panic_handler::{{closure}}(__internal__:0)
[info]         at std::sys::backtrace::__rust_end_short_backtrace(__internal__:0)
[info]         at rust_begin_unwind(__internal__:0)
[info]         at core::panicking::panic_fmt(__internal__:0)
[info]         at core::option::expect_failed(__internal__:0)
[info]         at comet::parquet::parquet_support::cast_struct_to_struct(__internal__:0)
[info]         at comet::parquet::parquet_support::cast_array(__internal__:0)
[info]         at comet::parquet::parquet_support::cast_struct_to_struct(__internal__:0)
[info]         at comet::parquet::parquet_support::cast_array(__internal__:0)
[info]         at comet::parquet::parquet_support::spark_parquet_convert(__internal__:0)
[info]         at <core::iter::adapters::GenericShunt<I,R> as core::iter::traits::iterator::Iterator>::next(__internal__:0)
[info]         at <comet::parquet::schema_adapter::SchemaMapping as datafusion::datasource::schema_adapter::SchemaMapper>::map_batch(__internal__:0)
[info]         at <futures_util::stream::stream::map::Map<St,F> as futures_core::stream::Stream>::poll_next(__internal__:0)
[info]         at <datafusion_datasource::file_stream::FileStream as futures_core::stream::Stream>::poll_next(__internal__:0)
[info]         at Java_org_apache_comet_parquet_Native_readNextRecordBatch(__internal__:0)
[info]         at <unknown>(__internal__:0)
[info] 	at org.apache.comet.parquet.Native.readNextRecordBatch(Native Method)

appreciate if you can mention those tests in #1551 so we can double check if they can pass

supportedDataType(
a.dataType,
allowComplex =
usingDataFusionParquetExec(conf) || op.isInstanceOf[CometSparkToColumnarExec])) =>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mbutrovich I think this check require attention.
It needs to work for CometSparkToColumnarExec as well

Copy link
Contributor

@kazuyukitanimura kazuyukitanimura left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one commment

supportedDataType(
a.dataType,
allowComplex =
usingDataFusionParquetExec(conf) || CometConf.COMET_CONVERT_FROM_PARQUET_ENABLED
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why COMET_CONVERT_FROM_* is allowed only here but not other places?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants