-
Notifications
You must be signed in to change notification settings - Fork 71
feat: add DataFile aggregate evaluation #373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
01f87b4 to
579ec0e
Compare
src/iceberg/expression/aggregate.cc
Outdated
| public: | ||
| explicit SingleValueStructLike(Literal literal) : literal_(std::move(literal)) {} | ||
|
|
||
| Result<Scalar> GetField(size_t /*pos*/) const override { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we return error if pos is not 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we return error if pos is not 0?
I moved LiteralToScalar/SingleValueStructLike into row/struct_like as the small adapter we use when evaluating aggregates from file metrics. I originally tried to return an error when pos != 0, but that breaks the metrics aggregation path: bound terms carry the original field position (often 1,2,…) so Count/Max/Min on file metrics all fail (ctest reproduces this). The Java equivalent ValueAggregate (ValueAggregate) also ignores the index for the same reason. It isn’t a reusable general StructLike; it’s a narrow, internal adapter, so I think we need to ignore the incoming index here for correctness. If anything in my understanding is off, please let me know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM. Thanks for the explanation!
afd5176 to
6a778c9
Compare
6a778c9 to
b37a0aa
Compare
c1c0721 to
e8cb1b3
Compare
src/iceberg/expression/aggregate.cc
Outdated
| bool valid_ = true; | ||
| }; | ||
|
|
||
| bool HasMapKey(const std::map<int32_t, int64_t>& map, int32_t key) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you wrap contains?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you wrap
contains?
Got it. It's redundant.
|
This PR carries a lot of unrelated changes. Could you please fix that? |
Absolutely |
If convenient, please review pr #400. Appreciate for your time and help. |
This PR implements the
DataFilevaluation for aggregates addressing issue #360.MAX/MINnow support evaluation fromDataFilemetrics.CountNonNull: requires value + null countsCountNull: requires null countsCountStar: requires non-negative record_countMissing metrics mark the aggregator invalid and produce null results.
AggregateEvaluator::AllAggregatorsValid().Mirrors Java’s allAggregatorsValid() to indicate when aggregates can be reliably computed from
DataFilemetrics.MAX/MINoverloads forUnboundTerm<BoundTransform>.