Skip to content

Conversation

@SuKi2cn
Copy link
Contributor

@SuKi2cn SuKi2cn commented Nov 30, 2025

This PR implements the DataFil evaluation for aggregates addressing issue #360.

  • MAX/MIN now support evaluation from DataFile metrics.
  • Count-family aggregates now follow Java semantics:
    • CountNonNull: requires value + null counts
    • CountNull: requires null counts
    • CountStar: requires non-negative record_count
      Missing metrics mark the aggregator invalid and produce null results.
  • Added AggregateEvaluator::AllAggregatorsValid().
    Mirrors Java’s allAggregatorsValid() to indicate when aggregates can be reliably computed from DataFile metrics.
  • Added MAX/MIN overloads for UnboundTerm<BoundTransform>.
  • Updated and extended tests:

@SuKi2cn SuKi2cn marked this pull request as draft November 30, 2025 11:51
@SuKi2cn SuKi2cn changed the title feat: DataFile aggregate evaluation feat: add DataFile aggregate evaluation Nov 30, 2025
@SuKi2cn SuKi2cn marked this pull request as ready for review November 30, 2025 12:21
public:
explicit SingleValueStructLike(Literal literal) : literal_(std::move(literal)) {}

Result<Scalar> GetField(size_t /*pos*/) const override {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we return error if pos is not 0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we return error if pos is not 0?

I moved LiteralToScalar/SingleValueStructLike into row/struct_like as the small adapter we use when evaluating aggregates from file metrics. I originally tried to return an error when pos != 0, but that breaks the metrics aggregation path: bound terms carry the original field position (often 1,2,…) so Count/Max/Min on file metrics all fail (ctest reproduces this). The Java equivalent ValueAggregate (ValueAggregate) also ignores the index for the same reason. It isn’t a reusable general StructLike; it’s a narrow, internal adapter, so I think we need to ignore the incoming index here for correctness. If anything in my understanding is off, please let me know.

Copy link
Member

@wgtmac wgtmac Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM. Thanks for the explanation!

bool valid_ = true;
};

bool HasMapKey(const std::map<int32_t, int64_t>& map, int32_t key) {
Copy link
Contributor

@HuaHuaY HuaHuaY Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you wrap contains?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you wrap contains?

Got it. It's redundant.

@wgtmac
Copy link
Member

wgtmac commented Dec 6, 2025

This PR carries a lot of unrelated changes. Could you please fix that?

@SuKi2cn
Copy link
Contributor Author

SuKi2cn commented Dec 7, 2025

This PR carries a lot of unrelated changes. Could you please fix that?

Absolutely

@SuKi2cn
Copy link
Contributor Author

SuKi2cn commented Dec 7, 2025

This PR carries a lot of unrelated changes. Could you please fix that?

If convenient, please review pr #400. Appreciate for your time and help.

@SuKi2cn SuKi2cn closed this Dec 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants