BUG: Fix numeric_only ignored with list of functions in DataFrame.agg and GroupBy.agg (#49352) #62803
+552
−0
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #49352
This PR implements the filtering approach suggested by @rhshadrach in issue #49352, making
numeric_only=Truework correctly when passing a list of aggregation functions toDataFrame.agg()andGroupBy.agg().Problem
When calling
.agg()with a list of functions andnumeric_only=True, the parameter was being ignored, causing errors or incorrect behavior:Root Cause
As identified by @rhshadrach in this comment:
The issue occurred because:
numeric_onlywas not being intercepted before splitting the DataFrame into Seriesnumeric_only)Solution
This PR implements Option 1 from the discussion:
Implementation Details:
In
NDFrameApply.agg_or_apply_list_like()(for DataFrame.agg):numeric_only=Trueis passedselect_dtypes(include="number")numeric_onlyfrom kwargs before passing to Series methodsIn
GroupByApply.agg_list_like()(for GroupBy.agg):Why This Approach?
select_dtypes()numeric_onlyparameternumeric_onlythrough to user functions (learned from API: add numeric_only support to groupby agg #58132)After This PR
Related Issues and PRs
DataFrame.aggbehavoir when passing as kwargsnumeric_only=True#49352 (this fix)numeric_onlyin groupby ops #56946 (broader numeric_only support in groupby)This PR avoids the problems encountered in #58132 by:
numeric_onlyvia**kwargsto user functionsTesting
New test files added:
pandas/tests/apply/test_frame_apply_numeric_only.py(28 tests)pandas/tests/groupby/aggregate/test_aggregate_numeric_only.py(21 tests)Test coverage includes:
numeric_only=True/False)All tests pass locally:
Backward Compatibility
✅ Fully backward compatible
numeric_onlyparametercc
@rhshadrach - This implements the interception approach you suggested in #49352. Would appreciate your review!
Checklist
DataFrame.aggbehavoir when passing as kwargsnumeric_only=True#49352doc/source/whatsnew/vX.X.X.rstfile if fixing a bug or adding a new feature