Skip to content

Conversation

@ali-aqib
Copy link

Fixes #49352

This PR implements the filtering approach suggested by @rhshadrach in issue #49352, making numeric_only=True work correctly when passing a list of aggregation functions to DataFrame.agg() and GroupBy.agg().


Problem

When calling .agg() with a list of functions and numeric_only=True, the parameter was being ignored, causing errors or incorrect behavior:

import pandas as pd

df = pd.DataFrame({
    'A': [1, 2, 3, 4, 5],
    'B': [10.5, 20.5, 30.5, 40.5, 50.5],
    'C': ['a', 'b', 'c', 'd', 'e']
})

# Before: Would fail with TypeError
print(df.agg(['min', 'max', 'mean'], numeric_only=True))
# TypeError: Series.min does not allow numeric_only=True with non-numeric dtypes.
# Error occurs because numeric_only=True is passed to string column 'C'

# GroupBy had the same issue
df2 = pd.DataFrame({
    'key': ['A', 'B', 'A', 'B'],
    'num': [1, 2, 3, 4],
    'text': ['a', 'b', 'c', 'd']
})

print(df2.groupby('key').agg(['sum', 'mean'], numeric_only=True))
# TypeError: Cannot use numeric_only=True with SeriesGroupBy.sum and non-numeric dtypes.
# numeric_only parameter was ignored, causing operations on string columns

Root Cause

As identified by @rhshadrach in this comment:

"We should be applying numeric_only=True in df.agg prior to splitting the DataFrame into Series. We then also need to not pass numeric_only=True to the functions when they are called on the Series."

The issue occurred because:

  1. numeric_only was not being intercepted before splitting the DataFrame into Series
  2. When passed to Series methods, it caused TypeError (Series aggregation methods don't accept numeric_only)

Solution

This PR implements Option 1 from the discussion:

"Intercept numeric_only arguments and apply them to the DataFrame prior to splitting the data up into Series"

Implementation Details:

  1. In NDFrameApply.agg_or_apply_list_like() (for DataFrame.agg):

    • Check if numeric_only=True is passed
    • Filter DataFrame to numeric columns using select_dtypes(include="number")
    • Remove numeric_only from kwargs before passing to Series methods
    • Process only numeric columns
  2. In GroupByApply.agg_list_like() (for GroupBy.agg):

    • Apply the same filtering logic
    • Ensures consistency between DataFrame and GroupBy behavior

Why This Approach?

  • Simple: Minimal code changes, leverages existing select_dtypes()
  • Consistent: Follows pandas' existing filtering philosophy
  • No Breaking Changes: Uses existing numeric_only parameter
  • Avoids kwargs issues: Doesn't pass numeric_only through to user functions (learned from API: add numeric_only support to groupby agg #58132)

After This PR

import pandas as pd

df = pd.DataFrame({
    'A': [1, 2, 3, 4, 5],
    'B': [10.5, 20.5, 30.5, 40.5, 50.5],
    'C': ['a', 'b', 'c', 'd', 'e']
})

# Works correctly - excludes column 'C'
result = df.agg(['min', 'max', 'mean'], numeric_only=True)
print(result)
#        A     B
# min   1.0  10.5
# max   5.0  50.5
# mean  3.0  30.5

# GroupBy also works
df2 = pd.DataFrame({
    'key': ['A', 'B', 'A', 'B'],
    'num': [1, 2, 3, 4],
    'text': ['a', 'b', 'c', 'd']
})

result = df2.groupby('key').agg(['sum', 'mean'], numeric_only=True)
# Only 'num' column is aggregated, 'text' is excluded ✅
print(result)
#    num      
#    sum mean
# key        
# A    4  2.0
# B    6  3.0

Related Issues and PRs

This PR avoids the problems encountered in #58132 by:

  • Not passing numeric_only via **kwargs to user functions
  • Using simple filtering instead of complex caching mechanisms
  • Targeting the specific list-of-functions code path

Testing

New test files added:

  • pandas/tests/apply/test_frame_apply_numeric_only.py (28 tests)
  • pandas/tests/groupby/aggregate/test_aggregate_numeric_only.py (21 tests)

Test coverage includes:

  • Mixed dtypes (numeric + non-numeric columns)
  • All numeric columns
  • No numeric columns (edge case)
  • Various function types (string, numpy functions, mixed)
  • Different numbers of functions
  • Single vs multiple grouping columns
  • Parameter variations (numeric_only=True/False)
  • Column order preservation
  • Edge cases (NaNs, datetime columns, single row, large DataFrame)
  • as_index parameter for GroupBy

All tests pass locally:

pytest pandas/tests/apply/test_frame_apply_numeric_only.py -v  # 28 passed
pytest pandas/tests/groupby/aggregate/test_aggregate_numeric_only.py -v  # 21 passed

Backward Compatibility

Fully backward compatible

  • Uses existing numeric_only parameter
  • No API changes
  • Existing behavior preserved for non-list functions
  • Only affects previously broken/inconsistent list-of-functions case

cc

@rhshadrach - This implements the interception approach you suggested in #49352. Would appreciate your review!


Checklist

@mroeschke mroeschke requested a review from rhshadrach October 23, 2025 16:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: inconsistent DataFrame.agg behavoir when passing as kwargs numeric_only=True

1 participant