Skip to content

FutureWarning: DataFrameGroupBy.apply does not have a clear solution. #62513

@hunterhogan

Description

@hunterhogan

IDK how pandas will classify this issue.

df = df.sort_values(by=['oeisID', 'n', 'boundary'], ascending=[True, True, False])
def addColumnsGrowing(groupBy: pandas.DataFrame) -> pandas.DataFrame:
    groupBy['bucketsGrowing'] = groupBy['buckets'].diff().gt(0).fillna(True)
    groupBy['arcCodesGrowing'] = groupBy['arcCodes'].diff().gt(0).fillna(True)
    return groupBy
df = df.groupby(['oeisID', 'n'], group_keys=False).apply(addColumnsGrowing)

C:\Users\hunte\AppData\Local\Temp\ipykernel_25960\1489258679.py:6: FutureWarning: DataFrameGroupBy.apply operated on the grouping columns. This behavior is deprecated, and in a future version of pandas the grouping columns will be excluded from the operation. Either pass include_groups=False to exclude the groupings or explicitly select the grouping columns after groupby to silence this warning.
df = df.groupby(['oeisID', 'n'], group_keys=False).apply(addColumnsGrowing)

"... pass include_groups=False" to what method?

"... or explicitly select the grouping columns." Is that a method, a parameter, or a topic in the documentation?

df = df.sort_values(by=['oeisID', 'n', 'boundary'], ascending=[True, True, False])
def addColumnsGrowing(groupBy: pandas.DataFrame) -> pandas.DataFrame:
    groupBy['bucketsGrowing'] = groupBy['buckets'].diff().gt(0).fillna(True)
    groupBy['arcCodesGrowing'] = groupBy['arcCodes'].diff().gt(0).fillna(True)
    return groupBy
df = df.groupby(['oeisID', 'n'], group_keys=False, include_groups=False).apply(addColumnsGrowing)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[6], line 6
      4     groupBy['arcCodesGrowing'] = groupBy['arcCodes'].diff().gt(0).fillna(True)
      5     return groupBy
----> 6 df = df.groupby(['oeisID', 'n'], group_keys=False, include_groups=False).apply(addColumnsGrowing)

TypeError: DataFrame.groupby() got an unexpected keyword argument 'include_groups'

Not groupby.

df = df.sort_values(by=['oeisID', 'n', 'boundary'], ascending=[True, True, False])
def addColumnsGrowing(groupBy: pandas.DataFrame) -> pandas.DataFrame:
    groupBy['bucketsGrowing'] = groupBy['buckets'].diff().gt(0).fillna(True)
    groupBy['arcCodesGrowing'] = groupBy['arcCodes'].diff().gt(0).fillna(True)
    return groupBy
df = df.groupby(['oeisID', 'n'], group_keys=False).apply(addColumnsGrowing, include_groups=False)

apply can't be right because I am getting diagnostic errors.

[{
	"resource": "/c:/apps/mapFolding/mapFolding/reference/matrixMeandersAnalysis/buckets.ipynb",
	"owner": "pylance12",
	"code": {
		"value": "reportCallIssue",
		"target": {
			"$mid": 1,
			"path": "/microsoft/pylance-release/blob/main/docs/diagnostics/reportCallIssue.md",
			"scheme": "https",
			"authority": "github.com"
		}
	},
	"severity": 8,
	"message": "No parameter named \"include_groups\"",
	"source": "Pylance",
	"startLineNumber": 6,
	"startColumn": 77,
	"endLineNumber": 6,
	"endColumn": 91,
	"origin": "extHost1"
}]

But if I run the cell, it seems to work, and I don't get a FutureWarning.

Partial solution

As I have tried to express many times to the Python community: the original linter was called a "spell checker." The same CI/CD concepts and tools can easily be applied to the words-that-are-not-code, which would have many benefits, including fewer "Issues" from confused users.

Metadata

Metadata

Assignees

No one assigned

    Labels

    ApplyApply, Aggregate, Transform, MapGroupby

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions