Skip to content

Improve sorting on DataFrame #215

Open
@jordanmontt

Description

@jordanmontt

Currently, we don't support multiColumn sorting. For example let's use this data as example: https://www.kaggle.com/datasets/prashant111/the-simpsons-dataset?resource=download

I have this DataFrame that I want to sort by season and by episode
Capture d’écran 2023-04-11 à 11 31 12

To sort it by season and by episode I need to do this:

df
    sortBy: #number_in_season;
    sortBy: #season

If I do it the other way around, first season and then the episode it does not work.

This raise the question that we need a better api and mechanism to sort a dataframe. Some options can be:

With ChainedSortFunction

df sortBy: #season ascending, #number_in_season ascending

Or more pandas like: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.sort_values.html

df sortValuesDescending: ( #season #number_in_season)
df sortValuesAscending: ( #season #number_in_season)

Also we need to add the sorted methods that return a new DataFrame.

Metadata

Metadata

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions