Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEAT-#4605: Handle small and empty dataframes #4606

Closed
wants to merge 5 commits into from

Conversation

naren-ponder
Copy link
Collaborator

@naren-ponder naren-ponder commented Jun 27, 2022

Signed-off-by: Naren Krishna [email protected]

What do these changes do?

  • commit message follows format outlined here
  • passes flake8 modin/ asv_bench/benchmarks scripts/doc_checker.py
  • passes black --check modin/ asv_bench/benchmarks scripts/doc_checker.py
  • signed commit with git commit -s
  • Resolves Handle Empty/Small Data DataFrames as a separate case #4605 ?
  • tests added and passing
  • module layout described at docs/development/architecture.rst is up-to-date
  • added (Issue Number: PR title (PR Number)) and github username to release notes for next major release

@naren-ponder naren-ponder requested a review from a team as a code owner June 27, 2022 15:52
@naren-ponder naren-ponder marked this pull request as draft June 27, 2022 15:52
Copy link
Collaborator

@devin-petersohn devin-petersohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@naren-ponder Would it be possible to transition this to using the query compiler approach? At the moment, this looks like the empty_dataframe.py file is a lot of shared code with the dataframe.py file, except for all the default_to_pandas. It might be easier to default to pandas at the query compiler since the base object already does that automatically.

@Garra1980
Copy link
Collaborator

Great! Looking forward to see these changes in

@lgtm-com
Copy link

lgtm-com bot commented Jun 30, 2022

This pull request introduces 11 alerts when merging 90ca31181c9c76b0ec27a3fc551356f9df9e4870 into 3982306 - view on LGTM.com

new alerts:

  • 10 for Unused import
  • 1 for `__init__` method returns a value

@lgtm-com
Copy link

lgtm-com bot commented Jul 8, 2022

This pull request introduces 11 alerts when merging fc42bd4d1a1024b6b6b6fa2503385b6a313dcbd4 into eddfda4 - view on LGTM.com

new alerts:

  • 11 for Unused import

@Garra1980
Copy link
Collaborator

@naren-ponder Any plans here?

@naren-ponder naren-ponder marked this pull request as ready for review July 26, 2022 18:21
Signed-off-by: Naren Krishna <[email protected]>
Signed-off-by: Naren Krishna <[email protected]>
Signed-off-by: Naren Krishna <[email protected]>
@codecov
Copy link

codecov bot commented Jul 26, 2022

Codecov Report

Merging #4606 (e4a2856) into master (f5f4c16) will increase coverage by 16.00%.
The diff coverage is 28.90%.

@@             Coverage Diff             @@
##           master    #4606       +/-   ##
===========================================
+ Coverage   71.43%   87.43%   +16.00%     
===========================================
  Files         259      261        +2     
  Lines       19198    19609      +411     
===========================================
+ Hits        13714    17146     +3432     
+ Misses       5484     2463     -3021     
Impacted Files Coverage Δ
...ore/storage_formats/pandas/small_query_compiler.py 28.00% <28.00%> (ø)
modin/pandas/dataframe.py 91.69% <66.66%> (+0.13%) ⬆️
...in/core/dataframe/algebra/default2pandas/binary.py 36.84% <0.00%> (-57.90%) ⬇️
.../core/dataframe/algebra/default2pandas/resample.py 50.00% <0.00%> (-44.45%) ⬇️
...n/core/dataframe/algebra/default2pandas/rolling.py 56.25% <0.00%> (-37.50%) ⬇️
modin/core/storage_formats/base/query_compiler.py 68.86% <0.00%> (-30.36%) ⬇️
...n/core/dataframe/algebra/default2pandas/groupby.py 71.83% <0.00%> (-24.65%) ⬇️
modin/core/dataframe/algebra/default2pandas/cat.py 83.33% <0.00%> (-16.67%) ⬇️
modin/core/dataframe/algebra/default2pandas/str.py 83.33% <0.00%> (-16.67%) ⬇️
...n/core/dataframe/algebra/default2pandas/default.py 87.93% <0.00%> (-8.63%) ⬇️
... and 76 more

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

@naren-ponder naren-ponder changed the title FEAT-#4605: Basic approach layout FEAT-#4605: Handle small and empty dataframes Jul 27, 2022
@pyrito
Copy link
Collaborator

pyrito commented Jul 27, 2022

@naren-ponder were you planning to add any tests to this PR?

@Garra1980
Copy link
Collaborator

Any progress here?

@pyrito
Copy link
Collaborator

pyrito commented Sep 12, 2022

@Garra1980 yes, @Billy2551 is going to be taking this on. It'll probably be in a new PR, but we will keep this one up for now.

@YarShev
Copy link
Collaborator

YarShev commented Dec 5, 2023

@naren-ponder, do you have a plan to continue this?

@arunjose696
Copy link
Collaborator

Closing as #5113 is based on ideas from the PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Handle Empty/Small Data DataFrames as a separate case
6 participants